Skip to content

Conversation

@therealdarkknight
Copy link
Contributor

Addresses issue 198 using the approach documented in the discussion of PR 209.

When creating an index with no dimension specified on an empty table, we set the dimension in the index header to 0. Later, during the first insert, we infer the dimension from the inserted first row and then update it in the index header.

… 0 and then later update it after inferring dimension from first insert
Copy link
Collaborator

@var77 var77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Do you think adding this code

isNull = true;
while(isNull) {
    // Get the indexed column out of the row and return it's dimensions
    datum = heap_getattr(tuple, indexCol, RelationGetDescr(heap), &isNull);
    if(!isNull) {
        array = DatumGetArrayTypePCopy(datum);
        n_items = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array));
        break;
    }
    tuple = heap_getnext(scan, ForwardScanDirection);
    if(tuple == NULL) {
        heap_endscan(scan);
        return 0;
    }
}

In line 246 makes sense?

Because now this will fail to infer the dimensions:

create table test_emp (v real[]);
insert into test_emp (v) values (NULL),( '{1,1,1}');
create index on test_emp using hnsw(v);

cc: @Ngalstyan4

@therealdarkknight
Copy link
Contributor Author

Looks good. Do you think adding this code

isNull = true;
while(isNull) {
    // Get the indexed column out of the row and return it's dimensions
    datum = heap_getattr(tuple, indexCol, RelationGetDescr(heap), &isNull);
    if(!isNull) {
        array = DatumGetArrayTypePCopy(datum);
        n_items = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array));
        break;
    }
    tuple = heap_getnext(scan, ForwardScanDirection);
    if(tuple == NULL) {
        heap_endscan(scan);
        return 0;
    }
}

In line 246 makes sense?

Because now this will fail to infer the dimensions:

create table test_emp (v real[]);
insert into test_emp (v) values (NULL),( '{1,1,1}');
create index on test_emp using hnsw(v);

cc: @Ngalstyan4

Yes, great point! I agree with this change. As far as I know, this behavior was present prior to this PR/issue (I just realized I also brought it up earlier, see number 2 in my first PR for this ).

I have some other ideas on how the way we do dimension inference can be refactored-- see number 3 in my first PR above. Since this edit is relevant to index creation on non-empty tables, (and this issue/PR is concerned with index creation on empty tables) I can submit a separate PR with this change and also refactor those functions there.

What do you think?

@var77
Copy link
Collaborator

var77 commented Dec 22, 2023

Yep that will work! Actually maybe we won't need that as per discussion with Narek we decided that the tradeoff of UX doesn't worth the complexity added (counting edge cases as well)

Copy link
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small ask, then should be good to go!

begin;
-- Our index then infers the dimension from the first inserted row
INSERT INTO small_world5 (id, v) VALUES
('000', '{1,0,0,0,1}'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you insert a NULL value here first, before inserting anything else, to trigger this case ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants