You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm encountering an issue with my PostgreSQL + PGVector setup:
I have a table containing vectors and a category_id, and I always need to filter on a specific category_id. However, since I have hundreds of different categories and the filtering is applied after the index scan, it leads to very low recall in my queries.
I understand that partitioning is recommended in such cases, but my challenge is that new category_id values are frequently added, and I need efficient indexing on them immediately for performance reasons.
I'm using Django, and due to external constraints, I'm stuck on PostgreSQL 16.3, meaning I can't upgrade pgvector to 0.8.0 to leverage iterative scanning.
Has anyone faced a similar issue? How did you manage to improve recall in this scenario?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi @ggalpra, check out the filtering docs for a list of options. I'd start with a B-tree index on category_id. For iterative scanning, pgvector 0.8.0 supports Postgres 13+, and for partitioning, you can use hash partitioning if categories are frequently added.
Hi,
I'm encountering an issue with my PostgreSQL + PGVector setup:
I have a table containing
vectors
and acategory_id
, and I always need to filter on a specificcategory_id
. However, since I have hundreds of different categories and the filtering is applied after the index scan, it leads to very low recall in my queries.I understand that partitioning is recommended in such cases, but my challenge is that new category_id values are frequently added, and I need efficient indexing on them immediately for performance reasons.
I'm using Django, and due to external constraints, I'm stuck on PostgreSQL 16.3, meaning I can't upgrade pgvector to 0.8.0 to leverage iterative scanning.
Has anyone faced a similar issue? How did you manage to improve recall in this scenario?
Thanks!
The text was updated successfully, but these errors were encountered: