Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alembic DDL mismatch for HALFVEC index #118

Closed
jleguina opened this issue Feb 7, 2025 · 1 comment
Closed

Alembic DDL mismatch for HALFVEC index #118

jleguina opened this issue Feb 7, 2025 · 1 comment

Comments

@jleguina
Copy link

jleguina commented Feb 7, 2025

Hi,

I'm facing a few issues with using Alembic and SQLAlchemy when creating halfvec indices.

1. The DLL generated by alembic given the index is invalid

My index:

Index(
    "ix_chunks_vector",
    func.cast(Chunk.vector, HALFVEC(settings.EMBEDDING_DIMS)).label("vector"),
    postgresql_using="hnsw",
    postgresql_with={"m": 16, "ef_construction": 64},
    postgresql_ops={"vector": "halfvec_cosine_ops"},
)

Which translates to Alembic as:

    op.create_index(
        "ix_chunks_vector",
        "chunks",
        [sa.text("CAST(vector AS HALFVEC(3072))")],
        unique=False,
        postgresql_using="hnsw",
        postgresql_with={"m": 16, "ef_construction": 64},
        postgresql_ops={"vector": "halfvec_cosine_ops"},
    )

Which in turn becomes:

CREATE INDEX ix_chunks_vector 
ON chunks
USING hnsw (CAST(vector AS HALFVEC(3072))) 
WITH (m = 16, ef_construction = 64)

As you can see, the operations are not inserted correctly into the query: hnsw (CAST(vector AS HALFVEC(3072))) should be hnsw (CAST(vector AS HALFVEC(3072)) halfvec_cosine_ops)

2. Manually altering the DLL results in a mismatch next time I generate a migration

If I manually correct the query to be:

CREATE INDEX ix_chunks_vector 
ON chunks
USING hnsw (CAST(vector AS HALFVEC(3072)) halfvec_cosine_ops) 
WITH (m = 16, ef_construction = 64)

The next time I run alembic autogenerate it will try to remove this index, and create one with the wrong DDL again:

    op.drop_index(
        "ix_chunks_vector",
        table_name="chunks",
        postgresql_with={"m": "16", "ef_construction": "64"},
        postgresql_using="hnsw",
    )
    op.create_index(
        "ix_chunks_vector",
        "chunks",
        [sa.text("CAST(vector AS HALFVEC(3072))")],
        unique=False,
        postgresql_using="hnsw",
        postgresql_with={"m": 16, "ef_construction": 64},
        postgresql_ops={"vector": "halfvec_cosine_ops"},
    )

I'm struggling to find a way to define the index appropriately to avoid this.

Thanks.


For context, here is my ORM table definition:

class Chunk(Base):
    __tablename__ = "chunks"

    # I've tried setting index to True and False, no change
    vector: Mapped[list[float]] = mapped_column(
        Vector(settings.EMBEDDING_DIMS), unique=False, nullable=False
    )

Index(
    "ix_chunks_vector",
    func.cast(Chunk.vector, HALFVEC(settings.EMBEDDING_DIMS)).label("vector"),
    postgresql_using="hnsw",
    postgresql_with={"m": 16, "ef_construction": 64},
    postgresql_ops={"vector": "halfvec_cosine_ops"},
)

I am using:

[[package]]
name = "pgvector"
version = "0.3.6"

[[package]]
name = "alembic"
version = "1.14.1"

[[package]]
name = "sqlalchemy"
version = "2.0.36"
@jleguina
Copy link
Author

jleguina commented Feb 7, 2025

Seems like an Alembic bug. Linking here in case someone else faces this: sqlalchemy/alembic#1603

@jleguina jleguina closed this as completed Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant