|
| 1 | += Couchbase |
| 2 | + |
| 3 | +This section will walk you through setting up the `CouchbaseSearchVectorStore` to store document embeddings and perform similarity searches using Couchbase. |
| 4 | + |
| 5 | +link:https://docs.couchbase.com/server/current/vector-search/vector-search.html[Couchbase] is a distributed, JSON document database, with all the desired capabilities of a relational DBMS. Among other features, it allows users to query information using vector-based storage and retrieval. |
| 6 | + |
| 7 | +== Prerequisites |
| 8 | + |
| 9 | + |
| 10 | +A running Couchbase instance. The following options are available: |
| 11 | +Couchbase |
| 12 | +* link:https://hub.docker.com/_/couchbase/[Docker] |
| 13 | +* link:https://cloud.couchbase.com/[Capella - Couchbase as a Service] |
| 14 | +* link:https://www.couchbase.com/downloads/?family=couchbase-server[Install Couchbase locally] |
| 15 | +* link:https://www.couchbase.com/downloads/?family=open-source-kubernetes[Couchbase Kubernetes Operator] |
| 16 | + |
| 17 | +== Auto-configuration |
| 18 | + |
| 19 | +Spring AI provides Spring Boot auto-configuration for the Couchbase Vector Store. |
| 20 | +To enable it, add the following dependency to your project's Maven `pom.xml` file: |
| 21 | + |
| 22 | +[source,xml] |
| 23 | +---- |
| 24 | +<dependency> |
| 25 | + <groupId>org.springframework.ai</groupId> |
| 26 | + <artifactId>spring-ai-couchbase-store-spring-boot-starter</artifactId> |
| 27 | +</dependency> |
| 28 | +---- |
| 29 | + |
| 30 | +or to your Gradle `build.gradle` build file. |
| 31 | + |
| 32 | +[source,groovy] |
| 33 | +---- |
| 34 | +dependencies { |
| 35 | + implementation 'org.springframework.ai:spring-ai-couchbase-store-spring-boot-starter' |
| 36 | +} |
| 37 | +---- |
| 38 | +NOTE: Couchbase Vector search is only available in starting version 7.6 and Java SDK version 3.6.0" |
| 39 | + |
| 40 | + |
| 41 | +TIP: Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build file. |
| 42 | + |
| 43 | +TIP: Refer to the xref:getting-started.adoc#repositories[Repositories] section to add Milestone and/or Snapshot Repositories to your build file. |
| 44 | + |
| 45 | +The vector store implementation can initialize the configured bucket, scope, collection and search index for you, with default options, but you must opt-in by specifying the `initializeSchema` boolean in the appropriate constructor. |
| 46 | + |
| 47 | +NOTE: This is a breaking change! In earlier versions of Spring AI, this schema initialization happened by default. |
| 48 | + |
| 49 | +Please have a look at the list of <<couchbasevector-properties,configuration parameters>> for the vector store to learn about the default values and configuration options. |
| 50 | + |
| 51 | +Additionally, you will need a configured `EmbeddingModel` bean. Refer to the xref:api/embeddings.adoc#available-implementations[EmbeddingModel] section for more information. |
| 52 | + |
| 53 | + |
| 54 | +Now you can auto-wire the `CouchbaseSearchVectorStore` as a vector store in your application. |
| 55 | + |
| 56 | +[source,java] |
| 57 | +---- |
| 58 | +@Autowired VectorStore vectorStore; |
| 59 | +
|
| 60 | +// ... |
| 61 | +
|
| 62 | +List <Document> documents = List.of( |
| 63 | + new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")), |
| 64 | + new Document("The World is Big and Salvation Lurks Around the Corner"), |
| 65 | + new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2"))); |
| 66 | +
|
| 67 | +// Add the documents to Qdrant |
| 68 | +vectorStore.add(documents); |
| 69 | +
|
| 70 | +// Retrieve documents similar to a query |
| 71 | +List<Document> results = vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5)); |
| 72 | +---- |
| 73 | + |
| 74 | +[[couchbasevector-properties]] |
| 75 | +=== Configuration Properties |
| 76 | + |
| 77 | +To connect to Couchbase and use the `CouchbaseSearchVectorStore`, you need to provide access details for your instance. |
| 78 | +A simple configuration can either be provided via Spring Boot's `application.properties`, |
| 79 | + |
| 80 | +[application,properties] |
| 81 | +---- |
| 82 | +spring.ai.openai.api-key=<key> |
| 83 | +spring.couchbase.connection-string=<conn_string> |
| 84 | +spring.couchbase.username=<username> |
| 85 | +spring.couchbase.password=<password> |
| 86 | +---- |
| 87 | + |
| 88 | +environment variables, |
| 89 | + |
| 90 | +[source,bash] |
| 91 | +---- |
| 92 | +export SPRING_COUCHBASE_CONNECTION_STRINGS=<couchbase connection string like couchbase://localhost> |
| 93 | +export SPRING_COUCHBASE_USERNAME=<couchbase username> |
| 94 | +export SPRING_COUCHBASE_PASSWORD=<couchbase password> |
| 95 | +# API key if needed, e.g. OpenAI |
| 96 | +export SPRING_AI_OPENAI_API_KEY=<api-key> |
| 97 | +---- |
| 98 | + |
| 99 | +or can be a mix of those. |
| 100 | +For example, if you want to store your password as an environment variable but keep the rest in the plain `application.yml` file. |
| 101 | + |
| 102 | +NOTE: If you choose to create a shell script for ease in future work, be sure to run it prior to starting your application by "sourcing" the file, i.e. `source <your_script_name>.sh`. |
| 103 | + |
| 104 | +Spring Boot's auto-configuration feature for the Couchbase Cluster will create a bean instance that will be used by the `CouchbaseSearchVectorStore`. |
| 105 | + |
| 106 | +The Spring Boot properties starting with `spring.couchbase.*` are used to configure the Couchbase cluster instance: |
| 107 | + |
| 108 | +|=== |
| 109 | +|Property | Description | Default Value |
| 110 | + |
| 111 | +| `spring.couchbase.connection-string` | A couchbase connection string | `couchbase://localhost` |
| 112 | +| `spring.couchbase.password` | Password for authentication with Couchbase. | - |
| 113 | +| `spring.couchbase.username` | Username for authentication with Couchbase.| - |
| 114 | +| `spring.couchbase.env.io.minEndpoints` | Minimum number of sockets per node.| 1 |
| 115 | +| `spring.couchbase.env.io.maxEndpoints` | Maximum number of sockets per node.| 12 |
| 116 | +| `spring.couchbase.env.io.idleHttpConnectionTimeout` | Length of time an HTTP connection may remain idle before it is closed and removed from the pool.| 1s |
| 117 | +| `spring.couchbase.env.ssl.enabled` | Whether to enable SSL support. Enabled automatically if a "bundle" is provided unless specified otherwise.| - |
| 118 | +| `spring.couchbase.env.ssl.bundle` | SSL bundle name.| - |
| 119 | +| `spring.couchbase.env.timeouts.connect` | Bucket connect timeout.| 10s |
| 120 | +| `spring.couchbase.env.timeouts.disconnect` | Bucket disconnect timeout.| 10s |
| 121 | +| `spring.couchbase.env.timeouts.key-value` | Timeout for operations on a specific key-value.| 2500ms |
| 122 | +| `spring.couchbase.env.timeouts.key-value` | Timeout for operations on a specific key-value with a durability level.| 10s |
| 123 | +| `spring.couchbase.env.timeouts.key-value-durable` | Timeout for operations on a specific key-value with a durability level.| 10s |
| 124 | +| `spring.couchbase.env.timeouts.query` | SQL++ query operations timeout.| 75s |
| 125 | +| `spring.couchbase.env.timeouts.view` | Regular and geospatial view operations timeout.| 75s |
| 126 | +| `spring.couchbase.env.timeouts.search` | Timeout for the search service.| 75s |
| 127 | +| `spring.couchbase.env.timeouts.analytics` | Timeout for the analytics service.| 75s |
| 128 | +| `spring.couchbase.env.timeouts.management` | Timeout for the management operations.| 75s |
| 129 | +|=== |
| 130 | + |
| 131 | +Properties starting with the `spring.ai.vectorstore.couchbase.*` prefix are used to configure `CouchbaseSearchVectorStore`. |
| 132 | + |
| 133 | +|=== |
| 134 | +|Property | Description | Default Value |
| 135 | + |
| 136 | +|`spring.ai.vectorstore.couchbase.index-name` | The name of the index to store the vectors. | spring-ai-document-index |
| 137 | +|`spring.ai.vectorstore.couchbase.bucket-name` | The name of the Couchbase Bucket, parent of the scope. | default |
| 138 | +|`spring.ai.vectorstore.couchbase.scope-name` |The name of the Couchbase scope, parent of the collection. Search queries will be executed in the scope context.| _default_ |
| 139 | +|`spring.ai.vectorstore.couchbase.collection-name` | The name of the Couchbase collection to store the Documents. | _default_ |
| 140 | +|`spring.ai.vectorstore.couchbase.dimensions` | The number of dimensions in the vector. | 1536 |
| 141 | +|`spring.ai.vectorstore.couchbase.similarity` | The similarity function to use. | `dot_product` |
| 142 | +|`spring.ai.vectorstore.couchbase.optimization` | The similarity function to use. | `recall` |
| 143 | +|`spring.ai.vectorstore.couchbase.initialize-schema`| whether to initialize the required schema | `false` |
| 144 | +|=== |
| 145 | + |
| 146 | +The following similarity functions are available: |
| 147 | + |
| 148 | +* l2_norm |
| 149 | +* dot_product |
| 150 | + |
| 151 | +The following index optimizations are available: |
| 152 | + |
| 153 | +* recall |
| 154 | +* latency |
| 155 | + |
| 156 | +More details about each in the https://docs.couchbase.com/server/current/search/child-field-options-reference.html[Couchbase Documentation] on vector searches. |
| 157 | + |
| 158 | +== Metadata Filtering |
| 159 | + |
| 160 | +You can leverage the generic, portable link:https://docs.spring.io/spring-ai/reference/api/vectordbs.html#_metadata_filters[metadata filters] with the Couchbase store. |
| 161 | + |
| 162 | +For example, you can use either the text expression language: |
| 163 | + |
| 164 | +[source,java] |
| 165 | +---- |
| 166 | +vectorStore.similaritySearch( |
| 167 | + SearchRequest.defaults() |
| 168 | + .query("The World") |
| 169 | + .topK(TOP_K) |
| 170 | + .filterExpression("author in ['john', 'jill'] && article_type == 'blog'")); |
| 171 | +---- |
| 172 | + |
| 173 | +or programmatically using the `Filter.Expression` DSL: |
| 174 | + |
| 175 | +[source,java] |
| 176 | +---- |
| 177 | +FilterExpressionBuilder b = new FilterExpressionBuilder(); |
| 178 | +
|
| 179 | +vectorStore.similaritySearch(SearchRequest.defaults() |
| 180 | + .query("The World") |
| 181 | + .topK(TOP_K) |
| 182 | + .filterExpression(b.and( |
| 183 | + b.in("author","john", "jill"), |
| 184 | + b.eq("article_type", "blog")).build())); |
| 185 | +---- |
| 186 | + |
| 187 | +NOTE: These filter expressions are converted into the equivalent Couchbase SQL++ filters. |
| 188 | + |
| 189 | + |
| 190 | +== Manual Configuration |
| 191 | + |
| 192 | +Instead of using the Spring Boot auto-configuration, you can manually configure the Couchbase vector store. For this you need to add the `spring-ai-couchbase-store` to your project: |
| 193 | + |
| 194 | +[source,xml] |
| 195 | +---- |
| 196 | +<dependency> |
| 197 | + <groupId>org.springframework.ai</groupId> |
| 198 | + <artifactId>spring-ai-couchbase-store</artifactId> |
| 199 | +</dependency> |
| 200 | +---- |
| 201 | + |
| 202 | +or to your Gradle `build.gradle` build file. |
| 203 | + |
| 204 | +[source,groovy] |
| 205 | +---- |
| 206 | +dependencies { |
| 207 | + implementation 'org.springframework.ai:spring-ai-couchbase-store' |
| 208 | +} |
| 209 | +---- |
| 210 | + |
| 211 | +Create a Couchbase `Cluster` bean. |
| 212 | +Read the link:https://docs.couchbase.com/java-sdk/current/hello-world/start-using-sdk.html[Couchbase Documentation] for more in-depth information about the configuration of a custom Cluster instance. |
| 213 | + |
| 214 | +[source,java] |
| 215 | +---- |
| 216 | +@Bean |
| 217 | +public Cluster cluster() { |
| 218 | + Cluster cluster = Cluster.connect("couchbase://localhost", |
| 219 | + "username", "password"); |
| 220 | +} |
| 221 | +---- |
| 222 | + |
| 223 | +and then create the `CouchbaseSearchVectorStore` bean using the builder pattern: |
| 224 | + |
| 225 | +[source,java] |
| 226 | +---- |
| 227 | +@Bean |
| 228 | +public VectorStore couchbaseSearchVectorStore(Cluster cluster, |
| 229 | + EmbeddingModel embeddingModel, |
| 230 | + Boolean initializeSchema) { |
| 231 | + return CouchbaseSearchVectorStore |
| 232 | + .builder(cluster, embeddingModel) |
| 233 | + .bucketName("test") |
| 234 | + .scopeName("test") |
| 235 | + .collectionName("test") |
| 236 | + .initializeSchema(initializeSchema) |
| 237 | + .build(); |
| 238 | +} |
| 239 | +
|
| 240 | +// This can be any EmbeddingModel implementation. |
| 241 | +@Bean |
| 242 | +public EmbeddingModel embeddingModel() { |
| 243 | + return new OpenAiEmbeddingModel(OpenAiApi.builder().apiKey(this.openaiKey).build()); |
| 244 | +} |
| 245 | +---- |
| 246 | + |
| 247 | +== Limitations |
| 248 | + |
| 249 | +NOTE: It is mandatory to have the following Couchbase services activated: Data, Query, Index, Search. While Data and Search could be enough, Query and Index are necessary to support the complete metadata filtering mechanism. |
0 commit comments