Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Load from _source if a field was ignore_above-ed #124678

Open
nik9000 opened this issue Mar 12, 2025 · 1 comment
Open

ESQL: Load from _source if a field was ignore_above-ed #124678

nik9000 opened this issue Mar 12, 2025 · 1 comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nik9000
Copy link
Member

nik9000 commented Mar 12, 2025

Description

Elasticsearch's ignore_above feature skips indexing a keyword field if it is longer than some number of characters. That's useful to preserve lucene's mind, but we lose the ability to search that field. It'd be good if we could load the field form _source in that case.

Example:

curl -uelastic:password -XDELETE localhost:9200/long_field
curl -uelastic:password -XPUT -HContent-Type:application/json localhost:9200/long_field?pretty -d'{
  "mappings": {
    "properties": {
      "url": {
        "properties": {
          "query": {
            "type": "keyword",
            "ignore_above": 10
          }
        }
      }
    }
  },
  "settings": {}
}'

curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/long_field/_doc?refresh -d'{
  "url": {
    "query": "short"
  }
}'
curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/long_field/_doc?refresh -d'{
  "url": {
    "query": "words words words"
  }
}'
curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/_query?pretty -d'{
  "query": "FROM long_field"
}'

Makes:

["short"],
[null]

But after this change it'd make:

["short"],
["words words words"]

These ignored fields are indexed and have a special doc value:

                context.doc().add(new SortedSetDocValuesField(NAME, new BytesRef(ignoredField)));
                context.doc().add(new StringField(NAME, ignoredField, Field.Store.NO));

This is generally cheap to maintain because there aren't that many unique field values and it powers the missing query. We could use the doc values to load these fields from _source. Pushed down queries are much harder - they'd need to double check the condition. I think. Something like that.

@nik9000 nik9000 added >enhancement needs:triage Requires assignment of a team area label :Analytics/ES|QL AKA ESQL labels Mar 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

2 participants