Skip to content

Translate parameterized collections to inline collections of parameters, with optional bucketization #34346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
roji opened this issue Aug 3, 2024 · 10 comments

Comments

@roji
Copy link
Member

roji commented Aug 3, 2024

When a collection parameter is referenced in a LINQ query, we currently allow it to be either parameterized in SQL (e.g. converted to a JSON array and processed with OPENJSON), or constantized (constant values integrated as-is in the SQL). Each method has its advantages and disadvantages (constantized queries allow for better query plans, parameterized queries have a single SQL and are better for plan/query caching).

A 3rd mode would be to integrate the values as SQL parameters, into an inline SQL collection (e.g. IN (@p1, @p2, @p3); this would allow the database the ability to take the collection's cardinality when planning (like the constantization option). As a further optimization on top, to partially mitigate the different-SQLs problem for collections of varying sizes, we'd have different sizing steps (or "buckets", e.g. 1, 5, 10, 20...); collections with e.g. 3 items would have their last item repeated to arrive at the next bucket size (5).

-- Constantization:
SELECT * FROM foo WHERE x IN (1, 2, 3);

-- Full parameterization:
SELECT * FROM foo WHERE x IN (SELECT v FROM OPENJSON(@values))

-- Inline collection with parameters:
SELECT * FROM foo WHERE x IN (@p1, @p2, @p3);

-- Inline collection with parameters, with bucketization:
SELECT * FROM foo WHERE x IN (@p1, @p2, @p3, @p3, @p3);

Notes:

  • We support parameterized collections not just in Contains, but in arbitrary LINQ querying. Since bucketization duplicates the last value, we can only do this in very query scenarios - mainly Contains; for example, we can't do this when Count() is composed on top of a parameterized collection.
  • We may consider making this mode the default, based on further perf investigations (#34347).
  • We have a report of inline collections of parameters being sometimes worse for query planning than constants (comment). We don't fully understand this yet, but this would obviously be important to understand, especially if we consider making this the default.
  • Since 8.0, users can already produce inline collections of parameters in SQL: Where(b => new[] { p1, p2, p3 }; but this works only if the size is fixed in the .NET collection. This issue is about having EF produce the same SQL but for when the entire .NET collection is parameterized in the LINQ query.
@PaulVrugt
Copy link

This seems like a very good idea

@IanKemp
Copy link

IanKemp commented Aug 5, 2024

How are you planning to implement this "bucketization"? Would the "bucket size" (e.g. 5 in the example) be specified by the end-user on a per-method-call basis, or as a configuration option in the DbContext, or is this something that EF Core would internally handle, or...? Obviously it's very early on but just trying to get a feel for what the team is thinking here WRT API shape and user control. - keeping in mind how the lack of user control around openjson has bit all involved.

@ErikEJ
Copy link
Contributor

ErikEJ commented Aug 5, 2024

I have shared this implementation for years, based on a proposal from @divega https://gist.github.com/ErikEJ/6ab62e8b9c226ecacf02a5e5713ff7bd

@ErikEJ
Copy link
Contributor

ErikEJ commented Aug 5, 2024

I also blogged about this years ago here: https://erikej.github.io/efcore/sqlserver/2020/03/30/ef-core-cache-pollution.html

@ErikEJ
Copy link
Contributor

ErikEJ commented Aug 5, 2024

This might also be useful (for SQL Server) https://github.com/dotnet/SqlClient/blob/main/release-notes/4.0/4.0.0.md#enable-optimized-parameter-binding

@roji
Copy link
Member Author

roji commented Aug 5, 2024

@IanKemp we'd probably provide both a context option to configure the default bucket size, and also a per-collection configuration as well - similar to how we're implementing context-wide and per-collection configurability for constantization/parameterization in 9 (see #32394 (comment)).

Note that this technique needs to be compared with just going back to full constantization, but with SQL Server RECOMPILE (see #34347). Inline collection with parameters does allow caching query plans where the collection cardinality is the same, but may also have some performance drawbacks and still causes some plan bloat (for different cardinalities).

@dazinator

This comment has been minimized.

@roji

This comment has been minimized.

@roji
Copy link
Member Author

roji commented Apr 10, 2025

Good point by @cincuranet: this technique can hit the SQL Server 2100 parameter limit.

@ErikEJ
Copy link
Contributor

ErikEJ commented Apr 10, 2025

@roji Yes, in the case of 2084 parameters, the user will have to bucket the values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants