Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosmos: remove automatic composition of discriminator filter in raw SQL queries #26124

Closed
JeremyLikness opened this issue Sep 21, 2021 · 16 comments · Fixed by #26292
Closed

Cosmos: remove automatic composition of discriminator filter in raw SQL queries #26124

JeremyLikness opened this issue Sep 21, 2021 · 16 comments · Fixed by #26292

Comments

@JeremyLikness
Copy link
Member

To ensure the correct type is project for materialization, FromSqlRaw wraps the query passed, so that this:

var contacts = await context.Contacts.FromSqlRaw("select * from Contacts").ToListAsync();

Really gets resolved as this:

SELECT c FROM (SELECT * From Contacts) c WHERE c.Discriminator = "Contact"

While this works fine in most scenarios, it causes issues when users project explicit columns. Although this works fine as a raw query and returns multiple records:

SELECT Id, Name, Phone, Address FROM Contacts

This will never return any records and doesn't throw an error or log a warning:

var contacts = await context.Contacts.FromSqlRaw("select Id, Name, Phone, Address from Contacts").ToListAsync();

Although this behavior is documented, it's not obvious if the documentation hasn't been read. It could lead to a lot of troubleshooting around a non-issue that just requires the user add one column to their select statement.

I suggest either logging a warning or throwing an exception when Discriminator isn't included so it's clear to the user it won't behave as expected and what to do for mitigation.

EF Core version: EF Core 6.0
Database provider: All
Target framework: .NET 6.0
Operating system: BeOS
IDE: Visual Studio 2022

@ajcvickers
Copy link
Contributor

@AndriySvyryd 6.0?

@AndriySvyryd
Copy link
Member

Yes, otherwise it would be a "breaking" change

@ajcvickers
Copy link
Contributor

@roji Nominating you to do this for 6.0. :-)

@ajcvickers ajcvickers added this to the 6.0.0 milestone Sep 24, 2021
@ajcvickers
Copy link
Contributor

@roji Ping.

@roji
Copy link
Member

roji commented Oct 4, 2021

On my plate for today, promise.

@roji
Copy link
Member

roji commented Oct 4, 2021

I've taken a look at this, and I'm not sure there's a good way of doing it. Here's the SQL we send to Cosmos:

SELECT c
FROM (
    SELECT c["id"], c["Region"], c["PostalCode"], c["Phone"], c["Fax"], c["CustomerID"], c["Country"], c["ContactTitle"], c["ContactName"], c["CompanyName"], c["City"], c["Address"] FROM root c
) c
WHERE (c["Discriminator"] = "Customer")

We simply compose the discriminator filter over the user's raw SQL; if that SQL happens to not return the discriminator, our filter simply returns no rows. Our shaper does make sure that the discriminator is correct, but because of the discriminator filter there's nothing here to shape (no results are returned).

One theoretical solution is for the filter to let through documents missing the discriminator (WHERE c["Discriminator"] = xxx OR NOT IS_DEFINED(c["Discriminator"])); this would make the results reach the shaper, which would promptly throw. However, this modifies pretty much all Cosmos queries and may have an impact on perf, only to detect and throw on an unsupported scenario (plus it pulls back all documents from the container). So I'm not too keen on doing this.

If anyone has a bright idea here or I've missed something, let me know.

@ajcvickers
Copy link
Contributor

@AndriySvyryd @smitpatel Thoughts on this?

@AndriySvyryd
Copy link
Member

We should verify whether the proposed change has any measurable impact on perf (and the cost charge), the worst case scenario would be when all items have a discriminator, but with a different value.

An alternative would be to not do any filtering and make the user do it, we would throw if we get a row with no discriminator or an unexpected one.

@roji
Copy link
Member

roji commented Oct 4, 2021

An alternative would be to not do any filtering and make the user do it, we would throw if we get a row with no discriminator or an unexpected one.

Does it really make sense to burden users with adding the discriminator filter for all their raw queries, just because they may forget to project the discriminator, an unsupported/negative scenario which should be relatively easily discoverable?

I feel a bit similarly about adding the NOT IS_DEFINED... I can look into the perf for that, but this feels a bit risky/too much to me just to flag this scenario...

@smitpatel
Copy link
Contributor

An alternative would be to not do any filtering and make the user do it, we would throw if we get a row with no discriminator or an unexpected one.

I would be in favor of this given the first proposal complicates the SQL.

Historically relational added discriminator filters to avoid rows with unknown discriminator value in TPH scenario. Additional filtering also turned out to be problematic there at times so we had to introduce complete discriminator mapping.
For Cosmos, we have discriminator because of sharing the collection. If you are using FromSql for non hierarchy

  • Is our behavior changing if the collection is not being shared by any other entity type?
  • When materializing value, does our check of discriminator value is really accurate? Should we just verify that discriminator value is expected one rather than trying to materialize value from server. e.g. if user gave us raw sql to product customers, it doesn't have to have discriminator value column projected out because we know what it is. This also aligns with relational behavior upto an extent that discriminator is not need to be projected without hierarchy.

@AndriySvyryd
Copy link
Member

an unsupported/negative scenario which should be relatively easily discoverable?

If @JeremyLikness had trouble figuring it out it doesn't bode well for the average user.

I feel a bit similarly about adding the NOT IS_DEFINED

When using raw SQL users should be very conscious of discriminators. I think that letting the user deal with it provides a learning opportunity and it also empowers the user to do it in a more efficient way.

@roji
Copy link
Member

roji commented Oct 4, 2021

OK, let's maybe give this a quick discussion in triage tomorrow? For the record I do think it's better to omit the filtering (requiring users to do it) than to add NOT IS_DEFINED, although I also think the current situation isn't that bad...

@smitpatel
Copy link
Contributor

My only issue with current situation is incorrect results without exception. At least if we drop the ball on user, either it works or they will get error about other entity records.

@roji
Copy link
Member

roji commented Oct 4, 2021

It's true we have incorrect results without exception, but the query will never ever return a single result the way it's written - this is very different from some of the more subtle "incorrect results" errors we've seen... It just feels like we shouldn't penalize users or do risky things perf-wise for it.

@smitpatel
Copy link
Contributor

Depending on app logic, incorrect result not returning any row can also cause data corruption which is same as any other incorrect result errors. Hence our priority always has been incorrect result > exception > slow query.

@roji
Copy link
Member

roji commented Oct 6, 2021

Design decision: we will remove the discriminator filter for raw SQL queries - it is the user's responsibility to write the correct SQL.

@roji roji changed the title FromRawSql should throw when Discriminator is not included in the result set Cosmos: remove automatic composition of discriminator filter in raw SQL queries Oct 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants