-
-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
select ... not in ...
returns wrong results for nulls
#5954
Comments
Context for people like me who were confused by this: In MySQL, The trouble is that Reasoning behind this: If you think of NULL as representing a missing or unknown value, then it's impossible to demonstrate that A is not in a set if the set contains NULL. |
To elaborate on this, it seems like the two possible fixes would be either:
The trouble with solution 2 is that the other common way to write an anti join in mySql is Ironically, it looks like this other minimum query actually gets built as a LeftJoin, not an AntiJoin, so either it's being optimized by dolthub/go-mysql-server#1773 back into a LeftJoin, or it was never converted into an AntiJoin in the first place:
(Update: I didn't have dolthub/go-mysql-server#1773 pulled locally yet, so it seems like we're not converting this example to an AntiJoin.) So if we wanted to convert both into AntiJoins we'd need to keep some state that tells the builder how to handle NULLs. |
Some additional regression tests. It's important to note that we have the incorrect behavior when either side of the joins contain NULL: Test 2Setup:
Dolt:
MySQL:
Test 3Setup:
Dolt:
MySQL:
Plan:
|
It's simple enough to change the Unfortunately, this breaks a ton of our regression tests around AntiJoin. I'd want to look at performance metrics before deciding whether this is an acceptable workaround. Tim linked me to our performance benchmarking at https://docs.dolthub.com/sql-reference/benchmarks/latency. I'm going to figure out exactly how much our performance hurts if we're only able to generate AntiJoins for nonnull columns. |
dolthub/go-mysql-server#1791 fixes this in the vast majority of cases. There's still one corner case where we don't have MySQL compatibility:
Then MySQL evaluates the expression as NULL and we don't. I decided to let this sit in the back of my mind for a little bit to see if there was an elegant solution, but at a certain point I just had to acknowledge that this is still closer to MySQL than we were before, document how it still differs, and ship it. |
At a glance it is hard to tell if the exception case listed above is still relevant. We have fixed a lot of issues related to this, but all in response to specific queries with regressions. I might close this for now, if we have a specific case for repro we can always make a new issue. |
Setup:
Dolt:
MySQL:
Plan:
It seems like we should either avoid using the optimization in this case or fix the join conditions.
The text was updated successfully, but these errors were encountered: