-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support insert_all
and upsert_all
using MERGE
#1312
Conversation
Extremely exciting! |
Performance for Upserting 50k records finishes in 2.07 seconds on my machine (gist). MySQL does does in 0.75 seconds (gist). We likely loose some time because MSSQL can only process 1000 rows at the same time. this separation I do in Ruby, which likely is slow. so I think this looks good overall and is ready for a review by @aidanharan. |
@andyundso Brilliant work! Will port the changes into the |
@aidanharan I realised while driving home today that there is likely a bug with the order of our values when doing an |
The Rails PR mentioned in the comment above for future reference rails/rails#54790 |
This PR adds support for
insert_all
andupsert_all
in theactiverecord-sqlserver-adapter
. As initially propsed on #859 / #869, it usesMERGE
to achieve so.The benefit of
ON DUPLICATE
clauses in other database systems is that you can pass it every data that you have in your Rails app without the need for de-duplication. WithMERGE
, it technically executes a join between the data we want to insert and the data in the target table. therefore, the values in the columns of our source data, which will be joined, need to be unique. Unique means unique across all primary keys and indexes with uniqueness. The following Rails test show-cases this well:From the adapter, we can ask Rails about indexes with uniqueness and all the primary keys for a given table. With that information, I run a
PARTITION BY
for each affected column (in the example above, one is the combination ofauthor_id
andname
, the second is just theid
column as primary key) over the source data, retrieve aROW NUMBER
and only keep the records where theirROW NUMBER
is 1.The rest of the code is then mostly as in other database adapters. More details are also provided in the code comments.
Performance
The big advantage of
insert_all
andupsert_all
is performance. I did not run extensive benchmarks for now, but just tried a basic case, which is to insert 50'000 simple post records.If I insert them all one by one (gist), this takes about 144 seconds on my machine. If use
insert_all
, this takes 0.48 seconds. (gist). Ifinsert_all
has a lot of collisions, like in this gist, the time goes up to 5 seconds. I also tested this collision script with MySQL, where the time to insert remains constant at 0.5 seconds.I need to test some more.