Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support insert_all and upsert_all using MERGE #1312

Merged
merged 8 commits into from
Mar 20, 2025

Conversation

andyundso
Copy link
Member

This PR adds support for insert_all and upsert_all in the activerecord-sqlserver-adapter. As initially propsed on #859 / #869, it uses MERGE to achieve so.

The benefit of ON DUPLICATE clauses in other database systems is that you can pass it every data that you have in your Rails app without the need for de-duplication. With MERGE, it technically executes a join between the data we want to insert and the data in the target table. therefore, the values in the columns of our source data, which will be joined, need to be unique. Unique means unique across all primary keys and indexes with uniqueness. The following Rails test show-cases this well:

 def test_insert_all_with_skip_duplicates_and_autonumber_id_not_given
    skip unless supports_insert_on_duplicate_skip?

    assert_difference "Book.count", 1 do
      # These two books are duplicates according to an index on %i[author_id name]
      # but their IDs are not specified so they will be assigned different IDs
      # by autonumber. We will get an exception from MySQL if we attempt to skip
      # one of these records by assigning its ID.
      Book.insert_all [
        { author_id: 8, name: "Refactoring" },
        { author_id: 8, name: "Refactoring" }
      ]
    end
  end

From the adapter, we can ask Rails about indexes with uniqueness and all the primary keys for a given table. With that information, I run a PARTITION BY for each affected column (in the example above, one is the combination of author_id and name, the second is just the id column as primary key) over the source data, retrieve a ROW NUMBER and only keep the records where their ROW NUMBER is 1.

The rest of the code is then mostly as in other database adapters. More details are also provided in the code comments.

Performance

The big advantage of insert_all and upsert_all is performance. I did not run extensive benchmarks for now, but just tried a basic case, which is to insert 50'000 simple post records.

If I insert them all one by one (gist), this takes about 144 seconds on my machine. If use insert_all, this takes 0.48 seconds. (gist). If insert_all has a lot of collisions, like in this gist, the time goes up to 5 seconds. I also tested this collision script with MySQL, where the time to insert remains constant at 0.5 seconds.

I need to test some more.

@Michoels
Copy link

Extremely exciting!
Kudos on this work, I can't wait to try it!

@andyundso
Copy link
Member Author

Performance for upsert_all looks good as well.

Upserting 50k records finishes in 2.07 seconds on my machine (gist). MySQL does does in 0.75 seconds (gist). We likely loose some time because MSSQL can only process 1000 rows at the same time. this separation I do in Ruby, which likely is slow.

so I think this looks good overall and is ready for a review by @aidanharan.

@andyundso andyundso marked this pull request as ready for review March 19, 2025 16:35
@andyundso andyundso requested a review from aidanharan March 19, 2025 16:35
@aidanharan aidanharan merged commit aa0d8ef into main Mar 20, 2025
6 checks passed
@aidanharan aidanharan deleted the support-insert-and-upsert-main branch March 20, 2025 13:40
@aidanharan
Copy link
Contributor

@andyundso Brilliant work! Will port the changes into the 8-0-stable branch and get a v8.0 release out containing it. Will help with getting some early feedback.

@andyundso
Copy link
Member Author

@aidanharan I realised while driving home today that there is likely a bug with the order of our values when doing an upsert. I opened a PR against Rails to add tests for these scenarios, but it gives me conflicting results. Let's see if one of the maintainers have time to investigate it.

@aidanharan
Copy link
Contributor

The Rails PR mentioned in the comment above for future reference rails/rails#54790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants