Support `insert_all` and `upsert_all` using `MERGE` #1312

andyundso · 2025-03-16T19:31:10Z

This PR adds support for insert_all and upsert_all in the activerecord-sqlserver-adapter. As initially propsed on #859 / #869, it uses MERGE to achieve so.

The benefit of ON DUPLICATE clauses in other database systems is that you can pass it every data that you have in your Rails app without the need for de-duplication. With MERGE, it technically executes a join between the data we want to insert and the data in the target table. therefore, the values in the columns of our source data, which will be joined, need to be unique. Unique means unique across all primary keys and indexes with uniqueness. The following Rails test show-cases this well:

 def test_insert_all_with_skip_duplicates_and_autonumber_id_not_given
    skip unless supports_insert_on_duplicate_skip?

    assert_difference "Book.count", 1 do
      # These two books are duplicates according to an index on %i[author_id name]
      # but their IDs are not specified so they will be assigned different IDs
      # by autonumber. We will get an exception from MySQL if we attempt to skip
      # one of these records by assigning its ID.
      Book.insert_all [
        { author_id: 8, name: "Refactoring" },
        { author_id: 8, name: "Refactoring" }
      ]
    end
  end

From the adapter, we can ask Rails about indexes with uniqueness and all the primary keys for a given table. With that information, I run a PARTITION BY for each affected column (in the example above, one is the combination of author_id and name, the second is just the id column as primary key) over the source data, retrieve a ROW NUMBER and only keep the records where their ROW NUMBER is 1.

The rest of the code is then mostly as in other database adapters. More details are also provided in the code comments.

Performance

The big advantage of insert_all and upsert_all is performance. I did not run extensive benchmarks for now, but just tried a basic case, which is to insert 50'000 simple post records.

If I insert them all one by one (gist), this takes about 144 seconds on my machine. If use insert_all, this takes 0.48 seconds. (gist). If insert_all has a lot of collisions, like in this gist, the time goes up to 5 seconds. I also tested this collision script with MySQL, where the time to insert remains constant at 0.5 seconds.

I need to test some more.

Michoels · 2025-03-17T12:52:05Z

Extremely exciting!
Kudos on this work, I can't wait to try it!

andyundso · 2025-03-19T16:35:36Z

Performance for upsert_all looks good as well.

Upserting 50k records finishes in 2.07 seconds on my machine (gist). MySQL does does in 0.75 seconds (gist). We likely loose some time because MSSQL can only process 1000 rows at the same time. this separation I do in Ruby, which likely is slow.

so I think this looks good overall and is ready for a review by @aidanharan.

aidanharan · 2025-03-20T13:42:59Z

@andyundso Brilliant work! Will port the changes into the 8-0-stable branch and get a v8.0 release out containing it. Will help with getting some early feedback.

andyundso · 2025-03-20T22:21:54Z

@aidanharan I realised while driving home today that there is likely a bug with the order of our values when doing an upsert. I opened a PR against Rails to add tests for these scenarios, but it gives me conflicting results. Let's see if one of the maintainers have time to investigate it.

aidanharan · 2025-03-21T13:02:42Z

The Rails PR mentioned in the comment above for future reference rails/rails#54790

andyundso added 5 commits March 16, 2025 17:06

Support supports_insert_on_duplicate_skip

1a78485

Support supports_insert_on_duplicate_update?

c0be62c

extend "query_requires_identity_insert?" for merge queries with inserts

3f88a74

refactor code into more smaller methods

dd20bce

Describe insert_all / upsert_all support

23020a9

andyundso added 2 commits March 19, 2025 17:25

Replace GREATEST with CASE

d68c566

Add Changelog entry

e092ebb

andyundso marked this pull request as ready for review March 19, 2025 16:35

andyundso requested a review from aidanharan March 19, 2025 16:35

aidanharan mentioned this pull request Mar 20, 2025

Small refactor of Support insert_all and upsert_all using MERGE #1314

Merged

Small refactor of Support insert_all and upsert_all using MERGE (#1314)

db8199a

aidanharan added feature rails-8.1 labels Mar 20, 2025

aidanharan merged commit aa0d8ef into main Mar 20, 2025
6 checks passed

aidanharan deleted the support-insert-and-upsert-main branch March 20, 2025 13:40

This was referenced Mar 20, 2025

Check if insert_all/upsert_all can be implemented using MERGE #859

Closed

Tracking rails/solid_cache compatibility #1122

Open

Support insert_all and upsert_all using MERGE #1315

Merged

mgrunberg mentioned this pull request Mar 20, 2025

insert_all/upsert_implementation using MERGE #869

Closed

andyundso mentioned this pull request Mar 20, 2025

Test if insert_all / upsert_all apply correct values when given duplicate identifiers in same call rails/rails#54790

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `insert_all` and `upsert_all` using `MERGE` #1312

Support `insert_all` and `upsert_all` using `MERGE` #1312

andyundso commented Mar 16, 2025

Michoels commented Mar 17, 2025

andyundso commented Mar 19, 2025

aidanharan commented Mar 20, 2025

andyundso commented Mar 20, 2025

aidanharan commented Mar 21, 2025

Support insert_all and upsert_all using MERGE #1312

Support insert_all and upsert_all using MERGE #1312

Conversation

andyundso commented Mar 16, 2025

Michoels commented Mar 17, 2025

andyundso commented Mar 19, 2025

aidanharan commented Mar 20, 2025

andyundso commented Mar 20, 2025

aidanharan commented Mar 21, 2025

Support `insert_all` and `upsert_all` using `MERGE` #1312

Support `insert_all` and `upsert_all` using `MERGE` #1312