-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][POC] Use pipelining mode #2706
Conversation
This is really rad! If you can write a failing test for the race condition and put it on this branch I might be able to help with looking into it. |
I succeed in writing an integration test to reproduce the bug in our own repo, I will try to do the same here (but I don't understand why the CI is failing right now here) : it.only("5. Prepared statement same name, loop, with pipelining", (done) => {
pool.connect((err, client, _done) => {
expect(err).to.not.exist
client.pipelining = true
const sql = `SELECT 1`
client.query({ name: 'Foo', text: sql}, (err) => {
expect(err).to.not.exist
})
client.query({ name: 'Foo', text: sql}, (err) => {
expect(err).to.not.exist
return done()
})
})
})
AssertionError: expected error: prepared statement "Foo" already exists { length: 107, severity: 'ERROR', code: '42P05', detail: undefined, hint: undefined, position: undefined, internalPosition: undefined, internalQuery: undefined, where: undefined, schema: undefined, table: undefined, dataType: undefined, constraint: undefined, file: 'prepare.c', routine: 'StorePreparedStatement' } to not exist Important : this bug seems to exhibit even if I set I confirm the cause is indeed a 2nd named statement sent with the same name, before the 1st statement has been received in _handleParseComplete, see logs :
Thus the only solution I found (see last commit a33084f) to avoid the bug is to track the prepared statements submitted to server when queries are submitted here : node-postgres/packages/pg/lib/client.js Line 501 in a33084f
instead to track it in node-postgres/packages/pg/lib/client.js Line 374 in a33084f
and I check if prepared statement has been already processed here : node-postgres/packages/pg/lib/query.js Line 163 in a33084f
I could have used this.connection.preparedStatements to do it, but I prefered to use a distinct property to avoid potential side effects. Tell me if you think it's the right way to do it. On other aspect to review is if the 1st query fails (because of a syntax error, constraint error, etc), is the 2nd query to prepare the statement ? I don't remember what is the current behaviour of the driver in that case. |
Work in progress to adress #2646
Base on the previous work of @jusou here #662, thx to him !
Principle
In current behaviour of node-postgres, a query is not sent to the database until the previous query has completed and the associated result received.
With this PR, pipelining mode can be enabled, it allows the client to send all of the queries to the server up front, minimizing time spent by one side waiting for the other to finish sending data.
Important : pipelining has nothing to do with query execution concurrency/parallelism. With or without pipelining mode, the PostgreSQL server is executing the queries sequentially (while using parallelism capabilities if enabled), pipelining just allows both sides of the connection to work concurrently when possible, and to minimize round-trip time.
The higher the latency is between the driver and the posgres serveur and the number of requests sent, the higher the performance gain will be.
API changes
client.pipelining
can be set to true to enable pipelining mode (default false)Implementation
While it's not finished, we are currently testing it with our product, with massive usage of queries (50 tasks in parallel sending each thousands of queries) and it seems to work (except of prepared statements).
I'm of course open to suggestions/review but I'm very busy nowadays thus don't be frustrated if I don't answer to remarks in a decent delay.
More details concerning the issue with the prepared statements broken in the PR :
If we send immediatly 2 queries with the same named Foo, we got the error
Prepared statement Foo already exists
Looks like the driver try to prepare the statement twice, probably because the event
parseComplete
has not been received yet for the 1st query, when the 2nd query is sent.I'm a bit confused about how to find the proper place to fix it.
First I thought that it would be simple to debounce the parsing in
node-postgres/packages/pg/lib/query.js
Line 36 in 21ccd4f
by checking if connection.preparedStatements[this.name] exists
but then I realized this check is already performed here :
node-postgres/packages/pg/lib/query.js
Line 192 in 21ccd4f
so I don't truly understand why this error occurs 🤔
If someone can help/give some hints, it would be useful !
Perf gain in our use case (⚠️ YMMV)
In our real application (payroll processing) :
165 tasks, concurrency between 1 and 40 :
distributed on 4 clients
each task is sending thousands of DML queries, mostly SELECT
gain : 1 second (no gain expected because of no concurrency)
concurrency 1 w/o pipeline : 2m28s
concurrency 1 with pipeline : 2m27s
gain : 18 seconds
concurrency 5 w/o pipeline : 2m20s
concurrency 5 with pipeline : 2m02s
gain : 25 seconds (test n°2 : 26 seconds)
concurrency 20 w/o pipeline : 2m22s (test 2 : 2m24s)
concurrency 20 with pipeline : 1m57s (test 2 : 1m58s)
gain : 27 seconds (test n°2 : 28 seconds)
concurrency 40 w/o pipeline : 2m20s (test 2 2m27s)
concurrency 40 with pipeline : 1m53s (test 2 1m59s
Conclusion :
Very interesting gain of 20% with concurrency 40 (our default in production)
It's also important to note that we have disabled prepared statements with pipelining mode due to the bug, so
the gain is perhaps even greater (by <5% more I would say).