- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strategies to prevent OOM when retrieving large field values #3405
Comments
I've tried to poke a bit with it. And while implementing things at the "connection" level for each message might help mitigate those issues. I think there is still one uncatchable error which is related to how the underlying Tell me if I'm wrong but it seems that The global limit might still help as long as a single row doesn't OOM the program execution tough. With a 100MB limit and a row like this I get the right expected error:
With the same limit and data like this I get:
I wonder if there might be a way to handle this gracefully in libpq as well. That might increase largely the scope of a PR tough. |
Unless you're explicitly using the native bindings, you wouldn't be using libpq with this module. The default is to use the pure-JS driver which does not call out to libpq. So any allocations would be handled by node and the allocation error would be from the node runtime. If you are using the native bindings, I don't think there's any option in libpq we could enable to set a max or gracefully handle it. The idea itself is doable though, in both pure-JS (probably easiery) and libpq. The PostgreSQL wire protocol ("FEBE") sends each row in it's own data message: https://www.postgresql.org/docs/current/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-DATAROW And every FEBE message in PostgreSQL has a type and message length at the start of the message. So it should be possible to have a connection level max message size. Anything bigger could be discarded. The error handling might be tricky (we'd have to create a error for the client, finish reading or close the result, then sync) but it's definitely doable. |
First, thank you for maintaining the pg library. I'm having an issue dealing with large field values that I'd like to get community guidance on.
Problem Description
When working with PostgreSQL tables that contain very large field values (text columns with hundreds of MB or more), we face a risk of out-of-memory (OOM) errors. Even when using pagination techniques like cursors, the library loads entire rows into memory, which becomes problematic with extremely large fields.
Our ideal solution would be a way to set a maximum threshold for query result size and automatically abort when exceeded, preventing memory issues.
Reproduction Case
Here's a minimal script demonstrating the issue:
Attempted Solutions
I've tried:
pg
events to keep track of the current query buffer size (couldn't find a way to do this)Questions
Thank you for the time you'll take answering my question and pointing me in the right direction.
Edit: After searching in older issues, I discovered that this issue is actually a duplicate of: #2336.
In addition to what was mentioned there, my use case involves an application where the server running the query and the queried database belong to different actors.
If there is not any new known way to handle this, and if the community is still open to reviewing contributions along those lines, I could try to develop a solution for this use case by integrating at the pool connection level, as mentioned in the comments.
The text was updated successfully, but these errors were encountered: