-
-
Notifications
You must be signed in to change notification settings - Fork 729
Large websocket messages cause error that is silently caught somewhere #3410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Xtreemrus, Thanks for reporting this issue! It looks like Socket.IO (internally using Engine.IO) has a default maximum frame size of 1,000,000 bytes. When we change it in Line 49 in ddb95e1
to something like core.sio = sio = socketio.AsyncServer(async_mode='asgi', cors_allowed_origins='*', json=json,
max_http_buffer_size=10_000_000) your example works. So we need to think about whether to somehow detect an overflow and warn the user about it, or to provide a configuration option. What do you think? |
Hi @falkoschindler, Thank you for your reply. I'm glad that the core issue has been identified. While I'm not a professional developer, it seems the problem consists of two parts:
Possible solutions: The solution for point 1 is less obvious, at least from my perspective. Since nothing is propagated to the backend during overflow, we could potentially add a JS script to each ui.input field where overflow might occur. This script would check text.value.length, and if it exceeds max_http_buffer_size (which could be passed as a constant to JS at site startup), it would trigger an exception on the Python side. Simply setting max_http_buffer_size to a very large number and hoping users won't exceed it seems less ideal, as we wouldn't have visibility into how often this event occurs. In an ideal scenario, it would be great if all input fields where possible had an option to specify the maximum number of characters allowed. If a developer sets an input character count higher than max_http_buffer_size, they would receive a warning during ui.run. This wouldn't be an error, but a caution that it could occur. Additionally, giving developers the ability to call a custom function when buffer overflow happens could help them implement user-friendly messages like "Warning: This field is limited to 1 million characters." These approaches would provide more robust error handling and improve the developer and end-user experience. |
According to the engineio sources, async def _websocket_handler(self, ws):
"""Engine.IO handler for websocket transport."""
async def websocket_wait():
data = await ws.wait()
if data and len(data) > self.server.max_http_buffer_size:
raise ValueError('packet is too large')
return data This exception is silently caught elsewhere. |
How about |
I guess this would allow any client to send arbitrarily large packages to the server, which could exhaust the memory and kill the server.
That's bad. But where could this be? Here is the line emitting the socket message: Line 99 in b4fb23d
Any exception caught in |
ui.textarea
with Over a Million Characters
Interesting 🤔 Would say, though: maybe a solution would be to send the difference in string, when the textarea is updated? Then no need to send so many characters across the internet? But that would be quite difficult and require an intensive rework ... |
And it might not even solve the problem if the diff is large, e.g. loading a new text from a file, converting to uppercase, replacing certain characters, translating into another language, minifying code, ... Sure, each example can be considered an edge case. But it's hard to draw a line. Therefore, I think, the main goal should be to at least raise an exception if a message cannot be sent due to size limitations. |
Proposal: Every time we do Thinking about the socket.io max message length: I think it was set in good faith so that one message will not monopolize the entire communication bandwidth, leading to other clients cannot send message. Maybe we can break up one message into several ones, if it is getting to be too long? |
Yes, warning about reaching the size limit or splitting messages are possible solutions. Nevertheless, I'm still assessing the situation: Why is the following code silently dropping change messages when editing the value? ui.textarea(value='x' * 1_000_000, on_change=lambda: ui.notify('changed')) Shouldn't it show any kind of error? Even registering an error handler like this doesn't help: window.socket.on("error", (err) => console.log(err)); |
Here I will put my solutions I brainstormed:
|
Simply put: Now we know. The exception is in the bit bucket. It will never be logged to the console as such. And yeah, that's indeed bad. Real bad. |
Here goes nothing: miguelgrinberg/python-engineio#397 |
For NiceGUI On Air we had a similar problem and implemented data chunking: #4555 |
For reference, here is a demo showing
textarea = ui.textarea(value='x' * 1_000_000, on_change=lambda: ui.notify('changed'))
ui.label().bind_text_from(textarea, 'value', lambda x: f'{len(x)} characters')
number = ui.number(value=0)
ui.timer(1.0, lambda: number.set_value(number.value + 1)) |
After reading miguelgrinberg/python-engineio#397 (comment), it seems to come down to the question whether the server is responsible for informing the client about errors (like I would expect from an HTTP server), or if the client needs to comply with certain limits (as it seems to be the case for Socket.IO). Regarding the issue of lost messages after editing a long text, I don't think handling the disconnect is a good solution. There should be a way notice the problem beforehand, either with help from socket.io or not. |
Author of python-engineio seems to be disinterested to be responsible for informing neither the client nor the server about too-long-message errors. Personally I would not go with that design, but anyways... Seems like perhaps keeping track of message length on the client side, as well as segmenting, is the way to go so far. |
Yeah, so it's our responsibility, engineio is not taking it up. Let's get to implementing segmenting and prevent the client from ever sending a too-long message in the first place, then. |
I'm not so sure we're there yet. Engineio might not be responsible, but maybe the Socket.IO client? It is responsible for serializing and submitting messages. Therefore it should be able to judge the total message size and report an error if it exceeds a limit. Anyway, would we really want to implement segmenting? Do we really want to allow sending MBs of data back an forth? I still think the primary goal is to inform the user about the message not being transmitted. And it would be great to keep the connection alive. |
Well, regardless of doing segmenting or show a warning when too much data, the current response from engineio author (he also make the socketio) seems that he means that "responsibility is on us". At the end of day we have to add something to our codebase, not him adding something to his. |
He is the author of https://github.com/miguelgrinberg/python-engineio and https://github.com/miguelgrinberg/python-socketio. But I'm talking about https://github.com/socketio/socket.io, the JavaScript client. |
In my opinion we should make it possible to submit large chunks of text (like it what the this issue describes in the first place), send image data via socketio etc without flooding the websocket connection. Purely increasing the engineio |
In my eyes, it is a mixture of techniques:
Though, #4571 is still beneficial without the handling in 0.1-1MB, since at least the websocket isn't closed and re-opened. And, with the chunked message transmission, we can afford a higher message upper limit (say 50MB) so that we can fix the core issue we are discussing, which is that "updates not going through when It won't make 50MB of text show up easily, just so we are clear. It'll probably lag like crazy on any browser, especially mobile, but at least the issue won't be at NiceGUI, then. |
And, for the chunked message transmission, preferably the loading popup isn't as obvious as the one in disconnect and #4571. Perhaps something inspired by streamlit. ![]() |
Description
Description
When
ui.textarea
contains more than a million characters, it starts to behave unpredictably without any visible warnings.Steps to Reproduce
To demonstrate the issue from different perspectives, I wrote the following code:
When you run this code, it creates a textarea filled with over a million characters, plus three more to make the behavior more evident in different scenarios.
Calculate
button immediately after launching (without interacting with theui.textarea
), everything is calculated correctly, and the console prints the appropriate values.ui.textarea
(e.g., delete the digit1
at the beginning or add a new character), theCalculate
button will no longer calculate the new value oftext_area
correctly. The code executes without errors, and you see the print output, but it shows the wrong number of characters and the first character is also incorrect, as if the connection between the frontend and backend is lost.ui.textarea
. In this case, everything works as expected, the counter decreases, and the first characters are read correctly.Additional Investigation
To see if this is a Quasar issue, I added character counting to their textarea code:
https://codepen.io/Xtreemrus/pen/XWLNPyY?editors=101
JS part:
In this case, working with a million characters is handled correctly.
Impact
A couple of years ago, cases requiring a million characters in a textarea were rare. However, with the development of LLMs, users are inserting increasingly large chunks of information into input fields. One million characters are approximately 250,000 tokens, which is now a feasible amount.
Difficulty
The challenge with this bug is that no visible errors occur, neither on the user side nor the developer side. There are no exceptions thrown, making it hard to find a workaround.
Environment
The text was updated successfully, but these errors were encountered: