Skip to content

Project ACID (Automatic Client-ID) aka "software rugpull" #4552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

evnchn
Copy link
Collaborator

@evnchn evnchn commented Mar 31, 2025

This is a reimagination of the "software rugpull" PR #4487

TL-DR: Software rugpull is back, renamed to ACID (Automatic Client-ID). It enables zero-browser-reload (no F5) in server reboot, client disconnect for longer duration than reconnect_timeout, and content reload. Name is awesome, Code is clean, PR is clear.

Introduction to ACID (Automatic Client-ID) aka "software rugpull", for the uninitiated

The ACID approach (previously called "software rugpull") shakes up the foundation of NiceGUI that is "for a browser client, there must immediately be a matching client instance in the server, or the client must start over and challenges an almost unstated rule that, in tricky circumstances reloading is the best way to solve all issues

It is achieved by automatically generated a client instance in the server with a matching client_id as the one reported by the browser (which, either, was deleted due to timeout, never existed since the server had a reboot, or intentionally deleted for ui.navigate.soft_reload), then deceiving the handshake code below into always returning True (hence the "software rugpull" nature, ref: https://youtu.be/q5M0TwnkWUM?t=3735)

What happened to the old PR

I'll be honest, that PR was one of the worse I have done.

  • Not having a good name
  • Implementation is a bit buggy and rough
  • Was done to solve a need which few had (multi-worker without sticky session)

What led to this new PR

Despite the above, the idea proposed in the old PR was sound, and this PR takes what was in that old PR, makes it better, and enables the 3 following functions:

  • Zero-browser-reload on client reconnection due to server reboot
  • Zero-browser-reload on client reconnecting after being disconnected for a duration longer than reconnect_timeout
  • Zero-browser-reload function ui.navigate.soft_reload, which reloads the content of the page without the equivalent of pressing F5 on the browser.

And, to maximize the chance that this PR goes through, I have included:

  • An awesome name, ACID, for this PR and the operation covered within
  • Stage the changes in separate commits for easier to read
  • Honestly the best PR description I have written so far

Testing code

I know you can't wait 😉

import asyncio
from nicegui import ui, core
import uuid
import time

def show_client_id():
    client_id = ui.context.client.id
    ui.label(f'Client ID: {client_id}')

def show_page_was_loaded_at_unix_timestamp():
    page_was_loaded_at_unix_timestamp = int(time.time())
    ui.label(f'Page was loaded at unix timestamp: {page_was_loaded_at_unix_timestamp}')

ui.label('Hey! Here is the auto-index page')
show_client_id()
ui.label('The client ID should not change in the auto-index page.')
ui.link('Go to async page', '/async_page')
ui.link('Go to sync page', '/sync_page')

@ui.page('/sync_page', title='Sync page')
def sync_page():
    print('sync_page')
    ui.label('This is a sync page')
    show_client_id()
    show_page_was_loaded_at_unix_timestamp()
    ui.label('The client ID should change on each page load.')
    ui.link('Go back to auto-index page', '/')
    ui.button('Soft reload (built-in code)', on_click=ui.navigate.soft_reload)

@ui.page('/async_page', title='Async page')
async def async_page(arg1: str = 'default', arg2: int = 0):
    print('async_page')
    ui.label('This is an async page')
    ui.label(f'arg1: {arg1}')
    ui.label(f'arg2: {arg2}')
    show_client_id()
    show_page_was_loaded_at_unix_timestamp()
    ui.label('The client ID should change on each page load.')
    random_uuid = str(uuid.uuid4())
    counter_label = ui.label("0")
    def increment_label():
        current_value = int(counter_label.text)
        counter_label.set_text(str(current_value + 1))
    ui.button('Increment label', on_click=increment_label)
    print("Emitted UUID:", random_uuid)
    ui.label(f'Here is a random UUID: {random_uuid}')
    ui.link('Go back to auto-index page', '/')
    await ui.context.client.connected()
    await asyncio.sleep(0.5)
    ui.label('=== The client is connected ===')
    show_client_id()
    ui.label('The client ID should remain consistent to the above.')
    ui.label(f'Here is the all-caps UUID: {random_uuid.upper()}')
    result_js = await ui.run_javascript('30 + 9')
    ui.label(f'Here is the result of a javascript expression: {result_js}. Are you a fan of Hatsune Miku?')
    async def try_and_close_the_websocket():
        print('Trying to close the websocket for id:', ui.context.client.id)
        print("Client's socket_id:", ui.context.client._socket_id)
        await core.sio.disconnect(ui.context.client._socket_id)
        # purge the client
        ui.context.client.delete()

    ui.button('Soft reload (userspace code)', on_click=try_and_close_the_websocket)
    ui.button('Soft reload (built-in code)', on_click=ui.navigate.soft_reload)

ui.run(reconnect_timeout=15, port=4552) # shorter timeout for testing purposes

Notable points:

  • No page reload on server restart nor extended duration (>15s) without internet (block it in DevTools)
  • "Soft reload" simply flashes the page for a bit while the new content comes, no page reload involved

Rough explanation of the commits

General modification to enable ACID maneuver

expose path of page on handshake + expose path of page to client for its handshake

This allows the browser client, when making a handshake to reconnect, to inform the server which page it was originally visiting (say /async_page). Subsequently in the ACID maneuver, the corresponding page function will then be found (for func, path in Client.page_routes.items():) and swiftly executed so as to create the ACID client to match the handshake request.

allow force-set ID for client

Since a client is matched via the client_id, and I personally don't want to implement logic to change the client_id in the browser side, I made it so that the server would accept whatever client_id was passed in to the server. This meant that simply doing self.id = str(uuid.uuid4()) would not be enough. Rather, self.id = force_id is called if force_id was passed in

page keep track of instances for ACID to find the right one

Makes sense shortly, since we need to re-find the page instance for with Client(proper_page, request=None, force_id=data['client_id']) as fake_client:

The important bit

The crux of ACID (Automatic Client-ID) aka "software rugpull"

TL-DR:

I stand by my old comment in #4487 (comment)

At this point the behaviour of _on_handshake diverges greatly from the traditional behaviour. For, upon the reception of a non-existent client_id in the handshake, the handler proceeds instead to create a matching client with the correct id and invokes the page function, establishing communication properly to a browser that is none-the-wiser.

Major difference being, in this new PR:

  • Do not instantiate a new page class with the same path. Use the correct page instance as provided by "page keep track of instances for ACID to find the right one"
  • Since the client is connected via WebSocket, there is no need to override Client.connected, and we can await the page function as usual. Since it is a mirror of nicegui/page.py, I have sneaked in changes from use background_tasks.create over bare asyncio.create_task #4551, which should be fine. If not we rollback
  • No longer re-implement the logic for handling tab_id and environ and sio.enter_room. Instead, we fully embrace into the "rugpull" nature by simply let client = Client.instances.get(data['client_id']) to find our made-up client and handle the logic there.
  • Assert the result is not a FastAPI Response, since it is WebSocket, how to serve a HTTP response...

Enable ui.navigate.soft_reload

client keep track of the socket id (for ui.navigate.soft_reload())

It is used for shutting down the Socket.IO connection. Without it, I don't see how it can be possible...

browser to aggressively reconnect for ACID

If I am not mistaken, as of writing, NiceGUI never offered any public or private methods to disconnect the Socket.IO connection gracefully (or in a way that the disconnect handler will be called), since the socket_id (not any UUID generated by NiceGUI, mind you, but ID generated by Python.SocketIO, which looks like xq8S7tkOWr3iIu-KAAAD) was never kept track of.
Therefore, my argument is, I can simply define the behaviour of disconnect to be an immediate reconnect, without fear it break existing code.

NOTE: When the browser reconnects, it starts over by setting window.nextMessageId = 0. This is because, though there may be 1000 messages from the old client, we still want to listen to every single message broadcasted by the new client.

ui.navigate.soft_reload() code

It is simply the try_and_close_the_websocket in the testing code, baked into the NiceGUI library.

The procedure for soft_reload:

  • Find the Socket.IO and shut it down
  • Delete the client
  • When the browser reconnects, the client no longer exist, so the ACID maneuver gets triggered, re-running the page function and serving new content to the browser

Remaining concerns

  • Need more exhaustive testing over compatibility. I need sleep (it's 5AM now and I haven't slept) so no I didn't do that...
  • Will this open us up to potential DDOS attack? (well if a malicious actor want to exhaust CPU by running the page function many time, they would originally need to fetch the index.html, now they don't. Not much of a big difference if you ask me, but anywayd)
  • args and kwargs passed into the page function (from FastAPI, think: URL params) are dropped on ACID maneuver. Should we let browser report to us what were they? Should the server store what were they? (I am thinking the first solution, but I am not sure because there may be some serialization issues when convert between JS-acceptable and Python-acceptable format. Though I don't want to let that one deficiency hold back this PR.)

@falkoschindler
Copy link
Contributor

Thanks for another impressive pull request, @evnchn!
Before checking your implementation in detail, let's clarify some general questions first:

  1. What exact problem are you trying to solve? I assume you want to load the page content a little faster after the connection got interrupted or the server has restarted. To be honest, I wouldn't have thought that this is a big deal.
  2. I guess ACID only works if all messages, that occurred in the meantime, are still in the message history queue. After a longer break, the browser will have to reload anyway. Right? Resetting nextMessageId to zero might work for a while, but not after messages have been pruned to save memory.
  3. It looks like you're allowing to connect to any route with any client ID. Is this secure? What about situations where a client has access to specific routes only? Can the server be tricked into serving restricted content?
  4. The page class is keeping references to every instance. Are you sure this isn't causing a memory leak? I guess it is fine because you can't delete pages in NiceGUI anyway. But I just want to raise awareness for this aspect nonetheless.

To summarize: Even though this PR might be an improvement over the "old" rugpull, it still adds quite some complexity that needs to be balanced with the expected benefit.

@falkoschindler falkoschindler added analysis Status: Requires team/community input feature Type/scope: New feature or enhancement 🌳 advanced Difficulty: Requires deep knowledge of the topic labels Mar 31, 2025
@evnchn
Copy link
Collaborator Author

evnchn commented Mar 31, 2025

Regarding your points:

  1. [Issue this PR solves] If you recall, I am fighting to host a server in with single-digit-megabit uplink. Therefore, I guess the advantages of thid PR resonates with me more than others. Regardless, this PR can be useful in these following practical ways.
    a. Enter an elevator for > 30 seconds, want the page to load in without a browser reload
    b. Enables a global refresh of the page as powerful as encapsulating the page in a @ui.refreshable without browser reload
    c. Rather importantly, in the case of being spammed HTTP requests, the old 30-second timeout for each client means that we are keeping 30-seconds worth of clients in memory, which slows down GC to the point it takes tens of seconds to minutes. With the ACID approach, I see that we probably won't need such a long reconnect timeout, thus we mitigate this old avenue of DDOS risk.
  2. [concern over resetting nextMessageId] Sorry but I disagree. Since the ACID client is created brand-new to re-connect to the browser, there by-definition cannot be any existing messages in any queue. All previous messages, whether they are pruned or not, does not matter since:
    a. Those messages belong to the old client has been deleted, and
    b. Those messages are straight-up not sent to the browser via the new client. It is similar to how it is valid to set self.next_message_id: int = 0 for a brand-new Outbox in a Client
  3. [concern over security] Yesterday night (night? It's 5AM), I was too focused on DDOS risk since I had some issues with regards to spamming requests to my own server, so did not quite think about security. Yes, it could be a security risk. My proposal: "Generate a random token on each normal page serve, for which it is used as a proof of authorization for the ACID maneuver*".
  4. [concern over page memory leak] Dismissed this concern yesterday exactly because "you can't delete pages in NiceGUI anyway"

Additionally, I shall add

  1. There are no pytest regarding this PR because I am lost as to how one shall add one. I am honestly shaking up far too many assumptions with regards to the inner workings of NiceGUI than what I would call comfortable with, so I wish not to proceed on that end.
  2. Focusing on your point 3 about security, we seem to be short-circuiting the FastAPI authentication and middleware system, if we allow more than 1 Socket.IO connection attempts and thus client creation PER 1 HTTP request. That's sketchy if you ask me 😅(Ties in to the footnote below)

*Issue will arise when, say, the page is configured to be one-off visitable which does some action which cannot be repeated

# assume some middleware already ensures that /visit_once can be visited once and once only throughout the lifespan of the server...

@ui.page("/visit_once")
def page visit_once():
    do_this_once_and_once_only()
    ui.label("The action meant to be done once and once only has been done!")

On the ACID maneuver, do_this_once_and_once_only() would be invoked, leading it to be done more than once throughout the lifespan of the server. It is impossible before this PR, and an undesirable behaviour for this PR.

So, either:

  • Make each ACID maneuver tied to the authorization system (think: middleware, all that stuff) to prevent this from happening
  • Make this feature opt-in (Think: @ui.page(acid=true)), while potentially introducing our own authorization system to account for short-circuiting FastAPI's one in the ACID maneuver

@evnchn
Copy link
Collaborator Author

evnchn commented Apr 1, 2025

Thinking about it, seems like having it as an opt-in characteristic @ui.page(acid=true) would be the best. It could stay as opt-in until enough time has passed that it is the default. Think: xrange in Python 2 and range in Python 3.

Would say, ACID maneuver bypassing FastAPI middleware logic is fine, as long as the page code checks for the authorization, and return a Response to reject client reconnection via ACID maneuver.

Implementing our own authorization system (no longer rely on FastAPI middleware) would help with the adaptation process. Imagine @ui.page(authorization_function=my_auth_func) where my_auth_func returns True or False depending on whether, say, app.storage.user["admin"] was set to True.

Though, it would still be better if we can somehow pass the reconnection attempt as if it was a HTTP request through the FastAPI middleware again, but paying attention not to re-execute the entire page function (would defeat do_this_once_and_once_only()). Then we don't need 2 sets of authorization systems.

@falkoschindler
Copy link
Contributor

Ah, I'm finally realizing that you are not reconnecting without loosing any messages. You just short-circuit the HTTP request that would be needed to get a fresh page. The following code would restart from 0 (as if you would reload):

@ui.page('/')
def index():
    n = ui.number(value=0)
    ui.timer(1.0, lambda: n.set_value(n.value + 1))

So you're enforcing a handshake right on disconnect and re-creating the client if it has already been pruned. The code duplication feels a bit fragile though, and the problem with lost parameters like request remains. It would be so much easier if the client would still be around. Bringing me to the thought: Why don't you simply increase the reconnect_timeout? What would be the disadvantage of setting it to something like one, two or five minutes?

@evnchn
Copy link
Collaborator Author

evnchn commented Apr 1, 2025

"Ah, I'm finally realizing..."

Yes. What you said in the last comment is true. You finally get the gist of the ACID approach. I like how even the library maintainer read it wrong the first time. Truly goes to how mind-bending this PR is (or maybe my communication skills is bad, which it is 😅). That said, we should be very careful as we proceed to communicate and navigate this PR.

"Why don't you simply increase the reconnect_timeout"

As mentioned, this would mean we are keeping many clients in memory. As Python GC times tends to exhibit at least logarithmic growth with regards to objects in memory (I think it's more like linear or superlinear if you ask me. Definitely noy constant), this would incur a risk that, if we spam requests to the server, that GC takes up most of the time, rendering the server useless. Happened while in my stress testing.

In retrospect, we can probably implement some sort of rate limiter (Think: https://github.com/long2ice/fastapi-limiter) such that it is possible to stop that from happening and have a long reconnect_timeout, but to what end? If my client is expected to go offline for 10 minutes, then we keep 10 minutes worth of clients in the Client.instances by setting reconnect_timeout = 600? The ACID approach scales better, if we want to enable clients which gets disconnected for a long time (think: PWAs, Yes I am implementing PWAs in my project, so yeah more viewpoints more better)

"The code duplication feels a bit fragile though"

It's the same code as page.py. We could refactor, but the logic is (mostly) the same. You can even see the signature task = background_tasks.create(wait_for_result()). I understand the ACID maneuver is fragile, but I don't understand why the duplicated code part is fragile. In fact, if you ask me, it's the least fragile bit of this PR, being mostly copied from known-working code.

"lost parameters like request remains"

More pressingly, the arguments passed in via the GET parameters are gone.
async def async_page(arg1: str = 'default', arg2: int = 0): you see if you do the ACID maneuver, arg1 and arg2 fallback to default value. I am not sure what the request argument would do in a Client, though, but we sure are losing it after the ACID maneuver.

For that, I recommend either:

  • Ban the use of such parameters outright, if the page is marked with @ui.page(acid=True). At least, throw a warning that says "hey those parameters are dropped in the reconnection!"
  • Always store such parameters in, say, app.storage.user (expecting the *args and **kwargs to be JSON-serializable anyways, not sure about request though), and in the ACID maneuver, we recall the parameters in the app.storage.user
  • For the GET parameters, the client can include them into the Socket.IO handshake, much like how we now include the path of the page in the handshake for the ACID maneuver to happen.

@evnchn
Copy link
Collaborator Author

evnchn commented Apr 1, 2025

If you may feel that the ACID approach is still a bit sketchy, we may have to resort to marking this approach as a beta feature, and potentially excluding the use of this approach in the security policy (meaning there won't be any support on security, and if you use this new feature and got hacked, you're on your own)

But let's try not to do that, and polish this PR a bit more first. More minds on the PR more better 🧠

@rodja
Copy link
Member

rodja commented Apr 2, 2025

this PR can be useful in these following practical ways.
a. Enter an elevator for > 30 seconds, want the page to load in without a browser reload
b. Enables a global refresh of the page as powerful as encapsulating the page in a @ui.refreshable without browser reload
c. Rather importantly, in the case of being spammed HTTP requests, the old 30-second timeout for each client means that we are keeping 30-seconds worth of clients in memory, which slows down GC to the point it takes tens of seconds to minutes. With the ACID approach, I see that we probably won't need such a long reconnect timeout, thus we mitigate this old avenue of DDOS risk.

All your listed benefits state that users do not want a browser reload. But why? And ACID is also doing a kind of a reload. So whats the benefit of not doing a standard reload?

@falkoschindler
Copy link
Contributor

What I mean with "fragile" is that we now need to keep page.py and nicegui.py in sync when changing the code in the future. We can certainly refactor it to avoid code duplication. And this might help to clarify what this code is actually doing, namely creating a new client instance based on certain information.
In order to do that - creating a client based on certain information -, you need this information, which is basically a recipe. This brings me to parameters like request: We could keep them together with the corresponding page function (maybe even using partial to combine everything in one callable) in a dictionary so that it only needs to be called in an ACID case.
And when thinking about storing a dictionary of recipes, one could argue to simply store client instances (i.e. increasing reconnect_timeout). It's hard (for me) to estimate the cost and benefit of one over the other approach.

To summarize: I'm still trying to simplify the concept, or at least my understanding of it.

But like @rodja asked: Why not doing a normal reload? Sure, after a short disconnect you might want to avoid a big(?) HTTP request. This is what reconnect_timeout is for. But after 10 minutes? Then a proper reload shouldn't be much of a problem. With efficient caching, there shouldn't be much data to be transferred.

@evnchn
Copy link
Collaborator Author

evnchn commented Apr 2, 2025

I think your understanding of this new approach has been correct since the comment beginning in "Ah".

The code refactoring sounds like a good idea, such that the logic for creating the client according to the "recipe" remains the same.

Now to answer the big questions:

  • [Why not just reload the page?] The thing with reloading via the browser is, since NiceGUI have the content in body script, from the moment the loading begins to it ends, the page remains blank. This will change with SSR in Server-Side Rendering by invoking NodeJS, Vue and Quasar for the HTML #4557, but thats an even more radical change than this PR.
    • Understandably, for properly hosted sites like https://nicegui.io, time for reload is not a concern
    • But for other sites with a single server, clients from distance locations will certainly feel the heat. Especially since it's TCP under the hood, involving the 3-way handshake. Not to mention proxies such as Cloudflare muddying the water in that aspect.
    • Look at loading speeds of some NiceGUI websites out in the wild at: (Incomplete) list of NiceGUI websites indexed on Google / Bing / Yandex #4506 (1) they are not fast in the first place, and (2) there exist at least one website which turns out is located farther from you, and the loading speed is slow. Assuming you are in Germany, then your experience would be similar to mine, and the slowest of the bunch is the website from China.
    • ACID helps mitigate this, since the page does not get reloaded, so the TCP handshake can take its time for the Socket.IO, while the user can still see something in the old page (also applies to page navigation, if I have time to get ui.navigate.soft_to() working)
    • "Why don't they eat meat?" (said by Emperor Hui of Jin when told that his people didn't have enough rice to eat)
  • [Value of the ACID PR beyond what's on the tin] In my opinion, ACID isn't the full picture. It is best combined with the following future PRs:
    • To reduce loading on the origin server, it is often desirable to cache the pages as the proxy server (think: Cloudflare). Right now that's impossible, since each page has a baked_in client_id, which must be correct or the page just wouldn't load. ACID enables aggressive caching of the page (think: public, max-age=31536000, immutable on the page itself), such that the origin server just never receives any HTTP request, only handles updates via Socket.IO
    • Since the ACID reload is so throughout, should I get ui.navigate.soft_to() working, it could be a great contender for the API to build SPAs on. Think: Introduce API to build Single Page Applications (SPAs) #2811 potentially using ACID for the underlying logic.

I may be thinking too many steps ahead here, but I still think there is value in getting this PR working, for servers with less-than-ideal connection quality, as well as pavig the way for future PRs.

@rodja
Copy link
Member

rodja commented Apr 2, 2025

The thing with reloading via the browser is, since NiceGUI have the content in body script, from the moment the loading begins to it ends, the page remains blank

Same is true for ACID. You need to fully clear the page and load elements anew. Otherwise you can not be sure to have the same state between frontend and backend.

clients from distance locations will certainly feel the heat. Especially since it's TCP under the hood, involving the 3-way handshake.

"clients from distance" are always a problem with NiceGUI (and other interactive frameworks). Each interaction needs to be send via websocket and so the whole page feels laggy if you have a roundtrip of more than say 60 ms. So I would argue that this is not a supported use case for NiceGUI in general.

ACID helps mitigate this, since the page does not get reloaded

But you need to do a full reload with ACID. I am sceptical about HTTP round trip beeing measurably slower than a socketio handshake (which is also happening via HTTP).

ACID enables aggressive caching of the page

But after loading the whole cached content is replaced by the new content send via socketio. There is only some stuff in the header which can be cached. And that might very well be done better without ACID (for example by putting as much of it in a separate file).

Since the ACID reload is so throughout, should I get ui.navigate.soft_to() working, it could be a great contender for the API to build SPAs on

The main benefit of #2811 is that it does only replace the parts on the page which changes. ACID always needs to evaluate and render everything anew.

Sorry for pushing back so hard. But we need to think about NiceGUI's maintainability and ease of use. I still not see a clear benefit of ACID which would justify the hardship it brings with it. To conclude with something positive: there might be hiding something like "Backend-less NiceGUI" behind ACID. I think of a small (separate) project which uses NiceGUI but does not establish a socket communication at all -- and hence could be used for serving static, cachable pages written in Python.

@evnchn
Copy link
Collaborator Author

evnchn commented Apr 2, 2025

First of all, as the library maintainer, you can reject the PR if you think it is too tough to work on it. There is no need to say sorry.

However, I still see some misconceptions, and I'd like to get the point fully across for your most accurate judgement.

TL-DR 1: ACID maneuver is NOT fast! It could possibly be slower than a plain-old HTTP load! However, since we are not invoking the browser, the ACID maneuver can take as long as it needs, but we will not get flashing white screen, not even a millisecond.
TL-DR 2: Via carefully placing contents before and after await ui.context.client.connected(), we force the unchanged content to be flushed and served over HTTP, while changed content served over Socket.IO. With the fact that we don't need a valid client_id per page connection, it is true that ACID makes caching possible


Have you checked out the code in this PR under adverse network conditions? You can set the network bandwidth and request latency in custom profiles in Network tab in DevTools as (reasonably) slow as you want, but there will still not be flashing any white screen when the ACID maneuver takes place. The content is just loaded in a snap 🫰, always.

Not only that, you can try make the page content longer. The effect is more obvious then.

I apologize for unable to screen-record on my other computer right now, but if you want, perhaps I can do a video demo?


Misconceptions clarification

Same is true for ACID. You need to fully clear the page and load elements anew. Otherwise you can not be sure to have the same state between frontend and backend.

I believe not. Since for the ACID approach, the web page is not reloaded, but the script handshakes the server to obtain the latest content. In the following workflow of a NiceGUI page load:

  • Browser make HTTP request to index.html (3-way handshake)
  • Browser loading content in <head> and <body> (no content in <body>, page is painted white)
  • Browser parsing script in <body>
  • nicegui.js parse elements <- We are skipping ahead to here via ACID, involving a 3-way handshake too!
  • Vue.js update VDOM
  • Browser render new DOM (content is displayed)

Notably:

  1. We are jumping ahead quite a lot, especially the browser's part.
  2. We do not invoke the browser's reload logic
  3. Skipping the browser is good, since a browser engine is full of overhead (sandboxing, killing and spawning a new process, restart rendering pipeline, etc...)
  4. Browser never paints the page white. Not even for a millisecond.
  5. State between frontend and backend remains consistent, since the ACID handshake provides the latest list of elements

I am sceptical about HTTP round trip beeing measurably slower than a socketio handshake (which is also happening via HTTP)

If you are comparing the round trip time between HTTP and a SocketIO handshake and come to the conclusion that the ACID approach is not worth it, then you have have misunderstood the point.

Point is:

  • SocketIO is most definitely slower than HTTP, with WebSocket basically a HTTP response with code 101 switching protocols, not to mention an additional handshaking step by SocketIO.
  • But, the beauty is, the browser reload logic is never invoked (it never fetches index.html and loads the page anew) so the ACID maneuver can take as long as it needs to take, but the user will not see a flashing white page. Not even for a millisecond.

ACID cannot enable aggressive caching of the page, since after loading the whole cached content, it is replaced by the new content send via socketio

That would be right under current implementation, but note the following:

  • What's to say we stick to the status quo in terms of what we serve in the HTML part? We could straight up not include any elements, and serves a minimal page which does nothing but supply a purely random client UUID for the ACID maneuver, expecting all elements to come in via Socket.IO. Think:
      // passing in {} for ACID
      const app = createApp({}, {
        version: "2.13.0",
        prefix: "",
        query: {'client_id': createRandomUUID(), 'next_message_id': 0},
        transports: ['websocket', 'polling'],
        quasarConfig: {"brand":{"primary":"#5898d4"},"loadingBar":{"color":"primary","skipHijack":false}},
      });

That would literally be the "super tiny, generic html page which establishes a websocket connection" @falkoschindler asked for in #1539 (comment)

  • Additionally, I think it's perfectly fine to serve a page which is something like this, providing you are using the caching trick on a page which (1) gets visited a lot, (2) bulky to send, and (3) has a few places where fresh data is pulled in (like the homepage):
@ui.page("/count_nicegui_pr_from_evnchn")
async def count_nicegui_pr_from_evnchn()
   # a bunch of fancy UI stuff above, which doesn't change with data
   number_of_PRs_evnchn_made_in_nicegui = ui.label("Loading...")
   # we do this so that `Client.build_response` flush the update, so that the HTML page says "Loading..." after nicegui.js kicks in but before Socket.IO connection
   await ui.context.client.connected() 
   # this next update will then be served over Socket.IO
   number_of_PRs_evnchn_made_in_nicegui.set_text(get_PR_count("zauberzeug/nicegui", "evnchn")) # sets text "alot"

Then, the page which is cached by Nginx / Cloudflare / etc, will have the same consistent but generic content (cache 99%), and the WebSocket message will just be a single update on the element number_of_PRs_in_nicegui (serve live 1%)

Note that:

  • Attempting to do any of the above without ACID will always fail, since the browser lacks a valid client_id to communicate the server. Handshake will always return false, and the browser will continuously reload.

Response to other points:

clients from distance, backend-less NiceGUI

Backend-less NiceGUI would probably be a pain, because we would need to implement the reactivity in JavaScript (or use Brython?). However, humans just want immediate visual feedback when they click a button. They don't need to see data for that. That's why if you check #4451, you see the course of action for me to implement a card click button which navigates to another page, is not using JavaScript to clear the page and populate new content, but merely to make the button do opacity-50 to make it feel it has been clicked.

I have thought of implementing such things (.on('click', visual_effect.click)), but thought the impact was too minor (that trick was just scratching the surface of what I did to make my server feel fast despite network latency), probably not a lot would use it, and a customized solution would be better.

Talking about the 3-way handshake taking an unexpectedly long duration, with the ACID approach, we can add extra animations and text like "Loading, please wait...". While, in a browser reload, all you will see as the browser does its thing is a blank white page which is unconfigurable as it is hardcoded into Chrome/Firefox/etc.

@evnchn
Copy link
Collaborator Author

evnchn commented Apr 2, 2025

"Backend-less NiceGUI"

Hmm: I am now thinking either @ui.page(client_connection=False) or a special return value for a page function (return NO_CONNECTION_NEEDED), which if set, makes the frontend totally not try to establish a WebSocket connection.

Then, such a page can be safely cached by proxies, because there is no connection to be made, and there won't be any concerns of missing client_id.

Of course, this would mean no reactivity, or at least all reactivity is done via JavaScript on the browser side....

@evnchn
Copy link
Collaborator Author

evnchn commented Apr 4, 2025

Brainstorming...

So the thing with ACID is that we're creating a new client for the socket.io handshake, because the old one is deleted.

But then, why delete the old client? Perhaps the client can be persistent, and shared, much like the auto-index page?

So, while this PR is still in consideration, I'll explore ways to leverage the existing auto-index page for a more seamless reconnection experience.

Meanwhile, I think, potentially, having a single shared client for the pages defined using @ui.page might be a good idea. That can be in another separate PR.

@falkoschindler falkoschindler requested a review from rodja April 7, 2025 08:36
@rodja rodja marked this pull request as draft April 7, 2025 08:37
@evnchn
Copy link
Collaborator Author

evnchn commented Apr 11, 2025

Timeline wise, should we want to do this, it definitely must be 3.X release.

Slipping such a big change in 2.X seems totally wrong.

Even if this functionality can be totally disabled, expectations have to be adjusted with this PR, and it's a big feature, not your run-of-the-mill new UI element.

@rodja rodja removed their request for review April 14, 2025 09:37
@falkoschindler
Copy link
Contributor

Regarding the white flash: This seems to be a common problem (FOUC) and could be addressed by the way <style> is loaded.
https://stackoverflow.com/questions/71799083/white-flash-on-dark-mode-on-refreshing-page
Maybe this path is worth exploring - independent of ACID - and could fix the core problem of reloading pages.

@evnchn
Copy link
Collaborator Author

evnchn commented Apr 14, 2025

I think white flash (FOUC) is a solved problem.

Because, the only styling that can show through is the blank page (recall all UI elements are shown via JS, so nothing's in there in the beginning), and the only styling (excluding crazy tricks such as :before) is setting the background color.

Thus, it would suffice, if whoever sets the custom background color also includes the following in their code.

https://github.com/zauberzeug/nicegui/wiki/Advanced-operations-and-techniques#combat-page-color-flash-when-using-non-white-page-color


Though, 2 ideas with regards to NiceGUI doing some of the heavy lifting:

  1. If set dark mode via NiceGUI, auto-include the said add_head_html code (would this affect switching in and out of dark mode, though?)
  2. If have something like ui.query('body').style(), can we put the style in head?

But I'm not sure that's worth the pursuit. Seems like it would be quite a trouble and needs significant rework.

Also, the benefit of manually inserting code would work for even older NiceGUI versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌳 advanced Difficulty: Requires deep knowledge of the topic analysis Status: Requires team/community input feature Type/scope: New feature or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants