-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Replacing ICU with ztd.text or encoding_rs #45389
Comments
If this is not worth |
I don't know (yet) if it's worth, but it's probably too early to bring this to the |
Small correction: in Bun's case, |
Thanks @Jarred-Sumner. I just updated the description. |
Possible small positive side effect of doing what is proposed here: There is at least one place in the code where we use ICU for converting between UTF-8 and UTF-16 even though that part of the code doesn't have any internationalization needs. It's blocking #37954, so this could also help with that, I suppose. |
I don't know |
I've worked quite a bit with WTF. It's great but not easy to use outside chromium's source tree. It's supposed to be standalone but it has at least a partial dependency on base/. WebKit's fork is probably even worse, I could never gather up enough courage to even try: https://github.com/WebKit/WebKit/blob/main/Source/WTF |
UTF8 encoding is now fast due to the recent developments. Can we consider |
What encodings are we talking about? Can you enumerate them? |
Any |
I suppose? The tradeoff is performance vs. maintenance. A library like Cross-compiling to WASM is an option but means no sharing of code or data. Each thread gets its own copy, and that's probably quite substantial for conversion tables. You'd have to measure it. |
I'd be happy to give this a shot. Would anybody want to help/guide me throughout the process? |
Happy to field questions. What direction do you plan on taking? |
Wouldn't that require copying all inputs and outputs from the JS heap to WebAssembly linear memory? That, plus the performance difference between WebAssembly and native, might diminish any performance benefits that this issue is hoping for. |
Copy data in/out: yes, but it may be cheap enough relative to the cost of conversion. Only way to know is to measure. |
Bun started using this package for certain paths: https://github.com/simdutf/simdutf |
That could work. The fact it has an amalgamation speaks in its favor, otherwise we're on the hook for maintaining a gyp build. A thing to keep in mind with SIMD is that the numbers can look great in isolation but turn out slower in real-world application. As with all things performance: you have to measure it. |
Note that simdutf is now part of node.js, so this should make such work more relevant? |
Am I understanding correctly that |
I've been mainly working on the TextDecoder performance gains for the past couple of weeks. It seems that
ICU
, even though is required for v8Intl
, is slow for UTF-8 encoding & decoding.I recommend either adding
ztd.text
orencoding_rs
with C++ bindings as a dependency and improving the performance of the TextDecoder & TextEncoder which will improve a lot of applications worldwide.Deno uses
encoding_rs
and Bun uses a custom implementation.Some good references:
The text was updated successfully, but these errors were encountered: