-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++20 coroutines + Native webassembly promise integration #20413
Comments
Yes, this is something I'd like to do. See emscripten/promise.h for the C API we have for interfacing with JS promises, including via JSPI. The next steps would be to create a C++11 wrapper API to take care of the resource management, then a C++20 coroutine API on top of that. Is this something you would be interested in working on? I'd be happy to review patches if so. |
Ha, I actually literally implemented this last week for Embind and was going to send a PR next week. Just randomly saw this issue. |
Btw I saw that experimental API too, but it's pretty low-level and it's kinda a shame that it uses its own list of promises and handles as that makes it harder to integrate with Embind's I decided to go with the latter as it's more useful for complex JS interactions. I'd be happy to chat more about it e.g. in Discord. |
Note that JSPI is orthogonal / unnecessary for coroutines. JSPI, like Asyncify, is useful for pausing the entire program, whereas in case of coroutines all the transformation magic happens at compile time and it only pauses the local coroutine itself, so Wasm engine doesn't [need to] know about promises and pausing. |
Yeah that's why I implemented mine via Embind instead - it supports passing promises from and to JS. I'll try to submit a PR soon. |
This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.
See #20420. |
Ah right, I guess those are two slightly different issues - adding coroutine support for |
This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.
Sometime I thinking about extension or plugin API. For Embind, for JSPI API, etc. Also, I prefer to make own extension package. |
FWIW I know I mentioned it before, but if you're referring to Both of those libraries can use JSPI to await promises when compiled with That is, I don't want to discourage you, just wanted to clarify because it sounds like you might think it's a low-level API for JSPI. They are both high-level libraries implemented in JavaScript and exposing C/C++ bindings - one works on JS promises created from C/C++ and another works on JS values (including promises) received from JS. |
Meanwhile... I getting such error when trying use promise C API with
My makefile: ____:
EMCC_DEBUG=1 $(CC) -I./include \
-I ../../src/cxx/ \
-c ./test.cpp \
-std=c++23 -sASYNCIFY=2 \
-DHALF_ENABLE_CPP11_CFENV=false \
-sNO_DISABLE_EXCEPTION_CATCHING \
-sDEMANGLE_SUPPORT=1 -sASSERTIONS -frtti \
-Wno-limited-postlink-optimizations \
-sALLOW_TABLE_GROWTH=1 \
-O0 -msimd128 --no-entry -sRESERVED_FUNCTION_POINTERS=1 --target=wasm64
EMCC_DEBUG=1 $(CC) -g ./test.o -o ./test.js \
-I ../../src/cxx/ \
-std=c++23 -sASYNCIFY=2 \
-Wno-limited-postlink-optimizations \
-O0 -msimd128 --no-entry -sRESERVED_FUNCTION_POINTERS=1 --target=wasm64 \
-sALLOW_MEMORY_GROWTH=1 \
-sSINGLE_FILE -sTOTAL_MEMORY=4MB \
-sEXPORT_ES6=1 \
-sNODERAWFS=0 \
-sUSE_ES6_IMPORT_META=1 \
-sNO_DISABLE_EXCEPTION_CATCHING \
-sDEMANGLE_SUPPORT=1 -sASSERTIONS -frtti \
-sEXPORTED_RUNTIME_METHODS="['addFunction']" \
-sALLOW_TABLE_GROWTH=1 \
-sEXPORTED_FUNCTIONS="[\
'_malloc', '_free', '_calloc', \
'_testPromise', \
'_emx_promise_then', '_emx_promise_resolve', '_emx_promise_create'\
]" |
This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.
This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.
This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.
This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.
From side of C++20 we have coroutines, from side of WebAssembly (at least in Chrome) we have native promise integration. Why be not to unify such features, and not to make an new and better async await API? Probably, make some sort of wrap or emulation? Or coroutine header with low-level code implementation?
The text was updated successfully, but these errors were encountered: