C++20 coroutines + Native webassembly promise integration #20413

u2re-dev · 2023-10-08T04:17:59Z

From side of C++20 we have coroutines, from side of WebAssembly (at least in Chrome) we have native promise integration. Why be not to unify such features, and not to make an new and better async await API? Probably, make some sort of wrap or emulation? Or coroutine header with low-level code implementation?

tlively · 2023-10-08T06:03:28Z

Yes, this is something I'd like to do. See emscripten/promise.h for the C API we have for interfacing with JS promises, including via JSPI. The next steps would be to create a C++11 wrapper API to take care of the resource management, then a C++20 coroutine API on top of that.

Is this something you would be interested in working on? I'd be happy to review patches if so.

RReverser · 2023-10-08T20:09:49Z

Ha, I actually literally implemented this last week for Embind and was going to send a PR next week. Just randomly saw this issue.

RReverser · 2023-10-08T20:11:10Z

See emscripten/promise.h for the C API we have for interfacing with JS promises, including via JSPI.

Btw I saw that experimental API too, but it's pretty low-level and it's kinda a shame that it uses its own list of promises and handles as that makes it harder to integrate with Embind's emscripten::val.

I decided to go with the latter as it's more useful for complex JS interactions. I'd be happy to chat more about it e.g. in Discord.

RReverser · 2023-10-09T11:22:29Z

Note that JSPI is orthogonal / unnecessary for coroutines.

JSPI, like Asyncify, is useful for pausing the entire program, whereas in case of coroutines all the transformation magic happens at compile time and it only pauses the local coroutine itself, so Wasm engine doesn't [need to] know about promises and pausing.

RReverser · 2023-10-09T12:10:23Z

Ha, I actually literally implemented this last week for Embind and was going to send a PR next week. Just randomly saw this issue.

Yeah that's why I implemented mine via Embind instead - it supports passing promises from and to JS. I'll try to submit a PR soon.

This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.

RReverser · 2023-10-09T14:14:59Z

See #20420.

RReverser · 2023-10-09T14:29:10Z

Ah right, I guess those are two slightly different issues - adding coroutine support for em_promise_t and for JavaScript values in Embind.

This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.

u2re-dev · 2023-10-10T01:33:06Z

Sometime I thinking about extension or plugin API. For Embind, for JSPI API, etc. Also, I prefer to make own extension package.

RReverser · 2023-10-10T02:08:56Z

for JSPI API

FWIW I know I mentioned it before, but if you're referring to em_promise_*, it's not JSPI API. It's just a JavaScript / C library, just like Embind. https://github.com/emscripten-core/emscripten/blob/main/src/library_promise.js

Both of those libraries can use JSPI to await promises when compiled with -s ASYNCIFY=2, but otherwise the only difference between them is the provided API and value representation.

That is, I don't want to discourage you, just wanted to clarify because it sounds like you might think it's a low-level API for JSPI. They are both high-level libraries implemented in JavaScript and exposing C/C++ bindings - one works on JS promises created from C/C++ and another works on JS values (including promises) received from JS.

u2re-dev · 2023-10-10T03:48:53Z

Meanwhile... I getting such error when trying use promise C API with -sASYNCIFY=2, with any other is working.

C:\***\EMX\test\js>call node --wasm-stack-switching-stack-size=1000 --experimental-wasm-modules --experimental-wasm-memory64 --experimental-modules --experimental-wasi-unstable-preview1 test.mjs --input-type=module
Aborted(Assertion failed: Missing __sig for invoke_djjj)
file:///C:/***/EMX/test/cxx/test.js:154
      throw ex;
      ^

RuntimeError: Aborted(Assertion failed: Missing __sig for invoke_djjj)
    at abort (file:///C:/***/EMX/test/cxx/test.js:684:11)
    at assert (file:///C:/***/EMX/test/cxx/test.js:396:5)
    at file:///C:/***/EMX/test/cxx/test.js:5078:17
    at Object.instrumentWasmImports (file:///C:/***/EMX/test/cxx/test.js:5090:13)
    at file:///C:/***/EMX/test/cxx/test.js:5621:10
    at async file:///C:/***/EMX/test/js/test.mjs:5:15

Node.js v20.7.0

My makefile:

____: 
    EMCC_DEBUG=1 $(CC) -I./include \
        -I ../../src/cxx/ \
        -c ./test.cpp \
        -std=c++23 -sASYNCIFY=2 \
        -DHALF_ENABLE_CPP11_CFENV=false \
        -sNO_DISABLE_EXCEPTION_CATCHING \
        -sDEMANGLE_SUPPORT=1 -sASSERTIONS -frtti \
        -Wno-limited-postlink-optimizations \
        -sALLOW_TABLE_GROWTH=1 \
        -O0 -msimd128 --no-entry -sRESERVED_FUNCTION_POINTERS=1 --target=wasm64
    
    EMCC_DEBUG=1 $(CC) -g ./test.o -o ./test.js \
        -I ../../src/cxx/ \
        -std=c++23 -sASYNCIFY=2 \
        -Wno-limited-postlink-optimizations \
        -O0 -msimd128 --no-entry -sRESERVED_FUNCTION_POINTERS=1 --target=wasm64 \
        -sALLOW_MEMORY_GROWTH=1 \
        -sSINGLE_FILE -sTOTAL_MEMORY=4MB \
        -sEXPORT_ES6=1 \
        -sNODERAWFS=0 \
        -sUSE_ES6_IMPORT_META=1 \
        -sNO_DISABLE_EXCEPTION_CATCHING \
        -sDEMANGLE_SUPPORT=1 -sASSERTIONS -frtti \
        -sEXPORTED_RUNTIME_METHODS="['addFunction']" \
        -sALLOW_TABLE_GROWTH=1 \
        -sEXPORTED_FUNCTIONS="[\
            '_malloc', '_free', '_calloc', \
            '_testPromise', \
            '_emx_promise_then', '_emx_promise_resolve', '_emx_promise_create'\
        ]"

This adds support for `co_await`-ing Promises represented by `emscripten::val`. The surrounding coroutine should also return `emscripten::val`, which will be a promise representing the whole coroutine's return value. Note that this feature uses LLVM coroutines and so, doesn't depend on either Asyncify or JSPI. It doesn't pause the entire program, but only the coroutine itself, so it serves somewhat different usecases even though all those features operate on promises. Nevertheless, if you are not implementing a syscall that must behave as-if it was synchronous, but instead simply want to await on some async operations and return a new promise to the user, this feature will be much more efficient. Here's a simple benchmark measuring runtime overhead from awaiting on a no-op Promise repeatedly in a deep call stack: ```cpp using namespace emscripten; // clang-format off EM_JS(EM_VAL, wait_impl, (), { return Emval.toHandle(Promise.resolve()); }); // clang-format on val wait() { return val::take_ownership(wait_impl()); } val coro_co_await(int depth) { co_await wait(); if (depth > 0) { co_await coro_co_await(depth - 1); } co_return val(); } val asyncify_val_await(int depth) { wait().await(); if (depth > 0) { asyncify_val_await(depth - 1); } return val(); } EMSCRIPTEN_BINDINGS(bench) { function("coro_co_await", coro_co_await); function("asyncify_val_await", asyncify_val_await, async()); } ``` And the JS runner also comparing with pure-JS implementation: ```js import Benchmark from 'benchmark'; import initModule from './async-bench.mjs'; let Module = await initModule(); let suite = new Benchmark.Suite(); function addAsyncBench(name, func) { suite.add(name, { defer: true, fn: (deferred) => func(1000).then(() => deferred.resolve()), }); } for (const name of ['coro_co_await', 'asyncify_val_await']) { addAsyncBench(name, Module[name]); } addAsyncBench('pure_js', async function pure_js(depth) { await Promise.resolve(); if (depth > 0) { await pure_js(depth - 1); } }); suite .on('cycle', function (event) { console.log(String(event.target)); }) .run({async: true}); ``` Results with regular Asyncify (I had to bump up `ASYNCIFY_STACK_SIZE` to accomodate said deep stack): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY -s ASYNCIFY_STACK_SIZE=1000000 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug async-bench-runner.mjs coro_co_await x 727 ops/sec ±10.59% (47 runs sampled) asyncify_val_await x 58.05 ops/sec ±6.91% (53 runs sampled) pure_js x 3,022 ops/sec ±8.06% (52 runs sampled) ``` Results with JSPI (I had to disable `DYNAMIC_EXECUTION` because I was getting "RuntimeError: table index is out of bounds" in random places depending on optimisation mode - JSPI miscompilation?): ```bash > ./emcc async-bench.cpp -std=c++20 -O3 -o async-bench.mjs --bind -s ASYNCIFY=2 -s DYNAMIC_EXECUTION=0 > node --no-liftoff --no-wasm-tier-up --no-wasm-lazy-compilation --no-sparkplug --experimental-wasm-stack-switching async-bench-runner.mjs coro_co_await x 955 ops/sec ±9.25% (62 runs sampled) asyncify_val_await x 924 ops/sec ±8.27% (62 runs sampled) pure_js x 3,258 ops/sec ±8.98% (53 runs sampled) ``` So the performance is much faster than regular Asyncify, and on par with JSPI. Fixes emscripten-core#20413.

RReverser mentioned this issue Oct 9, 2023

C++20 co_await support for Embind promises #20420

Merged

RReverser closed this as completed in #20420 Nov 3, 2023

fwcd mentioned this issue Mar 11, 2024

Port Mixxx to Emscripten/WebAssembly fwcd/m1xxx#69

Open

13 tasks

5cript mentioned this issue Sep 18, 2024

Detour on_message webview/webview#1147

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++20 coroutines + Native webassembly promise integration #20413

C++20 coroutines + Native webassembly promise integration #20413

u2re-dev commented Oct 8, 2023

tlively commented Oct 8, 2023

RReverser commented Oct 8, 2023 •

edited

Loading

RReverser commented Oct 8, 2023 •

edited

Loading

RReverser commented Oct 9, 2023

RReverser commented Oct 9, 2023

RReverser commented Oct 9, 2023

RReverser commented Oct 9, 2023

u2re-dev commented Oct 10, 2023 •

edited

Loading

RReverser commented Oct 10, 2023

u2re-dev commented Oct 10, 2023 •

edited

Loading

C++20 coroutines + Native webassembly promise integration #20413

C++20 coroutines + Native webassembly promise integration #20413

Comments

u2re-dev commented Oct 8, 2023

tlively commented Oct 8, 2023

RReverser commented Oct 8, 2023 • edited Loading

RReverser commented Oct 8, 2023 • edited Loading

RReverser commented Oct 9, 2023

RReverser commented Oct 9, 2023

RReverser commented Oct 9, 2023

RReverser commented Oct 9, 2023

u2re-dev commented Oct 10, 2023 • edited Loading

RReverser commented Oct 10, 2023

u2re-dev commented Oct 10, 2023 • edited Loading

RReverser commented Oct 8, 2023 •

edited

Loading

RReverser commented Oct 8, 2023 •

edited

Loading

u2re-dev commented Oct 10, 2023 •

edited

Loading

u2re-dev commented Oct 10, 2023 •

edited

Loading