Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WASM build #262

Draft
wants to merge 64 commits into
base: main
Choose a base branch
from
Draft

WASM build #262

wants to merge 64 commits into from

Conversation

agriyakhetarpal
Copy link

@agriyakhetarpal agriyakhetarpal commented Feb 27, 2025

Closes #234

  • Build
    • libgmp
    • libmpfr
    • flint
    • python-flint
  • Test

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Partially verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
We cannot verify signatures from co-authors, and some of the co-authors attributed to this commit require their commits to be signed.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@agriyakhetarpal
Copy link
Author

I've been testing it a bit through a PR in my fork, and it all builds well, up to one point in the Cython sources where it fails in step 107/114: https://github.com/agriyakhetarpal/python-flint/actions/runs/13576857716/job/37954995494?pr=1

with the following error trace:

FAILED: src/flint/types/_gr.cpython-312-wasm32-emscripten.so.p/meson-generated_src_flint_types__gr.pyx.c.o 
/tmp/tmp8t79rr74/cc -Isrc/flint/types/_gr.cpython-312-wasm32-emscripten.so.p -Isrc/flint/types -I../src/flint/types -I/opt/hostedtoolcache/Python/3.12.9/x64/include/python3.12 -I/home/runner/work/python-flint/python-flint/wasm-library-dir/include -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -fPIC -MD -MQ src/flint/types/_gr.cpython-312-wasm32-emscripten.so.p/meson-generated_src_flint_types__gr.pyx.c.o -MF src/flint/types/_gr.cpython-312-wasm32-emscripten.so.p/meson-generated_src_flint_types__gr.pyx.c.o.d -o src/flint/types/_gr.cpython-312-wasm32-emscripten.so.p/meson-generated_src_flint_types__gr.pyx.c.o -c src/flint/types/_gr.cpython-312-wasm32-emscripten.so.p/src/flint/types/_gr.pyx.c
src/flint/types/_gr.cpython-312-wasm32-emscripten.so.p/src/flint/types/_gr.pyx.c:19051:65: error: incompatible integer to pointer conversion passing 'mp_limb_t' (aka 'unsigned long') to parameter of type 'const fmpz *' (aka 'const long *') [-Wint-conversion]
 19051 |   gr_ctx_init_fq_nmod(__pyx_v_ctx->__pyx_base.__pyx_base.ctx_t, __pyx_v_p, __pyx_v_d, __pyx_v_name);
       |                                                                 ^~~~~~~~~
/home/runner/work/python-flint/python-flint/wasm-library-dir/include/flint/gr.h:1366:53: note: passing argument to parameter 'p' here
 1366 | void gr_ctx_init_fq_nmod(gr_ctx_t ctx, const fmpz_t p, slong d, const char * var);
      |                                                     ^
src/flint/types/_gr.cpython-312-wasm32-emscripten.so.p/src/flint/types/_gr.pyx.c:19150:65: error: incompatible integer to pointer conversion passing 'mp_limb_t' (aka 'unsigned long') to parameter of type 'const fmpz *' (aka 'const long *') [-Wint-conversion]
 19150 |   gr_ctx_init_fq_zech(__pyx_v_ctx->__pyx_base.__pyx_base.ctx_t, __pyx_v_p, __pyx_v_d, __pyx_v_name);
       |                                                                 ^~~~~~~~~
/home/runner/work/python-flint/python-flint/wasm-library-dir/include/flint/gr.h:1367:53: note: passing argument to parameter 'p' here
 1367 | void gr_ctx_init_fq_zech(gr_ctx_t ctx, const fmpz_t p, slong d, const char * var);
      |                                                     ^
2 errors generated.

@agriyakhetarpal
Copy link
Author

FWIW; I hardly have any Cython development experience, so I will have to wrap my head around this a bit, unless someone has an idea for a fix. The basis for the failure here is that WASM is stricter about type conversions than compilation for other platforms, which is why this mismatch is causing errors.

@agriyakhetarpal
Copy link
Author

Okay – the cause of the error, more specifically, is that gr_fq_nmod_ctx and gr_fq_zech_ctx pass a ulong parameter p

@cython.no_gc
cdef class gr_fq_zech_ctx(gr_scalar_ctx):
cdef ulong p
cdef slong d
@staticmethod
cdef inline gr_fq_zech_ctx _new(ulong p, slong d, char* name):
cdef gr_fq_zech_ctx ctx
ctx = gr_fq_zech_ctx.__new__(gr_fq_zech_ctx)
ctx.p = p
ctx.d = d
gr_ctx_init_fq_zech(ctx.ctx_t, p, d, name)
ctx._init = True
return ctx

but flint/gr.h has fmpz_t in version 3.0.1: https://github.com/flintlib/flint/blob/c05007be863b1aae7221f11a9c135d673d968638/src/gr.h#L1366-L1367

and this has apparently been fixed in version v3.1.0. I'll try updating flint to v3.1.0 (before proceeding to try out v3.1.2, if I do)

@agriyakhetarpal
Copy link
Author

Successful build: https://github.com/agriyakhetarpal/python-flint/actions/runs/13577635441?pr=1

The next task is to run tests, which I can take a look at later in the day.

@oscarbenjamin
Copy link
Collaborator

I haven't looked through this yet but thanks!

@oscarbenjamin
Copy link
Collaborator

but flint/gr.h has fmpz_t in version 3.0.1:

Best to keep with latest FLINT version for the pyodide build. There is no need to support a range of versions in pyodide. Generally python-flint tracks the latest version but has some #ifdef type stuff for older versions so that someone could build against them. No one is going to build against old versions in WASM though.

@agriyakhetarpal
Copy link
Author

Sounds good to me! I think the tests will have a bunch of function signature mismatches, though 😬 based on what I noticed through my fork's run. Also, building itself takes ~20 minutes in total – we could get to half of that if we have prebuilt WASM binaries. I guess they could be hosted somewhere in a GitHub release when things are ready.

@agriyakhetarpal
Copy link
Author

This can take a bit of time. I'm disabling tests and running it in on my fork to see how widespread the signature mismatches are, and I managed to get test_all.py to run till the end – now we have some more in test_docstrings.py. My next tasks will be to debug these mismatches, which I haven't had fun doing, as, i. compilation of these libraries to WASM is messy on a macOS and ii. a test here contains several assert statements, instead of several tests containing a few of them. Please bear with me as I proceed with this slowly and steadily. We could also wholly ignore test_docstrings.py if you're not too keen on that being tested. (I think it's good to test, though).

@oscarbenjamin
Copy link
Collaborator

I don't quite understand what the problem with the tests is. Is it just caused by the Cython/C signatures not matching?

There is now FLINT 3.2.0-rc1. We could perhaps update all the Cython declarations to that version and use that for the WASM build. I'm not sure when 3.2.0 final will be released but we would want to update to that straight away when it is.

@agriyakhetarpal
Copy link
Author

agriyakhetarpal commented Mar 3, 2025

I don't quite understand what the problem with the tests is. Is it just caused by the Cython/C signatures not matching?

Yes, majorly. If there's even a slight mismatch between the upstream signatures for Flint and the ones we have here, WASM's type safety guarantees that it will terminate the program, which makes Pyodide raise a fatal error. These can happen at the time of linking (like when it happened with version 3.0.1 above), or at runtime, like we are facing now. I have to drop these completely instead of xfailing them as the running Pyodide interpreter is no longer possible to be used after it has encountered fatal errors.

There are so many of these with SciPy, especially from its wrapping of Fortran libraries that we can't compile to WASM directly without having to f2c them over. I hope the new Flang from LLVM-19 can do this better: pyodide/pyodide#5268

There is now FLINT 3.2.0-rc1. We could perhaps update all the Cython declarations to that version and use that for the WASM build. I'm not sure when 3.2.0 final will be released but we would want to update to that straight away when it is.

Yes, this sounds good to me, and I hope they've fixed a few of these upstream. I could start another PR for this when I have enough time to do so; I assume it will help me learn Cython a bit. :)

@oscarbenjamin
Copy link
Collaborator

I've opened gh-264 to bump the FLINT version to 3.2.0-rc1 and update all of the Cython declarations.

The Cython declarations are set from the FLINT docs by running the bin/all_rst_to_pxd.py script. There can often be mismatches resulting from parsing the signatures from the docs rather than the actual C code e.g. the docs may be inaccurate but usually in the Cython -> C -> compiled extension modules the differences even out largely because a lot of FLINT's functions are not actually called directly by python-flint.

@oscarbenjamin
Copy link
Collaborator

I've opened gh-264 to bump the FLINT version to 3.2.0-rc1 and update all of the Cython declarations.

I've just merged this to main. If you rebase/merge with main then this PR will get the updated declarations.

There might still be some wrong declarations that we can fix manually for now but then upstream to FLINT afterwards.

@oscarbenjamin
Copy link
Collaborator

It seems to crash on the third or fourth of these lines:

F_cmp = fmpz_mod_ctx(10)
F_sml = fmpz_mod_ctx(p_sml)
F_med = fmpz_mod_ctx(p_med)
F_big = fmpz_mod_ctx(p_big)

That is just constructing the fmpz_mod context. It fails for the larger modulus but the same functions (in terms of FLINT's C API) are called for all moduli. Looking at it though I see:
cdef class fmpz_mod_ctx:
r"""
Context object for creating :class:`~.fmpz_mod` initialised
with a modulus :math:`N`.
>>> fmpz_mod_ctx(2**127 - 1)
fmpz_mod_ctx(170141183460469231731687303715884105727)
"""
def __cinit__(self):
cdef fmpz one = fmpz.__new__(fmpz)
fmpz_one(one.val)
fmpz_mod_ctx_init(self.val, one.val)
fmpz_mod_discrete_log_pohlig_hellman_clear(self.L)
self._is_prime = 0

I think that the call to fmpz_mod_discrete_log_pohlig_hellman_clear is wrong. It should be fmpz_mod_discrete_log_pohlig_hellman_init. That line was added in gh-95 which was apparently to prevent a Windows crash seen at the time. That code doesn't use the modulus though so it should not matter whether the modulus is large or small...

The modulus comes in here:

def __init__(self, mod):
# Ensure modulus is fmpz type
if not typecheck(mod, fmpz):
mod = any_as_fmpz(mod)
if mod is NotImplemented:
raise TypeError(
"Context modulus must be able to be cast to an `fmpz` type"
)
# Ensure modulus is positive
if mod < 1:
raise ValueError("Modulus is expected to be positive")
# Set the modulus
fmpz_mod_ctx_set_modulus(self.val, (<fmpz>mod).val)
# Check whether the modulus is prime
# TODO: should we use a stronger test?
self._is_prime = fmpz_is_probabprime(self.val.n)

I don't think that any of those functions would have a signature mismatch.

I'm a bit confused about exactly what is happening but the only thing I can immediately see that seems wrong is that call to fmpz_mod_discrete_log_pohlig_hellman_clear.

@oscarbenjamin
Copy link
Collaborator

Depending on the modulus fmpz_mod_ctx_init assigns different function pointers into the context struct:
https://github.com/flintlib/flint/blob/277980c37e22978f05f64e35a317f21ba9d7bcb5/src/fmpz_mod/ctx.c#L16-L75
If that is what trips pyodide's type checking then it would make sense that this fails for a particular modulus. So far the crashes seem consistent with saying that it crashes for medium modulus but not small modulus. I'm testing now whether it also crashes for large modulus.

@oscarbenjamin
Copy link
Collaborator

I'm not sure why but apparently this line causes the main problem:

self._is_prime = fmpz_is_probabprime(self.val.n)

At least if it is commented out then we don't see the crash... (although other tests fail)

Comment on lines +63 to +66
def use_fmpz_is_probabprime():
cdef fmpz p
p = fmpz(2**127 - 1)
return fmpz_is_probabprime(p.val)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just calling this function is enough to cause the crash:

Pyodide has suffered a fatal error. Please report this to the Pyodide maintainers.
The cause of the fatal error was:
RuntimeError: null function or function signature mismatch
    at wasm://wasm/02e45b4a:wasm-function[1550]:0x1de118
../.venv-pyodide/lib/python3.12/site-packages/flint/test/test_all.py::test_use_fmpz_is_probabprime Stack (most recent call first):
    at wasm://wasm/02e45b4a:wasm-function[1547]:0x1de04f
  File "/home/runner/work/python-flint/python-flint/.venv-pyodide/lib/python3.12/site-packages/flint/test/test_all.py", line 4682 in test_use_fmpz_is_probabprime
  File "/home/runner/work/python-flint/python-flint/.venv-pyodide/lib/python3.12/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
    at wasm://wasm/02e45b4a:wasm-function[1531]:0x1dcd97
    at wasm://wasm/02e45b4a:wasm-function[1530]:0x1dcba1
    at wasm://wasm/02e45b4a:wasm-function[1521]:0x1dae4b
    at wasm://wasm/02e45b4a:wasm-function[1522]:0x1db7dd
    at wasm://wasm/02e45b4a:wasm-function[313]:0xa924e
    at wasm://wasm/02e45b4a:wasm-function[220]:0x9dce6
    at wasm://wasm/0268a37a:wasm-function[1160]:0x1c4cf8
    at wasm://wasm/0268a37a:wasm-function[3428]:0x2a2a6d {
  pyodide_fatal_error: true
}

@oscarbenjamin
Copy link
Collaborator

It would be great if we could see what the functions in the stacktrace are but their names are all replaced with numbers in the minified wasm build. What I don't understand is if the mismatch occurs exactly when calling fmpz_is_probabprime or if it is during some internal call between FLINT functions.

Note that the crash occurs when the fmpz has a larger value. This is important because the FLINT fmpz type can be either an inline integer or a pointer to a multiprecision integer. In python-flint the C type is declared as:

ctypedef slong fmpz_struct
ctypedef fmpz_struct fmpz_t[1]

This is specifying that an fmpz_t is an array of one slong which means an unsigned long that is 32 bits in a wasm32 bit build. The type of p.val is declared as fmpz_t:
cdef class fmpz(flint_scalar):
"""
The *fmpz* type represents an arbitrary-size integer.
>>> fmpz(3) ** 25
847288609443
"""
cdef fmpz_t val

Ultimately the C code generated for this struct says that val is of type slong[1].

When we call fmpz_is_probabprime(p.val) for large p the bytes at .val really represent a pointer to a multiprecision integer rather than an inline slong so somewhere there needs to be a cast from an integer type to a pointer type. I wonder if that is what is causing the signature mismatch here but I don't really understand how wasm's runtime signature checking works. This doesn't crash for small modulus which suggests that the signature is okay when calling fmpz_is_probabprime with the integer value so maybe the problematic cast/call is actually within FLINT rather than when python-flint calls into the FLINT shared library?

Note that in the browser at sympy live you can use python-flint 0.6.0 along with FLINT 3.1 and this works fine there:

>>> import flint
>>> flint.fmpz(2**127 - 1).is_probable_prime()
1

Also there the whole test suite passes at least if the doctests are skipped (if they are not skipped then it seems to hang and not run them):

from flint.test.__main__ import main
main('-t')

I'm at a bit of a loss here. Maybe better debug output would help.

@oscarbenjamin
Copy link
Collaborator

This should go from FLINT main to rc2: flintlib/flint#2247 (comment)

@agriyakhetarpal
Copy link
Author

I'm going to try diving deeper into fmpz_is_probabprime with a debug Pyodide build; I will post the results in a few hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nightly wheels and WASM/pyodide builds
3 participants