Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BOLT optimizations fail on Linux aarch64 #128884

Open
zanieb opened this issue Jan 15, 2025 · 8 comments
Open

BOLT optimizations fail on Linux aarch64 #128884

zanieb opened this issue Jan 15, 2025 · 8 comments
Labels
build The build process and cross-build type-bug An unexpected behavior, bug, or error

Comments

@zanieb
Copy link
Contributor

zanieb commented Jan 15, 2025

Bug report

Bug description:

When running the --pgo test suite with a BOLT instrumented binary, the interpreter crashes with

./python -m test --pgo --rerun --verbose3 --timeout=
python: ../cpython-ro-srcdir/Python/generated_cases.c.h:1074: _PyEval_EvalFrameDefault: Assertion `tp->tp_alloc == PyType_GenericAlloc' failed.
Aborted (core dumped)

I find this surprising since we include _PyEval_EvalFrameDefault in the BOLT skip functions — but am not familiar with the details.

The following patch successfully worked around that error

diff --git a/Python/generated_cases.c.h b/Python/generated_cases.c.h
index 810beb61d0d..d24add0afab 100644
--- a/Python/generated_cases.c.h
+++ b/Python/generated_cases.c.h
@@ -1071,7 +1071,7 @@
                 DEOPT_IF(FT_ATOMIC_LOAD_UINT32_RELAXED(tp->tp_version_tag) != type_version, CALL);
                 assert(tp->tp_new == PyBaseObject_Type.tp_new);
                 assert(tp->tp_flags & Py_TPFLAGS_HEAPTYPE);
-                assert(tp->tp_alloc == PyType_GenericAlloc);
+                assert(tp->tp_alloc == PyBaseObject_Type.tp_alloc);
                 PyHeapTypeObject *cls = (PyHeapTypeObject *)callable_o;
                 PyFunctionObject *init_func = (PyFunctionObject *)FT_ATOMIC_LOAD_PTR_ACQUIRE(cls->_spec_cache.init);
                 PyCodeObject *code = (PyCodeObject *)init_func->func_code;

Then, the profiling test run succeeded, but BOLT crashed during the apply step.

# Run bolt against the merged data to produce an optimized binary.
for bin in python; do \
  /usr/lib/llvm-19/bin/llvm-bolt "${bin}.prebolt" -o "${bin}.bolt" -data="${bin}.fdata" -update-debug-sections -skip-funcs=_PyEval_EvalFrameDefault,sre_ucs1_match/1,sre_ucs2_match/1,sre_ucs4_match/1  -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot ; \
  mv "${bin}.bolt" "${bin}"; \
done
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: <unknown>
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: enabling relocation mode
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-INFO: number of removed linker-inserted veneers: 0
BOLT-INFO: 8500 out of 12058 functions in the binary (70.5%) have non-empty execution profile
BOLT-INFO: 41 functions with profile could not be optimized
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: removed 1 empty block
BOLT-INFO: ICF folded 678 out of 12439 functions in 5 passes. 0 functions had jump tables.
BOLT-INFO: Removing all identical functions will save 46.23 KB of code space. Folded functions were called 3909549484 times based on profile.
BOLT-INFO: ICP Total indirect calls = 1808544446, 153 callsites cover 99% of all indirect calls
 #0 0x0000aacc1be768cc (/usr/lib/llvm-19/bin/llvm-bolt+0x1ae68cc)
 #1 0x0000aacc1be74b80 (/usr/lib/llvm-19/bin/llvm-bolt+0x1ae4b80)
 #2 0x0000aacc1be77174 (/usr/lib/llvm-19/bin/llvm-bolt+0x1ae7174)
 #3 0x0000ff03feee37e0 (linux-vdso.so.1+0x7e0)
 #4 0x0000aacc1c397200 (/usr/lib/llvm-19/bin/llvm-bolt+0x2007200)
 #5 0x0000aacc1c39aa1c (/usr/lib/llvm-19/bin/llvm-bolt+0x200aa1c)
 #6 0x0000aacc1c39a9e4 (/usr/lib/llvm-19/bin/llvm-bolt+0x200a9e4)
 #7 0x0000aacc1c39a9e4 (/usr/lib/llvm-19/bin/llvm-bolt+0x200a9e4)
 #8 0x0000aacc1bf1ebc4 (/usr/lib/llvm-19/bin/llvm-bolt+0x1b8ebc4)
 #9 0x0000aacc1bf21328 (/usr/lib/llvm-19/bin/llvm-bolt+0x1b91328)
#10 0x0000aacc1becfe3c (/usr/lib/llvm-19/bin/llvm-bolt+0x1b3fe3c)
#11 0x0000aacc1aadf2f0 (/usr/lib/llvm-19/bin/llvm-bolt+0x74f2f0)
#12 0x0000ff03fe8684c4 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#13 0x0000ff03fe868598 call_init ./csu/../csu/libc-start.c:128:20
#14 0x0000ff03fe868598 __libc_start_main ./csu/../csu/libc-start.c:347:5
#15 0x0000aacc1aadd4f0 (/usr/lib/llvm-19/bin/llvm-bolt+0x74d4f0)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /usr/lib/llvm-19/bin/llvm-bolt python.prebolt -o python.bolt -data=python.fdata -update-debug-sections -skip-funcs=_PyEval_EvalFrameDefault,sre_ucs1_match/1,sre_ucs2_match/1,sre_ucs4_match/1 -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
Segmentation fault (core dumped)

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

@zanieb zanieb added build The build process and cross-build type-bug An unexpected behavior, bug, or error labels Jan 15, 2025
@zanieb
Copy link
Contributor Author

zanieb commented Jan 15, 2025

I suspect the BOLT segfault is a case of llvm/llvm-project#121554, but have not confirmed.

@liusy58
Copy link

liusy58 commented Jan 16, 2025

Please give me more details such as BOLT version, thx!

@zanieb
Copy link
Contributor Author

zanieb commented Jan 16, 2025

Hey @liusy58! This was reproduced with GitHub Actions and subsequently disabled — if you have access to aarch64 GitHub Actions runners you can reproduce that way. Otherwise, this was using llvm-19 on Ubuntu, installed with:

sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)" ./llvm.sh 19
sudo apt-get install bolt-19
export PATH="$(llvm-config-19 --bindir):$PATH"

You can see the details in #128845

I can try to reproduce in Docker too, but I don't have a Linux aarch64 machine around.

@zanieb
Copy link
Contributor Author

zanieb commented Jan 16, 2025

Here's a reproduction of the segfault on llvm-20 on Debian (presumes you're in the CPython source tree)

ARG DEBIAN_VERSION=unstable

FROM debian:${DEBIAN_VERSION}

ARG LLVM_VERSION=20
ENV LLVM_VERSION=${LLVM_VERSION}

ARG DEBIAN_VERSION
ENV DEBIAN_VERSION=${DEBIAN_VERSION}

# Add LLVM GPG key
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates wget gnupg \
    && rm -rf /var/lib/apt/lists/* \
    && mkdir -p /etc/apt/keyrings \
    && wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | gpg --dearmor -o /etc/apt/keyrings/llvm.gpg

# Add LLVM repositories
RUN if [ "${DEBIAN_VERSION}" = "unstable" ]; then \
        # When using `unstable`, the suffix is omitted
        DIST_SUFFIX=""; \
    else \
        DIST_SUFFIX="-${DEBIAN_VERSION}"; \
    fi && \
    if [ "${LLVM_VERSION}" = "20" ]; then \
        # When using `20`, the suffix is omitted
        LLVM_SUFFIX=""; \
    else \
        LLVM_SUFFIX="-${LLVM_VERSION}"; \
    fi && \
    echo "deb [signed-by=/etc/apt/keyrings/llvm.gpg] http://apt.llvm.org/${DEBIAN_VERSION}/ llvm-toolchain${DIST_SUFFIX} main" >> /etc/apt/sources.list.d/llvm.list && \
    echo "deb-src [signed-by=/etc/apt/keyrings/llvm.gpg] http://apt.llvm.org/${DEBIAN_VERSION}/ llvm-toolchain${DIST_SUFFIX} main" >> /etc/apt/sources.list.d/llvm.list && \
    echo "deb [signed-by=/etc/apt/keyrings/llvm.gpg] http://apt.llvm.org/${DEBIAN_VERSION}/ llvm-toolchain${DIST_SUFFIX}${LLVM_SUFFIX} main" >> /etc/apt/sources.list.d/llvm.list && \
    echo "deb-src [signed-by=/etc/apt/keyrings/llvm.gpg] http://apt.llvm.org/${DEBIAN_VERSION}/ llvm-toolchain${DIST_SUFFIX}${LLVM_SUFFIX} main" >> /etc/apt/sources.list.d/llvm.list

# Install LLVM
RUN apt-get update -y && apt-get install -y \
    libllvm-${LLVM_VERSION}-ocaml-dev \
    libllvm${LLVM_VERSION} \
    llvm-${LLVM_VERSION} \
    llvm-${LLVM_VERSION}-dev \
    llvm-${LLVM_VERSION}-doc \
    llvm-${LLVM_VERSION}-examples \
    llvm-${LLVM_VERSION}-runtime \
    clang-${LLVM_VERSION} \
    clang-tools-${LLVM_VERSION} \
    clang-${LLVM_VERSION}-doc \
    libclang-common-${LLVM_VERSION}-dev \
    libclang-${LLVM_VERSION}-dev \
    libclang1-${LLVM_VERSION} \
    clang-format-${LLVM_VERSION} \
    python3-clang-${LLVM_VERSION} \
    clangd-${LLVM_VERSION} \
    clang-tidy-${LLVM_VERSION} \
    lldb-${LLVM_VERSION} \
    lld-${LLVM_VERSION} \
    libc++-${LLVM_VERSION}-dev \
    libc++abi-${LLVM_VERSION}-dev \
    libbolt-${LLVM_VERSION}-dev \
    bolt-${LLVM_VERSION}

# Install Python build dependencies
RUN apt-get install -y \
    make \
    libc6 \
    build-essential \
    pkg-config \
    ccache \
    gdb \
    lcov \
    libb2-dev \
    libbz2-dev \
    libffi-dev \
    libgdbm-dev \
    libgdbm-compat-dev \
    liblzma-dev \
    libncurses5-dev \
    libreadline6-dev \
    libsqlite3-dev \
    libssl-dev \
    lzma \
    strace \
    tk-dev \
    uuid-dev \
    xvfb \
    zlib1g-dev


ADD . /cpython
WORKDIR /cpython

ENV CC=clang
ENV CXX=clang++
ENV PATH="/usr/lib/llvm-${LLVM_VERSION}/bin/:$PATH"

# Build CPython 
RUN ./configure \
    --with-openssl="/usr/include/openssl" \
    --enable-slower-safety \
    --enable-safety \
    --enable-bolt

RUN make clean
RUN make -j8

Unlike llvm/llvm-project#121554, this is not "fixed" by setting disabling -update-debug-sections with the configure option:

BOLT_COMMON_FLAGS="-skip-funcs=_PyEval_EvalFrameDefault,sre_ucs1_match/1,sre_ucs2_match/1,sre_ucs4_match/1"

The stacktrace is

 BOLT-INFO: ICP Total indirect calls = 879576737, 135 callsites cover 99% of all indirect calls
  #0 0x0000aaaad325f958 (/usr/lib/llvm-20/bin/llvm-bolt+0x1c0f958)
  #1 0x0000aaaad325dc14 (/usr/lib/llvm-20/bin/llvm-bolt+0x1c0dc14)
  #2 0x0000aaaad3260064 (/usr/lib/llvm-20/bin/llvm-bolt+0x1c10064)
  #3 0x0000ffffa82377a0 (linux-vdso.so.1+0x7a0)
  #4 0x0000aaaad37d5208 (/usr/lib/llvm-20/bin/llvm-bolt+0x2185208)
  #5 0x0000aaaad37d8a80 (/usr/lib/llvm-20/bin/llvm-bolt+0x2188a80)
  #6 0x0000aaaad33138b0 (/usr/lib/llvm-20/bin/llvm-bolt+0x1cc38b0)
  #7 0x0000aaaad33176e8 (/usr/lib/llvm-20/bin/llvm-bolt+0x1cc76e8)
  #8 0x0000aaaad32c2c00 (/usr/lib/llvm-20/bin/llvm-bolt+0x1c72c00)
  #9 0x0000aaaad1e26cd0 (/usr/lib/llvm-20/bin/llvm-bolt+0x7d6cd0)
 #10 0x0000ffffa623229c __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
 #11 0x0000ffffa623237c call_init ./csu/../csu/libc-start.c:128:20
 #12 0x0000ffffa623237c __libc_start_main ./csu/../csu/libc-start.c:347:5
 #13 0x0000aaaad1e24e70 (/usr/lib/llvm-20/bin/llvm-bolt+0x7d4e70)
 PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
 Stack dump:
 0.	Program arguments: /usr/lib/llvm-20/bin/llvm-bolt python.prebolt -o python.bolt -data=python.fdata -skip-funcs=_PyEval_EvalFrameDefault,sre_ucs1_match/1,sre_ucs2_match/1,sre_ucs4_match/1 -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot

@zanieb
Copy link
Contributor Author

zanieb commented Jan 16, 2025

And here's a stack with LLVM 19.6.1 built from source so we get some symbols

106.0  #0 0x0000aaaaab302010 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/bin/llvm-bolt+0x832010)
106.0  #1 0x0000aaaaab3000f0 llvm::sys::RunSignalHandlers() (/usr/bin/llvm-bolt+0x8300f0)
106.0  #2 0x0000aaaaab30285c SignalHandler(int) Signals.cpp:0:0
106.0  #3 0x0000ffffa6e127a0 (linux-vdso.so.1+0x7a0)
106.0  #4 0x0000aaaaab744220 llvm::bolt::IndirectCallPromotion::rewriteCall(llvm::bolt::BinaryBasicBlock&, llvm::MCInst const&, std::vector<std::pair<llvm::MCSymbol*, std::vector<llvm::MCInst, std::allocator<llvm::MCInst>>>, std::allocator<std::pair<llvm::MCSymbol*, std::vector<llvm::MCInst, std::allocator<llvm::MCInst>>>>>&&, std::vector<llvm::MCInst*, std::allocator<llvm::MCInst*>> const&) const (/usr/bin/llvm-bolt+0xc74220)
106.0  #5 0x0000aaaaab7482b0 llvm::bolt::IndirectCallPromotion::runOnFunctions(llvm::bolt::BinaryContext&) (/usr/bin/llvm-bolt+0xc782b0)
106.0  #6 0x0000aaaaab3ae840 llvm::bolt::BinaryFunctionPassManager::runPasses() (/usr/bin/llvm-bolt+0x8de840)
106.0  #7 0x0000aaaaab3b28c8 llvm::bolt::BinaryFunctionPassManager::runAllPasses(llvm::bolt::BinaryContext&) (/usr/bin/llvm-bolt+0x8e28c8)
106.0  #8 0x0000aaaaab35e2fc llvm::bolt::RewriteInstance::run() (/usr/bin/llvm-bolt+0x88e2fc)
106.0  #9 0x0000aaaaaac60dfc main (/usr/bin/llvm-bolt+0x190dfc)
106.0 #10 0x0000ffffa68a229c __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
106.0 #11 0x0000ffffa68a237c call_init ./csu/../csu/libc-start.c:128:20
106.0 #12 0x0000ffffa68a237c __libc_start_main ./csu/../csu/libc-start.c:347:5
106.0 #13 0x0000aaaaaac5efb0 _start (/usr/bin/llvm-bolt+0x18efb0)
106.0 PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
106.0 Stack dump:
106.0 0.	Program arguments: /usr/bin/llvm-bolt python.prebolt -o python.bolt -data=python.fdata -update-debug-sections -skip-funcs=_PyEval_EvalFrameDefault,sre_ucs1_match/1,sre_ucs2_match/1,sre_ucs4_match/1 -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot

@liusy58
Copy link

liusy58 commented Jan 16, 2025

Thank you~

@aaupov
Copy link

aaupov commented Jan 22, 2025

Currently we can't skip functions for aarch64, and it's not trivial to fix.
llvm/llvm-project#120267 adds support for computed goto, just need to check if it solves the problem.

@zanieb
Copy link
Contributor Author

zanieb commented Jan 22, 2025

Oh good to know what's why those are being ignored!

I'm tracking that computed goto feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants