Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization: big immutable locals are not promoted to constants #136218

Open
WaffleLapkin opened this issue Jan 28, 2025 · 1 comment
Open
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@WaffleLapkin
Copy link
Member

I tried this code, on riscv32i-unknown-none-elf (this code is originally from #136216):

#![no_std]

pub enum Idx { _0, _1, _2, _3 }

#[no_mangle]
pub unsafe fn lookup_local(x: Idx) -> u8 {
    [0xEF, 0x0E, 0x0D, 0xBD][x as usize]
}

#[no_mangle]
pub unsafe fn lookup_const(x: Idx) -> u8 {
    (const { [0xEF, 0x0E, 0x0D, 0xBD] })[x as usize]
}

I expected to see this happen: compiler outputs identical asm for both functions.

Instead, this happened: the asm for lookup_local is suboptimal (godbolt):

lookup_local:
        addi    sp, sp, -16
        li      a1, -17
        sb      a1, 12(sp)
        li      a1, 14
        sb      a1, 13(sp)
        li      a1, 13
        sb      a1, 14(sp)
        li      a1, 189
        sb      a1, 15(sp)
        addi    a1, sp, 12
        add     a0, a1, a0
        lbu     a0, 0(a0)
        addi    sp, sp, 16
        ret

lookup_const:
        lui     a1, %hi(.L__unnamed_1)
        addi    a1, a1, %lo(.L__unnamed_1)
        add     a0, a1, a0
        lbu     a0, 0(a0)
        ret

.L__unnamed_1:
        .ascii  "\357\016\r\275"

Instead of a using constant, lookup_local constructs the array on the stack.

This might be a missed optimization in the LLVM, but we could also potentially handle it in rustc (and personally I feel like the latter might be easier).

Meta

rustc --version --verbose:

rustc 1.86.0-nightly (2f348cb7c 2025-01-27)
binary: rustc
commit-hash: 2f348cb7ce4063fa4eb40038e6ada3c5214717bd
commit-date: 2025-01-27
host: x86_64-unknown-linux-gnu
release: 1.86.0-nightly
LLVM version: 19.1.7
above is the code from godbolt, locally I get slightly different output

rustc +nightly ./t.rs --target riscv32i-unknown-none-elf -Copt-level=3 -Zexport-executable-symbols, objdump --disassemble t

#![no_std]
#![no_main]

pub enum Idx {
    _0,
    _1,
    _2,
    _3,
}

#[no_mangle]
#[inline(never)]
pub unsafe fn lookup_local(x: Idx) -> u8 {
    [0xEF, 0x0E, 0x0D, 0xBD][x as usize]
}

#[inline(never)]
#[no_mangle]
pub unsafe fn lookup_const(x: Idx) -> u8 {
    (const { [0xEF, 0x0E, 0x0D, 0xBD] })[x as usize]
}

#[panic_handler]
fn disco_loop(_: &core::panic::PanicInfo) -> ! {
    loop {}
}
000110d8 <lookup_local>:
   110d8: ff010113     	addi	sp, sp, -0x10
   110dc: fef00593     	li	a1, -0x11
   110e0: 00b10623     	sb	a1, 0xc(sp)
   110e4: 00e00593     	li	a1, 0xe
   110e8: 00b106a3     	sb	a1, 0xd(sp)
   110ec: 00d00593     	li	a1, 0xd
   110f0: 00b10723     	sb	a1, 0xe(sp)
   110f4: 0bd00593     	li	a1, 0xbd
   110f8: 00b107a3     	sb	a1, 0xf(sp)
   110fc: 00c10593     	addi	a1, sp, 0xc
   11100: 00a58533     	add	a0, a1, a0
   11104: 00054503     	lbu	a0, 0x0(a0)
   11108: 01010113     	addi	sp, sp, 0x10
   1110c: 00008067     	ret

00011110 <lookup_const>:
   11110: 000105b7     	lui	a1, 0x10
   11114: 0d458593     	addi	a1, a1, 0xd4
   11118: 00a58533     	add	a0, a1, a0
   1111c: 00054503     	lbu	a0, 0x0(a0)
   11120: 00008067     	ret

@WaffleLapkin WaffleLapkin added A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jan 28, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 28, 2025
@saethlin saethlin added C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 29, 2025
@jieyouxu jieyouxu added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 29, 2025
@faptc
Copy link

faptc commented Mar 19, 2025

See also this closed issue #73825 and the associated zulip thread https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F

TLDR: Rust defers those optimizations to LLVM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants