-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv64 fails to build Python/perf_jit_trampoline.c: Unsupported target architecture #121201
Comments
cc @furkanonder |
configure says:
|
Hummm, looks like we are missing the definitions for the registers for riscv64 here: cpython/Python/perf_jit_trampoline.c Lines 352 to 376 in af8c3d7
@furkanonder can you take a look? Otherwise we may need to deactivate RISKV64 support meanwhile we figure out the DWARF definitions. |
I think it may be enough to add riskv here: cpython/Python/perf_jit_trampoline.c Line 371 in af8c3d7
but I am not sure about the numbers. It seems that they match the aarch64 but I would need a riskv machine to try out. |
I way to try out the numbers is to generate DWARF in riskv for the same function and check the numeric values of DWRF_REG_SP and DWRF_REG_RA |
Deactivated for now in ##121328 |
…-121328) Disable perf_trampoline on riscv64 for now Until support is added in perf_jit_trampoline.c pythongh-120089 was incomplete. (cherry picked from commit ca2e876) Co-authored-by: Stefano Rivera <[email protected]>
The backport automerge failed and is still open. |
According to Table 18.2: RISC-V calling convention register usage. looks like this;
Therefore, I set DWRF_REG_RA to 1 and DWRF_REG_SP to 2. $ git diff
diff --git a/Python/perf_jit_trampoline.c b/Python/perf_jit_trampoline.c
index 0a8945958b..6e30ed2865 100644
--- a/Python/perf_jit_trampoline.c
+++ b/Python/perf_jit_trampoline.c
@@ -371,6 +371,9 @@ enum {
#elif defined(__aarch64__) && defined(__AARCH64EL__) && !defined(__ILP32__)
DWRF_REG_SP = 31,
DWRF_REG_RA = 30,
+#elif defined(__riscv)
+ DWRF_REG_RA = 1,
+ DWRF_REG_SP = 2,
#else
# error "Unsupported target architecture"
#endif I got an another error here, I have no idea about the extra registers. @@ -477,7 +480,7 @@ elf_init_ehframe(ELFObjectContext* ctx)
DWRF_U8(DWRF_CFA_advance_loc | 6);
DWRF_U8(DWRF_CFA_def_cfa_offset); DWRF_UV(8);
/* Extra registers saved for JIT-compiled code. */
-#elif defined(__aarch64__) && defined(__AARCH64EL__) && !defined(__ILP32__)
+#elif (defined(__aarch64__) && defined(__AARCH64EL__) && !defined(__ILP32__)) || defined(__riscv)
DWRF_U8(DWRF_CFA_advance_loc | 1);
DWRF_U8(DWRF_CFA_def_cfa_offset); DWRF_UV(16);
DWRF_U8(DWRF_CFA_offset | 29); DWRF_UV(2); Following these changes, the build was completed successfully. I didn't encounter any failed test cases.
|
@furkanonder Can you create another PR with the previous changes and the changes you just did? We can test that against the builedbot then. Also, please, can you show me the output of |
PR for the changes.
Output: == CPython 3.14.0a0 (heads/main-dirty:1dc9a4f6b2, Jul 2 2024, 23:29:42) [GCC 13.2.0]
== Linux-6.1.81-riscv64-with-glibc2.38 little-endian
== Python build: release
== cwd: /home/dietpi/desktop/cpython/build/test_python_worker_53029æ
== CPU count: 4
== encodings: locale=UTF-8 FS=utf-8
== resources: all test resources are disabled, use -u option to unskip tests
Using random seed: 3537377140
0:00:00 load avg: 4.39 Run 1 test sequentially in a single process
0:00:00 load avg: 4.39 [1/1] test_perf_profiler
test_pre_fork_compile (test.test_perf_profiler.TestPerfProfiler.test_pre_fork_compile) ... skipped "perf command doesn't work"
test_python_calls_appear_in_the_stack_if_perf_activated (test.test_perf_profiler.TestPerfProfiler.test_python_calls_appear_in_the_stack_if_perf_activated) ... skipped "perf command doesn't work"
test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated (test.test_perf_profiler.TestPerfProfiler.test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated) ... skipped "perf command doesn't work"
test_python_calls_appear_in_the_stack_if_perf_activated (test.test_perf_profiler.TestPerfProfilerWithDwarf.test_python_calls_appear_in_the_stack_if_perf_activated) ... skipped "perf command doesn't work"
test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated (test.test_perf_profiler.TestPerfProfilerWithDwarf.test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated) ... skipped "perf command doesn't work"
test_sys_api (test.test_perf_profiler.TestPerfTrampoline.test_sys_api) ... ok
test_sys_api_get_status (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_get_status) ... ok
test_sys_api_with_existing_trampoline (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_with_existing_trampoline) ... ok
test_sys_api_with_invalid_trampoline (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_with_invalid_trampoline) ... ok
test_trampoline_works (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works) ... ok
test_trampoline_works_with_forks (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works_with_forks) ... ok
----------------------------------------------------------------------
Ran 11 tests in 1.052s
OK (skipped=5)
== Tests result: SUCCESS ==
1 test OK.
Total duration: 2.4 sec
Total tests: run=11 skipped=5
Total test files: run=1/1
Result: SUCCESS I followed Python support for the Linux perf profiler to test perf profiling. I think the perf tool is not well supported in my buildbot. dietpi@DietPi:~/desktop/cpython$ cat my_script.py
def foo(n):
result = 0
for _ in range(n):
result += 1
return result
def bar(n):
foo(n)
def baz(n):
bar(n)
if __name__ == "__main__":
baz(1000000)
dietpi@DietPi:~/desktop/cpython$ perf record -F 9999 -g -o perf.data -a ./python my_script.py
Error:
cycles:P: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
dietpi@DietPi:~/desktop/cpython$ dietpi@DietPi:~/desktop/cpython$ perf stat -e cycles -o perf.data ./python my_script.py
dietpi@DietPi:~/desktop/cpython$ cat perf.data
# started on Fri Jul 5 00:40:49 2024
Performance counter stats for './python my_script.py':
742098135 cycles
0.547787000 seconds time elapsed
0.485775000 seconds user
0.010120000 seconds sys
dietpi@DietPi:~/desktop/cpython$ $ perf list
List of pre-defined events (to be used in -e or -M):
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
stalled-cycles-backend OR idle-cycles-backend [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
cgroup-switches [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
duration_time [Tool event]
user_time [Tool event]
system_time [Tool event]
cpu:
L1-dcache-loads OR cpu/L1-dcache-loads/
L1-dcache-load-misses OR cpu/L1-dcache-load-misses/
L1-dcache-stores OR cpu/L1-dcache-stores/
L1-dcache-store-misses OR cpu/L1-dcache-store-misses/
L1-dcache-prefetches OR cpu/L1-dcache-prefetches/
L1-dcache-prefetch-misses OR cpu/L1-dcache-prefetch-misses/
L1-icache-loads OR cpu/L1-icache-loads/
L1-icache-load-misses OR cpu/L1-icache-load-misses/
L1-icache-prefetches OR cpu/L1-icache-prefetches/
L1-icache-prefetch-misses OR cpu/L1-icache-prefetch-misses/
LLC-loads OR cpu/LLC-loads/
LLC-load-misses OR cpu/LLC-load-misses/
LLC-stores OR cpu/LLC-stores/
LLC-store-misses OR cpu/LLC-store-misses/
LLC-prefetches OR cpu/LLC-prefetches/
LLC-prefetch-misses OR cpu/LLC-prefetch-misses/
dTLB-loads OR cpu/dTLB-loads/
dTLB-load-misses OR cpu/dTLB-load-misses/
dTLB-stores OR cpu/dTLB-stores/
dTLB-store-misses OR cpu/dTLB-store-misses/
dTLB-prefetches OR cpu/dTLB-prefetches/
dTLB-prefetch-misses OR cpu/dTLB-prefetch-misses/
iTLB-loads OR cpu/iTLB-loads/
iTLB-load-misses OR cpu/iTLB-load-misses/
branch-loads OR cpu/branch-loads/
branch-load-misses OR cpu/branch-load-misses/
node-loads OR cpu/node-loads/
node-load-misses OR cpu/node-load-misses/
node-stores OR cpu/node-stores/
node-store-misses OR cpu/node-store-misses/
node-prefetches OR cpu/node-prefetches/
node-prefetch-misses OR cpu/node-prefetch-misses/
(null) [Kernel PMU event]
firmware:
fw_access_load
[Load access trap event]
fw_access_store
[Store access trap event]
fw_fence_i_received
[Received FENCE.I request from other HART event]
fw_fence_i_sent
[Sent FENCE.I request to other HART event]
fw_hfence_gvma_received
[Received HFENCE.GVMA request from other HART event]
fw_hfence_gvma_sent
[Sent HFENCE.GVMA request to other HART event]
fw_hfence_gvma_vmid_received
[Received HFENCE.GVMA with VMID request from other HART event]
fw_hfence_gvma_vmid_sent
[Sent HFENCE.GVMA with VMID request to other HART event]
fw_hfence_vvma_asid_received
[Received HFENCE.VVMA with ASID request from other HART event]
fw_hfence_vvma_asid_sent
[Sent HFENCE.VVMA with ASID request to other HART event]
fwError: failed to open tracing events directory
_hfence_vvma_received
[Received HFENCE.VVMA request from other HART event]
fw_hfence_vvma_sent
[Sent HFENCE.VVMA request to other HART event]
fw_illegal_insn
[Illegal instruction trap event]
fw_ipi_received
[Received IPI from other HART event]
fw_ipi_sent
[Sent IPI to other HART event]
fw_misaligned_load
[Misaligned load trap event]
fw_misaligned_store
[Misaligned store trap event]
fw_set_timer
[Set timer event]
fw_sfence_vma_asid_received
[Received SFENCE.VMA with ASID request from other HART event]
fw_sfence_vma_received
[Sent SFENCE.VMA with ASID request to other HART event]
fw_sfence_vma_sent
[Sent SFENCE.VMA request to other HART event]
instructions:
atomic_memory_retired
[Atomic memory operation retired]
conditional_branch_retired
[Conditional branch retired]
exception_taken
[Exception taken]
fp_addition_retired
[Floating-point addition retired]
fp_div_sqrt_retired
[Floating-point division or square-root retired]
fp_fusedmadd_retired
[Floating-point fused multiply-add retired]
fp_load_retired
[Floating-point load instruction retired]
fp_multiplication_retired
[Floating-point multiplication retired]
fp_store_retired
[Floating-point store instruction retired]
integer_arithmetic_retired
[Integer arithmetic instruction retired]
integer_division_retired
[Integer division instruction retired]
integer_load_retired
[Integer load instruction retired]
integer_multiplication_retired
[Integer multiplication instruction retired]
integer_store_retired
[Integer store instruction retired]
jal_instruction_retired
[JAL instruction retired]
jalr_instruction_retired
[JALR instruction retired]
other_fp_retired
[Other floating-point instruction retired]
system_instruction_retired
[System instruction retired]
memory:
data_tlb_miss
[Data TLB miss]
dcache_miss_mmio_accesses
[Data cache miss or memory-mapped I/O access]
dcache_writeback
[Data cache write-back]
icache_retired
[Instruction cache miss]
inst_tlb_miss
[Instruction TLB miss]
utlb_miss
[UTLB miss]
microarch:
addressgen_interlock
[Address-generation interlock]
branch_direction_misprediction
[Branch direction misprediction]
branch_target_misprediction
[Branch/jump target misprediction]
csr_read_interlock
[CSR read interlock]
dcache_dtim_busy
[Data cache/DTIM busy]
fp_interlock
[Floating-point interlock]
icache_itim_busy
[Instruction cache/ITIM busy]
integer_multiplication_interlock
longlat_interlock
[Long-latency interlock]
pipe_flush_csr_write
[Pipeline flush from CSR write]
pipe_flush_other_event
[Pipeline flush from other event]
rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
[(see 'man perf-list' on how to encode it)]
mem:<addr>[/len][:access] [Hardware breakpoint] |
…21328) Disable perf_trampoline on riscv64 for now Until support is added in perf_jit_trampoline.c pythongh-120089 was incomplete.
Unfortunately without testing in a system that has a working perf we won't be able to merge the PR because we cannot validate that it works. I am not confortable to merge these changes without being able to corroborate the functionality :( |
…21328) Disable perf_trampoline on riscv64 for now Until support is added in perf_jit_trampoline.c pythongh-120089 was incomplete.
build: https://buildbot.python.org/all/#/builders/1379/builds/625
cc @pablogsal
Linked PRs
The text was updated successfully, but these errors were encountered: