Jump-and-link instruction #630

Dentosal · 2025-03-14T11:33:37Z

Closes #627. VM issue: FuelLabs/fuel-vm#857. VM PR: FuelLabs/fuel-vm#925.

The design is quite similar of the RISC-V of the same name. JAL $ra $rb imm stores the address of the next instruction to $ra, so that register can be used as a return address from the subroutine. If ra is $zero, the value is discarded instead, so this can be used as a jump without having to trash a register. After storing the return address, it jumps to instruction at memory address $rb + imm * 4.

The main purpose of this instruction is efficient subroutine-calling and returning. JAL $ret_addr $subroutine_addr 0 is used to perform the call, and JAL $zero $ret_addr 0 returns from it. For nexted function calls, the callee is responsible for storing the $ret_addr.

The following snippet shows a minimal program using the functionality:

// main function
jal $ret_addr $pc 2 // call subroutine
ret $zero // end program

// subroutine
/* subroutine body comes here */
jal $zero $ret_addr 0 // Return from the subroutine

Fibonacci example

To show off how compact code this makes, I wrote a small fibonacci function using it. The function here uses the following register-based ABI:

Function argument and return value $fnarg in 0x10
Function return address $return_addr in 0x11

Also the code uses the following locals: $local1: 0x12, $local2: 0x13, $local3: 0x14 (named for pshl/popl)

// Set argument
movi $fnarg 10 // <- this computes fibo(10), i.e. 10th fibonacci number, 55

// Main function
jal $return_addr $pc 3 // <- offset to the subroutine
log $fnarg $zero $zero $zero
ret $one

// Fibonacci subroutine
// fibo(0) = 0, fibo(1) = 1, fibo(n) = fibo(n-1) + fibo(n-2)
pshl 0b11110 // Save return_address and local{1,2,3}
// Compute fn pointer to the current function and place it in local3
subi $local3 $pc 4 // <- subtract 4 to get prev instruction start
// If n < 2 no computation needed
movi $local1 2
lt $local1 $fnarg $local1
jnzf $local1 $zero 8 // Skip over computation
// Else call self with n - 1 and n - 2 and sum those
subi $local2 $fnarg 2         // Save n - 2 to local2
subi $fnarg $fnarg 1          // n -= 1
jal $return_addr $local3 0    // Call self
move $local1 $fnarg           // Copy result to local1
move $fnarg $local2           // Restore n - 2 from local2
jal $return_addr $local3 0    // Call self
move $local2 $fnarg           // Copy result to local2
add $fnarg $local1 $local2 // result = local1 + local2
// Computation ends here this is where jnzf jumps to
popl 0b11110 // Restore return_address and local{1,2,3}
jal $zero $return_addr 0 // Return from subroutine

Before requesting review

I have reviewed the changes myself

Voxelot · 2025-03-17T20:40:00Z

cc @vaivaswatha can you comment on the impact of this change? Ie. any concerns regarding register allocation for nested sub-routines?

xunilrj · 2025-03-17T22:01:42Z

Today this is how we compile the following fn.

fn main() -> u64 {
    1337
}

This is the function ASM (not super optimized to avoid inlining):

pshl i3                       ; save registers 16..40
pshh i524288                  ; save registers 40..64
move $$locbase $sp            ; save locals base register for function main_0
move $r0 $$reta               ; save return address
movi $r1 i1337                ; initialize constant into register
move $$retv $r1               ; set return value
move $$reta $r0               ; restore return address
poph i524288                  ; restore registers 40..64
popl i3                       ; restore registers 16..40
jmp $$reta                    ; return from call

This is the ASM calling the fn:

sub  $$reta $pc $is           ; get current instruction offset from instructions start ($is) 
srli $$reta $$reta i2         ; get current instruction offset in 32-bit words
addi $$reta $$reta i4         ; [call]: set new return address
jmpf $zero i76                ; [call]: call main_0
move $r0 $$retv               ; [call]: copy the return value

With this new instruction, we could call fns like:

jal $$reta $pc i76
move $r0 $$retv

and the fn would be

pshl i3                       ; save registers 16..40
pshh i524288                  ; save registers 40..64
move $$locbase $sp            ; save locals base register for function main_0
move $r0 $$reta               ; save return address
movi $r1 i1337                ; initialize constant into register
move $$retv $r1               ; set return value
poph i524288                  ; restore registers 40..64
popl i3                       ; restore registers 16..40
jal $zero $r0 0

Which means we can save 3 instructions when calling fns (huge gains!), and none in the function definition, given that jal $zero $ret_addr 0 seems to be identical to jmp $$reta

We could save extra 4 instructions per function definition, by using JAL last argument as a flag to do register pushing and popping.

vaivaswatha · 2025-03-19T04:35:40Z

cc @vaivaswatha can you comment on the impact of this change? Ie. any concerns regarding register allocation for nested sub-routines?

There shouldn't be any problem. When we enter a function, we save all (used) registers and pop them all back at the end. So register allocation shouldn't be affected. I don't see any downsides, and the upside is as elaborated by @xunilrj .

Dentosal · 2025-03-21T11:10:41Z

Which means we can save 3 instructions when calling fns (huge gains!), and none in the function definition, given that jal $zero $ret_addr 0 seems to be identical to jmp $$reta

jal $zero $ret_addr 0 isn't exectly identical to jmp $$reta, in the sense that the jmp is $is-relative, and jal is not.

We could save extra 4 instructions per function definition, by using JAL last argument as a flag to do register pushing and popping.

I'm not sure how that would work? The immediate part here is at most 12 bits long, and the VM has 48 user-writable registers. Unless we special-case some of these registers, of course, but that seems unwise.

I'm noticing that the function calls could be optimized a lot further with smarter register allocation. For instance...

you could save two instructions by only using only higher-half (pshh/poph) registers in the function body, so pshl/popl isn't required at all
there's no actual need to move $r0 $$reta, just use jal $zero $$reta 0 directly
and of course, the whole function should be inlined in this case
after returning, the move $r0 $$retv could be optimized away by treating $$retv as the return value

xunilrj · 2025-03-22T14:45:34Z

I was imagining one bit per push/pop. So from the 12bits not being used, 4 would allow jump and push, or jump and pop all registers.

Dentosal · 2025-03-24T10:22:50Z

I was imagining one bit per push/pop. So from the 12bits not being used, 4 would allow jump and push, or jump and pop all registers.

I don't think push/pop all registers are sensible operations. At least you'd like to keep the return value and address as-is.

Dentosal · 2025-03-27T20:06:31Z

Some benchmarks with a sway compiler modified to use this instruction:

build command forc build --release.

Project	`d821dcb`	`d821dcb` with `JAL` support	reduction
mira-v1-core	89.384 KB	85.704 KB	4.3%
sway-applications name-registry/registry-contract	24.664 KB	23.128 KB	6.2%

Add JAL instruction

3e8c8af

Dentosal self-assigned this Mar 14, 2025

Dentosal mentioned this pull request Mar 14, 2025

Jump-and-link instruction FuelLabs/fuel-vm#925

Open

6 tasks

Dentosal marked this pull request as ready for review March 14, 2025 11:42

Dentosal requested review from a team March 14, 2025 11:42

Dentosal added the comp:FVM Component: FuelVM label Mar 14, 2025

Dentosal mentioned this pull request Oct 31, 2024

New function call/return helper opcodes FuelLabs/fuel-vm#857

Open

Dentosal and others added 2 commits March 24, 2025 12:22

Merge branch 'master' into dento/jal-instruction

023cd4b

Correctly use imm * 4 in all fields

8d726c8

Merge branch 'master' into dento/jal-instruction

b33db4c

Dentosal mentioned this pull request Apr 14, 2025

Subroutine calls using the new JAL instruction FuelLabs/sway#7085

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jump-and-link instruction #630

Jump-and-link instruction #630

Dentosal commented Mar 14, 2025 •

edited

Loading

Voxelot commented Mar 17, 2025

xunilrj commented Mar 17, 2025 •

edited

Loading

vaivaswatha commented Mar 19, 2025

Dentosal commented Mar 21, 2025

xunilrj commented Mar 22, 2025

Dentosal commented Mar 24, 2025

Dentosal commented Mar 27, 2025

Jump-and-link instruction #630

Are you sure you want to change the base?

Jump-and-link instruction #630

Conversation

Dentosal commented Mar 14, 2025 • edited Loading

Fibonacci example

Before requesting review

Voxelot commented Mar 17, 2025

xunilrj commented Mar 17, 2025 • edited Loading

vaivaswatha commented Mar 19, 2025

Dentosal commented Mar 21, 2025

xunilrj commented Mar 22, 2025

Dentosal commented Mar 24, 2025

Dentosal commented Mar 27, 2025

Dentosal commented Mar 14, 2025 •

edited

Loading

xunilrj commented Mar 17, 2025 •

edited

Loading