Skip to content

Jump-and-link instruction #630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Jump-and-link instruction #630

wants to merge 4 commits into from

Conversation

Dentosal
Copy link
Member

@Dentosal Dentosal commented Mar 14, 2025

Closes #627. VM issue: FuelLabs/fuel-vm#857. VM PR: FuelLabs/fuel-vm#925.

The design is quite similar of the RISC-V of the same name. JAL $ra $rb imm stores the address of the next instruction to $ra, so that register can be used as a return address from the subroutine. If ra is $zero, the value is discarded instead, so this can be used as a jump without having to trash a register. After storing the return address, it jumps to instruction at memory address $rb + imm * 4.

The main purpose of this instruction is efficient subroutine-calling and returning. JAL $ret_addr $subroutine_addr 0 is used to perform the call, and JAL $zero $ret_addr 0 returns from it. For nexted function calls, the callee is responsible for storing the $ret_addr.

The following snippet shows a minimal program using the functionality:

// main function
jal $ret_addr $pc 2 // call subroutine
ret $zero // end program

// subroutine
/* subroutine body comes here */
jal $zero $ret_addr 0 // Return from the subroutine

Fibonacci example

To show off how compact code this makes, I wrote a small fibonacci function using it. The function here uses the following register-based ABI:

  • Function argument and return value $fnarg in 0x10
  • Function return address $return_addr in 0x11

Also the code uses the following locals: $local1: 0x12, $local2: 0x13, $local3: 0x14 (named for pshl/popl)

// Set argument
movi $fnarg 10 // <- this computes fibo(10), i.e. 10th fibonacci number, 55

// Main function
jal $return_addr $pc 3 // <- offset to the subroutine
log $fnarg $zero $zero $zero
ret $one

// Fibonacci subroutine
// fibo(0) = 0, fibo(1) = 1, fibo(n) = fibo(n-1) + fibo(n-2)
pshl 0b11110 // Save return_address and local{1,2,3}
// Compute fn pointer to the current function and place it in local3
subi $local3 $pc 4 // <- subtract 4 to get prev instruction start
// If n < 2 no computation needed
movi $local1 2
lt $local1 $fnarg $local1
jnzf $local1 $zero 8 // Skip over computation
// Else call self with n - 1 and n - 2 and sum those
subi $local2 $fnarg 2         // Save n - 2 to local2
subi $fnarg $fnarg 1          // n -= 1
jal $return_addr $local3 0    // Call self
move $local1 $fnarg           // Copy result to local1
move $fnarg $local2           // Restore n - 2 from local2
jal $return_addr $local3 0    // Call self
move $local2 $fnarg           // Copy result to local2
add $fnarg $local1 $local2 // result = local1 + local2
// Computation ends here this is where jnzf jumps to
popl 0b11110 // Restore return_address and local{1,2,3}
jal $zero $return_addr 0 // Return from subroutine

Before requesting review

  • I have reviewed the changes myself

@Dentosal Dentosal self-assigned this Mar 14, 2025
@Dentosal Dentosal marked this pull request as ready for review March 14, 2025 11:42
@Dentosal Dentosal requested review from a team March 14, 2025 11:42
@Dentosal Dentosal added the comp:FVM Component: FuelVM label Mar 14, 2025
@Voxelot
Copy link
Member

Voxelot commented Mar 17, 2025

cc @vaivaswatha can you comment on the impact of this change? Ie. any concerns regarding register allocation for nested sub-routines?

@xunilrj
Copy link
Contributor

xunilrj commented Mar 17, 2025

Today this is how we compile the following fn.

fn main() -> u64 {
    1337
}

This is the function ASM (not super optimized to avoid inlining):

pshl i3                       ; save registers 16..40
pshh i524288                  ; save registers 40..64
move $$locbase $sp            ; save locals base register for function main_0
move $r0 $$reta               ; save return address
movi $r1 i1337                ; initialize constant into register
move $$retv $r1               ; set return value
move $$reta $r0               ; restore return address
poph i524288                  ; restore registers 40..64
popl i3                       ; restore registers 16..40
jmp $$reta                    ; return from call

This is the ASM calling the fn:

sub  $$reta $pc $is           ; get current instruction offset from instructions start ($is) 
srli $$reta $$reta i2         ; get current instruction offset in 32-bit words
addi $$reta $$reta i4         ; [call]: set new return address
jmpf $zero i76                ; [call]: call main_0
move $r0 $$retv               ; [call]: copy the return value

With this new instruction, we could call fns like:

jal $$reta $pc i76
move $r0 $$retv  

and the fn would be

pshl i3                       ; save registers 16..40
pshh i524288                  ; save registers 40..64
move $$locbase $sp            ; save locals base register for function main_0
move $r0 $$reta               ; save return address
movi $r1 i1337                ; initialize constant into register
move $$retv $r1               ; set return value
poph i524288                  ; restore registers 40..64
popl i3                       ; restore registers 16..40
jal $zero $r0 0

Which means we can save 3 instructions when calling fns (huge gains!), and none in the function definition, given that jal $zero $ret_addr 0 seems to be identical to jmp $$reta

We could save extra 4 instructions per function definition, by using JAL last argument as a flag to do register pushing and popping.

@vaivaswatha
Copy link

cc @vaivaswatha can you comment on the impact of this change? Ie. any concerns regarding register allocation for nested sub-routines?

There shouldn't be any problem. When we enter a function, we save all (used) registers and pop them all back at the end. So register allocation shouldn't be affected. I don't see any downsides, and the upside is as elaborated by @xunilrj .

@Dentosal
Copy link
Member Author

Which means we can save 3 instructions when calling fns (huge gains!), and none in the function definition, given that jal $zero $ret_addr 0 seems to be identical to jmp $$reta

jal $zero $ret_addr 0 isn't exectly identical to jmp $$reta, in the sense that the jmp is $is-relative, and jal is not.

We could save extra 4 instructions per function definition, by using JAL last argument as a flag to do register pushing and popping.

I'm not sure how that would work? The immediate part here is at most 12 bits long, and the VM has 48 user-writable registers. Unless we special-case some of these registers, of course, but that seems unwise.

I'm noticing that the function calls could be optimized a lot further with smarter register allocation. For instance...

  • you could save two instructions by only using only higher-half (pshh/poph) registers in the function body, so pshl/popl isn't required at all
  • there's no actual need to move $r0 $$reta, just use jal $zero $$reta 0 directly
  • and of course, the whole function should be inlined in this case
  • after returning, the move $r0 $$retv could be optimized away by treating $$retv as the return value

@xunilrj
Copy link
Contributor

xunilrj commented Mar 22, 2025

I was imagining one bit per push/pop. So from the 12bits not being used, 4 would allow jump and push, or jump and pop all registers.

@Dentosal
Copy link
Member Author

I was imagining one bit per push/pop. So from the 12bits not being used, 4 would allow jump and push, or jump and pop all registers.

I don't think push/pop all registers are sensible operations. At least you'd like to keep the return value and address as-is.

@Dentosal
Copy link
Member Author

Some benchmarks with a sway compiler modified to use this instruction:

build command forc build --release.

Project d821dcb d821dcb with JAL support reduction
mira-v1-core 89.384 KB 85.704 KB 4.3%
sway-applications name-registry/registry-contract 24.664 KB 23.128 KB 6.2%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:FVM Component: FuelVM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Combined jump operation for internal function calls
4 participants