-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove goto #630
Comments
Unless there are local functions with access to parent stack
|
Your example code can be implemented with defer like this: fn foo() -> %void {
%defer {
// nontrivial error handling code
};
%return errorCondition();
%return someOtherErrorCondition();
%return yetAnotherErrorCondition();
} |
Here's another way to write it: fn foo() {
e: {
if (error_condition) return :e;
if (some_other_error_condition) return :e;
if (yet_another_error_condition) return :e;
return;
}
// nontrivial error handling code
} |
@andrewrk: these two examples are syntactically very tricky. They move the simple and explicit error checking from its proper place to some unnatural location, or they require creating additional functions (and inventing more names). They are replacing the most simple and readable flow
with something strange. The examples also assume that everyone wants to return Zig (I am also unsure whether the examples would work, because of the need to access local variables.) Here's another real code advantage of
To implement such feature (shared error handling code) is impossible w/o |
The second one has the same source locations as your pseudocode.
Where is the Zig error in the second example? Please respect the OP's request:
|
I should had comment the examples separately. The second example is still tricky and creates named block and adds new level of indentation. These are two local level problems I try to minimize: the need to invent names and the excessive indentation. (That's why I proposed Zig having
There's not.
The 3pp code which uses A non-simple use is I do not write open source. I use goto occasionally, only for error handling, and only going down. It never caused problems for me. |
Btw, the second example could be rewritten as:
No need for blocks and one artificial name less, but I would not like it either. Unlike occasional goto such trick is also not present in C codebases I know. |
|
I wrote this code today. Made me regret that there is no goto in javascript.
In the worst case it calls I honestly don't understand why you want to remove goto so much. |
re2c looked like a spam, but I guess it's not. It uses goto heavily, is it? |
pluto439, I am not convinced this JavaScript is a good use case. Why don't you just do a loop and use "break"? |
Why t f would you use a loop here? There is nothing to loop. |
can be done with a loop:
But I am obviously missing something and looking like a fool. Sorry. |
Oh, this. Yes, it could be better with loop, bad example then. The flat code feels simplier though... The ofter example then, I often need to do something like
I guess I have to learn how to use exceptions. My code is pretty unstable right now, it breaks from time to time. I probably should learn D just for its exception handling. |
@pluto439, you can write code like that in js with nested if statements or if you want to make it flat then you can wrap the code in a
In Zig that kind of behaviour can be achieved with while loops if you really want to but you should use %defer (or whatever the syntax for it is in the future), so there is no need for goto. To be honest I believe only reasons for having goto are state machines/interpreters where you would most likely want to have computed goto which we don't have in Zig or error handling for which Zig already has better replacement already. So unless we can get computed goto, goto should probably be removed as obsolete language construct. Computed goto rationales and examples: |
Ilariel, you introduced that I thought jido was going to -- fake loop. Goto is so much simplier than all of this. Why you want to make my life harder? |
find: {
const view_tag = document.getElementById("view") ?? return :find;
const a_tags = view_tag.getElementsByTagName("a");
if (a_tags.len < 3) return :find;
const third_elem = a_tags[2];
// ...
} This looks better to me than any of the JS code up there. Also it found a superfluous check for |
With goto I don't need to create code blocks needlessly. I had a tendency to create too many code blocks, and code went all the way to the right as a result. Now that I'm not afraid to use return anymore, it's a bit better.
How do I read it? Like this?
It wasn't superfluous. There are different actions on "found" and "not_found".
|
Thank you @PavelVozenilek, @Jurily, and @Ilariel for the real actual examples of goto. Here's my analysis of them:
(And configurable macro functions can be implemented with comptime-dynamic function references.)
and:
"computed goto"-style finite state machines can be implemented with As i said here, while-switch can be converted to computed goto by an optimization pass in the compiler. Even if llvm doesn't have this, we can do it on the zig side. Even if it proves to be difficult to do, we can add a language feature later that does this better than general-purpose gotos with respect to communicating intent precisely. And we're not discussing removing computed goto, because zig doesn't have that. This proposal is to remove goto with hardcoded labels. I don't know if re2c uses hardcoded goto or computed goto, because I couldn't find any examples of the output C/C++ code. |
Why the dislike for So there's simple concept proven in practice versus novelty feature. This novelty feature is IMHO harder to think about and less expressive than the old one. I personally do not like novelties and would recommend to keep as close as possible with C syntax/features. |
There are good reasons to have But I'll still give my opinion on
fn foo() {
if (is_error()) goto handle_error;
var value = bar();
return value;
handle_error:
return error.Foo;
} If you can't tell why that's a compile error, then that's evidence that To make unstructured So that's my opinion on |
@thejoshwolfe: I do not know about implementation issues, and cannot argue about these. But it seems to be sold here as backward, dangerous feature to be replaced by shiny new thing. (I remember goto horribleness was discussed long ago in D, but then Walter Bright said that he converts all control flow constructs in gotos internally. Also some proposed syntax could be written using existing constructs, as mentioned here. But I would not like this kind of minimalism. |
So I now have to write a more ugly code to make compiler a bit simplier. I thought your goal was to make close to metal programs, which is why you avoided garbage collection. Removing goto means going in the opposite direction. Eh, whatever, I need to make my own language anyway. |
Nope. Making the compiler simpler is just my opinion. The real argument for removing
That's over simplified. We want to make optimal, readable, maintainable, etc. programs. The Zen of Zig does not include imitating assembly. If you can make an argument that Pseudo code doesn't impress me much, because it's possible to invent pseudo code to justify any imaginable language feature. Pseudo code does not demonstrate a need for |
defer already breaks it. You can make the same code even without defer, and it will be even more oblivious. Defer was made for exception handling (in D, by a different name), it doesn't make much sense by itself, and zig doesn't have exceptions. (Well, it has one, but it's one way and is always fatal.)
I think I talked about this here #594 (comment) . Gotoing after a variable creation will mean that a variable exists, but is simply uninitialized. Also look at this, gotoing over a declaration is ok for variables and fixed-length arrays, but not ok for variable-length arrays https://stackoverflow.com/questions/29880836/skip-variable-declaration-using-goto When time comes to livecoding, you will be jumping all over the place in the debugger. You know how in visual studio you can just drag the yellow arrow all over the place to go back or forward? Internally, this is the "goto". Why forbid something in the language, if you will need that in the debugger later? Defer will somewhat complicate debugging too, because code wouldn't flow from top to bottom anymore, it will return in the defer sometimes. Imagine gotoing into the defer block, what will happen at exit? (It will run all defers in the code block in the reverse order, starting from the one it is in right now.) Goto is the way the hardware works, it's very natural simply for that reason. Removing goto will mean that I will have to create many code blocks that I overwise wouldn't need. With goto I'm 100% sure that everything I can imagine I will be able to implement. I'm imagening something like <> diamond, where code blocks will have intersect. Or some very complex expressions in the "if" statements, when the only alternative to goto will be creating extra variables, and compiler will not be able to optimize them out back to goto properly. Or a situation when I will need to do uninitialization of something in a very specific order, that the standard defer will not be able to handle. I can't invest any time in zig unless you figure out the situation with exeptions and goto. Especially exceptions, because it affects the rest of a program significally. Exceptions can be faster than error codes if used carefully. #578 (comment) |
That's going to be time-consuming, in that I'd have to a) create an artificial example, and b) create a benchmark harness in zig that works well enough to capture the difference (non-trivial, wasn't trivial in C either) - the difference is on the order of ~1 cycle for keys with len <= 32 iirc and a poisson distribution of key lengths iirc. Plus an extra 1 cycle that clang loses over gcc, which I'm assuming is in codegen. Realistically, I'm not sure when I would get to that. Out of curiosity, what was gained by the removal? I read the original thread, and this one, and didn't get a sense of what drove it. |
Hi, Is this issue still active? If so I want to try my attempt to post my code that uses goto here... |
@Trung0246 You can do that |
I don't have that much ziglang experience but I did read through the documentation Code: https://pastebin.com/vxS5WSmk (removed some unnecessary part, example usage is probably at the botom) Here is own personal code that uses Basically the logic of The I think I could refactor the code by using 2 lambdas in C++ and same for ziglang but then it's not really elegant compare to using a single If you have any question just ask (I know my code is ridiculously over engineered 😂) |
Does translate-c support goto yet? Once that's done we can tell people to write their goto code in C and see how it translates into zig 😄 |
Here is a use-case for Here's a godbolt comparing a C goto implementation and a Zig Gotos aren't the only way to represent DFAs -- there's the (IMO nicer) way of using a set of mutually tail-recursive functions. There are two problems with this, though: there's no TCO in debug mode, and LLVM still doesn't optimize it as well as it does the Note that this is a different use-case to computed goto -- LLVM actually optimizes |
@mb64 you can use the guaranteed tail call optimization of zig: fn lex_a(data: *Data) callconv(.C) void
{
while(data.cursor < data.input.len and data.input[data.cursor] == 'a') {
data.a_count += 1;
data.cursor += 1;
}
return @call(.{ .modifier = .always_tail }, lex, .{data});
} Full example is here: https://zig.godbolt.org/z/K1c817 |
How is this fn foo() !u8 {
if (is_error()) goto handle_error;
var value = bar();
return value;
handle_error:
return error.Foo;
} different from this? fn foo() !u8 {
attempt: {
if (is_error()) break :attempt;
var value = bar();
return value;
}
return error.Foo;
} |
It's not really. Zig essentially has goto via labeled breaks |
In one particular way it's very different and much better. The version with |
Labeled break is essential goto with a builtin off-by-one error, since the break goes to the statement after the label instead of the labelled statement. It also causes way too many levels of nesting to appear (something I want to avoid). |
Is there a good way to jump to a specific location within a while loop from outside said while loop? E.g. while (true) {
// logic A
if (condition) goto :LABEL;
// logic B
if (condition2) break;
}
while (true) {
// logic C
LABEL:
// logic D
} In a language like C# I would implement the 2nd while loop as a goto to make Of course, one could always use a boolean switch like so: const b = LABEL: while (true) {
// logic A
if (condition) break :LABEL false;
// logic B
if (condition2) break :LABEL true;
};
while (true) {
if (b) {
// logic C
}
b = true;
// logic D
} To me, doing this is more complicated than it needs to be. I would prefer a goto statement. However, I would settle for someone telling me how I could get LLVM to optimize this to be equivalent in the emitted assembly. In my code, the mov al, 1
test al, 1
jne .LBB1_443 ; why do we need to check if ((1 & 1) != 0)? 😔
jmp .LBB1_460 ; this is impossible. We all know 1 and 0 are not equal. (I am using ReleaseFast on zig trunk on godbolt) |
Your solution would require to guard the logic between the two loops if there is any too: while (true) {
// logic A
if (condition) goto :LABEL;
// logic B
if (condition2) break;
}
// logic I
while (true) {
// logic C
LABEL:
// logic D
} Actually it looks like a state machine, so I would express it with explicit state: var state = 0;
while (true) {
if (state == 0) {
// logic A
if (condition) state = 1 else state = 2;
}
else if (state == 1) {
// logic B
if (condition2) state = 3 else state = 0;
}
else if (state == 2) {
// logic D
state = 4;
}
else if (state == 3) {
// logic I
state = 4;
}
else if (state == 4) {
// logic C
state = 2;
}
} |
Just to add, that solution specifically would be further improved by #8220: const initial_state = ;
sw: switch (@as(u3, 0)) {
0 => {
// logic A
continue :sw if (condition) 1 else 2;
},
1 => {
// logic B
continue :sw if (condition2.*) 3 else 0;
},
2 => {
// logic D
continue :sw 4;
},
3 => {
// logic I
continue :sw 4;
},
4 => {
// logic C
continue :sw 2;
},
else => unreachable,
} |
For reference, here the code I am working on: https://zig.godbolt.org/z/sjzYj7nGG. It is not cleaned up very much yet but what I am doing is trying to find the most optimal way to implement my DynSDT data structure: https://validark.github.io/DynSDT You can see I have multiple implementations of the The function mov bpl, 1
test bpl, 1
jne .LBB16_44
jmp .LBB16_50 The if (depq_len == 10 - k and next_i != NULL) {
do_cur_insertion = false;
break; // this should skip the if (do_cur_insertion) statement in the following loop
} It should jump to the second half of this loop: while (true) {
if (do_cur_insertion) {
// ...
}
// please jump here!
do_cur_insertion = true;
if (next_i != NULL) {
// ...
}
depq_len -%= 1;
if (depq_len == std.math.maxInt(@TypeOf(depq_len))) return @intCast(u4, k);
cur_i = depq[depq_len];
} Instead, it jumps to code that checks More complaints about LLVMOddly enough, in that same function, not related to the issue of goto, this code: k += 1;
if (k == 10) return 10; translates to this assembly: inc al
cmp al, 10
je .LBB16_45
; ... stuff
.LBB16_45:
mov al, 10
jmp .LBB16_2 Randomly scrolling through the assembly, I found another example: cmp r11b, 4
jne .LBB16_38 ; if (r11b != 4)
jmp .LBB16_41 ; if (r11b == 4)
; ...
; this is only referenced once in the assembly, so it was generated specifically to go with the previous jmp!
.LBB16_41:
vpextrd esi, xmm0, 3
mov r11b, 4 ; r11b = 4 😔 Maybe this is just a problem in general for LLVM? The emit is also pretty temperamental. Changing the comments in the code can lead to instructions changing order, which is odd (when the instruction order does not matter). My first assembly in this comment sometimes gets one of the other if (depq_len == 0) return k;
depq_len -= 1; This gives me this emit: .LBB16_38:
test r11b, r11b
je .LBB16_2
.LBB16_39:
dec r11b However, this same code in the second while loop gives me this emit: sub r11b, 1
jb .LBB16_2 I can ask more forcefully for this optimization by using this code instead: depq_len -%= 1;
if (depq_len == std.math.maxInt(@TypeOf(depq_len))) return k; Yes, this is something of a micro-optimization, but it is still weird to me that if you use the exact same code to check |
Update: I tried changing the code to look more like a state machine by using an enum rather than a boolean, and it translated the end of each block to this instead: xor r9b, 1
jmp .LBB16_43
; ...
.LBB16_43:
test r9b, 1
je .LBB16_51 At the moment, it might be impossible to achieve what I am trying to do. If anyone wants to take a crack at it, I would be happy to be shown there is a way to get the optimized output I am looking for. Otherwise, maybe #8220 will guarantee the jumps I am trying to do? I also tried implementing this via tail calls: https://zig.godbolt.org/z/Gj9qeahMK |
This is a good example of idiomatic goto in a VM dispatch loop: https://github.com/sqmedeiros/lpeglabel/blob/master/lplvm.c It only stands as an argument for goto if there isn't a better way to write it in Zig. Is there? |
The I really don't see any logic in that file which has a compelling use of (Regardless, I should note that this part of Zig is essentially as decided as it gets; |
I don't have opinions about whether Zig should, or should not, have I saw this in the first comment:
And then read a lot of abstract discussion, with relatively few real-life examples of nontrivial Here is another example, also from the Lua team, where eight gotos are used to structure a VM dispatch loop. Lua is renowned for its relative speed and very small binary. They're using I'm evaluating Zig for writing a VM, so I'm keenly interested in whether it's possible to get the same performance, and a clearer instruction flow, using what Zig has now. If it is, why would anyone want You might look at the question this way: if someone replies with a detailed answer, and demonstration that the machine code generated by a Zig version is broadly equivalent, that's conclusive: Zig doesn't need But I did read all the comments, and code like this wasn't addressed (a related question about computed gotos was). So, given the specific call to action in the first post, I thought it was appropriate to resurrect the discussion. |
@mlugg's answer is still applicable. #8220 would allow you to do arbitrary jumps without the normal footguns associated with goto. Right now, you can only do it with tail call optimization, which is a massive pain and doesn't work on some platforms (iirc WASM). Right now, you can write your VM as a regular switch statement, and once #8220 comes out you can switch over. Otherwise, you might want to use DynAsm or something that allows you to have precise control over register allocation if you really want to write a performant VM. A high level language + LLVM, by itself, is not going to be the ideal choice for a VM if you want to beat LuaJIT. If you're interested in more ideas, I'd look at https://github.com/luajit-remake/luajit-remake |
Thanks for the pointer to that LuaJIT project, definitely the sort of thing I'm interested in generally! Custom ASM-enhanced VMs have their place, as does tracing. There is also a place for portable platform-independent VMs which perform well: so Lua, not LuaJIT. Reading #8220 more closely, I'm beginning to see how it would more than likely cover the parts of a VM's state machine which can't be expressed in C without |
I'd like to note that the use case you bring up is actually very important to the compiler itself. The main loop of semantic analysis in the Zig compiler - our primary bottleneck excluding LLVM - is essentially a VM instruction dispatch loop, which in fact largely motivated #8220 in the first place. Similar loops also exist in all of our code generation backends. So emitting optimal assembly here is definitely something we want to make sure we can do! (Incidentally, I am actively working on #8220, and expect to have a PR up within days.) |
Late to the party, but I think I have an interesting use case for goto in C to contribute: at the end of a switch-case branch to jump to a handful of different 'epilog blocks' (this is code-generated): The compiler output is exactly as you'd expect (in native code at least): the code branches via a jump-table, and the end of each case-branch has a direct jump to one of the three epilog blocks. Empty case branches are "short-cutted" and directly have the address of the goto target in the jump table. I didn't inspect WASM output, but performance is close to native code, so I guess it's fine. Here's a blogpost about how that emulator works in general (only the first couple of sections are interesting): https://floooh.github.io/2021/12/17/cycle-stepped-z80.html PS: as a special case, the ...think of it as a giant state machine, not unlike the result of an async/await code transformation. PPS: I'm planning to do some emulator experiments in Zig soon-ish, and I hope I can somehow emulate this construct without a C-like goto (maybe with nested named blocks and labelled break?) - in any case I kinda like how straightforward the C version is for this sort of low-level "outside-the-box" contraption :) |
I ported a small amount of the code for you, just enough to get the point across. Here is what I came up with: fn z80_tick(cpu: *z80_t, pins_: u64) u64 {
var pins = pins_ & ~(Z80_CTRL_PIN_MASK | Z80_RETI);
sw: switch (@as(enum { start, fetch_next, step_next, track_int_bits }, .start)) {
.start => switch (cpu.step) {
1506 => {
if (pins & Z80_WAIT) continue :sw .track_int_bits;
cpu.iff1 = false;
continue :sw .step_next;
},
1515 => continue :sw .step_next,
1516 => continue :sw .fetch_next,
},
.fetch_next => {
pins = cpu.fetch(pins);
continue :sw .step_next;
},
.step_next => {
cpu.step += 1;
continue :sw .track_int_bits;
},
.track_int_bits => {
// track NMI 0 => 1 edge and current INT pin state, this will track the
// relevant interrupt status up to the last instruction cycle and will
// be checked in the first M1 cycle (during _fetch)
const rising_nmi: u64 = (pins ^ cpu.pins) & pins; // NMI 0 => 1
cpu.pins = pins;
cpu.int_bits = ((cpu.int_bits | rising_nmi) & Z80_NMI) | (pins & Z80_INT);
},
}
} With the new labeled-switch-continue feature, you can now get the |
@Validark wow that looks interesting, I hadn't even considered such an "outer switch statement", very neat! In the meantime I used a slightly different approach using 'labeled break': ...basically using nested labeled scope blocks, and the labeled break at the end of switch statement controls which 'inner' scope blocks to skip. It works, but I like your approach more. The labeled continue approach immediately looks very useful for building all sorts of 'switch-based state machines' :) |
Amazing how much complexity you have to throw at your code just to be able to avoid goto. Is it really worth it? |
IMHO the above example isn't any more complex than the original goto-based code in C, and while I'm not a 'goto considered harmful zealot' myself, the C/C++ goto does have a couple of problems that are worth fixing (like being able to skip variable declarations and initializations). In the long run there might even be some advantages if the compilation target is a 'structured bytecode' like WASM (but that's speculation on my side). |
@floooh Yes, in this case you could use labeled_breaks as well. fn z80_tick(cpu: *z80_t, pins_: u64) u64 {
var pins = pins_ & ~(Z80_CTRL_PIN_MASK | Z80_RETI);
track_int_bits: {
step_next: {
fetch_next: switch (cpu.step) {
1506 => {
if (pins & Z80_WAIT) break :track_int_bits;
cpu.iff1 = false;
break :step_next;
},
1515 => break :step_next,
1516 => break :fetch_next,
}
pins = cpu.fetch(pins);
}
cpu.step += 1;
}
// track NMI 0 => 1 edge and current INT pin state, this will track the
// relevant interrupt status up to the last instruction cycle and will
// be checked in the first M1 cycle (during _fetch)
const rising_nmi: u64 = (pins ^ cpu.pins) & pins; // NMI 0 => 1
cpu.pins = pins;
cpu.int_bits = ((cpu.int_bits | rising_nmi) & Z80_NMI) | (pins & Z80_INT);
} However, I think this code is more confusing as written above, since I am labeling the block with what happens after it, which doesn't make sense. With labeled breaks you are saying what to break out of, whereas with goto/labeled-continue we are saying what to jump to. If I thought of it as "we can now stop X process", I would write it as a "break", but if I thought of it as "jump to X process", I would write it as a labeled continue. Also the labeled continue version is significantly more powerful, and can express arbitrary control flows that cannot be expressed with the cascading blocks strategy. |
Please everyone post links to your zig code that uses goto, so we won't remove the feature. Alternatively, post links to C code that uses goto, so we can assess if there's a more zig-like way to do it, or if goto is really the best solution for those cases.
I want real actual code here, not pseudocode examples. Please link to open source projects.
This discussion started in #346.
For reference, here's how you can do without
goto
:defer
for the try-finally pattern.%defer
for the cleanup-on-error pattern.while (true) { switch (state) { ...
for cases where you want computed goto. Here's some existing discussion on that: labeled loops, labeled break, labeled continue #346 (comment)break
to jump forward and labeledcontinue
to jump backward, placing loops where necessary. This is the plan for the C-to-Zig translator. This is the ugliest solution, but should always work when all else fails.The text was updated successfully, but these errors were encountered: