Remove goto #630

thejoshwolfe · 2017-11-28T17:34:14Z

Please everyone post links to your zig code that uses goto, so we won't remove the feature. Alternatively, post links to C code that uses goto, so we can assess if there's a more zig-like way to do it, or if goto is really the best solution for those cases.

I want real actual code here, not pseudocode examples. Please link to open source projects.

This discussion started in #346.

For reference, here's how you can do without goto:

use defer for the try-finally pattern.
use %defer for the cleanup-on-error pattern.
use while (true) { switch (state) { ... for cases where you want computed goto. Here's some existing discussion on that: labeled loops, labeled break, labeled continue #346 (comment)
use labeled break to jump forward and labeled continue to jump backward, placing loops where necessary. This is the plan for the C-to-Zig translator. This is the ugliest solution, but should always work when all else fails.

The text was updated successfully, but these errors were encountered:

PavelVozenilek · 2017-11-28T23:25:53Z

Unless there are local functions with access to parent stack defer cannot do the following:

if (error condition) goto error;
...
if (some other error condition) goto error;
...
if (yet another error condition) goto error;
...

error:
   // nontrivial error handling code

defer would need to duplicate all the error handling code, or one has to create a top level function and pass needed parameters there.

andrewrk · 2017-11-28T23:29:47Z

Your example code can be implemented with defer like this:

fn foo() -> %void {
    %defer {
        // nontrivial error handling code
    };
    %return errorCondition();
    %return someOtherErrorCondition();
    %return yetAnotherErrorCondition();
}

andrewrk · 2017-11-28T23:31:35Z

Here's another way to write it:

fn foo() {
    e: {
        if (error_condition) return :e;
        if (some_other_error_condition) return :e;
        if (yet_another_error_condition) return :e;
        return;
    }
    // nontrivial error handling code
}

PavelVozenilek · 2017-11-29T00:50:57Z

@andrewrk: these two examples are syntactically very tricky. They move the simple and explicit error checking from its proper place to some unnatural location, or they require creating additional functions (and inventing more names).

They are replacing the most simple and readable flow

do_A();
if (wrong) handle error
do_B();
if (wrong) handle error
do_C();
if (wrong) handle error

with something strange.

The examples also assume that everyone wants to return Zig error when the function gets into internal error situation handling business. The function may try to return some valid substitute, not the error. This also makes %defer not full replacement for traditional error handling.

(I am also unsure whether the examples would work, because of the need to access local variables.)

Here's another real code advantage of goto based error handling:

if (error condition) goto error1;
...
if (some other error condition) goto error2;
...
if (yet another error condition) goto error3;
...

error3:
  // the least common part of error handling
  ...
error2: // fallthrough  
  // error handling code for some scenarios
  ...
error1: // fallthrough
   // error handling code common for all scenarios
    ...

To implement such feature (shared error handling code) is impossible w/o goto (except for certain Forth dialects which offer so called "headless functions", where it is possible to jump inside of a function).

andrewrk · 2017-11-29T00:57:57Z

They move the simple and explicit error checking from its proper place to some unnatural location

The second one has the same source locations as your pseudocode.

The examples also assume that everyone wants to return Zig error

Where is the Zig error in the second example?

Please respect the OP's request:

I want real actual code here, not pseudocode examples. Please link to open source projects.

PavelVozenilek · 2017-11-29T02:07:11Z

@andrewrk:

The second one has the same source locations as your pseudocode.

I should had comment the examples separately. The second example is still tricky and creates named block and adds new level of indentation.

These are two local level problems I try to minimize: the need to invent names and the excessive indentation. (That's why I proposed Zig having elif or anonymous tuples and enums.)

Where is the Zig error in the second example?

There's not.

real code

The 3pp code which uses goto in a restricted way (handling errors down the flow) is DL malloc ( ftp://g.oswego.edu/pub/misc/malloc.c ), LZ4 compressor ( https://github.com/lz4/lz4/blob/dev/lib/lz4.c ) or SQLite. These libraries use fairly long functions and goto is the least intrusive way to handle errors (I guess).

A non-simple use is quicksort implementation in MSVC stdlib, it jumps up. This is IME quite rare occurence.

I do not write open source. I use goto occasionally, only for error handling, and only going down. It never caused problems for me.

PavelVozenilek · 2017-11-29T04:12:58Z

Btw, the second example could be rewritten as:

fn foo() {
    while (true)
        if (error_condition) break;
        if (some_other_error_condition) break;
        if (yet_another_error_condition) break;
        return;
    }
    // nontrivial error handling code
}

No need for blocks and one artificial name less, but I would not like it either. Unlike occasional goto such trick is also not present in C codebases I know.

ghost · 2017-11-30T18:52:57Z

re2c is a free and open-source lexer generator for C and C++. Its main goal is generating fast lexers: at least as fast as their reasonably optimized hand-coded counterparts. Instead of using traditional table-driven approach, re2c encodes the generated finite state automata directly in the form of conditional jumps and comparisons. The resulting programs are faster and often smaller than their table-driven analogues, and they are much easier to debug and understand.

pluto439 · 2017-12-04T07:26:46Z

I wrote this code today. Made me regret that there is no goto in javascript.

        var file = FileUtils.getFile("Home", comment_path_from.slice(0,1));
        if ( file.exists() ) file = FileUtils.getFile("Home", comment_path_from.slice(0,2));
        if ( file.exists() ) file = FileUtils.getFile("Home", comment_path_from.slice(0,3));
        if ( file.exists() ) file = FileUtils.getFile("Home", comment_path_from.slice(0,4));
        if ( file.exists() ) file = FileUtils.getFile("Home", comment_path_from.slice(0,5));

        if (! file.exists() ) {
            hijack.innerHTML = "NO COMMENTS FOUND"
        } else {

In the worst case it calls file.exists() 4 extra times. Have to create a new variable, but I'm too lazy for that. And I doubt any compiler will optimize it properly.

I honestly don't understand why you want to remove goto so much.

pluto439 · 2017-12-04T08:26:10Z

re2c looked like a spam, but I guess it's not. It uses goto heavily, is it?

jido · 2017-12-04T11:59:36Z

pluto439, I am not convinced this JavaScript is a good use case. Why don't you just do a loop and use "break"?

pluto439 · 2017-12-04T12:11:17Z

Why t f would you use a loop here? There is nothing to loop.

jido · 2017-12-04T12:30:22Z

tryWith(1);
if (ok()) tryWith(2);
if (ok()) tryWith(3);
if (ok()) tryWith(4);
if (ok()) tryWith(5);

can be done with a loop:

for (n = 1; n <= 5; n++) {
   tryWith(n);
   if (!ok()) break;
}

But I am obviously missing something and looking like a fool. Sorry.

pluto439 · 2017-12-04T13:44:49Z

Oh, this. Yes, it could be better with loop, bad example then. The flat code feels simplier though...

The ofter example then, I often need to do something like var el = document.getElementById('view').firstChild.firstChild.getElementsByTagName('a')[2].firstChild.nextElement, and every step can potentially break. I wish I could do something like

var temp = document.getElementById('view')
if ( ! temp ) goto not_found
temp = temp.getElementsByTagName('a')
if (temp.length < 3) goto not_found
temp = temp[2]
if ( ! temp ) goto not_found

I guess I have to learn how to use exceptions. My code is pretty unstable right now, it breaks from time to time. I probably should learn D just for its exception handling.

Ilariel · 2017-12-04T14:30:09Z

@pluto439, you can write code like that in js with nested if statements or if you want to make it flat then you can wrap the code in a while(cond) {cond = false; // do your thing }/do {//do your thing} while(false) and use break to break out early and jump to your error handling code.

do {
var temp = document.getElementById('view');
if ( ! temp ) break;
temp = temp.getElementsByTagName('a');
if (temp.length < 3) break;
temp = temp[2];
if ( ! temp ) break;
} while (false);

In Zig that kind of behaviour can be achieved with while loops if you really want to but you should use %defer (or whatever the syntax for it is in the future), so there is no need for goto.

To be honest I believe only reasons for having goto are state machines/interpreters where you would most likely want to have computed goto which we don't have in Zig or error handling for which Zig already has better replacement already. So unless we can get computed goto, goto should probably be removed as obsolete language construct.

Computed goto rationales and examples:
https://github.com/python/cpython/blob/31a8393cf6a74c870c3484dd68500619f6232c6d/Python/ceval.c#L581
https://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables

pluto439 · 2017-12-04T16:29:09Z

Ilariel, you introduced that I thought jido was going to -- fake loop. Goto is so much simplier than all of this. Why you want to make my life harder?

andrewrk · 2017-12-04T16:40:36Z

find: {
    const view_tag = document.getElementById("view") ?? return :find;
    const a_tags = view_tag.getElementsByTagName("a");
    if (a_tags.len < 3) return :find;
    const third_elem = a_tags[2];
    // ...
}

This looks better to me than any of the JS code up there. Also it found a superfluous check for !temp.

pluto439 · 2017-12-04T17:00:59Z

With goto I don't need to create code blocks needlessly. I had a tendency to create too many code blocks, and code went all the way to the right as a result. Now that I'm not afraid to use return anymore, it's a bit better.

const view_tag = document.getElementById("view") ?? return :find;

How do I read it? Like this?

                 document
                 document.getElementById("view")
                 document.getElementById("view") ?? return :find;
const view_tag = document.getElementById("view") ?? return :find;

Also it found a superfluous check for !temp.

It wasn't superfluous. There are different actions on "found" and "not_found".

var temp = document.getElementById('view')
if ( ! temp ) goto not_found
temp = temp.getElementsByTagName('a')
if (temp.length < 3) goto not_found
temp = temp[2]
if ( ! temp ) goto not_found

do_something_with_temp(temp)
return

:not_found
print('something went wrong')
return

thejoshwolfe · 2017-12-04T20:36:30Z

Thank you @PavelVozenilek, @Jurily, and @Ilariel for the real actual examples of goto. Here's my analysis of them:

DL malloc ( ftp://g.oswego.edu/pub/misc/malloc.c )

The postaction label can be better implemented with defer.
I don't quite understand why the erroraction label exists. You can just call a function for that.

(And configurable macro functions can be implemented with comptime-dynamic function references.)

LZ4 compressor ( https://github.com/lz4/lz4/blob/dev/lib/lz4.c )

The _last_literals label is effectively a labeled break.
The _next_match label is really a while loop. Refactoring that to avoid goto would involve indenting some code, and converting some breaks to labeled breaks, but overall the change would be arguably clearer.
The _output_error label is just a return statement. That should be refactored into a function that does the arithmetic, and then all the goto _output_error; statements should be changed to return outputError(ip, src);.

re2c ( http://re2c.org/ )

and:

Computed goto rationales and examples:
https://github.com/python/cpython/blob/31a8393cf6a74c870c3484dd68500619f6232c6d/Python/ceval.c#L581
https://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables

"computed goto"-style finite state machines can be implemented with while (true) { switch (state) { ... } }. The advantages of goto/computed goto over while-switch seem to be entirely related to performance due to low-level things like assembly branch prediction.

As i said here, while-switch can be converted to computed goto by an optimization pass in the compiler. Even if llvm doesn't have this, we can do it on the zig side. Even if it proves to be difficult to do, we can add a language feature later that does this better than general-purpose gotos with respect to communicating intent precisely.

And we're not discussing removing computed goto, because zig doesn't have that. This proposal is to remove goto with hardcoded labels. I don't know if re2c uses hardcoded goto or computed goto, because I couldn't find any examples of the output C/C++ code.

PavelVozenilek · 2017-12-04T21:43:18Z

@thejoshwolfe:

Why the dislike for goto, though? It is present in many languages and I never read about problems with it. True, Dijkstra complained two generations ago, but this was different goto than what is used these days.

So there's simple concept proven in practice versus novelty feature. This novelty feature is IMHO harder to think about and less expressive than the old one. I personally do not like novelties and would recommend to keep as close as possible with C syntax/features.

thejoshwolfe · 2017-12-04T22:30:11Z

There are good reasons to have defer, switch, break, and continue, so those features get to be in the language. Now that we have those features, and we want only one obvious way to do things, what reasons are left for goto? The burden of proof is on arguing for the inclusion of features, not on the omission of them.

But I'll still give my opinion on goto from a compiler design perspective, which is mostly negative:

goto is much more complicated than it appears, because Zig's language features are designed for structured, {block}-based semantics, and goto generally violates structured programming. This complexity comes from things like variable scope and initialization and defer triggers. For example, this code is a compile error:

fn foo() {
    if (is_error()) goto handle_error;
    var value = bar();
    return value;
handle_error:
    return error.Foo;
}

If you can't tell why that's a compile error, then that's evidence that goto is complicated and a bit confusing. The general problem is that you should have used and if, but you used goto instead, and now you have to deal with the complexities of that decision.

To make unstructured goto work with structured Zig, there's some weird conversions that happen behind the scenes. The one relevant here is that every variable declaration or defer statement effectively starts a new scope that lasts until the end of the containing scope. It is illegal to jump into a scope, which is why the above example is an error. If we didn't have goto, there would be little motivation to preserve this strategy of creating invisible, effective scopes.

So that's my opinion on goto. But I'll repeat what I said above: The burden of proof is on the inclusion of goto. Without compelling usecases for goto, it will not be included in Zig.

PavelVozenilek · 2017-12-04T23:03:07Z

@thejoshwolfe: I do not know about implementation issues, and cannot argue about these. But it seems to be sold here as backward, dangerous feature to be replaced by shiny new thing.

(I remember goto horribleness was discussed long ago in D, but then Walter Bright said that he converts all control flow constructs in gotos internally.
http://forum.dlang.org/thread/[email protected]?page=3#post-l2f5bv:24rie:241:40digitalmars.com )

Also some proposed syntax could be written using existing constructs, as mentioned here. But I would not like this kind of minimalism.

pluto439 · 2017-12-05T07:32:25Z

So I now have to write a more ugly code to make compiler a bit simplier. I thought your goal was to make close to metal programs, which is why you avoided garbage collection. Removing goto means going in the opposite direction.

Eh, whatever, I need to make my own language anyway.

thejoshwolfe · 2017-12-05T17:28:16Z

So I now have to write a more ugly code to make compiler a bit simplier.

Nope. Making the compiler simpler is just my opinion. The real argument for removing goto is that we don't need goto.

I thought your goal was to make close to metal programs

That's over simplified. We want to make optimal, readable, maintainable, etc. programs. The Zen of Zig does not include imitating assembly. If you can make an argument that goto is in line with the Zen of Zig, then we can have a discussion. Right now goto is in violation of "one obvious way to do things". There are arguments for and against the readability of goto, so I don't think we can agree there; we'll need to argue from a different perspective.

Pseudo code doesn't impress me much, because it's possible to invent pseudo code to justify any imaginable language feature. Pseudo code does not demonstrate a need for goto; real code does. I'll reiterate that all the real uses of goto mentioned in this thread I have analyzed, and they can all be replaced by defer, break, continue, return, switch, etc. We don't have a need for goto yet in this discussion.

pluto439 · 2017-12-06T23:20:59Z

one obvious way to do things

defer already breaks it. You can make the same code even without defer, and it will be even more oblivious. Defer was made for exception handling (in D, by a different name), it doesn't make much sense by itself, and zig doesn't have exceptions. (Well, it has one, but it's one way and is always fatal.)

every variable declaration or defer statement effectively starts a new scope that lasts until the end of the containing scope. It is illegal to jump into a scope

I think I talked about this here #594 (comment) . Gotoing after a variable creation will mean that a variable exists, but is simply uninitialized. Also look at this, gotoing over a declaration is ok for variables and fixed-length arrays, but not ok for variable-length arrays https://stackoverflow.com/questions/29880836/skip-variable-declaration-using-goto

When time comes to livecoding, you will be jumping all over the place in the debugger. You know how in visual studio you can just drag the yellow arrow all over the place to go back or forward? Internally, this is the "goto". Why forbid something in the language, if you will need that in the debugger later? Defer will somewhat complicate debugging too, because code wouldn't flow from top to bottom anymore, it will return in the defer sometimes. Imagine gotoing into the defer block, what will happen at exit? (It will run all defers in the code block in the reverse order, starting from the one it is in right now.) Goto is the way the hardware works, it's very natural simply for that reason.

Removing goto will mean that I will have to create many code blocks that I overwise wouldn't need. With goto I'm 100% sure that everything I can imagine I will be able to implement. I'm imagening something like <> diamond, where code blocks will have intersect. Or some very complex expressions in the "if" statements, when the only alternative to goto will be creating extra variables, and compiler will not be able to optimize them out back to goto properly. Or a situation when I will need to do uninitialization of something in a very specific order, that the standard defer will not be able to handle.

I can't invest any time in zig unless you figure out the situation with exeptions and goto. Especially exceptions, because it affects the rest of a program significally. Exceptions can be faster than error codes if used carefully. #578 (comment)

paulie-g · 2020-08-17T07:41:44Z

That's going to be time-consuming, in that I'd have to a) create an artificial example, and b) create a benchmark harness in zig that works well enough to capture the difference (non-trivial, wasn't trivial in C either) - the difference is on the order of ~1 cycle for keys with len <= 32 iirc and a poisson distribution of key lengths iirc. Plus an extra 1 cycle that clang loses over gcc, which I'm assuming is in codegen.

Realistically, I'm not sure when I would get to that. Out of curiosity, what was gained by the removal? I read the original thread, and this one, and didn't get a sense of what drove it.

Trung0246 · 2020-09-28T18:58:11Z

Hi,

Is this issue still active? If so I want to try my attempt to post my code that uses goto here...

pfgithub · 2020-09-28T18:59:15Z

@Trung0246 You can do that

Trung0246 · 2020-09-28T19:19:26Z

I don't have that much ziglang experience but I did read through the documentation

Code: https://pastebin.com/vxS5WSmk (removed some unnecessary part, example usage is probably at the botom)
Another usage example: https://pastebin.com/2eDmm762

Here is own personal code that uses goto to jumps into "if" scope (you can ctrl+f to search for it anyways)

Basically the logic of insert inside Manage class will be creating a new "event group" and "event node" by passing required tuple as arguments and passing a lambda wrapper for insert (a.k.a METHODS::INSERT) or emplace (a.k.a METHODS::EMPLACE) to actually put those things inside std::multimap or std::multiset.

The group_key will be used to check if the "event group" already exist, then it will attempt to skip creating new "event group" object and reuse the old one that have the same group_key. The problematic part is that either we reuse "event group" or create new "event group" we will have to put the "event node" we created into them.

I think I could refactor the code by using 2 lambdas in C++ and same for ziglang but then it's not really elegant compare to using a single goto (or may not even possible, I'm not sure yet because of complex type system of C++).

If you have any question just ask (I know my code is ridiculously over engineered 😂)

andrewrk · 2020-09-28T19:47:41Z

Does translate-c support goto yet? Once that's done we can tell people to write their goto code in C and see how it translates into zig 😄

mb64 · 2020-11-28T10:14:36Z

Here is a use-case for goto that I haven't been able to find a replacement for in Zig. The fastest way to tokenize text in general is with a DFA. This is easily expressed with goto, and hard to efficiently express without goto.

Here's a godbolt comparing a C goto implementation and a Zig while (true) { switch (st) { ... } } implementation. LLVM doesn't optimize away the state variable.

Gotos aren't the only way to represent DFAs -- there's the (IMO nicer) way of using a set of mutually tail-recursive functions. There are two problems with this, though: there's no TCO in debug mode, and LLVM still doesn't optimize it as well as it does the goto version.

Note that this is a different use-case to computed goto -- LLVM actually optimizes while (true) { switch (@intToEnum(OpCode, code[pc])) { ... } } to use the same table dispatch as computed gotos. As for why it optimizes this but not regular gotos, my guess is that it's because the DFA has many dispatch points where the state is calculated (each st = .whatever; continue), as opposed to just one at the top of the loop.

ikskuh · 2020-11-28T12:36:21Z

Gotos aren't the only way to represent DFAs -- there's the (IMO nicer) way of using a set of mutually tail-recursive functions. There are two problems with this, though: there's no TCO in debug mode, and LLVM still doesn't optimize it as well as it does the goto version.

@mb64 you can use the guaranteed tail call optimization of zig:

fn lex_a(data: *Data) callconv(.C) void
{
    while(data.cursor < data.input.len and data.input[data.cursor] == 'a') {
        data.a_count += 1;
        data.cursor += 1;
    }
    return @call(.{ .modifier = .always_tail }, lex, .{data});
}

Full example is here: https://zig.godbolt.org/z/K1c817

windows-h8r · 2022-01-19T23:12:03Z

There are good reasons to have defer, switch, break, and continue, so those features get to be in the language. Now that we have those features, and we want only one obvious way to do things, what reasons are left for goto? The burden of proof is on arguing for the inclusion of features, not on the omission of them.

But I'll still give my opinion on goto from a compiler design perspective, which is mostly negative:

goto is much more complicated than it appears, because Zig's language features are designed for structured, {block}-based semantics, and goto generally violates structured programming. This complexity comes from things like variable scope and initialization and defer triggers. For example, this code is a compile error:
fn foo() {
    if (is_error()) goto handle_error;
    var value = bar();
    return value;
handle_error:
    return error.Foo;
}
If you can't tell why that's a compile error, then that's evidence that goto is complicated and a bit confusing. The general problem is that you should have used and if, but you used goto instead, and now you have to deal with the complexities of that decision.

To make unstructured goto work with structured Zig, there's some weird conversions that happen behind the scenes. The one relevant here is that every variable declaration or defer statement effectively starts a new scope that lasts until the end of the containing scope. It is illegal to jump into a scope, which is why the above example is an error. If we didn't have goto, there would be little motivation to preserve this strategy of creating invisible, effective scopes.

So that's my opinion on goto. But I'll repeat what I said above: The burden of proof is on the inclusion of goto. Without compelling usecases for goto, it will not be included in Zig.

How is this

fn foo() !u8 {
    if (is_error()) goto handle_error;
    var value = bar();
    return value;
handle_error:
    return error.Foo;
}

different from this?

fn foo() !u8 {
    attempt: {
        if (is_error()) break :attempt;
        var value = bar();
        return value;
    }
    return error.Foo;
}

andrewrk · 2022-01-19T23:14:45Z

How is this ... different from this?

It's not really. Zig essentially has goto via labeled breaks

SpexGuy · 2022-01-19T23:29:56Z

In one particular way it's very different and much better. The version with goto skips over the initialization of value, to a place where value is in scope and has not been initialized. This is a compile error in C/C++. Zig's use of scopes makes the problem disappear, since you can only break to an outer scope, and any inner variables disappear at the same time.

clausecker · 2022-01-19T23:54:13Z

How is this ... different from this?

It's not really. Zig essentially has goto via labeled breaks

Labeled break is essential goto with a builtin off-by-one error, since the break goes to the statement after the label instead of the labelled statement. It also causes way too many levels of nesting to appear (something I want to avoid).

Validark · 2022-11-09T22:26:26Z

Is there a good way to jump to a specific location within a while loop from outside said while loop? E.g.

while (true) {
    // logic A
    if (condition) goto :LABEL;
    // logic B
    if (condition2) break;
}

while (true) {
    // logic C
    LABEL:
    // logic D
}

In a language like C# I would implement the 2nd while loop as a goto to make LABEL in scope, but in Zig I am not sure this is possible to express.

Of course, one could always use a boolean switch like so:

const b = LABEL: while (true) {
    // logic A
    if (condition) break :LABEL false;
    // logic B
    if (condition2) break :LABEL true;
};

while (true) {
    if (b) {
        // logic C
    }
    b = true;
    // logic D
}

To me, doing this is more complicated than it needs to be. I would prefer a goto statement. However, I would settle for someone telling me how I could get LLVM to optimize this to be equivalent in the emitted assembly. In my code, the b condition does not get optimized away. No matter where I move the b = true line, even when I put it at the end of the second while loop, I still get this in assembly:

        mov     al, 1
        test    al, 1
        jne     .LBB1_443 ; why do we need to check if ((1 & 1) != 0)? 😔 
        jmp     .LBB1_460 ; this is impossible. We all know 1 and 0 are not equal.

(I am using ReleaseFast on zig trunk on godbolt)

jido · 2022-11-09T23:41:50Z

Your solution would require to guard the logic between the two loops if there is any too:

while (true) {
    // logic A
    if (condition) goto :LABEL;
    // logic B
    if (condition2) break;
}
// logic I
while (true) {
    // logic C
    LABEL:
    // logic D
}

Actually it looks like a state machine, so I would express it with explicit state:

var state = 0;
while (true) {
   if (state == 0) {
      // logic A
      if (condition) state = 1 else state = 2;
   }
   else if (state == 1) {
       // logic B
       if (condition2) state = 3 else state = 0;
   }
   else if (state == 2) {
       // logic D
       state = 4;
   }
   else if (state == 3) {
       // logic I
       state = 4;
   }
   else if (state == 4) {
       // logic C
       state = 2;
   }
}

InKryption · 2022-11-10T00:02:33Z

Just to add, that solution specifically would be further improved by #8220:

const initial_state = ;
sw: switch (@as(u3, 0)) {
    0 => {
        // logic A
        continue :sw if (condition) 1 else 2;
    },
    1 => {
        // logic B
        continue :sw if (condition2.*) 3 else 0;
    },
    2 => {
        // logic D
        continue :sw 4;
    },
    3 => {
        // logic I
        continue :sw 4;
    },
    4 => {
        // logic C
        continue :sw 2;
    },
    else => unreachable,
}

Validark · 2022-11-11T00:55:27Z

For reference, here the code I am working on: https://zig.godbolt.org/z/sjzYj7nGG. It is not cleaned up very much yet but what I am doing is trying to find the most optimal way to implement my DynSDT data structure: https://validark.github.io/DynSDT

You can see I have multiple implementations of the topKCompletionsToPrefix function, and I am experimenting with different techniques like using sentinels to avoid boundary checks, storing the data in plain variables, using SIMD techniques, etc. Also I am looking at what the right balance is between inlining more loop iterations where possible or condensing things. A lot of fun stuff, but if it looks messy or unpolished, that is why.

The function topKCompletionsToPrefix on line 806 is the code I was working on most recently. I put some comments in there describing the assembly generation I am seeing that seems suboptimal. Line 9504 in the assembly is what I was alluding to in my previous comment. Although it now is in a different register:

        mov     bpl, 1
        test    bpl, 1
        jne     .LBB16_44
        jmp     .LBB16_50

The goto jump that I was hoping for has to do with do_cur_insertion. This code is what I mean:

if (depq_len == 10 - k and next_i != NULL) {
    do_cur_insertion = false;
    break; // this should skip the if (do_cur_insertion) statement in the following loop
}

It should jump to the second half of this loop:

while (true) {
    if (do_cur_insertion) {
        // ... 
    }

    // please jump here!
    do_cur_insertion = true;

    if (next_i != NULL) {
        // ...
    }

    depq_len -%= 1;
    if (depq_len == std.math.maxInt(@TypeOf(depq_len))) return @intCast(u4, k);
    cur_i = depq[depq_len];
}

Instead, it jumps to code that checks test bpl, 1 (bpl is do_cur_insertion), followed by je .LBB16_50 to potentially skip over the if (do_cur_insertion) section we are guaranteed to skip when coming from the zig source above. Then the end of my second loop ends as shown in the asm above, with writing a 1 to bpl and then testing to see if bpl is 1. Ideally, the compiler should realize that the program never actually needs to check do_cur_insertion at all, so long as we jump to the place after the if (do_cur_insertion) statement from the aforementioned break expression. From then on, we should always run that code when we loop back to the top of the while loop.

More complaints about LLVM

Oddly enough, in that same function, not related to the issue of goto, this code:

k += 1;
if (k == 10) return 10;

translates to this assembly:

        inc     al
        cmp     al, 10
        je      .LBB16_45
        ; ... stuff
.LBB16_45:
        mov     al, 10
        jmp     .LBB16_2

Randomly scrolling through the assembly, I found another example:

        cmp     r11b, 4
        jne     .LBB16_38 ; if (r11b != 4)
        jmp     .LBB16_41 ; if (r11b == 4)

        ; ...
; this is only referenced once in the assembly, so it was generated specifically to go with the previous jmp!
.LBB16_41:
        vpextrd esi, xmm0, 3
        mov     r11b, 4 ; r11b = 4 😔

Maybe this is just a problem in general for LLVM? The emit is also pretty temperamental. Changing the comments in the code can lead to instructions changing order, which is odd (when the instruction order does not matter). My first assembly in this comment sometimes gets one of the other mov instructions from the function logic under its initial mov instruction. It also seems to do some optimizations sometimes but other times it won't. Consider this code from my original version of the first while loop:

if (depq_len == 0) return k;
depq_len -= 1;

This gives me this emit:

.LBB16_38:
        test    r11b, r11b
        je      .LBB16_2
.LBB16_39:
        dec     r11b

However, this same code in the second while loop gives me this emit:

sub     r11b, 1
jb      .LBB16_2

I can ask more forcefully for this optimization by using this code instead:

depq_len -%= 1;
if (depq_len == std.math.maxInt(@TypeOf(depq_len))) return k;

Yes, this is something of a micro-optimization, but it is still weird to me that if you use the exact same code to check if (depq_len == 0) in the second while loop, it gives me the more optimized asm with sub r11b, 1, followed by jb .LBB16_2, but it won't do the same in the first while loop. So it will do the optimization only sometimes, even though the circumstances are roughly the same and the state of depq_len does not matter when returning from the function happens.

Validark · 2022-11-11T18:33:41Z

Update: I tried changing the code to look more like a state machine by using an enum rather than a boolean, and it translated the end of each block to this instead:
https://zig.godbolt.org/z/1YYxMo9vE

    xor     r9b, 1
    jmp     .LBB16_43

; ...

.LBB16_43:
    test    r9b, 1
    je      .LBB16_51

At the moment, it might be impossible to achieve what I am trying to do. If anyone wants to take a crack at it, I would be happy to be shown there is a way to get the optimized output I am looking for. Otherwise, maybe #8220 will guarantee the jumps I am trying to do?

I also tried implementing this via tail calls: https://zig.godbolt.org/z/Gj9qeahMK
It gave me the jumps I was hoping for, but isn't the cleanest solution in terms of Zig code, IMO. I think Zig should be capable of doing this within a single function. I'm fine with using this solution for the time being, though.

mnemnion · 2024-04-22T02:15:53Z

This is a good example of idiomatic goto in a VM dispatch loop: https://github.com/sqmedeiros/lpeglabel/blob/master/lplvm.c

It only stands as an argument for goto if there isn't a better way to write it in Zig.

Is there?

mlugg · 2024-04-22T02:32:46Z

@mnemnion

The goto fails are easily modeled with either function calls or #8220 depending on specifics. goto pushcapture, either function calls or combining the switch cases.

I really don't see any logic in that file which has a compelling use of goto.

(Regardless, I should note that this part of Zig is essentially as decided as it gets; goto is not coming back.)

mnemnion · 2024-04-25T15:24:37Z

I don't have opinions about whether Zig should, or should not, have goto.

I saw this in the first comment:

Please everyone post links to your zig code that uses goto, so we won't remove the feature. Alternatively, post links to C code that uses goto, so we can assess if there's a more zig-like way to do it, or if goto is really the best solution for those cases.

And then read a lot of abstract discussion, with relatively few real-life examples of nontrivial goto, where it might be challenging to produce comparable performance without using the construct. It's clear that the most common pattern, which is a jump to cleanup code, isn't necessary in Zig.

Here is another example, also from the Lua team, where eight gotos are used to structure a VM dispatch loop. Lua is renowned for its relative speed and very small binary. They're using goto because (in C) they need to.

I'm evaluating Zig for writing a VM, so I'm keenly interested in whether it's possible to get the same performance, and a clearer instruction flow, using what Zig has now. If it is, why would anyone want goto? If it isn't, that's a marked lack in a language like Zig, which should certainly be usable to craft an optimal dispatch loop for a VM.

You might look at the question this way: if someone replies with a detailed answer, and demonstration that the machine code generated by a Zig version is broadly equivalent, that's conclusive: Zig doesn't need goto, because there aren't other cases where it's used to good effect. Providing branchless state-machine effects in a hot dispatch loop is it, that's what goto is good for in C, other than making up for some defects which Zig has already remedied.

But I did read all the comments, and code like this wasn't addressed (a related question about computed gotos was). So, given the specific call to action in the first post, I thought it was appropriate to resurrect the discussion.

Validark · 2024-04-25T15:57:16Z

@mlugg's answer is still applicable. #8220 would allow you to do arbitrary jumps without the normal footguns associated with goto. Right now, you can only do it with tail call optimization, which is a massive pain and doesn't work on some platforms (iirc WASM). Right now, you can write your VM as a regular switch statement, and once #8220 comes out you can switch over.

Otherwise, you might want to use DynAsm or something that allows you to have precise control over register allocation if you really want to write a performant VM. A high level language + LLVM, by itself, is not going to be the ideal choice for a VM if you want to beat LuaJIT.

If you're interested in more ideas, I'd look at https://github.com/luajit-remake/luajit-remake

mnemnion · 2024-04-25T17:29:32Z

Thanks for the pointer to that LuaJIT project, definitely the sort of thing I'm interested in generally!

Custom ASM-enhanced VMs have their place, as does tracing. There is also a place for portable platform-independent VMs which perform well: so Lua, not LuaJIT.

Reading #8220 more closely, I'm beginning to see how it would more than likely cover the parts of a VM's state machine which can't be expressed in C without goto or compromises. "You can't yet, but we know that and have a better solution on the roadmap" is a reasonable answer to my question. Thanks for taking the time to reply.

mlugg · 2024-04-25T17:45:42Z

I'd like to note that the use case you bring up is actually very important to the compiler itself. The main loop of semantic analysis in the Zig compiler - our primary bottleneck excluding LLVM - is essentially a VM instruction dispatch loop, which in fact largely motivated #8220 in the first place. Similar loops also exist in all of our code generation backends. So emitting optimal assembly here is definitely something we want to make sure we can do!

(Incidentally, I am actively working on #8220, and expect to have a PR up within days.)

floooh · 2024-05-15T08:56:51Z

Late to the party, but I think I have an interesting use case for goto in C to contribute: at the end of a switch-case branch to jump to a handful of different 'epilog blocks' (this is code-generated):

https://github.com/floooh/chips/blob/bd1ecff58337574bb46eba5e7e16937899360e56/chips/z80.h#L4706-L4767

The compiler output is exactly as you'd expect (in native code at least): the code branches via a jump-table, and the end of each case-branch has a direct jump to one of the three epilog blocks. Empty case branches are "short-cutted" and directly have the address of the goto target in the jump table.

I didn't inspect WASM output, but performance is close to native code, so I guess it's fine.

Here's a blogpost about how that emulator works in general (only the first couple of sections are interesting): https://floooh.github.io/2021/12/17/cycle-stepped-z80.html

PS: as a special case, the _wait() macro (which implements the Z80's WAIT handling) jumps out of the middle of a switch-case branch to the NMI edge-detection epilog block:

https://github.com/floooh/chips/blob/bd1ecff58337574bb46eba5e7e16937899360e56/chips/z80.h#L1805

...think of it as a giant state machine, not unlike the result of an async/await code transformation.

PPS: I'm planning to do some emulator experiments in Zig soon-ish, and I hope I can somehow emulate this construct without a C-like goto (maybe with nested named blocks and labelled break?) - in any case I kinda like how straightforward the C version is for this sort of low-level "outside-the-box" contraption :)

Validark · 2024-09-25T08:26:22Z

(@floooh) PPS: I'm planning to do some emulator experiments in Zig soon-ish, and I hope I can somehow emulate this construct without a C-like goto (maybe with nested named blocks and labelled break?) - in any case I kinda like how straightforward the C version is for this sort of low-level "outside-the-box" contraption :)

I ported a small amount of the code for you, just enough to get the point across. Here is what I came up with:

fn z80_tick(cpu: *z80_t, pins_: u64) u64 {
    var pins = pins_ & ~(Z80_CTRL_PIN_MASK | Z80_RETI);

    sw: switch (@as(enum { start, fetch_next, step_next, track_int_bits }, .start)) {
        .start => switch (cpu.step) {
            1506 => {
                if (pins & Z80_WAIT) continue :sw .track_int_bits;
                cpu.iff1 = false;
                continue :sw .step_next;
            },
            1515 => continue :sw .step_next,
            1516 => continue :sw .fetch_next,
        },

        .fetch_next => {
            pins = cpu.fetch(pins);
            continue :sw .step_next;
        },

        .step_next => {
            cpu.step += 1;
            continue :sw .track_int_bits;
        },

        .track_int_bits => {
            // track NMI 0 => 1 edge and current INT pin state, this will track the
            // relevant interrupt status up to the last instruction cycle and will
            // be checked in the first M1 cycle (during _fetch)
            const rising_nmi: u64 = (pins ^ cpu.pins) & pins; // NMI 0 => 1
            cpu.pins = pins;
            cpu.int_bits = ((cpu.int_bits | rising_nmi) & Z80_NMI) | (pins & Z80_INT);
        },
    }
}

With the new labeled-switch-continue feature, you can now get the goto control flow that you want!

floooh · 2024-09-27T10:59:32Z

@Validark wow that looks interesting, I hadn't even considered such an "outer switch statement", very neat!

In the meantime I used a slightly different approach using 'labeled break':

https://github.com/floooh/chipz/blob/fdd72d54e3da85604c70ee062e09780437a48d31/src/chips/z80.zig#L1196-L1209

...basically using nested labeled scope blocks, and the labeled break at the end of switch statement controls which 'inner' scope blocks to skip. It works, but I like your approach more. The labeled continue approach immediately looks very useful for building all sorts of 'switch-based state machines' :)

clausecker · 2024-09-27T11:05:50Z

Amazing how much complexity you have to throw at your code just to be able to avoid goto. Is it really worth it?

floooh · 2024-09-27T11:16:22Z

IMHO the above example isn't any more complex than the original goto-based code in C, and while I'm not a 'goto considered harmful zealot' myself, the C/C++ goto does have a couple of problems that are worth fixing (like being able to skip variable declarations and initializations).

In the long run there might even be some advantages if the compilation target is a 'structured bytecode' like WASM (but that's speculation on my side).

Validark · 2024-09-27T13:13:00Z

@floooh Yes, in this case you could use labeled_breaks as well.

fn z80_tick(cpu: *z80_t, pins_: u64) u64 {
    var pins = pins_ & ~(Z80_CTRL_PIN_MASK | Z80_RETI);

    track_int_bits: {
        step_next: {
            fetch_next: switch (cpu.step) {
                1506 => {
                    if (pins & Z80_WAIT) break :track_int_bits;
                    cpu.iff1 = false;
                    break :step_next;
                },
                1515 => break :step_next,
                1516 => break :fetch_next,
            }

            pins = cpu.fetch(pins);
        }
        cpu.step += 1;
    }

    // track NMI 0 => 1 edge and current INT pin state, this will track the
    // relevant interrupt status up to the last instruction cycle and will
    // be checked in the first M1 cycle (during _fetch)
    const rising_nmi: u64 = (pins ^ cpu.pins) & pins; // NMI 0 => 1
    cpu.pins = pins;
    cpu.int_bits = ((cpu.int_bits | rising_nmi) & Z80_NMI) | (pins & Z80_INT);
}

However, I think this code is more confusing as written above, since I am labeling the block with what happens after it, which doesn't make sense. With labeled breaks you are saying what to break out of, whereas with goto/labeled-continue we are saying what to jump to. If I thought of it as "we can now stop X process", I would write it as a "break", but if I thought of it as "jump to X process", I would write it as a labeled continue. Also the labeled continue version is significantly more powerful, and can express arbitrary control flows that cannot be expressed with the cascading blocks strategy.

thejoshwolfe added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Nov 28, 2017

thejoshwolfe mentioned this issue Nov 28, 2017

labeled loops, labeled break, labeled continue #346

Closed

andrewrk added this to the 0.2.0 milestone Nov 28, 2017

andrewrk added the accepted This proposal is planned. label Nov 28, 2017

andrewrk mentioned this issue Mar 12, 2021

introduce labeled continue syntax inside a switch expression #8220

Closed

kassane mentioned this issue Jul 6, 2023

Suggestion: Zig support skvadrik/re2c#451

Closed

Remove goto #630

Remove goto #630

Comments

thejoshwolfe commented Nov 28, 2017 • edited Loading

PavelVozenilek commented Nov 28, 2017

andrewrk commented Nov 28, 2017

andrewrk commented Nov 28, 2017

PavelVozenilek commented Nov 29, 2017

andrewrk commented Nov 29, 2017

PavelVozenilek commented Nov 29, 2017

PavelVozenilek commented Nov 29, 2017 • edited Loading

ghost commented Nov 30, 2017

pluto439 commented Dec 4, 2017

pluto439 commented Dec 4, 2017

jido commented Dec 4, 2017

pluto439 commented Dec 4, 2017

jido commented Dec 4, 2017

pluto439 commented Dec 4, 2017

Ilariel commented Dec 4, 2017 • edited Loading

pluto439 commented Dec 4, 2017

andrewrk commented Dec 4, 2017

pluto439 commented Dec 4, 2017

thejoshwolfe commented Dec 4, 2017

PavelVozenilek commented Dec 4, 2017

thejoshwolfe commented Dec 4, 2017

PavelVozenilek commented Dec 4, 2017

pluto439 commented Dec 5, 2017

thejoshwolfe commented Dec 5, 2017

pluto439 commented Dec 6, 2017

paulie-g commented Aug 17, 2020

Trung0246 commented Sep 28, 2020

pfgithub commented Sep 28, 2020

Trung0246 commented Sep 28, 2020 • edited Loading

andrewrk commented Sep 28, 2020

mb64 commented Nov 28, 2020

ikskuh commented Nov 28, 2020

windows-h8r commented Jan 19, 2022

andrewrk commented Jan 19, 2022

SpexGuy commented Jan 19, 2022

clausecker commented Jan 19, 2022

Validark commented Nov 9, 2022 • edited Loading

jido commented Nov 9, 2022

InKryption commented Nov 10, 2022

Validark commented Nov 11, 2022 • edited Loading

Validark commented Nov 11, 2022 • edited Loading

mnemnion commented Apr 22, 2024

mlugg commented Apr 22, 2024

mnemnion commented Apr 25, 2024

Validark commented Apr 25, 2024

mnemnion commented Apr 25, 2024

mlugg commented Apr 25, 2024

floooh commented May 15, 2024 • edited Loading

Validark commented Sep 25, 2024 • edited Loading

floooh commented Sep 27, 2024 • edited Loading

clausecker commented Sep 27, 2024

floooh commented Sep 27, 2024

Validark commented Sep 27, 2024

thejoshwolfe commented Nov 28, 2017 •

edited

Loading

PavelVozenilek commented Nov 29, 2017 •

edited

Loading

Ilariel commented Dec 4, 2017 •

edited

Loading

Trung0246 commented Sep 28, 2020 •

edited

Loading

Validark commented Nov 9, 2022 •

edited

Loading

Validark commented Nov 11, 2022 •

edited

Loading

Validark commented Nov 11, 2022 •

edited

Loading

floooh commented May 15, 2024 •

edited

Loading

Validark commented Sep 25, 2024 •

edited

Loading

floooh commented Sep 27, 2024 •

edited

Loading