Roughly track the last token in each AST node #14391

tomc1998 · 2023-01-20T21:39:57Z

This is important for tools like LSP servers, because we can't properly reconstruct the extents of an AST node just from its structure. Take the following example:

fn foo() void {
    var bar = 10;
    if (ba
}

If the cursor is positioned after the if (ba, zls will not autocomplete bar. This is because Ast.lastToken tries to find the end of the function by traversing the children, so it never includes the final }, meaning zls doesn't think that the cursor is inside foo.

To remedy this, I've added an parser option that will record the last token of ast nodes as well as the first. Optional so that we don't take a performance&memory hit when we don't need to track this data.

I've been using a forked zig with this based on 0.10.0 along with a patched zls for the past couple months, and it works fine.

Here is an example that you can run to see what I'm talking about:

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const alloc = gpa.allocator();
    const my_src =
        \\pub fn foo() void {
        \\var bar = 10;
        \\    if (b
        \\}
    ;

    const my_ast = try std.zig.parse(alloc, my_src);

    // Find the 'foo' fn_decl node
    const decls = my_ast.rootDecls();
    std.debug.assert(decls.len == 1);
    const foo_decl = decls[0];

    // Find the last token for this fn_decl node
    const last_tok_ix = my_ast.lastToken(foo_decl);
    const last_tok_offset = my_ast.tokens.items(.start)[last_tok_ix];

    // Print the offset of the last token
    try std.io.getStdOut().writer().print(
        "Source is {} bytes long, last tok in foo is {}\n",
        .{ my_src.len, last_tok_offset },
    );
}

This prints:

Source is 45 bytes long, last tok in foo is 32

It thinks the last token of the function decl foo starts at 32, but the source is 45 bytes long. With correct tracking of AST extents, this should show 44 instead.

This is important for tools like LSP servers, because we can't properly reconstruct the extents of an AST node just from its structure

Vexu · 2023-01-21T23:33:11Z

LSP servers should really be using a custom parser that is designed to produce valid output on bad input. Attaching options like this onto a parser that is designed to be as fast as possible on valid input is not sustainable on the long run.

tomc1998 · 2023-01-22T11:34:38Z

This costs 1 predictable branch per node to fix a huge issue in the main zig LSP server.

The change is very high value for basically 0 cost, up until ZLS has a custom parser, at a time when the zig syntax is still changing.

ZLS is incredibly frustrating to use without this change, because currently you don't get completions if you're in the middle of writing an if or while or switch stmt at the end of a block - an incredibly common situation. It's so frustrating that I stayed on and old forked 0.10.0 build of zls for months, almost entirely for this change, despite other zls crashes being fixed in upstream.

I'm asking that you consider this change on it's own, not as a slippery slope to some generalised tree sitter - which I don't want.

andrewrk · 2023-01-22T20:37:14Z

Thank you for taking the time to send this patch, and for explaining the context for it.

This zig parser exists primarily for this codebase - the one whose issue tracker we are commenting on. It is exposed in the standard library for the benefit of any tooling that wants to use the very same parser as the compiler itself. For some use cases, this may not be the best parser implementation to use. We have a fully-specified grammar should anyone want to create a custom parser that is designed to be used in an IDE plugin.

https://github.com/zigtools/zls is a third-party project. Bugs in that project are not tracked as issues in this project unless they are caused by an issue with the zig compiler itself. In fact, this issue tracker has an old issue open - #615 - for implementing a server in the compiler designed to work with IDEs. This will tackle some of the same problems, but in a different way. I could foresee this sub-project motivating similar changes in zig's main parser, but I do not wish to modify zig's main parser to better support a third party Language Server Protocol implementation.

Roughly track the last token in each AST node

9284694

This is important for tools like LSP servers, because we can't properly reconstruct the extents of an AST node just from its structure

andrewrk closed this Jan 22, 2023

llogick mentioned this pull request Oct 17, 2023

no switch case completion on function inside container zigtools/zls#1535

Open

Techatrix mentioned this pull request Oct 21, 2023

Explore modifying Zig's parser to fit the needs of a language server zigtools/zls#1536

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roughly track the last token in each AST node #14391

Roughly track the last token in each AST node #14391

tomc1998 commented Jan 20, 2023

Vexu commented Jan 21, 2023

tomc1998 commented Jan 22, 2023

andrewrk commented Jan 22, 2023

Roughly track the last token in each AST node #14391

Roughly track the last token in each AST node #14391

Conversation

tomc1998 commented Jan 20, 2023

Vexu commented Jan 21, 2023

tomc1998 commented Jan 22, 2023

andrewrk commented Jan 22, 2023