Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roughly track the last token in each AST node #14391

Closed
wants to merge 1 commit into from

Conversation

tomc1998
Copy link
Contributor

This is important for tools like LSP servers, because we can't properly reconstruct the extents of an AST node just from its structure. Take the following example:

fn foo() void {
    var bar = 10;
    if (ba
}

If the cursor is positioned after the if (ba, zls will not autocomplete bar. This is because Ast.lastToken tries to find the end of the function by traversing the children, so it never includes the final }, meaning zls doesn't think that the cursor is inside foo.

To remedy this, I've added an parser option that will record the last token of ast nodes as well as the first. Optional so that we don't take a performance&memory hit when we don't need to track this data.

I've been using a forked zig with this based on 0.10.0 along with a patched zls for the past couple months, and it works fine.


Here is an example that you can run to see what I'm talking about:

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const alloc = gpa.allocator();
    const my_src =
        \\pub fn foo() void {
        \\var bar = 10;
        \\    if (b
        \\}
    ;

    const my_ast = try std.zig.parse(alloc, my_src);

    // Find the 'foo' fn_decl node
    const decls = my_ast.rootDecls();
    std.debug.assert(decls.len == 1);
    const foo_decl = decls[0];

    // Find the last token for this fn_decl node
    const last_tok_ix = my_ast.lastToken(foo_decl);
    const last_tok_offset = my_ast.tokens.items(.start)[last_tok_ix];

    // Print the offset of the last token
    try std.io.getStdOut().writer().print(
        "Source is {} bytes long, last tok in foo is {}\n",
        .{ my_src.len, last_tok_offset },
    );
}

This prints:

Source is 45 bytes long, last tok in foo is 32

It thinks the last token of the function decl foo starts at 32, but the source is 45 bytes long. With correct tracking of AST extents, this should show 44 instead.

This is important for tools like LSP servers, because we can't properly
reconstruct the extents of an AST node just from its structure
@Vexu
Copy link
Member

Vexu commented Jan 21, 2023

LSP servers should really be using a custom parser that is designed to produce valid output on bad input. Attaching options like this onto a parser that is designed to be as fast as possible on valid input is not sustainable on the long run.

@tomc1998
Copy link
Contributor Author

This costs 1 predictable branch per node to fix a huge issue in the main zig LSP server.

The change is very high value for basically 0 cost, up until ZLS has a custom parser, at a time when the zig syntax is still changing.

ZLS is incredibly frustrating to use without this change, because currently you don't get completions if you're in the middle of writing an if or while or switch stmt at the end of a block - an incredibly common situation. It's so frustrating that I stayed on and old forked 0.10.0 build of zls for months, almost entirely for this change, despite other zls crashes being fixed in upstream.

I'm asking that you consider this change on it's own, not as a slippery slope to some generalised tree sitter - which I don't want.

@andrewrk
Copy link
Member

Thank you for taking the time to send this patch, and for explaining the context for it.

This zig parser exists primarily for this codebase - the one whose issue tracker we are commenting on. It is exposed in the standard library for the benefit of any tooling that wants to use the very same parser as the compiler itself. For some use cases, this may not be the best parser implementation to use. We have a fully-specified grammar should anyone want to create a custom parser that is designed to be used in an IDE plugin.

https://github.com/zigtools/zls is a third-party project. Bugs in that project are not tracked as issues in this project unless they are caused by an issue with the zig compiler itself. In fact, this issue tracker has an old issue open - #615 - for implementing a server in the compiler designed to work with IDEs. This will tackle some of the same problems, but in a different way. I could foresee this sub-project motivating similar changes in zig's main parser, but I do not wish to modify zig's main parser to better support a third party Language Server Protocol implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants