-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: improved compiler test suite (compiler errors, run and compare, translate-c, etc.) #11288
Comments
A few notes:
Note that for
I don't think we should have any notion of test dependencies. All tests should be able to be independently run from each other. I see that you addressed this in more detail in the "Explicit Test Dependencies" section. If you feel strongly that this is useful and helpful then I'm fine with going with it. My only resistance is that it's a lot of meta work for dubious benefits. I'm just imagining myself reviewing a big PR that tries to add test dependencies all over the place and then a bunch of people, who contribute suspiciously seldomly, bikeshedding over it.
Failing backends should be excluded; the default should be all backends. This follows the philosophy of the Lines 668 to 721 in 2af6971
It exists, and it is implemented everywhere except for the part where the stage2 compiler does not support
Default output mode for error tests should be
The compile errors should go at the end of the file so that adding/removing lines in the expected errors does not shift the other error lines up and down. To keep it consistent I suggest that the entire manifest is at the very end of the file and consists of all the line comments in a row at the end of the file. This also frees us up to test doc comments such as the
If we want to be able to move our existing CLI tests to this mechanism then we need a way to check the following things:
See test/cli.zig for more details. All these tests currently have bespoke logic for testing these parts of the CLI. Perhaps the best option would be to leave CLI testing alone, since the common case of exit 0 is already handled better through the other test mechanisms described in this proposal. |
I think a valid solution for this is to support
Let's drop it unless @matu3ba can convince you otherwise!
How about this wording: The The one thing I'm thinking is for compiler errors, we've already changed the wording for example of out of bounds messages to be much more explicit in stage2, so we don't want those to always run all backends.
Ack.
Greatttttt point. Updating this now.
Agreed, let's leave them alone for now, and then reconsider later. |
Just as a heads up, the way this is handled in #11284 right now is that the container doc comments are simply not considered part of the source, so they don't affect line numbers. I have no problem with either direction - It was just the choice I made at the time to get things up and running |
It'd be much easier to not modify files, because then we can pass them as-is into the compiler, specifically around test modes that require subprocessing to If you have problem either way, I'd say let's move them to the end and make them normal comments ( |
I am fine with either statement. My intention was to clarify the situation.
This does not explain, if and (if yes) how many processes are spawn to isolate error sources on usage of a thread pool.
The test output should be then defined. The error trace is buffered somewhere with allocations, so the behavior for OOM should be specified. Question on embedded manifests: |
Rephrase to mention that the error may exist during compiling or running the binary.
Edge cases: Is calling a test
Please specify what makes success and error for completeness+consistency. |
I'd say that this is a bespoke format for a bespoke use case and it doesn't belong as-is in libstd since it isn't a general purpose thing. It also isn't a particularly complex task. The tokenizer/parser is in libstd, you could build it yourself. That would be my opinion, anyways. Other projects (such as Go) do not expose their compiler test "manifest"-equivalent parsing into the standard library and have historically said the same thing: you have the parser available, build it yourself.
Good point. I would recommend outputting failures as soon as they exist so we don't have to deal with buffering and memory allocations for that (beyond the buffering necessary for building up the error message/trace for that single failure). What does the normal test runner promise? I agree specifying the OOM behavior here is a good idea.
Defer to @andrewrk for clarification here, but my interpretation was that we'd use multiple threads in a single process. These tests aren't expected to have any side effects (and it is a bug if there are). The compiler itself is the same way. If there are heisenbugs/flakey tests, we chase them down as we normally do with that class of bug. A |
👍
Good clarifications, will make some edits. My proposal:
My reasoning is just to avoid two ways to do the same thing. There is only one way to do incremental tests and that is having an explicit numeric suffix.
I did in the first sentence, but I'll repeat it for consistency and being explicit. |
|
The current translate-c tests do substring matching when checking for specific translations, because some predefined macros and constants can vary depending on the target. For example with pub const __LONG_MAX__ = @as(c_long, 2147483647); But with pub const __LONG_MAX__ = @import("std").zig.c_translation.promoteIntLiteral(c_long, 9223372036854775807, .decimal); Even if we force the tests to always use a specific target, the double underscore namespace is reserved for the compiler (clang in this case) which means an llvm version update could add/remove those types of constants, requiring all the tests to be updated even if nothing really changed. |
Thanks @ehaas, I missed that bit of nuance. I think in that case then, we should cleanly separate For the others, they should specify a manifest with the expected output and it should follow the same substring matching semantics as today. Thoughts on that? Question then, since |
Note that when you do
I'm happy to do an early merge - I didn't look at it yet because you said you wanted to push some more changes today, something about directories. Generally I think this proposal can and should be rolled out with incremental enhancements such as the one in your PR.
👍
This already exists and is already observed by the test harness :-)
It seems reasonable to have the same manifest syntax in both .c source files and .zig source files to me. |
I think embedding the manifest in C would be fine. One thing to note is that it's not always matching against one string; sometimes the test runner searches for multiple strings in the output so we'd need a good syntax for that. Example here |
Got it, I've updated the proposal to support that. The manifest would look like the following. Each case would be separated by a line with only
@andrewrk Yep, I've split out
Done. |
Now that #11530 is merged, let's break this proposal into separate, independently completable issues for each task that is left. |
Alright I'm closing this. Anyone is free to extract the unsolved issues out of this proposal into separately completable issues 👍 |
The main problem this addresses are: - Diffs are noisy. Datamodel tests involve large strings literals, and very little significant information otherwise. In a single file, even for simple changes like reordering or renaming tests, or changing indentation, the diffs get unreadable quickly, as the large blocks of Prisma schemas and validation errors get mixed with the test function names and calls to test helpers. This state of things disincentivizes reorganizing tests. The new setup has one test per file, with the test name being the file name, including its full path from the validation tests root. - Churn. We often need to subtly adjust many tests to account for trivial changes in the public APIs, when all we want to do is assert that a certain schema is valid, or invalid with a given error. This prevents us from evolving the datamodel API as much as we would like, and as a consequence of these changes being a lot of work, we have a lot of inconsistency in how tests are written and which test helpers are used. This has manifested recently in difficulties simply moving tests, without changing them, from `libs/datamodel/core` to `psl/psl`. With this test suite, we rely on a lot less context, setup and noise (variable names for expectations). We don't expect the shape of these tests to evolve much, churn should be minimal and if we need to change something, it will be possible to autofix through UPDATE_EXPECT=1. - The test suite gets slower to compile as we add more tests, and that noticeably hurts our feedback loops. This setup is less rust code to compile: each test case translates to a single test function containing a single synchronous function call. We can transparently make test runs compile no rust code at all in the future if we need to. Prior art: - In prisma-engines: we already did that in introspection engine tests in that commit: ef84679 - In other compiler-like projects, this is a common approach. Here are a few examples: - go compiler (https://github.com/golang/go/blob/master/test/assign1.go) - rustc UI tests (https://github.com/rust-lang/rust/tree/master/src/test/ui) - zig compiler (ziglang/zig#11288) next steps: same setup for reformatting tests
The main problem this addresses are: - Diffs are noisy. Datamodel tests involve large strings literals, and very little significant information otherwise. In a single file, even for simple changes like reordering or renaming tests, or changing indentation, the diffs get unreadable quickly, as the large blocks of Prisma schemas and validation errors get mixed with the test function names and calls to test helpers. This state of things disincentivizes reorganizing tests. The new setup has one test per file, with the test name being the file name, including its full path from the validation tests root. - Churn. We often need to subtly adjust many tests to account for trivial changes in the public APIs, when all we want to do is assert that a certain schema is valid, or invalid with a given error. This prevents us from evolving the datamodel API as much as we would like, and as a consequence of these changes being a lot of work, we have a lot of inconsistency in how tests are written and which test helpers are used. This has manifested recently in difficulties simply moving tests, without changing them, from `libs/datamodel/core` to `psl/psl`. With this test suite, we rely on a lot less context, setup and noise (variable names for expectations). We don't expect the shape of these tests to evolve much, churn should be minimal and if we need to change something, it will be possible to autofix through UPDATE_EXPECT=1. - The test suite gets slower to compile as we add more tests, and that noticeably hurts our feedback loops. This setup is less rust code to compile: each test case translates to a single test function containing a single synchronous function call. We can transparently make test runs compile no rust code at all in the future if we need to. Prior art: - In prisma-engines: we already did that in introspection engine tests in that commit: ef84679 - In other compiler-like projects, this is a common approach. Here are a few examples: - go compiler (https://github.com/golang/go/blob/master/test/assign1.go) - rustc UI tests (https://github.com/rust-lang/rust/tree/master/src/test/ui) - zig compiler (ziglang/zig#11288) next steps: same setup for reformatting tests
Requirement for #89
Proposal
An improved test harness for the compiler that supports the following:
run
- Build and run to completion and compare outputerror
- Build and expect specific error outputtranslate-c
- Build a “.c” file and assert an expected “zig” file is produced. The Zig file can further optionally define a subsequent test to run (i.e.run
) so you can translate C then optionally assert the proper behavior also happens.incremental
- Build and incrementally update a program. Each incremental step is itself a test case so that we can support incremental updates introducing errors, and a subsequent update fixing the errors. [stage2-only]cli
- Run a specificzig
CLI command (typically,zig build test
) to test the Zig CLI or Zig build machinery.This will be built on the existing
test-stage2
harness (src/test.zig
) and won’t a totally new implementation. This is almost purely iteratively improving the existing harness rather than writing anything new.The tests will continue to invoked using
zig build test-stage2
or similar targets.NOTE: Some of these are not new features, but are features that are currently unimplemented in the stage2 test harness or just need slight improvement in the stage2 test harness.
Backwards Compatibility
A limited amount of backwards compatibility with the existing compiler test API will be retained so we can incrementally introduce these test cases. In some cases, certain test types such as stack trace tests and certain native backend tests will continue to use the Zig API and will not support the file-based definition format. Therefore, we will continue to support the necessary subset of the Zig API.
Note that changes to these tests will not gain the benefits of the above harness. For example, changes will trigger a full rebuild. They won’t be parallelized as well, etc. However, these tests tend to have very few cases and/or are updated very infrequently.
Test Definitions
Tests are now defined using files within a directory structure rather than a Zig API. This has benefits:
Directory Structure
The directory structure has no meaning to the test definitions. Any test can go into any directory, and multiple tests can go into a single directory. The test harness determines test mode and settings using the manifest (defined below). Why? Originally, Andrew and I felt the directory structure should imply some stuff such as backend (stage1 vs stage2). But we pulled back on this initial plan because many tests should behave the same on stage1 AND stage2, and incremental tests want to test multiple test modes.
However, we may wish to enforce directory idioms, such as all stage1-exclusive tests going into a
tests/stage1
directory. The test harness itself will not care about these details.The name is determined by the filename. It is an error to have duplicate test names. Therefore, two tests named
foo.zig
andfoo.S
would produce an error since they are both named “foo” despite having different extensions.Test Manifests
Manifest Syntax
The first line of the manifest is the test type.
Subsequent non-empty lines are
key=value
configuration for that test type.An empty line followed by additional data is “trailing” configuration that is dependent on the test type. For example, for
error
tests, it defines the expected error output to match.Manifest Location
The test manifest can be either embedded or adjacent. Only Zig and C files support embedded manifests.
A manifest must have a test. However, a test does not require a manifest. There a handful of implicit manifest conditions defined in a later section.
For embedded manifests, the manifest is the last comment in a Zig or C file. The comment is not stripped for test types that compile the source. Example, filename
my_test.zig
:Manifests can also be defined adjacent to a test. In this case, the manifest file must have the same filename as the test file and end in the
.manifest
suffix. Adjacent manifests are supported specifically for non-Zig tests and multi-file or incremental tests.Example:
my_test.zig
:my_test.manifest
:Implicit Manifests
Whilst a manifest requires a test, a test does not require a manifest. If a manifest is not found for a test file, a default
run
manifest is assumed (must compile and exit with exit code 0).Incremental Tests
Tests that test that the stage2 compiler can do incremental compilation have an additional special filename format: each incremental compilation step is suffixed numerically to denote ordering.
For example,
foo_0.zig
,foo_1.zig
would denote that the test named “foo” is incremental and will run test case 0 followed by test case 1 within the same stage2 compilation context. Each individual test case (foo_0
,foo_1
) can define their own test mode. This enables us to test that the incremental compiler handles errors after succeses and so on.For naming, the following test names will be created and filterable:
foo
incremental sequence and all cases within foo.It is an error for the following conditions:
foo_0
andfoo_2
without afoo_1
. Likewise, you cannot have afoo_1
without afoo_0
.foo_0.zig
and afoo.zig
. This is an error since it would create a duplicate test name “foo” whilst also being unclear to contributors._0
suffix. The_0
is not implied. For examplefoo.zig
andfoo_1.zig
do not comprise an incremental test case, the harness will see_1
as an incremental case with a hole missing_0
and error.Test Execution and Ordering
The test harness will do the following at a very high-level:
There is no guarantee on test ordering.
Test failures are output to the console immediately upon happening. Failures are not buffered in memory except for the minimal amount of memory needed to build the error message. In the case of OOM, a message will be outputted with the test name that failed but no further information (if it reached this point, the filename is already in-memory).
Backends
Backends are specified using the
backends
manifest configuration option that is available for all test types. This is a comma-separated list of the backends to test against. The backend can be prefixed with a!
to test all backends but exclude that specific backend.Supported backends:
stage1
stage2
- Stage2 using the default backend (LLVM or native).Native Backends
There are handful of tests for the native backends. These are primarily testing things that behavior tests test. According to Andrew, they’re historically only there for backends before they are able to execute behavior tests using at the least the simplified test runner. Therefore, the plan for now is to remove these since they’re tested via behavior tests, or keep them using the old Zig API.
Test Filtering
The
-Dtest-filter
build option will be used to filter tests by name.Different test targets from the
zig build
command will be used to filter by test types, i.e. error tests vs run tests.Test Types
Run Tests
Run tests build an exe and run it to completion. Any exit code other than zero is a test failure. Additionally, output can be specified and it will be asserted to match the output on stdout byte for byte.
The following type-specific manifest options are supported:
translate-c
(default: false) - If non-empty, C files will be processed throughtranslate-c
prior to being run. Without this, any C files will use the Zig compiler frontend for C (clang at the time of writing).Manifest trailing data is the stdout data to assert. stderr is not matched.
Run tests use the suffix of the associated test file to perform the proper build. For example,
.zig
files go through the normal Zig compilation pipeline..S
files are assembled and linked.Example, a run test that just asserts exit code 0 and ignores all output:
Example, a run test that asserts output in addition to the exit code being 0:
Error Tests
Error tests build an exe, obj, or lib and assert that specific errors exist. This only tests errors that happen during compilation and not during runtime. The built exe (if the output mode is exe) is not executed. Runtime errors should be checked using the
run
test type with a panic handler.The compiler is expected to error. If the compiler is subprocessed (i.e. testing stage1), then a non-zero exit code will be expected.
The following type-specific manifest options are supported:
output_mode
(default:obj
) - one ofexe
,lib
,obj
. This is the compilation output mode.is_test
(default: false) - non-empty specifies this hasis_test
set on the compilation mode.The manifest trailing data is the error output to assert. One error is expected per line, and will be matched using a string subset match. The subset match lets you omit things like filename. Non-empty trailing data is required.
Example:
Example that specifies an output mode:
Example that tests errors in test mode:
Translate C Tests
translate-c
tests test that a C file is translated into a specific Zig file by testing one or more substring matches against the resulting Zig source after translation.This test has no type-specific configuration.
The manifest trailing data are the one or more sets of strings to compare against. Each set is separated by a newline followed by a comma.
Example of a single match:
Example of multiple matches, in which case every match must found:
Deprecated or Ignored Test Types
The following test types will be deprecated since they are just a special case of the file-defineable test types. In all cases, these tests are simply migrated, not lost.
assemble_and_link
- This is just therun
test case above where the run input supports assembly files.runtime_safety
- This is a special case ofrun
.The following test types are ignored and kept as-is. They are tests that aren’t frequently updated and don’t have many cases:
stack_traces
standalone
- This looks like something we can convert to this new test suite later, but is ignored for now due to some complexities and details that we should probably iron out in a dedicated issue.Abandoned Ideas
The ideas below were initially discussed or existed but ultimately abandoned:
Explicit Test Dependencies
One configuration setting that can be specified in the manifest for all test types is an
after
setting. This is a comma-separated list of tests that you want to be executed first before your test.This is optional. Tests should not have side effects and therefore the ordreing should not matter. However, this feature can be used to point to other tests that test functionality that is used but not tested by the current test, therefore optimizing test suite speed.
CLI Tests
CLI tests execute specific
zig
CLI commands and assert exit code zero.CLI tests have no type-specific manifest options.
The manifest trailing data specifies one command to run per line (all assumed to be subcommands of
zig
). This lets you test multiple targets seprately. They are run in order. If no trailing data is specified,zig build test
is run.CLI tests can run against any Zig binary, so these are a good way to test consistent behavior against both a stage1 and stage2 binary build.
Example, with a custom target:
Example, running a file:
Example, running and building:
The text was updated successfully, but these errors were encountered: