Sourcify Tests #1305

mjoerussell · 2025-04-15T22:08:20Z

Run e2e tests on real-world contracts using the Sourcify dataset. This will eventually replace the Sanctuary tests.

Still to-do:

Better diagnostic messages when parsing fails (restore messages that we get in Sanctuary tests)
Test for more import resolution corner cases
CI Job

The test runner checks the sourcify manifest for a list of all shards for a particular chain, then downloads each sequentially, unpacks, and finally tests each source file. This is a very basic runner and there are a lot of improvements that can be made, plus a lot of missing features. But this is a pretty good start.

…imports * Reuse a buffer when reading source files for efficiency * Reorganize code

…e parallelized. Now they are using proper iterators, and `Contract::Item` is a struct which contains a file handle instead of the file contents. This means that we can still use a shared buffer to read all of the files, improving performance, without having to borrow the string through several layers of iterators. This was what prompted me to move away from "real" iterators previously. Like the sanctuary tests, I'm using `rayon` to parallize the runner. Here, contracts are processed in parallel, but the source files within a single contract are processed sequentially. Most contracts don't have a large number of source files and trying to parallelize them as well would be unnecessary complexity.

…t them "in the background". This means less time waiting in-between shards, which matters more now that processing each shard is internally parallelized. There could still be some improvements here, specifically with making the fetches truly async. Right now we're using `reqwest`, which requires `tokio` to be used as the async runtime. It's possible that we could find a different library that would allow us to make async calls using only the `futures` crate, which would be much better suited to our use case.

…al thread for it

…of how imports/files are resolved when building a compilation unit, since the previous way wasn't working in practice.

… specify a specific contract to test

* Allow the user to not include partial_match contracts. They are still included by default * Categorize contracts between full_match and partial_match

…ning up confusing code. * Fetch archives in the main thread, and process them in a separate thread. This means that the thread doesn't have to take ownership of the `Repository` instance, and so that will get dropped at the correct time. * Following that change, moved the fetching logic into a closure that we invoke immediately. This is all so that `tx` (Sender) can be dropped before calling `process_thread.join()`. Otherwise, the processing thread will get stuck waiting for a new message forever and will never be joined. * Removing all of the custom iterators that I built for `ContractArchive` and `Contract`. The `Contract` iterators were no longer being used since I started building compilation units by traversing the import tree. The `ContractArchive` iterator was replaced by a function `ContractArchive::contracts`, which returns an `impl Iterator`. I thought that this would be a much simpler way to express this logic, especially since the other custom iterator was removed. * Adding additional terminal output to inform the user about the progress of the test runner.

* Emit events report after testing is complete

changeset-bot · 2025-04-15T22:08:24Z

⚠️ No Changeset found

Latest commit: fb85549

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

mjoerussell · 2025-04-16T15:31:31Z

Just ran a parsing test locally on the entire Ethereum mainnet. Here are the high-level stats:

Source Files	3,149,211
Contracts Passed	820,565
Contracts Failed	4
Unresolved Imports	0

2 of the parse failures were related to multi-line comments. Example:

Error: Expected ContractKeyword or ImportKeyword or InterfaceKeyword or LibraryKeyword or PragmaKeyword.
     ╭─[Presscoins.sol:335:1]
     │
 335 │ ╭─▶ /** PressOnDemand is the first decentralized ondemand platform with in-Dapp store,
     ┆ ┆
 341 │ ├─▶   /*
     │ │
     │ ╰───────── Error occurred here.
─────╯

2 of the parse failures were on function arguments which used the indexed keyword twice. Example:

Error: Expected CloseParen or Comma.
     ╭─[MEXPToken.sol:207:32]
     │
 207 │     event Burn(address indexed indexed from, uint256 value);
     │                                ─────────────┬─────────────
     │                                             ╰─────────────── Error occurred here.
─────╯

… Sanctuary tests. There are 4 known ones in the Ethereum mainnet, but right now I only have the address for 1

… when there is no context

* Add GitHub workflow for sourcify tests

ggiraldez

I left some comments, especially important I think are the ones referring to reporting of errors for bindings and version inference.

But other than that, and more importantly, I struggled to follow the structure of the code and it's not immediately obvious from the code order and the current set of modules. In particular:

the Contract seems a central structure, yet its definition and implementation is buried in sourcify.rs, along with code specific to Sourcify structure (which is expected of course)
metadata.rs contains the JSON structure of the data returned by Sourcify, but also the main code to do file resolution is there
both JSON API structs as well as internal structs are mixed together and it's unclear what is constructed from a JSON and what is not
the scaffolding and infrastructure code seems fine, except the test code itself is located in main.rs; I'd extract that to its own module
the API for Repository, Shard and ContractArchive is confusing; I left a comment with a suggestion for an alternative structure

crates/solidity/testing/sourcify/src/command.rs

crates/solidity/testing/sourcify/src/compilation_builder.rs

crates/solidity/testing/sourcify/src/sourcify.rs

…r/more coherent API. `Repository` is gone, and `Manifest` has become the "entrypoint" to fetching data from Sourcify. 1. First, call `Manifest::new` to fetch `manifest.json` and get descriptors of all the archives that can be tested with the current configuration. 2. Next, call `manifest.archives()` to get an iterator over all the `ContractArchives` that can be fetched. Each `ArchiveDescriptor` corresponds to one `ContractArchive`. 3. `ContractArchive`'s API is mostly unchanged, from there you can iterate over all the contracts in the unpacked archive and test each of them. This is not _too_ different from the original API (if you squint), but it requires managing fewer structs and is overall cleaner.

…ort resolution + holding the `target` and `version` fields for a contract. Instead, I've moved those fields onto the `Contract` itself and renamed `ContractMetadata` to `ImportResolver`. This should simplify things and make it more clear what's responsible for what.

…ug()` instead of `!remap.has_known_bug()`. Refactoring `import_resolver.rs` a bit

… being the biggest change

…just allocate a new string for the file contents instead.

…s are being written to the project's `target` directory

…t of these file reads are happening when printing error diagnostics, which is not something we want to early-return on. If a file read fails, we can simply skip reporting that error and try the next one.

ggiraldez

Thank you for the refactors. The code is a lot clearer now. I still left a couple more questions and requests.

crates/solidity/testing/sourcify/src/main.rs

crates/solidity/testing/sourcify/src/import_resolver.rs

ggiraldez · 2025-04-23T13:54:16Z

crates/solidity/testing/sourcify/src/import_resolver.rs

+                    // Sometimes imports from URL-imports don't share the URL prefix
+                    self.get_real_name(import_path)


Q: Shouldn't this case be covered by the check for path_is_url(import_path) above?

The first check is for when the import path is a URL, the highlighted check is for when the source path is a URL but the import path isn't.

Right, but if the source is a URL, then the import path should either be:

a relative path, which should be resolved correctly in resolve_relative_url_import above, or

an absolute URL path/@-path which should be covered by the ifs above

If we got here, it means the import is an absolute path (or a relative path that couldn't be resolved correctly), which for a URL source doesn't make sense to try to resolve as a local path. This may very well be a quirk with solc, but that's why it got my attention.

Not necessarily. I think I misunderstood what you were asking before so let me explain the issue:

There are (as I've seen) two possibilities when you're resolving a relative import path inside a URL source. Let's say the source path is https://github.com/@library/a/b/main.sol, and the import path is ../c/other.sol. The import's virtual path might be either of these:

https://github.com/@library/a/c/other.sol

a/c/other.sol

The call to resolve_relative_url_import produces the first version. It does the regular path resolution, ignoring the host, and then attaches the host to the result. This fallback is for the second case. The second case is fairly rare but it does happen, and I haven't noticed a pattern as to why.

crates/solidity/testing/sourcify/src/import_resolver.rs

crates/solidity/testing/sourcify/src/sourcify.rs

crates/solidity/testing/sourcify/src/main.rs

crates/solidity/testing/sourcify/src/sourcify.rs

ggiraldez · 2025-04-23T14:50:00Z

crates/solidity/testing/sourcify/src/sourcify.rs

+        fs::read_dir(&self.sources_path)
+            .map(|i| i.count())
+            .unwrap_or(0)


You should be able to get this value from the import_resolver's source_maps and avoid the I/O operation.

* Refactor a bit in ArchiveDescriptor to make things clearer

* Refactor `ImportResolver`

ggiraldez

Looks great! I left one more question, but it's non-blocking IMO.

ggiraldez · 2025-04-25T03:28:41Z

crates/solidity/testing/sourcify/src/main.rs

+    let fetcher = |t: std::sync::mpsc::Sender<ContractArchive>| {
+        for archive_desc in manifest.archives() {
+            let Ok(archive) = ContractArchive::fetch(archive_desc) else {
+                continue;


Should we somehow indicate the I/O error? Otherwise it's possible the run succeeds, but in reality it didn't run any tests.

…t chains * Updated the available chains - added "base" and removed some others, leaving only a few key ones that we want to test

github-actions added 18 commits March 31, 2025 08:12

Starting to add tests for sourcify

b318f2f

* Build a CompilationUnit from source files in a contract, resolving …

b6e12d5

…imports * Reuse a buffer when reading source files for efficiency * Reorganize code

Use json feature of reqwest

f2fa3eb

Fetch archive data on the main thread instead of spawning an addition…

b30c5ac

…al thread for it

Starting to add bindings tests. As part of this, I'm reworking a lot …

b9ff97f

…of how imports/files are resolved when building a compilation unit, since the previous way wasn't working in practice.

Fix path resolution when the source file was imported with a URL

1ca5524

Add sharding options, which are not used yet. Also add the ability to…

9b2fa58

… specify a specific contract to test

* Use sharding options

ad3227a

* Allow the user to not include partial_match contracts. They are still included by default * Categorize contracts between full_match and partial_match

Run infra lint

3e7673e

* Remove some unused code

67f316c

* Emit events report after testing is complete

Add ShowCombinedResultsCommand from Sanctuary tests

1d2208c

Add more chains

86296ce

Fix corner case bugs in import resolution

f357d74

Run infra lint

b4ba88b

Improve error reporting for parse errors

89eaa28

github-actions added 7 commits April 16, 2025 10:33

Remove unused function

a6051a5

Merge branch 'main' into feature/sourcify

dca9529

Fix clippy lint errors

e231037

Add the ability to skip contracts with known parser bugs, like in the…

2785daa

… Sanctuary tests. There are 4 known ones in the Ethereum mainnet, but right now I only have the address for 1

Fix a bug in ImportRemap, matches_context should always return true…

7805cea

… when there is no context

* Add --check-infer-version like in the Sanctuary tests

9768423

* Add GitHub workflow for sourcify tests

Run infra lint

594d9c2

mjoerussell marked this pull request as ready for review April 16, 2025 22:16

mjoerussell requested a review from a team as a code owner April 16, 2025 22:16

ggiraldez requested changes Apr 17, 2025

View reviewed changes

github-actions added 13 commits April 18, 2025 09:53

Better error messages when version inference tests fail

5961436

Better error messages for bindings errors

71897fc

Fix bug where import remaps were being filtered on `remap.has_known_b…

1fb4b3f

…ug()` instead of `!remap.has_known_bug()`. Refactoring `import_resolver.rs` a bit

Refactor CompilationBuilder a bit, using a HashSet for seen files…

bd3ae92

… being the biggest change

Fix lint errors

38cc787

Merge branch 'main' into feature/sourcify

9737901

Change Contract::read_file to not take an output buffer parameter, …

511f4d4

…just allocate a new string for the file contents instead.

Use CargoWorkspace::locate_source_crate to ensure that archive file…

98ed7fd

…s are being written to the project's `target` directory

Don't make the checks return Err if contract.read_file fails. Mos…

77305a3

…t of these file reads are happening when printing error diagnostics, which is not something we want to early-return on. If a file read fails, we can simply skip reporting that error and try the next one.

Move tests out of main.rs and into a new dedicated file.

eca990a

Fix lint issues

3dfdb99

ggiraldez requested changes Apr 23, 2025

View reviewed changes

github-actions added 7 commits April 23, 2025 14:19

* Make archive fetching more explicit

812a389

* Refactor a bit in ArchiveDescriptor to make things clearer

* Rename SourceMap::real_name to SourceMap::source_id

7b86350

* Refactor `ImportResolver`

More ImportResolver refactors, trying to make everything clearer

cc4a04e

Fix shard range

4a000be

Fix lint issues

3384ba0

Quick refactor in test_single_contract

1169c3c

Use url crate to parse/check url imports

6780897

mjoerussell requested a review from ggiraldez April 23, 2025 21:25

Fix lint issues

e93a941

ggiraldez approved these changes Apr 25, 2025

View reviewed changes

github-actions added 5 commits April 25, 2025 09:03

Print error message if fetching a contract archive fails

10f8e6c

Add all known bugged contracts to exceptions list

ea02ef7

* Added a workflow which runs sourcify tests against several differen…

224a119

…t chains * Updated the available chains - added "base" and removed some others, leaving only a few key ones that we want to test

Fix error diagnostics for bindings

17d4975

Add exception for unbound identifiers in pragma abicoder

fb85549

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sourcify Tests #1305

Sourcify Tests #1305

mjoerussell commented Apr 15, 2025 •

edited

Loading

changeset-bot bot commented Apr 15, 2025 •

edited

Loading

mjoerussell commented Apr 16, 2025

ggiraldez left a comment

ggiraldez left a comment

ggiraldez Apr 23, 2025

mjoerussell Apr 23, 2025

ggiraldez Apr 25, 2025

mjoerussell Apr 25, 2025

ggiraldez Apr 23, 2025

ggiraldez left a comment

ggiraldez Apr 25, 2025

		// Sometimes imports from URL-imports don't share the URL prefix
		self.get_real_name(import_path)

Sourcify Tests #1305

Are you sure you want to change the base?

Sourcify Tests #1305

Conversation

mjoerussell commented Apr 15, 2025 • edited Loading

changeset-bot bot commented Apr 15, 2025 • edited Loading

⚠️ No Changeset found

mjoerussell commented Apr 16, 2025

ggiraldez left a comment

Choose a reason for hiding this comment

ggiraldez left a comment

Choose a reason for hiding this comment

ggiraldez Apr 23, 2025

Choose a reason for hiding this comment

mjoerussell Apr 23, 2025

Choose a reason for hiding this comment

ggiraldez Apr 25, 2025

Choose a reason for hiding this comment

mjoerussell Apr 25, 2025

Choose a reason for hiding this comment

ggiraldez Apr 23, 2025

Choose a reason for hiding this comment

ggiraldez left a comment

Choose a reason for hiding this comment

ggiraldez Apr 25, 2025

Choose a reason for hiding this comment

mjoerussell commented Apr 15, 2025 •

edited

Loading

changeset-bot bot commented Apr 15, 2025 •

edited

Loading