Skip to content

Sourcify Tests #1305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 52 commits into
base: main
Choose a base branch
from
Open

Conversation

mjoerussell
Copy link
Contributor

@mjoerussell mjoerussell commented Apr 15, 2025

Run e2e tests on real-world contracts using the Sourcify dataset. This will eventually replace the Sanctuary tests.

Still to-do:

  • Better diagnostic messages when parsing fails (restore messages that we get in Sanctuary tests)
  • Test for more import resolution corner cases
  • CI Job

github-actions added 18 commits March 31, 2025 08:12
The test runner checks the sourcify manifest for a list of all shards for a particular chain, then downloads each sequentially, unpacks, and finally tests each source file.

This is a very basic runner and there are a lot of improvements that can be made, plus a lot of missing features. But this is a pretty good start.
…imports

* Reuse a buffer when reading source files for efficiency
* Reorganize code
…e parallelized. Now they are using proper iterators, and `Contract::Item` is a struct which contains a file handle instead of the file contents. This means that we can still use a shared buffer to read all of the files, improving performance, without having to borrow the string through several layers of iterators. This was what prompted me to move away from "real" iterators previously.

Like the sanctuary tests, I'm using `rayon` to parallize the runner. Here, contracts are processed in parallel, but the source files within a single contract are processed sequentially. Most contracts don't have a large number of source files and trying to parallelize them as well would be unnecessary complexity.
…t them "in the background". This means less time waiting in-between shards, which matters more now that processing each shard is internally parallelized.

There could still be some improvements here, specifically with making the fetches truly async. Right now we're using `reqwest`, which requires `tokio` to be used as the async runtime. It's possible that we could find a different library that would allow us to make async calls using only the `futures` crate, which would be much better suited to our use case.
…of how imports/files are resolved when building a compilation unit, since the previous way wasn't working in practice.
* Allow the user to not include partial_match contracts. They are still included by default
* Categorize contracts between full_match and partial_match
…ning up confusing code.

* Fetch archives in the main thread, and process them in a separate thread. This means that the thread doesn't have to take ownership of the `Repository` instance, and so that will get dropped at the correct time.
* Following that change, moved the fetching logic into a closure that we invoke immediately. This is all so that `tx` (Sender) can be dropped before calling `process_thread.join()`. Otherwise, the processing thread will get stuck waiting for a new message forever and will never be joined.
* Removing all of the custom iterators that I built for `ContractArchive` and `Contract`. The `Contract` iterators were no longer being used since I started building compilation units by traversing the import tree. The `ContractArchive` iterator was replaced by a function `ContractArchive::contracts`, which returns an `impl Iterator`. I thought that this would be a much simpler way to express this logic, especially since the other custom iterator was removed.
* Adding additional terminal output to inform the user about the progress of the test runner.
* Emit events report after testing is complete
Copy link

changeset-bot bot commented Apr 15, 2025

⚠️ No Changeset found

Latest commit: fb85549

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@mjoerussell
Copy link
Contributor Author

Just ran a parsing test locally on the entire Ethereum mainnet. Here are the high-level stats:

Source Files3,149,211
Contracts Passed820,565
Contracts Failed4
Unresolved Imports0
  • 2 of the parse failures were related to multi-line comments. Example:
Error: Expected ContractKeyword or ImportKeyword or InterfaceKeyword or LibraryKeyword or PragmaKeyword.
     ╭─[Presscoins.sol:335:1]
     │
 335 │ ╭─▶ /** PressOnDemand is the first decentralized ondemand platform with in-Dapp store,
     ┆ ┆
 341 │ ├─▶   /*
     │ │
     │ ╰───────── Error occurred here.
─────╯
  • 2 of the parse failures were on function arguments which used the indexed keyword twice. Example:
Error: Expected CloseParen or Comma.
     ╭─[MEXPToken.sol:207:32]
     │
 207 │     event Burn(address indexed indexed from, uint256 value);
     │                                ─────────────┬─────────────
     │                                             ╰─────────────── Error occurred here.
─────╯

@mjoerussell mjoerussell marked this pull request as ready for review April 16, 2025 22:16
@mjoerussell mjoerussell requested a review from a team as a code owner April 16, 2025 22:16
Copy link
Contributor

@ggiraldez ggiraldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, especially important I think are the ones referring to reporting of errors for bindings and version inference.

But other than that, and more importantly, I struggled to follow the structure of the code and it's not immediately obvious from the code order and the current set of modules. In particular:

  • the Contract seems a central structure, yet its definition and implementation is buried in sourcify.rs, along with code specific to Sourcify structure (which is expected of course)
  • metadata.rs contains the JSON structure of the data returned by Sourcify, but also the main code to do file resolution is there
  • both JSON API structs as well as internal structs are mixed together and it's unclear what is constructed from a JSON and what is not
  • the scaffolding and infrastructure code seems fine, except the test code itself is located in main.rs; I'd extract that to its own module
  • the API for Repository, Shard and ContractArchive is confusing; I left a comment with a suggestion for an alternative structure

github-actions added 13 commits April 18, 2025 09:53
…r/more coherent API. `Repository` is gone, and `Manifest` has become the "entrypoint" to fetching data from Sourcify.

1. First, call `Manifest::new` to fetch `manifest.json` and get descriptors of all the archives that can be tested with the current configuration.
2. Next, call `manifest.archives()` to get an iterator over all the `ContractArchives` that can be fetched. Each `ArchiveDescriptor` corresponds to one `ContractArchive`.
3. `ContractArchive`'s API is mostly unchanged, from there you can iterate over all the contracts in the unpacked archive and test each of them.

This is not _too_ different from the original API (if you squint), but it requires managing fewer structs and is overall cleaner.
…ort resolution + holding the `target` and `version` fields for a contract. Instead, I've moved those fields onto the `Contract` itself and renamed `ContractMetadata` to `ImportResolver`. This should simplify things and make it more clear what's responsible for what.
…ug()` instead of `!remap.has_known_bug()`.

Refactoring `import_resolver.rs` a bit
…just allocate a new string for the file contents instead.
…s are being written to the project's `target` directory
…t of these file reads are happening when printing error diagnostics, which is not something we want to early-return on. If a file read fails, we can simply skip reporting that error and try the next one.
Copy link
Contributor

@ggiraldez ggiraldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the refactors. The code is a lot clearer now. I still left a couple more questions and requests.

Comment on lines 43 to 44
// Sometimes imports from URL-imports don't share the URL prefix
self.get_real_name(import_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Shouldn't this case be covered by the check for path_is_url(import_path) above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first check is for when the import path is a URL, the highlighted check is for when the source path is a URL but the import path isn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but if the source is a URL, then the import path should either be:

  • a relative path, which should be resolved correctly in resolve_relative_url_import above, or
  • an absolute URL path/@-path which should be covered by the ifs above

If we got here, it means the import is an absolute path (or a relative path that couldn't be resolved correctly), which for a URL source doesn't make sense to try to resolve as a local path. This may very well be a quirk with solc, but that's why it got my attention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. I think I misunderstood what you were asking before so let me explain the issue:

There are (as I've seen) two possibilities when you're resolving a relative import path inside a URL source. Let's say the source path is https://github.com/@library/a/b/main.sol, and the import path is ../c/other.sol. The import's virtual path might be either of these:

  1. https://github.com/@library/a/c/other.sol
  2. a/c/other.sol

The call to resolve_relative_url_import produces the first version. It does the regular path resolution, ignoring the host, and then attaches the host to the result. This fallback is for the second case. The second case is fairly rare but it does happen, and I haven't noticed a pattern as to why.

Comment on lines 326 to 328
fs::read_dir(&self.sources_path)
.map(|i| i.count())
.unwrap_or(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to get this value from the import_resolver's source_maps and avoid the I/O operation.

@mjoerussell mjoerussell requested a review from ggiraldez April 23, 2025 21:25
Copy link
Contributor

@ggiraldez ggiraldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I left one more question, but it's non-blocking IMO.

let fetcher = |t: std::sync::mpsc::Sender<ContractArchive>| {
for archive_desc in manifest.archives() {
let Ok(archive) = ContractArchive::fetch(archive_desc) else {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we somehow indicate the I/O error? Otherwise it's possible the run succeeds, but in reality it didn't run any tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants