-
Notifications
You must be signed in to change notification settings - Fork 36
Sourcify Tests #1305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Sourcify Tests #1305
Conversation
The test runner checks the sourcify manifest for a list of all shards for a particular chain, then downloads each sequentially, unpacks, and finally tests each source file. This is a very basic runner and there are a lot of improvements that can be made, plus a lot of missing features. But this is a pretty good start.
…imports * Reuse a buffer when reading source files for efficiency * Reorganize code
…e parallelized. Now they are using proper iterators, and `Contract::Item` is a struct which contains a file handle instead of the file contents. This means that we can still use a shared buffer to read all of the files, improving performance, without having to borrow the string through several layers of iterators. This was what prompted me to move away from "real" iterators previously. Like the sanctuary tests, I'm using `rayon` to parallize the runner. Here, contracts are processed in parallel, but the source files within a single contract are processed sequentially. Most contracts don't have a large number of source files and trying to parallelize them as well would be unnecessary complexity.
…t them "in the background". This means less time waiting in-between shards, which matters more now that processing each shard is internally parallelized. There could still be some improvements here, specifically with making the fetches truly async. Right now we're using `reqwest`, which requires `tokio` to be used as the async runtime. It's possible that we could find a different library that would allow us to make async calls using only the `futures` crate, which would be much better suited to our use case.
…of how imports/files are resolved when building a compilation unit, since the previous way wasn't working in practice.
… specify a specific contract to test
* Allow the user to not include partial_match contracts. They are still included by default * Categorize contracts between full_match and partial_match
…ning up confusing code. * Fetch archives in the main thread, and process them in a separate thread. This means that the thread doesn't have to take ownership of the `Repository` instance, and so that will get dropped at the correct time. * Following that change, moved the fetching logic into a closure that we invoke immediately. This is all so that `tx` (Sender) can be dropped before calling `process_thread.join()`. Otherwise, the processing thread will get stuck waiting for a new message forever and will never be joined. * Removing all of the custom iterators that I built for `ContractArchive` and `Contract`. The `Contract` iterators were no longer being used since I started building compilation units by traversing the import tree. The `ContractArchive` iterator was replaced by a function `ContractArchive::contracts`, which returns an `impl Iterator`. I thought that this would be a much simpler way to express this logic, especially since the other custom iterator was removed. * Adding additional terminal output to inform the user about the progress of the test runner.
* Emit events report after testing is complete
|
Just ran a parsing test locally on the entire Ethereum mainnet. Here are the high-level stats:
|
… Sanctuary tests. There are 4 known ones in the Ethereum mainnet, but right now I only have the address for 1
… when there is no context
* Add GitHub workflow for sourcify tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, especially important I think are the ones referring to reporting of errors for bindings and version inference.
But other than that, and more importantly, I struggled to follow the structure of the code and it's not immediately obvious from the code order and the current set of modules. In particular:
- the
Contract
seems a central structure, yet its definition and implementation is buried insourcify.rs
, along with code specific to Sourcify structure (which is expected of course) metadata.rs
contains the JSON structure of the data returned by Sourcify, but also the main code to do file resolution is there- both JSON API structs as well as internal structs are mixed together and it's unclear what is constructed from a JSON and what is not
- the scaffolding and infrastructure code seems fine, except the test code itself is located in
main.rs
; I'd extract that to its own module - the API for
Repository
,Shard
andContractArchive
is confusing; I left a comment with a suggestion for an alternative structure
…r/more coherent API. `Repository` is gone, and `Manifest` has become the "entrypoint" to fetching data from Sourcify. 1. First, call `Manifest::new` to fetch `manifest.json` and get descriptors of all the archives that can be tested with the current configuration. 2. Next, call `manifest.archives()` to get an iterator over all the `ContractArchives` that can be fetched. Each `ArchiveDescriptor` corresponds to one `ContractArchive`. 3. `ContractArchive`'s API is mostly unchanged, from there you can iterate over all the contracts in the unpacked archive and test each of them. This is not _too_ different from the original API (if you squint), but it requires managing fewer structs and is overall cleaner.
…ort resolution + holding the `target` and `version` fields for a contract. Instead, I've moved those fields onto the `Contract` itself and renamed `ContractMetadata` to `ImportResolver`. This should simplify things and make it more clear what's responsible for what.
…ug()` instead of `!remap.has_known_bug()`. Refactoring `import_resolver.rs` a bit
… being the biggest change
…just allocate a new string for the file contents instead.
…s are being written to the project's `target` directory
…t of these file reads are happening when printing error diagnostics, which is not something we want to early-return on. If a file read fails, we can simply skip reporting that error and try the next one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the refactors. The code is a lot clearer now. I still left a couple more questions and requests.
// Sometimes imports from URL-imports don't share the URL prefix | ||
self.get_real_name(import_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Shouldn't this case be covered by the check for path_is_url(import_path)
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first check is for when the import path is a URL, the highlighted check is for when the source path is a URL but the import path isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but if the source is a URL, then the import path should either be:
- a relative path, which should be resolved correctly in
resolve_relative_url_import
above, or - an absolute URL path/@-path which should be covered by the ifs above
If we got here, it means the import is an absolute path (or a relative path that couldn't be resolved correctly), which for a URL source doesn't make sense to try to resolve as a local path. This may very well be a quirk with solc, but that's why it got my attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. I think I misunderstood what you were asking before so let me explain the issue:
There are (as I've seen) two possibilities when you're resolving a relative import path inside a URL source. Let's say the source path is https://github.com/@library/a/b/main.sol
, and the import path is ../c/other.sol
. The import's virtual path might be either of these:
https://github.com/@library/a/c/other.sol
a/c/other.sol
The call to resolve_relative_url_import
produces the first version. It does the regular path resolution, ignoring the host, and then attaches the host to the result. This fallback is for the second case. The second case is fairly rare but it does happen, and I haven't noticed a pattern as to why.
fs::read_dir(&self.sources_path) | ||
.map(|i| i.count()) | ||
.unwrap_or(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to get this value from the import_resolver
's source_maps
and avoid the I/O operation.
* Refactor a bit in ArchiveDescriptor to make things clearer
* Refactor `ImportResolver`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I left one more question, but it's non-blocking IMO.
let fetcher = |t: std::sync::mpsc::Sender<ContractArchive>| { | ||
for archive_desc in manifest.archives() { | ||
let Ok(archive) = ContractArchive::fetch(archive_desc) else { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we somehow indicate the I/O error? Otherwise it's possible the run succeeds, but in reality it didn't run any tests.
…t chains * Updated the available chains - added "base" and removed some others, leaving only a few key ones that we want to test
Run e2e tests on real-world contracts using the Sourcify dataset. This will eventually replace the Sanctuary tests.
Still to-do: