-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial ValidationOracle implementation #331
Conversation
202904b
to
5d9a4ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few funky things. Generally I think it's okay, but I flew through it pretty quickly.
With the move of the accumulator into the history network, we may not actually need the cross-network validation super soon, but I think we'll want it eventually anyway.
src/lib.rs
Outdated
.iter() | ||
.any(|val| val == HISTORY_NETWORK) | ||
{ | ||
initialize_history_network( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Was the order swap necessary? If not, it would be nice to leave it after state network just for the cleaner diff.
@@ -117,6 +117,7 @@ pub struct PortalnetConfig { | |||
pub data_radius: U256, | |||
pub no_stun: bool, | |||
pub enable_metrics: bool, | |||
pub infura_url: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird that this url is in the config, but then not used in the get_infura_url()
. Is this something old and leftover that can be removed?
trin-history/src/lib.rs
Outdated
// Currently running HistoryNetwork solo will not perform any validation, | ||
// since supporting running individual networks is not currently a priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand this, and I don't see how it's shown in the code here. You're saying we will not attempt to run validation if only the history network is running? So will we reject all content? Could you re-explain in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added an infura project id requirement for running the history network solo, so now it'll perform validation normally.
trin-history/src/validation.rs
Outdated
let rlp = Rlp::new(&header_rlp); | ||
|
||
let header: Header = Header::decode_rlp(&rlp).expect("error decoding header"); | ||
let infura_url = get_infura_url(&infura_project_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I reading this right that our tests would hit Infura here? Seems unwise to link our CI to Infura uptime (and to rack up Infura requests for each run). Would be good to run a slightly more isolated test.
trin-history/src/validation.rs
Outdated
let header = Header::decode_rlp(&rlp).expect("invalid header"); | ||
let number = format!("0x{:02X}", header.number); | ||
let expected_hash = self.validation_oracle.get_hash_at_height(number).unwrap(); | ||
assert_eq!(hex::encode(key.block_hash), expected_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, panic all of trin
on bad content? I was expecting an Err
instead.
Looks like we should add a test where the content is wrong, too.
self.validator | ||
.validate_content(ck_clone, content_clone) | ||
.await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here is where I expect an Err
check and then an early return before storage if it was invalid.
8e6ec9b
to
cf8355f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve of the general pattern of a Validator
instance per network, containing shared Oracle
instances which are each able to validate a specific piece of network data. 👍
In your initial comment, the checklist says that validation is currently only done when requesting on-demand data (item 2), but it seems like the code is only validating content before storing in the DB (item 1). Not sure if that's a mistake or misunderstanding on my part.
src/lib.rs
Outdated
utp_listener_tx, | ||
portalnet_config.clone(), | ||
storage_config.clone(), | ||
header_oracle.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't matter currently with only history network being given the header_oracle
, but won't clone()
ing the oracle result in us having one copy per network, each with only the jsonrpc_tx
of only that network and thus unable to do cross-network validation? I'd think we want to pass an Arc<HeaderOracle>
around instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good call, I had some contrived idea of how this was going to happen with a sort of path-dependent initialization process, but this is a much better solution - though had to go with Arc<RwLock<HeaderOracle>>
to be able to get a mut ref.
Err(_) => error!("Content received, but not stored: Error writing to db."), | ||
}, | ||
false => debug!( | ||
"Content received, but not stored: Distance falls outside current radius." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Technically this could be because it's outside our radius or we're already storing it. It could get confusing that currently there's no way to distinguish between these two reasons.
pub history_jsonrpc_tx: Option<tokio::sync::mpsc::UnboundedSender<HistoryJsonRpcRequest>>, | ||
pub header_gossip_jsonrpc_tx: Option<bool>, | ||
pub block_indices_jsonrpc_tx: Option<bool>, | ||
pub state_jsonrpc_tx: Option<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a scenario in which we need to talk to the state network in order to validate a header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not off the top of my head. So I understand that this field (along with header_gossip_jsonrpc_tx
and block_indices_jsonrpc_tx
) are not necessary atm, but personally I don't mind leaving them in there to maintain some context for the object's purpose
Ok(should_store) => match should_store { | ||
true => match self.storage.write().store(&content_key, &content.into()) { | ||
Ok(_) => (), | ||
Err(_) => error!("Content received, but not stored: Error writing to db."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Err(_) => error!("Content received, but not stored: Error writing to db."), | |
Err(err) => error!(format!("Content received, but not stored due to: {err}."), |
trin-core/src/types/validation.rs
Outdated
|
||
// This is a mock Validator for use in tests where no validation is required. | ||
pub struct IdentityValidator { | ||
pub header_oracle: HeaderOracle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this needs to have a HeaderOracle
, instead of nothing?
pub history_jsonrpc_tx: Option<tokio::sync::mpsc::UnboundedSender<HistoryJsonRpcRequest>>, | ||
pub header_gossip_jsonrpc_tx: Option<bool>, | ||
pub block_indices_jsonrpc_tx: Option<bool>, | ||
pub state_jsonrpc_tx: Option<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with having these individually instead of the main portal jsonrpc tx channel for the reason mentioned in the comment, but down the line maybe we want to wrap these jsonrpc_tx
s up into a wrapper struct to avoid duplicating the same four jsonrpc_tx
members in the BlockBodyOracle
and BlockReceiptsOracle
. Though that's only if all oracles need access to all networks. If not, I think the clearest thing would be to have each oracle have access to the minimum number of networks necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be some confusion in how I explained this, but the idea was to only have a single "Oracle" - the object that has access to the various tx channels. This will be shared (via Arc<RwLock>
) between the different "Validators" (eg. BlockBodyValidator
and ReceiptsValidator
) which will contain all the type-specific logic for validation. Iiuc - then the HeaderOracle
is already behaving like the wrapper struct you're describing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not seeing why there would be a BlockBodyValidator
and ReceiptsValidator
(instead of BlockBodyOracle
and ReceiptsOracle
) when currently there's ChainHistoryValidator
and StateValidator
. Seems like it's currently set up so that each network has one validator, and each data type has an oracle, but you're saying there will be a single oracle (in which case, HeaderOracle
sounds like it's overly specific to headers) and a validator per data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, my bad I got mixed up. You're right. The ChainHistoryValidator
will do all validation for chain history network types (headers, bodies, and receipts). But, yeah the name HeaderOracle
isn't necessarily an oracle for just the BlockHeader
type. HeaderOracle
refers more to the idea of an oracle that can answer questions like is_hash_canonical
, get_hash_at_height
, get_latest_header
(etc). Which might be used to validate state network data for example. This is one reason why I considered the name ValidationOracle
- which seems like a more generic name, but since we've used HeaderOracle
in discussion I went that way. However, we don't need a BlockBodyOracle
and ReceiptsOracle
- we can simply use the HeaderOracle
to validate those data types.
// for data to perform validation. Currently, it just proxies these requests | ||
// on to infura. | ||
#[derive(Clone)] | ||
pub struct HeaderOracle { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this BlockHeaderOracle
to be consistent with other places where BlockHeader
is used instead of just Header
, like ContentKey::BlockHeader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in conversation during devconnect/syncs we've been throwing around this HeaderOracle
term, which is a main reason I chose that name. I also considered ValidationOracle
. Imo the "block" part of "BlockHeader" is somewhat superfluous since there is only one kind of header, therefore no need to distinguish between header types. Also, ContentKey::BlockHeader
is a defined type from the specs, while this oracle object isn't strictly defined, a bit more general in purpose, and could end up performing various non-header related functions down the road. These are just some thoughts I had for why I went with HeaderOracle
, but definitely lmk if you still feel like it's much more useful for understanding to go with BlockHeaderOracle
- I'm definitely open to the change
229641d
to
7b81338
Compare
@mrferris Gotcha, In my head # 1 was a bit more general ( and probably redundant now that you point this out) referring to all instances in which we'd store something in the db (findcontent / findcontent via utp / offer / direct injection). Since those aren't all complete, I left that checkmark blank for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not been able to come up with a better way to perform validation, but I can't help but feel like 3 PhantomData
fields in OverlayProtocol
is something indicative of an anti-pattern.
The only structural alternative I can imagine is some channel to bubble up unvalidated content items from the overlay, and some channel to send validation confirmations back down to the overlay.
Another thought I had was the addition of validation metadata to the OverlayContentKey
, such as which other content keys are necessary to validate this content key, but this would potentially require that different implementations of OverlayContentKey
be aware of each other, which we don't want. In general, I was thinking about ways to make the OverlayContentKey
trait provide richer behavior. I would be interested to know if you have any ideas on that topic.
If possible, I would love to stay away from async-trait
, but bypassing that might require some unwelcome complexity.
trin-core/src/types/validation.rs
Outdated
// This is a mock Validator for use in tests where no validation is required. | ||
pub struct IdentityValidator {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the "identity" naming makes sense in this context. I named IdentityContentKey
that way because the function that maps the content key to the content ID is the identity function.
Perhaps a more appropriate name here would be something like InerrantValidator
, because it will always return Ok(..)
. The implementation could define this behavior for all implementors of OverlayContentKey
.
trin-core/src/types/validation.rs
Outdated
} | ||
} | ||
|
||
// This trait is used by all overlay-network Validators to validate content in the overlay service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Doc comments meant to describe types begin with 3 slashes: ///
. This nit concerns several definitions in the changes here. For the description of a trait, we can omit "this trait" from the description.
}; | ||
let content_key = match TContentKey::try_from(find_content.content_key) { | ||
Ok(val) => val, | ||
Err(_) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below we should log the specific error that we encounter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the one below, but I don't think it's currently possible with the message on 971
- due to this bug. Basically the compiler bugs out and says Debug
isn't implemented for the error message, but Debug
is most definitely implemented for an anyhow Error. Same compiler error happens up in handle_find_content()
when we decode a content key there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm well the compiler only knows that TContentKey
implements OverlayContentKey
, but in the trait definition for OverlayContentKey
, we do not specify that the Error
associated type for TryFrom
implements Debug
. I think we could fix this with a trait bound on the associated type.
Ok(should_store) => match should_store { | ||
true => match self.storage.write().store(&content_key, &content.into()) { | ||
Ok(_) => (), | ||
Err(msg) => error!("{}", format!("Content received, but not stored: {msg}")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: error!
can take a format string as an argument, so we don't need a separate format!
.
impl< | ||
TContentKey: OverlayContentKey + Send + Sync, | ||
TMetric: Metric + Send, | ||
TValidator: 'static + Validator<TContentKey> + Send, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this 'static
lifetime if possible. I encountered something like that for TContentKey
and TMetric
, but I was able to remove it. I can't remember what was required, so I'm not sure whether it would work for TValidator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, async_trait
makes this difficult if not impossible
let find_content = match request { | ||
Request::FindContent(find_content) => find_content, | ||
_ => { | ||
error!("Unable to process received content: Invalid request message."); | ||
return; | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This far in the processing pipeline, we shouldn't have to perform this match. If we are in this function, we should be sure that we have a FindContent
request. We could perform that matching further upstream in the pipeline.
Yeah I can see that, but I don't have any strong instinct as to why it's problematic now or down the road.
Iiuc, something like this was considered. But due to the requirement that we want to validate content that we've
I considered this as well. It would have been lovely to essentially merge the
I'm just not sure if this is possible. Requests to networks for data used in validation should be async imo. But I'm curious why you want to stay away from |
If the My aversion to |
return; | ||
} | ||
}; | ||
match self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This match can be rewritten as:
if let Err(err) = self.validator.validate_content(7content_key, &content).await {
error!("Unable to validate received content: {msg:?}");
return;
}
} | ||
} | ||
match self.storage.read().should_store(&content_key) { | ||
Ok(should_store) => match should_store { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should_store
returns a boolean so we can say if should_store {.....} else {....}
} | ||
match self.storage.read().should_store(&content_key) { | ||
Ok(should_store) => match should_store { | ||
true => match self.storage.write().store(&content_key, &content.into()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true => match self.storage.write().store(&content_key, &content.into()) { | |
true => if let Err(err) = self.storage.write().store(&content_key, &content.into()) { | |
error!("Content received, but not stored: {msg}") | |
} |
trin-core/src/types/validation.rs
Outdated
/// Responsible for dispatching cross-overlay-network requests | ||
/// for data to perform validation. Currently, it just proxies these requests | ||
/// on to infura. | ||
#[derive(Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make some kind of a rule for every struct to always implement derive(Debug)
, It is quite useful when logging/debugging.
trin-core/src/types/validation.rs
Outdated
impl Default for HeaderOracle { | ||
fn default() -> Self { | ||
Self { | ||
infura_url: "https://mainnet.infura.io:443/v3/".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add this URL as a constant at the beginning of the file?
trin-core/src/utils/infura.rs
Outdated
), | ||
}; | ||
|
||
format!("https://mainnet.infura.io:443/v3/{}", infura_project_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we define the default Infura URL as a constant, we can use it here too
trin-history/src/validation.rs
Outdated
use trin_core::portalnet::types::content_key::HistoryContentKey; | ||
use trin_core::portalnet::types::messages::ByteList; | ||
use trin_core::types::header::Header; | ||
use trin_core::types::validation::{HeaderOracle, Validator}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice if we keep all imports aggregated by crate. The easiest way to do this is to install the nightly
rust channel locally, add rustfmt.toml
file in the main folder with imports_granularity = "Crate"
and every time before pushing online just run cargo +nightly fmt --all
.
&mut self, | ||
content_key: &HistoryContentKey, | ||
content: &ByteList, | ||
) -> anyhow::Result<()> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be useful to return some type here on success (instead of an empty tuple), for example, if we are verifying the block header, we can return the actual hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure. Imo validate_xxx
is simply asking to validate the content or return an error. Pretty much every time I've used this naming, I've stuck with this basic pattern. I've read some posts (that I can't find now) which say returning Result<(), Err>
is the preferred way to handle this in rust.
I get that it seems kind of useless, but I'd also argue against implementing unnecessary logic (like returning data that we don't currently need). Now ... maybe down the road we do need this data. At which point, I'd say that's fine but we should change the name of the validation function to something like validate_and_get_content_key()
.
7df2dae
to
d3264e1
Compare
ethportal-peertest/src/jsonrpc.rs
Outdated
let response_obj = match next_obj { | ||
Some(val) => val, | ||
None => return Err(anyhow!("Empty JsonRpc response")), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, why was this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clippy was complaining about an ok_or()
containing a function, but it looks like ok_or_else(|| ...)
works as well, not sure why I went so verbose the first time around...
src/lib.rs
Outdated
// todo: pass header_oracle into state network | ||
// - to initialize StateValidator | ||
// - for oracle to pick up handle to state_jsonrpc_tx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move todo to issue?
let content_key = match TContentKey::try_from(request.content_key) { | ||
Ok(val) => val, | ||
Err(msg) => { | ||
error!("Unable to process received content: Invalid content key: {msg:?}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird situation. We requested content with an invalid content ID and then find out about it during processing the response, yeah?
The error message doesn't really convey how weird that is. I think a log like this might give me more up-front understanding:
error!("Unable to process received content: Invalid content key: {msg:?}"); | |
error!("Unable to process received content: we sent a request with an invalid content key: {msg:?}"); |
if let Err(err) = self | ||
.validator | ||
.validate_content(&content_key, &content) | ||
.await | ||
{ | ||
error!("Unable to validate received content: {err:?}"); | ||
return; | ||
}; | ||
match self.storage.read().should_store(&content_key) { | ||
Ok(should_store) => { | ||
if should_store { | ||
if let Err(err) = self.storage.write().store(&content_key, &content.into()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since validation might be expensive (literally, at first), maybe we should be validating after testing should_store()
, but obviously before store()
.
trin-core/src/types/validation.rs
Outdated
|
||
impl HeaderOracle { | ||
// Currently falls back to infura, to be updated to use canonical block indices network. | ||
pub fn get_hash_at_height(&mut self, hex_number: String) -> anyhow::Result<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing in the block number as a hex string at this level of the API is surprising to me. I guess it's because Infura/json-rpc expects a hex string, but that seems like an implementation detail that's leaking up into the API. I would expect a uint here, then converting to a hex string inside of get_hash_at_height()
before building the JsonRequest
.
trin-history/src/validation.rs
Outdated
.header_oracle | ||
.write() | ||
.unwrap() | ||
.get_hash_at_height(number)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... continuing from the previous comment, I'd expect to be able to do this here:
.get_hash_at_height(number)?; | |
.get_hash_at_height(header.number)?; |
What was wrong?
Initial datatype to help with portal network data validation. This is a fairly minimal implementation, but the general structure that I'm considering is there. Looking for feedback / suggestions / improvements on this layout before continuing to finish out the todos.
Validation needs to happen..
Accept
ed content that's beenOffer
ed to us (via utp)BlockBodies
&Receipts
in the Chain History networkAssumptions
Where to validate?
Validation could happen "on top" of an overlay network.
eth_*
jsonrpc endpoints).But since validation needs to happen "within" an overlay network (we might as well perform all the validation at that level).
Accept
ed content before storing it locally,OverlayService
must be capable for validating data.Current Implementation
ValidationOracle
Validator
and it collects a handle to that networks jsonrpc tx channel.events
channel, but jsonrpc seemed sufficient & simplerValidationOracle
Validator
content_key
/content
types that network supports.To-Do