-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(sampler): clean up sampler attributes, part 3 #5402
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5402 +/- ##
==========================================
- Coverage 83.35% 83.33% -0.02%
==========================================
Files 343 342 -1
Lines 18800 18784 -16
==========================================
- Hits 15671 15654 -17
- Misses 3129 3130 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…ric into remote_backend_4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, left some very minor comments
…n classes, part 4 (#5404) This PR continues the effort to consolidate PyG's sampling interface in preparation for moving `sample(...)` behind the `GraphStore` interface. This effort is somewhat large in scope and will be broken into multiple PRs for ease of review. It builds off of #5402, and makes a significant move to abstract data loading behind a `data: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]]` and a `sampler: BaseSampler`. It does so by introducing two base implementation classes: `NodeLoader` and `LinkLoader`. `NodeLoader` performs sampling from nodes (using `sample_from_nodes`), and `LinkLoader` does the same from edges (using `sample_from_edges`). They both expose parameters in their initializers that are intended for **loading** (that is, the process of using a sampler to get subgraphs, using a feature fetcher to get features, and joining these together to construct a `HeteroData` object to pass downstream). Samplers are intended to expose parameters that are used for **sampling** (that are particular to the sampling method). The implementations of `NeighborLoader` and `LinkNeighborLoader` are now very simple: they pass the `NeighborSampler` and any necessary initialization parameters directly in `__init__`, with no other change.
This PR continues the effort to consolidate PyG's sampling interface in preparation for moving
sample(...)
behind theGraphStore
interface. This effort is somewhat large in scope and will be broken into multiple PRs for ease of review. It resolves some TODOs from #5365.The major change removes special handling of
input_type
andperm
/perm_dict
in the neighbor loader. This enables a full separation between the sampler interface and the NeighborLoader (the only other interaction that is not clearly documented or exposed well in the sampler class is*SamplerOutput.metadata
, which will take a bit more work to properly define.