-
-
Notifications
You must be signed in to change notification settings - Fork 73
Add memory and storage awareness #289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey, two general ideas:
|
I ended up deciding to implement this logic in MemPool, since it's the most reasonable place to do this, and it has the greatest control over memory management: JuliaData/MemPool.jl#60. With that PR posted and basically ready to go, I'm slightly changing what we'll be implementing in Dagger:
The non-optional items in this list are the basics necessary to let Dagger handle "big data" problems; the MemPool PR also gives us swap-to-disk automatically, so we don't need to worry about that for now. The optional items are useful for improving scheduling decisions, which are helpful, but not strictly necessary (and will be partially obviated by future work-stealing).
|
47c7e2a
to
e510748
Compare
By default we'll follow whatever
This would be a decision for the MemPool allocator to make. I think I'd like to see how far we can get with basic allocation strategies (maybe MRU or similar), before we consider passing such information directly to the allocator. I'd prefer not to end up with an API like Linux's |
e510748
to
b9378e0
Compare
b9e22d1
to
19c8787
Compare
a424693
to
d33746b
Compare
Track worker storage resources and devices Track thunk return value allocations Expand procutil option to time_util and alloc_util Add storage option for specifying MemPool storage device Format bytes in debug logs Add locking around CHUNK_CACHE Move return value Chunks to MemPool device Chunk: Update tochunk docstring Walk data to determine serialization safety Drop Julia 1.6 support
Split suites out into individual files Provide usage info when run without BENCHMARK env var Add option to save logs to output file Add DTable CSV/Arrow reading suite
This PR adds awareness of memory and storage (disk, etc.) to Dagger, adding a new "storage subsystem", similar to the existing "processor subsystem". The intention is that by modeling storage resources explicitly - specifically detecting their real-time capacities and free space, and providing methods to move data to-and-from storage - we can teach the scheduler to swap data to disk when memory is full, or any other kind of capacity-protecting movement or scheduling.
We will additionally begin tracking GC allocations at runtime, and use estimates of such allocations to limit scheduling when the scheduler knows that memory would otherwise become exhausted. This should make it easier to execute code over "big data", even when such data is too large for a single worker, or even all workers, to keep in memory at one time. This model should also be extensible to GPUs (which have their own memory space), so that GPU OOMs can be avoided.
Todo:
storage
thunk option to indicate which MemPoolStorageDevice
to use (defaults to the global device)StorageDevice
transfer times per-byte and compression amount (per-type?), and teach Sch to computeStorageDevice
transfer costs in schedulingestimate_task_costs