-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mutable tensor support for local cluster #464
Conversation
I would like to share my current progress on My current implementation is allocating chunks directly on workers for mutable tensor. When Now the trouble is, I cannot find a clean way to make
Another approach is creating |
Or, should I make |
Now We now have a
|
Excellent work! I will take some time to review the implementation. |
@sighingnow Please rebase your fork onto mars/master and update your code as we renamed all |
4309a8f
to
ea63136
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, every single write will retrieve data first, modify it, then write back to plasma. This is an option without doubt, but I think this method have some flaws listed blow:
- It's not safe, if many clients try to modify a same chunk, no transaction guaranteed.
- The data would be preferred to be pinned in the plasma store always, it there are other computation, there would be no room for it.
- I doubt the performance would not be great especially for write when many small modifies happen.
I prefer that for each write just store the effected index, timestamp and data into chunk store, if read or seal come just get all the historical write, and assemble them into one chunk, and write back if it's seal.
I think in this way will have better write performance.
@@ -68,6 +70,30 @@ def _update_tileable_shape(self, tileable): | |||
tileable._update_shape(tuple(sum(nsplit) for nsplit in new_nsplits)) | |||
tileable.nsplits = new_nsplits | |||
|
|||
def create_mutable_tensor(self, name, shape, dtype, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can still let user create a mutable tensor via mt.mutable_tensor
, because sometimes session
is not always required during processing data.
Thanks for the comprehensive review. I would like to explain some implementation decisions based on the comments. The expected design is just storing the affected index, timestamp, and data separately and merge into a whole tensor when
Now back to the current design. The basic insight is that the mutable tensor won't be kept in memory for a long time since the mutable tensor is only used to create a tensor by a series of
Based on the assumption that the mutable tensor is only used to create tensor, the write operations issued by the client won't overlap, or the overlapping doesn't affect the result. If we use the design that just recording the affected index and value, I think we couldn't give such transaction guarantee as well.
If the mutable tensor won't exist in memory for a long time, keep it in memory could provide the best wirte performance. When it is sealed, the chunks would be unpinned.
I think the write performance should be even better compared to the design that just recording the write operation and then do a merge. Because,
|
A previous bug has been fixed now and local test shows that now mtensor = session.create_mutable_tensor("test", (4, 4), chunk_size=3)
# write slice cross chunks
mtensor[2:4] = np.arange(8).reshape(2, 4)
# write with broadcast
mtensor[1] = np.arange(4).reshape(4)
# write scalar value (cross chunks)
mtensor[1:4, 2] = 8
x = mtensor.seal()
print(session.fetch(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few suggestions.
- Some functions could be util functions, for instance, the function that given a slice or integer to calculate the index array and value array. Some tests could be added to ensure the correctness.
GraphActor
looks like sort of final class, so doesMutableTensor
, so when seal triggered, is it possible to create aGraphActor
with a tensor that ownsFetchTensor
op?- A few functions that added to
SenderActor
seem not suitable, maybeReceiveActor
can be leveraged to receive data. When seal triggered, some actor can do the job to merge historical arrays into one, and then put back into plasma or just into disk if no room can be cleared out. - Tests should be added.
f6b567e
to
968a40c
Compare
CI is green. |
Great, I will review soon later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me, almost mergeable, but still have two issues to solve.
- Remove inheritance from GraphActor for MutableTensorActor.
- As
ResultSenderActor
could be removed recently, the logic based on it should be refined. We can see if we could wait for PR to removeResultSenderActor
.
Besides, there are small number of lines that not covered which can tell on coverall. Could add some ut to cover them as much as you can.
Overall really an excited PR for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, LGTM. We could pass the ci by constraining pyarrow version as a workaround. I think by then we can merge this PR.
Please rebase master branch to pass travis ci. |
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
…ensor, rather than MutableTensorData. Signed-off-by: Tao He <[email protected]>
Signed-off-by: Tao He <[email protected]>
Could someone help me relaunch the failed py35 job on appveyor? The job under my personal account is ok for the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What do these changes do?
Support mutable tensor.
A
MutableTensorActor
on scheduler to manager mutable tensors and perform the underlying operations.A
MutableTensor
andMutableTensorData
to manage chunks of mutable tensors.allocate/read/write
interface inResultSenderActor
(on workers).Chunk allocation:
read/write
operation is implemented by reading/writing anp.ndarray
constructed by the memory view directly (zero copy and no read/write overhead).TODO
write_mutable_tensor
.Tensor
(andTensorData
) using those thunks directly when seal.read/write
dispatch logic on client (now it runs on scheduler)Related issue number
#415
Resolves #439
Resolves #450.