-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exists() method for Checkpoints class #142
Comments
@KrishnaPG I'm curious under what circumstances this is too expensive? In the default Jupyter ContentsManager, list_checkpoints is basically just |
Thank you @ssanderson
In scenarios where the older checkpoints get moved to secondary (slower) storage, there is no way to tell the We do not see it as impossible for a user to create 25+ checkpoints throughout the life of a notebook, where as his active "working set" would be only last 5 or 10 checkpoints. A typical Redis implementation would hold these 'working set' in memory, while pushing all the older ones to secondary storage. Ofcourse, One can certainly circumvent those problems by always trying to have all the checkpoints always available in the main memory ready to be listed etc., but then it looks like having an "exists()" method is more simpler and semantically valid solution. |
Another point worth noting: the It could be just an ordinary method with default implementation that falls back to
This way, it is backward compatible and also does not put pressure on implementor who do not want to take advantage of that |
OK, I think this is a reasonable addition. @KrishnaPG , do you want to make a pull request adding it. There might be some bikeshedding over naming.
|
@takluyver vote on naming would be 👍 for |
I think it might be better to be a bit more specific, because there are some different possible 'exists' questions you might ask of the checkpoints implementation:
But it's not a strong preference, and I don't currently have a name I like better for it. |
Thanks @takluyver and @ssanderson . Forgive me for the It was not my intention to promote it - its just that not being familiar with python or its type-conversion system, I used the But I see from your messages that python has this As for the naming, I have no preference either. Whichever works for you, is fine with me. But as @takluyver pointed, the semantics certainly need to be clear for the user. For example, something on the lines of BTW, does python support polymorphic functions (where multiple methods can have same function name with different parameter types). In that case multiple methods with same name such as below may satisfy @takluyver 's criteria:
If not, what is the typical way Python achieves the above kind of mechanism? One simple way would be - if the
where The |
This isn't possible in python without adding some extra machinery (see below for an example). If you define two functions with the same name in a given namespace, the second just overwrites the first.
You can write something like:
I'd say this is generally considered a bit of an anti-pattern though unless you've got a good reason for wanting to accept multiple input types. For cases where it makes sense (e.g. when you're implementing an addition function on various number-like types), there are libraries like @mrocklin's multipledispatch that let you specify type signatures as decorators. For this particular case, dispatching on type for an |
Sorry for the late reply. So, what is the action plan? Shall we go ahead with the name Also, I would like bring the issue of adding 'descriptions' metadata to the files and checkpoints. I know the issue has been put in halt at #130 due to performance reasons. However, for alternate content manager classes (such as mongodb / postgres based managers) being discussed here, which are not file-based, reading the metadata info (such as the descriptions) happens anyway as part of reading the filename and related info. So no additional overhead. Primarily my concern is about 'checkpoints'. Listing 10 different checkpoints with different timestamps in the menu is not so effective as listing checkpoints with their 'commit messages'. So, my question is: assuming that performance overhead is not a problem (let the content /checkpoint manger worry about it), what is the change required to add these descriptions for files and checkpoints (architecture wise)? For poor-performers like the file-based content manager, they can always skip reading the description and return "" (empty string) for the description, if they want. What are your views? |
It kinds of depends on what people want to do. With extensions, I suppose it is still possible to write that externally, as another API endpoint for the time beeing, and iterate on it. We are about to release 4.0, so we will not change API right now. |
Not a problem @Carreau You guys know how to do it right and when to do it. So, will leave it in your good hands and hope to see it implemented sometime soon. Meanwhile I will try to keep adding my views/comments/suggestions on other list of features /wishlist from architecture stand point (which is what I do better), as we keep encountering them in our own use of this product. I thank you and all your other project members for taking time to review issues such as this and others, and also for providing this good opensource work/project for community. It is helpful. |
Not always, and our bandwidth is pretty limitted. It would be (relatively) easy to get more experience than one of us on a specific part of the project. I would expect @ssanderson to be the more experienced on the content manager side. Draft implementation are alway welcome, they at least often show issue in implementation. We almost never get things right the first time.
No problem, I know we are not always super responsive and we can seem reluctant sometime, maybe out of extra caution. |
@gnestor : was this implemented in 5.0? What are the next steps? Thanks! |
@KrishnaPG Are you still interested in contributing this feature? Marking as backlog for now... |
Closing due to inactivity... |
Presently the
Checkpoint
class haslist_checkpoints()
method that is being used to verify if a check point exist for a given file or not, such as here:In some scenarios listing all checkpoints is too costly affair when we are just interested in only checking the existence of atleast one.
Perhaps the
Checkpoint
class should havecheckpoint_exists_for(path)
kind of mechanism.(PS: Originally posted here)
The text was updated successfully, but these errors were encountered: