Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new attachment property for cells. #21

Merged
merged 4 commits into from
Mar 8, 2016
Merged

Add a new attachment property for cells. #21

merged 4 commits into from
Mar 8, 2016

Conversation

julienr
Copy link
Contributor

@julienr julienr commented Oct 20, 2015

This is a proposal to add a new 'attachments' property to cells to allow for inline medias.
The primary use case is to be able to add images to markdown cells and store them directly in the notebook (as opposed to having them on the filesystem).

References :
original issue : jupyter/notebook#613
corresponding notebook PR : jupyter/notebook#621

@minrk
Copy link
Member

minrk commented Oct 20, 2015

From the proposal, it looks like attachments should only be present on Markdown cells, not all cells.

@julienr
Copy link
Contributor Author

julienr commented Oct 20, 2015

The argument for having attachments for all cells is so that they are not removed if the user changes the cell type by mistake (this often happens to me).

@ellisonbg
Copy link
Contributor

@minrk in general I like this idea. I do think it would help to have the attachments on all cell types. How would we handle this from a versioning perspective?

@minrk
Copy link
Member

minrk commented Oct 20, 2015

Okay, all cells makes some sense, then. Losing on accidental cell type change isn't a good enough argument for me, because this can be handled in the notebook js without adding it to the nbformat. I don't love the idea of adding another always-empty attribute to each cell, especially when it should always be empty on all cell types other than markdown. What is the use case for other cell types?

New attributes should bump the minor version of the nbformat. This should work fine, but we should do some careful testing that the UI in earlier versions behaves correctly.

@jhamrick
Copy link
Member

👍 I think this is a great idea.

I could see a use case for having attachments in raw cells, simply for the same reason as markdown cells (e.g. you might be creating a raw cell that has rst in it).

For code cells, this might be a way to allow people to store data in the notebooks, which I know is a feature many people want. This might be more complex than what we actually want, though, because it would probably require that the kernel knows how to find that data from the notebook data structure -- unless it was something that the notebook frontend sent along?

@julienr
Copy link
Contributor Author

julienr commented Oct 21, 2015

@jhamrick I have been thinking about the same thing for code cell. It would be nice to have a %get_attachment magic or similar to build self-contained notebooks. There would need to be some warning about attaching big files this way, otherwise the notebook might grow too large for the JSON parser to handle.

@minrk
Copy link
Member

minrk commented Oct 21, 2015

I wouldn't ever encourage storing data in the notebook for use in code. Storing the data files next to the notebook will always be better in many different ways. For this reason, I would oppose a %get_attachment magic in IPython, since I think it would encourage bad practice, even if it were possible.

Now that I think about it, perhaps we should be working on an interface for uploading files to the contents service and using relative URLs, rather than embedding attachments in the JSON.

@ellisonbg
Copy link
Contributor

Yeah, I agree with @minrk that the notebook shouldn't be used to store data
that the kernel needs to use.

On Wed, Oct 21, 2015 at 10:24 AM, Min RK [email protected] wrote:

I wouldn't ever encourage storing data in the notebook, especially for
use in code. Storing the data files next to the notebook will always be
better in many different ways. For this reason, I would always oppose a
%get_attachment magic in IPython, since I think it would encourage bad
practice, even if it were possible.

Now that I think about it, perhaps we should be working on an interface
for uploading files to the contents service and using relative URLs, rather
than embedding attachments in the JSON.


Reply to this email directly or view it on GitHub
#21 (comment).

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
[email protected] and [email protected]

@minrk
Copy link
Member

minrk commented Feb 8, 2016

Sorry this slipped off the review stack. @julienr do you have some time to come back to it? If not, we can finish it up:

This is what I think is left to do:

  • remove attachments from code cells (as mentioned, UI can avoid data loss without adding it to metadata in file format), but leave them on markdown + raw
  • bump nbformat version to 4.1
  • add attachments key to cells when opening 4.0 notebooks, unless we want to define it as an optional key.

@julienr
Copy link
Contributor Author

julienr commented Feb 8, 2016

@minrk Yes I'll find the time to push it. I'll start by rebasing everything onto the current master

@julienr
Copy link
Contributor Author

julienr commented Feb 9, 2016

@minrk How does bumping nbformat to 4.1 work ? Should I just create a new v4.1 directory based on v4 + the new attachments stuff ?

@minrk
Copy link
Member

minrk commented Feb 9, 2016

No, just update nbformat_minor in v4.nbbase to 1. We don't need separate minor implementations, since they are by definition backward-compatible.

@julienr
Copy link
Contributor Author

julienr commented Feb 9, 2016

I think it makes sense to have attachments be an optional attribute. They are most likely to be a rare
occurrence. What do you think ?

@minrk
Copy link
Member

minrk commented Feb 9, 2016

It's a trade-off:

  • optional:
    • + reduces flood of empty dicts all over the place (we already have this with metadata), but consumers can always use cell['attachments']
    • - all consumers must do cell.get('attachments', {})), which is often omitted if it is technically allowed to be absent, but the default implementation typically stores the empty dict.

@takluyver
Copy link
Member

I think I'd prefer optional: when people save notebooks in format 4.1 for the first time, people with notebooks in version control are not going to thank us if their diff includes attachments: {} added to every cell.

@minrk
Copy link
Member

minrk commented Feb 9, 2016

I think I'd prefer optional: when people save notebooks in format 4.1 for the first time, people with notebooks in version control are not going to thank us if their diff includes attachments: {} added to every cell.

Should we pop attachments before writing if defined and empty? Or do that in the js client?

@takluyver
Copy link
Member

Hmm, I don't have a strong preference on that. If we pop them before writing in nbformat, do we recreate them on load? i.e. does it become another difference between the format of on-disk and in-memory notebooks? My inclination is that we should minimise those differences if possible, but then we do already have the one difference in multi-line strings, and we probably can't get rid of that, so I don't think it's a major problem if we introduce another one.

@julienr
Copy link
Contributor Author

julienr commented Feb 12, 2016

In the current PR, every cell in the js client has an attachments attribute that defaults to {}. At the toJSON conversion time, only TextCell have their attachments attribute written to the on-disk notebook.

This simplify the JS code (no need for cell.attachments !== undefined everywhere), but of course this can be modified.


Cell attachments
----------------
Markdown and raw cells can have a number of attachments, typically inline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a clear, explicit statement that attachments may be undefined, and add a

.. versionadded:: 4.1

@minrk
Copy link
Member

minrk commented Feb 17, 2016

no need for cell.attachments !== undefined everywhere

@julienr that doesn't solve the problem in general, because unless nbformat itself makes some guarantee, other consumers (e.g. nbconvert) must all implement their own handling of attachments being undefined or not.

I think we are pretty much set here, with the decision:

  • attachments may be undefined
  • empty attachments will be left as-is, according to the higher level writer (js).

We can leave the handling of whether to include empty attachments (sounds like not) to the js.

@minrk minrk added this to the 4.1 milestone Mar 8, 2016
@minrk minrk changed the title [WIP] Add a new attachment property for cells. Add a new attachment property for cells. Mar 8, 2016
minrk added a commit that referenced this pull request Mar 8, 2016
Add a new attachment property for cells.
@minrk minrk merged commit 7ab61f3 into jupyter:master Mar 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants