NBectomy #891

davidbrochart · 2022-06-24T14:57:04Z

Jupyter Server should not special-case notebook documents. There should only be two models, as far as the contents API is concerned: directories and files. A notebook is a file.

kevin-bates · 2022-06-24T19:07:21Z

I don't have any particular affinity for the Contents service, but do have an affinity for our users, and believe this is too aggressive, especially given the lack of a deprecation cycle. In essence, this pull request is removing notebook validation and "notarization", which has been a given across every client application and custom ContentsManager implementation. Are we proposing that it's now up to each client application to perform validation (despite the fact that validation should be performed on the server)?

One path forward, which was touched on in yesterday's Server meeting, is to keep the existing classes (thereby preserving existing applications and customizations), but perhaps inject a new base class (e.g., BaseContentsManager) consisting of everything in ContentsManager except validation and notarization and from which ContentsManager derives. You'd then need a separate file-based ContentsManager that derives from BaseContentsManager and would probably need to utilize a separate mixin (sans any validation) that consists primarily of today's FileContentsManager. The default ContentsManager could then either remain AsyncFileContentsManager or we make the new hierarchy (that skips validation/notary) the default. Any exiting configurations that bring their own ContentsManager will continue to override the default and continue to work.

Another approach may be to make validation configurable (and optional). For this, I think you'd need to let it be enabled by default, but could deprecate the default value such that the future default is to skip validation. End users and custom ContentsManager implementations could still override the (new) default to retain validation. This configurable could be an enumeration allowing for combinations of validation and/or notarization.

I agree that it would be a nice simplification to only recognize files (and directories) and also agree that validation is probably over-performed. But I also believe notebook validation is core to a notebook server and find it a bit disturbing that we'd entertain the notion that invalid notebooks could be persisted without any idea they're invalid.

davidbrochart · 2022-06-24T20:37:16Z

Thanks Kevin for the feedback.
You're right, this is a bit aggressive, but I wanted to show that things still work when removing everything related to notebooks from the server (this PR works with JupyterLab). I was pretty confident it would be the case because Jupyverse already does that.
I'll consider the options you gave, but keep in mind that the contents API was basically removed with the latest RTC improvements: all the contents go through the YDoc websocket. If we take the direction that RTC will be the default (and that you will be "collaborating with yourself" without noticing it in a single user scenario), then maybe we shouldn't invest too many efforts in the REST API anyway.
Actually, this simplification is motivated by an RTC issue.

kevin-bates · 2022-06-24T21:32:02Z

Thanks @davidbrochart - that explanation is helpful for my understanding.

keep in mind that the contents API was basically removed with the latest RTC improvements: all the contents go through the YDoc websocket

Removed from where? Is Lab (and all its extensions) moving to the RTC interfaces to persist and retrieve content? I must be missing something here - sorry.

I wanted to show that things still work when removing everything related to notebooks from the server

I can remove any core piece of internal functionality (like validation) and demonstrate that clients of that software still work. In fact, it may never fail again!

davidbrochart · 2022-06-24T21:49:09Z

Removed from where?

Sorry, "removed" is not the right word, but JupyterLab in RTC mode doesn't use the REST contents API to load and save documents. It uses the YDocWebSocketHandler.
It's not here yet, but in the future, I can see RTC being the only supported mode, meaning that our document data structures will be implemented using CRDTs.

With RTC, validation becomes a shared responsibility, it's not centralized in the server. Actually, from the CRDT point of view, the server is just another client. Every client is responsible for emitting valid document changes.

kevin-bates · 2022-06-24T22:02:15Z

With RTC, validation becomes a shared responsibility, it's not centralized in the server. Actually, from the CRDT point of view, the server is just another client. Every client is responsible for emitting valid document changes.

I see. Thanks.

echarles · 2022-06-25T06:33:10Z

this pull request is removing notebook validation and "notarization"

With RTC, validation becomes a shared responsibility, it's not centralized in the server.

Can you give more context on how you see validation, trust and notarization features being kept in the Jupyter RTC world?

davidbrochart · 2022-06-25T07:11:02Z

I think that in RTC, document validity is almost "built-in": a client which modifies e.g. a notebook does it in a consistent way. You might say that it was already the case before RTC, but we still validated the whole notebook on save. I'm not sure we should do that in RTC for performance reasons, since the document is automatically saved after each change. Also we currently have an architecture where all the updates go to the server, before being forwarded to other clients, but there is no guarantee that it will always be the case. For instance, in JupyterLite every client could be directly talking to every other ones.
As far as trust is concerned, there are discussions about whether we should keep it, but this seems like a fragile mechanism anyway, and it's kind of broken currently. @hbcarlos and @fcollonval know more about it.
This PR also removes checkpoints, since they will be replaced with Ystores in RTC.
Let's keep discussing, this PR is about that.

echarles · 2022-06-25T07:54:00Z

IMHO Notebook trust and security is something important that should be upgraded whatever feature is considered in a consistent and predicable way . Adding @Carreau to this thread as he may have thoughts on that.

Carreau · 2022-06-27T07:58:41Z

The trust model is really to prevent someone from forging and send you a notebook with potentially malicious code. I agree that once you passed the RTC barrier and are collaborating with other folks you do "trust" them.

But ho boy with respect to crafting malicious notebook that can infect each other at load time, just look at the history of excel viruses spreading themselves via macros, and transpose it to notebooks.

We went through many iteration of the trust mechanism, including and not limited to trusting each cell independently. But we figured out how to make a notebook, where simply the ordering of the cell would change its state from harmless to dangerous. Or we found out that some service at the time were sensitive to script tag injections in prompt numbers.

Personally with what I have witnessed in term of notebook capabilities, if signing is removed and not replaced by another mechanism, there is no way I ever use jupyter to open a notebook I received by email, nor ever open a notebook from a cloned github repository or similar. For me ipynb without signature is worse than executing a .exe downloaded on a pirate website.

So really I don't think we should load/save notebook w/o signing.

As for the simplification and treating notebooks vs other files in a more similar matter, I'm all for it.

Another concern for me is if removing the trust mechanism will make some companies back out of jupyter.

cc @rcthomas, @rpwagner @jweill-aws

Carreau · 2022-06-27T08:01:12Z

Oh, this week-end I read about Chesterton's Fence, this may be a good example.

Carreau · 2022-06-27T08:11:44Z

And yeah, it took me 8 minutes to pwned jupyverse, including fixing ImportError: cannot import name 'User' from 'fps_auth.models' (.../jupyverse/plugins/auth/fps_auth/models.py):

davidbrochart · 2022-06-27T08:17:23Z

I don't understand what you mean, but if you found something wrong, do you mind opening an issue in jupyverse?

Carreau · 2022-06-27T09:06:00Z

Sure, jupyter-server/jupyverse#186

But that seem like a fundamental piece of the Jupyter Security model. If I send you a file that contains javascript, the javascript should not execute on load. Unless of course you are the author of the javascript or trust it.

Which is all what the notebook notary deal with, if the signature does not match, and the notebook is potentially tempered with, it will show at untrusted. Which you can't do by just loading the Json.

But for me that raise the question (and a concern): What is your security model for real-time collaboration ? Because if things like the above are not understood or taken into consideration, i'm scared of the security implications of RTC...

davidbrochart · 2022-06-27T09:17:51Z

Thanks for opening the issue. Jupyverse doesn't implement the current trust mechanism. As it is probably going to change with RTC, we preferred to wait for things to settle.

But for me that raise the question (and a concern): What is your security model for real-time collaboration ? Because if things like the above are not understood or taken into consideration, i'm scared of the security implications of RTC...

There is @trungleduc's PR which deals with rendering widgets in RTC. I think it is a solid base that should generalize to all outputs, and become the new trust mechanism.

davidbrochart · 2022-06-27T09:40:35Z

including fixing ImportError: cannot import name 'User' from 'fps_auth.models' (.../jupyverse/plugins/auth/fps_auth/models.py)

BTW, you didn't mention this in your issue, mind opening a new issue in jupyverse?

Carreau · 2022-06-27T09:43:46Z

that's just on the released version. latest git seem fine now.

…

On Mon, Jun 27, 2022 at 11:40 AM David Brochart ***@***.***> wrote: including fixing ImportError: cannot import name 'User' from 'fps_auth.models' (.../jupyverse/plugins/auth/fps_auth/models.py) BTW, you didn't mention this in your issue, mind opening a new issue in jupyverse? — Reply to this email directly, view it on GitHub <#891 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACR5T7CWMEP2OJC5DTVWV3VRFZKBANCNFSM5ZYDVJZA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

davidbrochart · 2022-06-27T09:48:45Z

that's just on the released version. latest git seem fine now.

I could not reproduce, mind opening an issue?

Carreau · 2022-06-27T09:53:22Z

Sure: jupyter-server/jupyverse#187

minrk · 2022-07-08T17:28:02Z

I think we're a pretty long way from being ready for this. JupyterLab isn't the only client for Jupyter server to support, and removing the single most fundamental feature of the server (serving notebooks via the contents API) without deprecating for at least one, and probably multiple major release cycles doesn't seem prudent to me.

At the very least, an equivalent or improved trust model must be fully supported before this should even be deprecated.

Zsailer · 2023-02-16T16:56:18Z

Hey @davidbrochart, we're triaging PRs in the Jupyter Server meeting today. Because there hasn't been activity on this PR in a little while, I'm going to close it. Thanks for your work here and feel free to re-open anytime if this is in-progress.

davidbrochart marked this pull request as draft June 24, 2022 14:57

davidbrochart added the enhancement label Jun 24, 2022

Remove notebook file type

47edc5a

davidbrochart force-pushed the file_type branch from a43b5ac to 47edc5a Compare June 24, 2022 15:14

Fix/remove tests

c6f82ea

davidbrochart force-pushed the file_type branch from 58791e2 to c6f82ea Compare June 24, 2022 16:32

davidbrochart marked this pull request as ready for review June 24, 2022 16:43

davidbrochart mentioned this pull request Jun 24, 2022

Register new file types #884

Closed

davidbrochart mentioned this pull request Jun 27, 2022

Render widgets on all clients in collaborative mode jupyterlab/jupyterlab#11494

Closed

davidbrochart mentioned this pull request Jun 28, 2022

RTC and document validation jupyterlab/jupyterlab#12743

Open

Zsailer closed this Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NBectomy #891

NBectomy #891

davidbrochart commented Jun 24, 2022 •

edited

Loading

kevin-bates commented Jun 24, 2022

davidbrochart commented Jun 24, 2022

kevin-bates commented Jun 24, 2022

davidbrochart commented Jun 24, 2022

kevin-bates commented Jun 24, 2022

echarles commented Jun 25, 2022

davidbrochart commented Jun 25, 2022

echarles commented Jun 25, 2022

Carreau commented Jun 27, 2022

Carreau commented Jun 27, 2022

Carreau commented Jun 27, 2022

davidbrochart commented Jun 27, 2022

Carreau commented Jun 27, 2022

davidbrochart commented Jun 27, 2022 •

edited

Loading

davidbrochart commented Jun 27, 2022

Carreau commented Jun 27, 2022 via email

davidbrochart commented Jun 27, 2022

Carreau commented Jun 27, 2022

minrk commented Jul 8, 2022

Zsailer commented Feb 16, 2023

NBectomy #891

NBectomy #891

Conversation

davidbrochart commented Jun 24, 2022 • edited Loading

kevin-bates commented Jun 24, 2022

davidbrochart commented Jun 24, 2022

kevin-bates commented Jun 24, 2022

davidbrochart commented Jun 24, 2022

kevin-bates commented Jun 24, 2022

echarles commented Jun 25, 2022

davidbrochart commented Jun 25, 2022

echarles commented Jun 25, 2022

Carreau commented Jun 27, 2022

Carreau commented Jun 27, 2022

Carreau commented Jun 27, 2022

davidbrochart commented Jun 27, 2022

Carreau commented Jun 27, 2022

davidbrochart commented Jun 27, 2022 • edited Loading

davidbrochart commented Jun 27, 2022

Carreau commented Jun 27, 2022 via email

davidbrochart commented Jun 27, 2022

Carreau commented Jun 27, 2022

minrk commented Jul 8, 2022

Zsailer commented Feb 16, 2023

davidbrochart commented Jun 24, 2022 •

edited

Loading

davidbrochart commented Jun 27, 2022 •

edited

Loading