Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make repl_history.jl less racy #37015

Open
StefanKarpinski opened this issue Aug 12, 2020 · 11 comments · May be fixed by #45450
Open

make repl_history.jl less racy #37015

StefanKarpinski opened this issue Aug 12, 2020 · 11 comments · May be fixed by #45450
Labels
REPL Julia's REPL (Read Eval Print Loop)

Comments

@StefanKarpinski
Copy link
Member

We're hearing from users on clusters with shared home directories that their ~/.julia/logs/repl_history.jl files are frequently corrupted. It would be good to make this less racy, either by doing some file locking with Pidfile or by some other mechanism.

@stevengj
Copy link
Member

Just use git to manage the history. Or maybe PostgreSQL?

@stevengj stevengj added the REPL Julia's REPL (Read Eval Print Loop) label Aug 17, 2020
@StefanKarpinski
Copy link
Member Author

Requiring a database or git (which we're trying to reduce and eventually eliminate dependencies on) for REPL history seems a bit excessive. How would one use git for this?

@stevengj
Copy link
Member

stevengj commented Aug 17, 2020

Sorry, I thought it was obvious that I was joking. 🤷 Next time I'll put in a 😉 .

@StefanKarpinski
Copy link
Member Author

That went right over my head 😬

@notinaboat
Copy link
Contributor

I have this problem a few times per day.

Other's have the same problem with clusters: #36895 (comment)

The thing* I'm working on has two Julia processes on my intel Mac and another three each on a seperate Raspberry Pi.
All the processes share a JULIA_DEPOT_PATH. The Raspberry Pis have this directory mounted via NFS.
(sometimes there is also a qemu vm in the mix)

I can't see file locking ever solving this problem. Even if there was a reliable cross-platform (and cross-filesystem?) way to do the locking, it wouldn't be great to have everyone waiting for one node in a cluster to release it's file lock.

Perhaps it would be simpler for every process to log to logs/repl_history.d/$host.$pid.
Then, when loading the history, iterate through all of logs/repl_history.d/* in parallel, pushing items into the REPLHistoryProvider in timestamp order.

I imaging an eventual consistency scheme for coalescing files where at history loading time:

  • any file loaded that was last modified more than one day ago is deleted
  • content from the delete file is saved under the new $host.$pid file of the process doing the history loading.

This could result in duplication if who processes both decided to do deletion at the same time, but exact duplicates of timestamp and command could simply be ignored at load time.

@StefanKarpinski, if you would support a PR along these lines I would consider working on it. I can't quite bring myself to use a startup.jl hack like this: #36895 (comment)


* the "thing" is a test harness for an embedded industrial control system. The R'Pis simulate sensor inputs and record outputs from the control board. The Mac does test scenario orchestration, log recording and log analysis. Each Julia process has a REPL accessible via tmux for interactive use.

"""
# TODO: write-lock history file
seekend(hist.history_file)
print(hist.history_file, entry)
flush(hist.history_file)
nothing

@StefanKarpinski
Copy link
Member Author

I do think something like that may be the way to go. I don't love having that $host part in there for people on local file systems, but I guess it doesn't do much harm. Another approach would be to just use a random slug in the file name instead of the host and pid.

@notinaboat
Copy link
Contributor

Possible benefits of $host:

  • Later supporting --history-file={yes|no|local}. I can imagine sometimes wanting to keep separate history for different nodes in a heterogeneous cluster.
  • A new process could append to an existing file if the host matches and there is no process running for the pid.
  • Might aid debugging of "Which host is screwing up the REPL history?" in future.

But, just using logs/repl_history.d/$(UUIDs.uuid4()) would solve the problem at hand.

I sympathise with idea of not exposing people on local file systems to complexity that does not effect them. However, I can' think of any useful way to define "local file system" given the common use of various layers of virtualisation.

@StefanKarpinski
Copy link
Member Author

The idea of --history-file={yes|no|local} is interesting. It suggests to me structuring the history paths like this:

  • logs/repl_history/$host/$pid.jl while recording
  • logs/repl_history/$host.jl once aggregated

I.e. doing the consolidation at the host level at most. For most users this will just be logs/repl_history/$host/*.jl and logs/repl_history/$host.jl with a single host name, which is not too bad. Why structure it like this? That way if you decide later to do --history-file=local you can start Julia with history that's only coming from the current node, regardless of how you ran it previously.

Other related issues: log aggregation & rotation. When does one aggregate logs from multiple PIDs? This has to be done carefully so that multiple Julias trying to do it at the same time don't step all over each other. There's also a question of log rotation — having an endlessly growing log file isn't ideal. Eventually it will be unmanageably large. It might make sense when aggregating logs to also split out really old stuff and only load it on demand.

@Moelf
Copy link
Contributor

Moelf commented Nov 12, 2024

Do we know how does bash/zsh/fish/ipython deal with this?

@giordano
Copy link
Contributor

Do we know how does bash/zsh/fish/ipython deal with this?

In zsh you need to set the option sharehistory to enable multiple sessions to handle this situation nicely, but the out-of-the-box experience is worse than julia. I'm not sure bash has any support for handling multiple sessions writing the history file. I don't know how the sharehistory is actually implemented though, and I also don't know if it's any better than Julia's parallel writing.

@caleb-allen
Copy link

I appreciate the approach taken by fish shell documented on this issue

fish doesn't implement any automatic synchronization mechanism because of the added performance penalties and complexity, and instead have a history merge command:

Immediately incorporates history changes from other sessions. Ordinarily fish ignores history changes from sessions started after the current one. This command applies those changes immediately.

It strikes me as a nice balance. It's a simple command which exposes the feature for those that need it, but it avoids adding complexity that would come with an automatic feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
REPL Julia's REPL (Read Eval Print Loop)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants