-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix build and populate index in parallel #86
Conversation
Some rough measurements:
I tried to drop the caches before the third measurement, but I'm not sure it worked (ZFS on Linux). Tested on a Celeron J1900 on a library with 1 200 directories and 12 000 songs. |
8f5d1e7
to
3c07046
Compare
@@ -322,8 +341,8 @@ pub fn populate(db: &DB) -> Result<()> { | |||
Regex::new(&settings.index_album_art_pattern)? | |||
}; | |||
|
|||
let (directory_sender, directory_receiver) = channel(); | |||
let (song_sender, song_receiver) = channel(); | |||
let (directory_sender, directory_receiver) = crossbeam_channel::unbounded(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to crossbeam-channel
because it's Sender
is Sync
(and it's also faster and more reliable than the std
one).
let song_tags = song_files | ||
.into_par_iter() | ||
.filter_map(song_metadata) | ||
.collect::<Vec<_>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid the allocation here by doing a fold
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough for me and very readable in this form. I like this a lot better than what it was before.
Ok(()) | ||
sub_directories | ||
.into_par_iter() | ||
.map(|sub_directory| self.populate_directory(Some(path), &sub_directory)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, could we use .for_each()
here instead of map and collect? I'm also unsure how this collect()
compacts the Iter of results into a single value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the same as the similar Iterator method. It propagates one of the Err
results (note how you were using ?
before, exiting at the first error). With my other error printing change, we should get an error if the channel was closed. It probably doesn't matter much, but it seemed more clear to exit like this than to not handle the errors.
I'll add a comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://doc.rust-lang.org/stable/rust-by-example/error/iter_result.html#fail-the-entire-operation-with-collect, it's a less-known trick.
.iter() | ||
.par_bridge() | ||
.map(|target| updater.populate_directory(None, target.as_path())) | ||
.collect::<Result<()>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the same collect
magic! Does something implement Into<Result>
for Iter<Result>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, it returns an error if one of the mounts failed.
Left one minor question before merging, but this is a great change! Very promising numbers - although I assume it might be especially beneficial on the lower end CPU. |
I think it's beneficial on slower drives because you get more IO operations at once. Even mechanical drives are fastest at 8 concurrent IOs or so. On my CPU this still doesn't go above 50-80% overall usage (over the four coures), even with ZFS trying to uncompress the files. |
Hello, So yesterday I added my collection of about 200k mp3 files at the time of opening #85. Unforunately after a couple of hours it 'timed out' so it idexed just a few folders of it. I clikced on scan now again, once I restarted the container, it is the same CPU usage as yesterday, around 3% max. The music collection is on ZFS on linux (LZ4 compression) with an i5-4590. Maybe this will fix it. I'm eagerly waiting for a new build. :) |
Even if you closed the browser, the scan should have ran in the background (it runs periodically anyway). I'm not familiar with LXC, do you have the logs? They're normally printed to the console, but they might get redirected somewhere else if you run it in the background. There's no logs for files scanned successfully, but you might see some errors. You can also
😄 |
😄
It doesn't write to a file, but to console. On my distro I see them with |
There is a log entry at the end of each scan, you've been using it for your benchmarks @lnicola :D On Linux, running without the |
As a datapoint @eisengrau, I run Polaris on a Raspberry Pi 3 with about 80k songs on a USB HDD and the initial indexing (cold cache) takes about 10-15 minutes IIRC. |
@agersant I'm curious what effect this PR has on that. |
Ok, I ran polaris -f, I could now see what has stopped the scanning: 10:09:26 [ERROR] [src/index/mod.rs:90] Error while updating index: Invalid directory path Could be directories/file names with accented/invalid characters? |
I think I fixed an issue causing it to give up on some errors, can you try with the latest version?
Only UTF-8 works. |
Should I git clone and recompile again, since previously I just unpacked the latest archive and used that. |
Clone, |
Fixes #85
Fixes #84
CC @eisengrau