Fix build and populate index in parallel #86

lnicola · 2020-07-20T13:31:24Z

Fixes #85
Fixes #84

lnicola · 2020-07-20T13:32:59Z

Some rough measurements:

[INFO] [src/index/update.rs:27] Library index update took 322.893 seconds

# rayon within a directory
[INFO] [src/index/update.rs:28] Library index update took 172.079 seconds

# rayon both within a directory and across directories
[INFO] [src/index/update.rs:28] Library index update took 49.357 seconds

# same, warm cache
[INFO] [src/index/update.rs:28] Library index update took 14.523 seconds

I tried to drop the caches before the third measurement, but I'm not sure it worked (ZFS on Linux). Tested on a Celeron J1900 on a library with 1 200 directories and 12 000 songs.

lnicola · 2020-07-20T13:44:08Z

src/index/update.rs

@@ -322,8 +341,8 @@ pub fn populate(db: &DB) -> Result<()> {
 		Regex::new(&settings.index_album_art_pattern)?
 	};

-	let (directory_sender, directory_receiver) = channel();
-	let (song_sender, song_receiver) = channel();
+	let (directory_sender, directory_receiver) = crossbeam_channel::unbounded();


Switched to crossbeam-channel because it's Sender is Sync (and it's also faster and more reliable than the std one).

lnicola · 2020-07-20T13:51:26Z

src/index/update.rs

+		let song_tags = song_files
+			.into_par_iter()
+			.filter_map(song_metadata)
+			.collect::<Vec<_>>();


We could avoid the allocation here by doing a fold.

Good enough for me and very readable in this form. I like this a lot better than what it was before.

agersant · 2020-07-21T00:16:28Z

src/index/update.rs

-		Ok(())
+		sub_directories
+			.into_par_iter()
+			.map(|sub_directory| self.populate_directory(Some(path), &sub_directory))


For clarity, could we use .for_each() here instead of map and collect? I'm also unsure how this collect() compacts the Iter of results into a single value.

I think it's the same as the similar Iterator method. It propagates one of the Err results (note how you were using ? before, exiting at the first error). With my other error printing change, we should get an error if the channel was closed. It probably doesn't matter much, but it seemed more clear to exit like this than to not handle the errors.

I'll add a comment here.

See https://doc.rust-lang.org/stable/rust-by-example/error/iter_result.html#fail-the-entire-operation-with-collect, it's a less-known trick.

agersant · 2020-07-21T00:20:43Z

src/index/update.rs

+			.iter()
+			.par_bridge()
+			.map(|target| updater.populate_directory(None, target.as_path()))
+			.collect::<Result<()>>()?;


This looks like the same collect magic! Does something implement Into<Result> for Iter<Result>?

Same as above, it returns an error if one of the mounts failed.

agersant · 2020-07-21T00:22:06Z

Left one minor question before merging, but this is a great change! Very promising numbers - although I assume it might be especially beneficial on the lower end CPU.

lnicola · 2020-07-21T03:42:11Z

I think it's beneficial on slower drives because you get more IO operations at once. Even mechanical drives are fastest at 8 concurrent IOs or so.

On my CPU this still doesn't go above 50-80% overall usage (over the four coures), even with ZFS trying to uncompress the files.

eisengrau · 2020-07-21T07:21:56Z

Hello,

So yesterday I added my collection of about 200k mp3 files at the time of opening #85. Unforunately after a couple of hours it 'timed out' so it idexed just a few folders of it. I clikced on scan now again, once I restarted the container, it is the same CPU usage as yesterday, around 3% max. The music collection is on ZFS on linux (LZ4 compression) with an i5-4590.

Maybe this will fix it. I'm eagerly waiting for a new build. :)

lnicola · 2020-07-21T07:25:45Z

Even if you closed the browser, the scan should have ran in the background (it runs periodically anyway).

I'm not familiar with LXC, do you have the logs? They're normally printed to the console, but they might get redirected somewhere else if you run it in the background. There's no logs for files scanned successfully, but you might see some errors. You can also strace it to get an indication of progress.

ZFS on linux

😄

eisengrau · 2020-07-21T07:52:26Z

It's definately doing something:

I'll get back and try to find some logs once it stops. Does the polaris binary generate any, or at the moment it just runs in the background once launched?

lnicola · 2020-07-21T07:55:15Z

teddybear

😄

I'll get back and try to find some logs once it stops. Does the polaris binary generate any, or at the moment it just runs in the background once launched?

It doesn't write to a file, but to console. On my distro I see them with journalctl and in Docker docker logs probably works, but I'm not sure what Debian and LXC do.

agersant · 2020-07-21T08:46:31Z

There's no logs for files scanned successfully

There is a log entry at the end of each scan, you've been using it for your benchmarks @lnicola :D

On Linux, running without the -f flag makes the output go to a polaris.log file under your $XDG_CACHE_HOME directory. Running with -f makes the process not daemonize and output to console.

agersant · 2020-07-21T08:50:12Z

As a datapoint @eisengrau, I run Polaris on a Raspberry Pi 3 with about 80k songs on a USB HDD and the initial indexing (cold cache) takes about 10-15 minutes IIRC.

lnicola · 2020-07-21T09:04:18Z

@agersant I'm curious what effect this PR has on that.

eisengrau · 2020-07-21T10:12:20Z

Ok, I ran polaris -f, I could now see what has stopped the scanning:

10:09:26 [ERROR] [src/index/mod.rs:90] Error while updating index: Invalid directory path

Could be directories/file names with accented/invalid characters?

lnicola · 2020-07-21T10:13:21Z

I think I fixed an issue causing it to give up on some errors, can you try with the latest version?

Could be directories/file names with accented/invalid characters?

Only UTF-8 works.

eisengrau · 2020-07-22T11:26:11Z

Should I git clone and recompile again, since previously I just unpacked the latest archive and used that.

lnicola · 2020-07-22T11:33:42Z

Clone, cargo build --release and take the binary from target/release/polaris.

Bump pear and pear_codegen

8035856

lnicola force-pushed the parallel-index branch 3 times, most recently from 8f5d1e7 to 3c07046 Compare July 20, 2020 13:50

lnicola commented Jul 20, 2020

View reviewed changes

agersant reviewed Jul 21, 2020

View reviewed changes

Use more threads when populating index

dbb5f79

lnicola force-pushed the parallel-index branch from 3c07046 to dbb5f79 Compare July 21, 2020 03:54

agersant merged commit 17976dc into agersant:master Jul 21, 2020

agersant mentioned this pull request Jul 21, 2020

could not compile rocket_http #85

Closed

lnicola deleted the parallel-index branch July 21, 2020 08:47

lnicola mentioned this pull request Sep 17, 2020

Scan speed in 0.12.4 terribly slow #96

Closed

agersant mentioned this pull request Jul 14, 2024

Inquiry Regarding Bandwidth Limitation in Polaris Project #205

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix build and populate index in parallel #86

Fix build and populate index in parallel #86

lnicola commented Jul 20, 2020 •

edited

Loading

lnicola commented Jul 20, 2020 •

edited

Loading

lnicola Jul 20, 2020

lnicola Jul 20, 2020

agersant Jul 21, 2020 •

edited

Loading

agersant Jul 21, 2020 •

edited

Loading

lnicola Jul 21, 2020 •

edited

Loading

lnicola Jul 21, 2020

agersant Jul 21, 2020 •

edited

Loading

lnicola Jul 21, 2020

agersant commented Jul 21, 2020

lnicola commented Jul 21, 2020

eisengrau commented Jul 21, 2020

lnicola commented Jul 21, 2020 •

edited

Loading

eisengrau commented Jul 21, 2020

lnicola commented Jul 21, 2020

agersant commented Jul 21, 2020 •

edited

Loading

agersant commented Jul 21, 2020

lnicola commented Jul 21, 2020

eisengrau commented Jul 21, 2020

lnicola commented Jul 21, 2020 •

edited

Loading

eisengrau commented Jul 22, 2020

lnicola commented Jul 22, 2020

Fix build and populate index in parallel #86

Fix build and populate index in parallel #86

Conversation

lnicola commented Jul 20, 2020 • edited Loading

lnicola commented Jul 20, 2020 • edited Loading

lnicola Jul 20, 2020

Choose a reason for hiding this comment

lnicola Jul 20, 2020

Choose a reason for hiding this comment

agersant Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

agersant Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

lnicola Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

lnicola Jul 21, 2020

Choose a reason for hiding this comment

agersant Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

lnicola Jul 21, 2020

Choose a reason for hiding this comment

agersant commented Jul 21, 2020

lnicola commented Jul 21, 2020

eisengrau commented Jul 21, 2020

lnicola commented Jul 21, 2020 • edited Loading

eisengrau commented Jul 21, 2020

lnicola commented Jul 21, 2020

agersant commented Jul 21, 2020 • edited Loading

agersant commented Jul 21, 2020

lnicola commented Jul 21, 2020

eisengrau commented Jul 21, 2020

lnicola commented Jul 21, 2020 • edited Loading

eisengrau commented Jul 22, 2020

lnicola commented Jul 22, 2020

lnicola commented Jul 20, 2020 •

edited

Loading

lnicola commented Jul 20, 2020 •

edited

Loading

agersant Jul 21, 2020 •

edited

Loading

agersant Jul 21, 2020 •

edited

Loading

lnicola Jul 21, 2020 •

edited

Loading

agersant Jul 21, 2020 •

edited

Loading

lnicola commented Jul 21, 2020 •

edited

Loading

agersant commented Jul 21, 2020 •

edited

Loading

lnicola commented Jul 21, 2020 •

edited

Loading