Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspected live-lock issue in lxd.db under contention #2398

Closed
leehambley opened this issue Sep 19, 2016 · 52 comments
Closed

Suspected live-lock issue in lxd.db under contention #2398

leehambley opened this issue Sep 19, 2016 · 52 comments

Comments

@leehambley
Copy link

leehambley commented Sep 19, 2016

Required information

  • Distribution:
root@virthost ~ # uname -a
Linux virthost 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@virthost ~ # cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
  • The output of "lxc info":
apicompat: 0
auth: trusted
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    ....snip....
    -----END CERTIFICATE-----
  certificatefingerprint: ....snip....
  driver: lxc
  driverversion: 2.0.3
  kernel: Linux
  kernelarchitecture: x86_64
  kernelversion: 4.4.0-34-generic
  server: lxd
  serverpid: 3207
  serverversion: 2.0.3
  storage: btrfs
  storageversion: "4.4"
config: {}
public: false

Issue description

We run an environment where the lxc command line tool is used to run short-lived workloads in containers. The lxc commands are dispatched into the background with a fork/exec/detach process (very typical) from our worker processes.

The behaviour we observe is a progressive slowing-down of the system in spite of negligible host load. With three parallel processes we have a throughput of ~3.8 containers per minute, with more that drops significantly.

Initial research suggests that this is related to a live-lock contention of lxd.db as processes have a fixed 100x100ms retry behaviour which causes all the processes to line up and bang on the lock simultaneously.

First indications are that introducing a little entropy into the loop sleep times increases throughput dramatically, such as adding rand(100,i)ms sleep to each loop interation

Steps to reproduce

  1. Run a tight loop, say 100 entries for i in \seq 1 100; do lxc exec ubuntu-32 sleep 500 &; done
  2. Observe that with more concurrency the loop gets slower and slower, spending more and more time with lxc processes hanging. Output of lsof /var/lib/lxd/lxd.db shows ~3+n (where n is roughly the number of times we've dispatched an lxc exec command)
  3. Lament the naïve locking in https://github.com/lxc/lxd/blob/master/lxd/db.go around each DB operation.

I'm not quite sure how best to time the issue, I thought about the following:

  • Emit UDP packets (statsd format?) on each transition change, e.g in https://github.com/lxc/lxd/blob/master/lxd/db.go#L287 (and all similar places)
  • Plot that somehow, rrdtool springs to mind, but doesn't have precision better than 1s, so is unsuitable.
  • Probably observe all processes lining up to fail to acquire the lock, whilst sitting potentially for as long as 10s per call, failing to get a database lock.

Before attempting a PR I'd like to know what, if any discussion has been had around the internal database, it's design choices, the implementation (~3+n × the same file open) and the naïve choice of sleep/wait with the locking of the database. Before potentially pushing an unwelcome change to the internals.

@leehambley
Copy link
Author

I was just looking up in the documentation for SQLite3, mostly for my own benefit, and wanted to quote the requisite paragraph here:

(5) Can multiple applications or multiple instances of the same application access a single database file at the same time?

Multiple processes can have the same database open at the same time. Multiple processes can be doing a SELECT at the same time. But only one process can be making changes to the database at any moment in time, however.

SQLite uses reader/writer locks to control access to the database. (Under Win95/98/ME which lacks support for reader/writer locks, a probabilistic simulation is used instead.) But use caution: this locking mechanism might not work correctly if the database file is kept on an NFS filesystem. This is because fcntl() file locking is broken on many NFS implementations. You should avoid putting SQLite database files on NFS if multiple processes might try to access the file at the same time. On Windows, Microsoft's documentation says that locking may not work under FAT filesystems if you are not running the Share.exe daemon. People who have a lot of experience with Windows tell me that file locking of network files is very buggy and is not dependable. If what they say is true, sharing an SQLite database between two or more Windows machines might cause unexpected problems.

We are aware of no other embedded SQL database engine that supports as much concurrency as SQLite. SQLite allows multiple processes to have the database file open at once, and for multiple processes to read the database at once. When any process wants to write, it must lock the entire database file for the duration of its update. But that normally only takes a few milliseconds. Other processes just wait on the writer to finish then continue about their business. Other embedded SQL database engines typically only allow a single process to connect to the database at once.

However, client/server database engines (such as PostgreSQL, MySQL, or Oracle) usually support a higher level of concurrency and allow multiple processes to be writing to the same database at the same time. This is possible in a client/server database because there is always a single well-controlled server process available to coordinate access. If your application has a need for a lot of concurrency, then you should consider using a client/server database. But experience suggests that most applications need much less concurrency than their designers imagine.

When SQLite tries to access a file that is locked by another process, the default behavior is to return SQLITE_BUSY. You can adjust this behavior from C code using the sqlite3_busy_handler() or sqlite3_busy_timeout() API functions.

@leehambley
Copy link
Author

leehambley commented Sep 19, 2016

First-pass benchmarking script:

# Required on my Vagrant box, for some reason
echo 'nameserver 8.8.8.8' | sudo tee -a /etc/resolv.conf

# Required because I'm a second class citizen (British…)
sudo locale-gen en_GB.UTF-8 

# Stop nagging from apt-get
export DEBIAN_FRONTEND=noninteractive 

sudo apt update --fix-missing
sudo apt install -y lxd zfsutils-linux parallel

# accept all defaults, unfortunately doesn't respect DEBIAN_FRONTEND variable
sudo lxd init

# Don't ask.......
echo "will cite" | parallel --bibtex

# Get Started
lxc remote add images images.linuxcontainers.org

for i in (1 5 10 100 1000); do
    printf "## %d\n"
    time seq 100 | parallel -j$i --eta lxc launch images:ubuntu/trusty/i386 ubuntu-32-{}
done

As mentioned in the initial issue report, since I suspect the issue is parallelism and struggling with locks on the lxd.db the -j (also -P in parallel) seems to be the crux of the issue.

Unfortunately I can't test this script for real, as it simply doesn't work on my vagrant box, I suspect the problem lay with vagrant however, as the machine keeps just quitting on me.

@stgraber
Copy link
Contributor

You could use lxd-benchmark, included in the main git repository to test parallel startup time.

Last time I did some testing and fixes around that, I could easily get to 8-10 containers a second on a dual quadcore server so we either severely regressed from then or there's something else which differs between our test environments.

@leehambley
Copy link
Author

Hi @stgraber thanks for checking in, I'll direct my team to test using lxd-benchmark.

@leehambley
Copy link
Author

Whilst waiting for my team so I can corroborate the issue, can you speak at all about the choice of extremely naïve locking of the sqlite3 DB versus a design with one goroutine synchronizing the database access?

@stgraber
Copy link
Contributor

No particular reason other than simplicity, in our tests the database has never been the bottleneck so we never felt like there was a reason to focus on changing such a critical and potentially fragile piece of the code.

@dhamidi
Copy link

dhamidi commented Sep 20, 2016

Hi @stgraber, I'm working on the team Lee mentioned :) We did a benchmark on our server using lxd-benchmark and reached less than one container per second, using the ubuntu:16.04 image and btrfs as a the storage backend. Here are the results of the benchmark (running on a Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz, with 32 GiB of RAM)

root@virthost /usr/local/src/go/src/github.com/lxc/lxd # lxd-benchmark spawn --count=50 --image=ubuntu-16.04 --parallel=10
Test environment:
  Server backend: lxd
  Server version: 2.0.3
  Kernel: Linux
  Kernel architecture: x86_64
  Kernel version: 4.4.0-34-generic
  Storage backend: btrfs
  Storage version: 4.4
  Container backend: lxc
  Container version: 2.0.3

Test variables:
  Container count: 50
  Container mode: unprivileged
  Startup mode: normal startup
  Image: ubuntu-16.04
  Batches: 5
  Batch size: 10
  Remainder: 0

[Sep 19 20:10:51.830] Found image in local store: ubuntu-16.04
[Sep 19 20:10:51.830] Starting the test
[Sep 19 20:12:34.363] Failed to spawn container 'benchmark-07': Error calling 'lxd forkstart benchmark-07 /var/lib/lxd/containers /var/log/lxd/benchmark-07/lxc.conf': err='exit status 1'
[Sep 19 20:13:05.474] Started 10 containers in 133.643s (0.075/s)
[Sep 19 20:14:42.796] Failed to spawn container 'benchmark-20': Error calling 'lxd forkstart benchmark-20 /var/lib/lxd/containers /var/log/lxd/benchmark-20/lxc.conf': err='exit status 1'
[Sep 19 20:14:48.861] Started 20 containers in 237.031s (0.084/s)
[Sep 19 20:15:39.395] Failed to spawn container 'benchmark-28': Error calling 'lxd forkstart benchmark-28 /var/lib/lxd/containers /var/log/lxd/benchmark-28/lxc.conf': err='exit status 1'
[Sep 19 20:16:23.009] Started 40 containers in 331.179s (0.121/s)
[Sep 19 20:17:01.868] Test completed in 370.037s

Is this a configuration mistake on our end or might this be related to the locking issue?

@stgraber
Copy link
Contributor

root@nuturo:~# /home/ubuntu/lxd-benchmark spawn --count=50 --image=images:alpine/edge/amd64 --parallel=10 --privileged
Test environment:
  Server backend: lxd
  Server version: 2.0.4
  Kernel: Linux
  Kernel architecture: x86_64
  Kernel version: 4.4.0-38-generic
  Storage backend: btrfs
  Storage version: 4.4
  Container backend: lxc
  Container version: 2.0.4

Test variables:
  Container count: 50
  Container mode: privileged
  Startup mode: normal startup
  Image: images:alpine/edge/amd64
  Batches: 5
  Batch size: 10
  Remainder: 0

[Sep 20 16:57:57.561] Found image in local store: 419583948579ed190a1314e7fbe20d95122ca8a06c335de69c6c565d324ca16e
[Sep 20 16:57:57.561] Starting the test
[Sep 20 16:57:59.908] Started 10 containers in 2.347s (4.260/s)
[Sep 20 16:58:02.700] Started 20 containers in 5.139s (3.892/s)
[Sep 20 16:58:08.643] Started 40 containers in 11.082s (3.609/s)
[Sep 20 16:58:11.269] Test completed in 13.708s

That's with LXD 2.0.4 from Ubuntu 16.04 on a rather old server (dual Xeon E5430 with 12GB of RAM).

Note that the choice of alpine and --privileged helps getting cleaner benchmark results as no filesystem remapping is done and the containers themselves don't do very much at all when they start compared to a full Ubuntu system.

@leehambley
Copy link
Author

leehambley commented Sep 20, 2016

Thanks for the reply @stgraber. I'm not sure what to make of it, you're telling us that the performance characteristics are different when LXD is used in a totally different way that doesn't suit our workload?

I'd like to get the issue back to the current reported issue, given our workload, LXD is at least a couple or order of magnitude slower than other solutions (which we can't use...), when the host system is unloaded, hence our suspicion at a (live-)locking problem.

I will endeavour to benchmark this some more and come back to you, and we can decide how to proceed with my report.

Thanks for your time so far!

@stgraber
Copy link
Contributor

What I'm saying is that if we suspect a database issue, it's best to remove other potential source of slowdowns from the benchmark which is what I did above.

@leehambley
Copy link
Author

I'll run both benchmarks with some UDP (statsd) logging enabled, I'll get the data plotted and we can compare results.

@stgraber
Copy link
Contributor

For comparison, that's the result of the same benchmark you ran:

root@nuturo:~# /home/ubuntu/lxd-benchmark spawn --count=50 --image=ubuntu-16.04 --parallel=10
Test environment:
  Server backend: lxd
  Server version: 2.0.4
  Kernel: Linux
  Kernel architecture: x86_64
  Kernel version: 4.4.0-38-generic
  Storage backend: btrfs
  Storage version: 4.4
  Container backend: lxc
  Container version: 2.0.4

Test variables:
  Container count: 50
  Container mode: unprivileged
  Startup mode: normal startup
  Image: ubuntu-16.04
  Batches: 5
  Batch size: 10
  Remainder: 0

[Sep 20 17:39:47.693] Found image in local store: ubuntu-16.04
[Sep 20 17:39:47.693] Starting the test
[Sep 20 17:39:58.064] Started 10 containers in 10.371s (0.964/s)
[Sep 20 17:40:22.903] Started 20 containers in 35.210s (0.568/s)
[Sep 20 17:41:27.847] Started 40 containers in 100.154s (0.399/s)
[Sep 20 17:41:53.969] Test completed in 126.276s

So we see a speed decrease as the machine gets loaded (again, running this on CPUs that are approaching 10 years old) but it's still nowhere near as bad as what you're getting.

@stgraber
Copy link
Contributor

Ok, so I found one thing that's very weird. Performance is massively better with the lxd that's shipped by Ubuntu vs a hand built one.

ubuntu version

root@djanet:~# /home/ubuntu/lxd-benchmark spawn --count=50 --image=images:alpine/edge/amd64 --parallel=10 --privileged
Test environment:
  Server backend: lxd
  Server version: 2.0.4
  Kernel: Linux
  Kernel architecture: x86_64
  Kernel version: 4.4.0-36-generic
  Storage backend: zfs
  Storage version: 5
  Container backend: lxc
  Container version: 2.0.4

Test variables:
  Container count: 50
  Container mode: privileged
  Startup mode: normal startup
  Image: images:alpine/edge/amd64
  Batches: 5
  Batch size: 10
  Remainder: 0

[Sep 20 17:52:44.244] Importing image into local store: 419583948579ed190a1314e7fbe20d95122ca8a06c335de69c6c565d324ca16e
[Sep 20 17:52:45.568] Starting the test
[Sep 20 17:52:47.213] Started 10 containers in 1.645s (6.080/s)
[Sep 20 17:52:48.940] Started 20 containers in 3.372s (5.931/s)
[Sep 20 17:52:52.575] Started 40 containers in 7.007s (5.709/s)
[Sep 20 17:52:54.593] Test completed in 9.025s

Same thing but hand built

root@djanet:~# /home/ubuntu/lxd-benchmark spawn --count=50 --image=images:alpine/edge/amd64 --parallel=10 --privileged
Test environment:
  Server backend: lxd
  Server version: 2.0.4
  Kernel: Linux
  Kernel architecture: x86_64
  Kernel version: 4.4.0-36-generic
  Storage backend: zfs
  Storage version: 5
  Container backend: lxc
  Container version: 2.0.4

Test variables:
  Container count: 50
  Container mode: privileged
  Startup mode: normal startup
  Image: images:alpine/edge/amd64
  Batches: 5
  Batch size: 10
  Remainder: 0

[Sep 20 17:53:51.725] Found image in local store: 419583948579ed190a1314e7fbe20d95122ca8a06c335de69c6c565d324ca16e
[Sep 20 17:53:51.725] Starting the test
[Sep 20 17:54:03.224] Started 10 containers in 11.499s (0.870/s)
[Sep 20 17:54:19.729] Started 20 containers in 28.003s (0.714/s)
[Sep 20 17:54:47.893] Started 40 containers in 56.167s (0.712/s)
[Sep 20 17:54:58.629] Test completed in 66.904s

@stgraber
Copy link
Contributor

The main difference I can think of is that the packaged lxd is built using source dependencies from the Ubuntu archive rather than the latest version from upstream.

@stgraber
Copy link
Contributor

Confirmed that re-building with the same set of source dependencies as Ubuntu gives me the same speed too... now to figure out which of those dependencies is causing that big speed change.

Based on your feedback, I'm going to start by testing sqlite since that'd explain what you're seeing :)

@stgraber
Copy link
Contributor

And sure enough, that's the one causing the speed difference.

@stgraber
Copy link
Contributor

So yeah, looks like something happened to go-sqlite3 that makes it massively less performant in the way we use it now... Ubuntu and Debian ship a very old version of mattn/go-sqlite which isn't affected.

@leehambley
Copy link
Author

Thanks for looking into that for us @stgraber - I tried to find out what refs of mattn/go-sqlite are in use upstream and compare the changelog and start looking for slow-downs.

Since we're talking about cgo, I wonder if the general downturn in compiler speed after 1.5 is responsible?

@leehambley
Copy link
Author

Hi @stgraber, I finally had time to look into the way Ubuntu packages things, it's not a domain I'm familiar with but here's what I found out, I'd be glad if you could validate my assumptions for me:

So, that gives us a solid timeline, confirmation then:

  • With golang-github-mattn-go-sqlite3-dev v1.0.0 performance is good
  • With golang-github-mattn-go-sqlite3-dev v1.1.0 performance is not good

So, what changed in mattn-go-sqlite3 in that time?

Immediately, nothing jumps out in the compare view for me as being horrible, but if you can confirm that I'm at least barking up the right tree, I don't mind bisecting the builds of

I haven't factored in the different Go versions yet, but I think the Xenial package golang still pulls 1.6. With Go 1.6 having been released on 2016-02-17.

I think we can rule out different Go versions, unless the builds of LXD for Xenial pre-date Feb '16, so it shouldn't be too hard to bisect the 11, commits between the tagged versions, or if there's a chance they've already fixed it on their master branch, we can also bisect those ~90 commits.(many of which are merges, so there's nowhere near 90 real changes to examine.)

@stgraber
Copy link
Contributor

Let me try to clarify a bit more where the problem is.

If I build LXD master with Golang 1.7 using Ubuntu's golang-github-mattn-go-sqlite3-dev at version v1.1.0, I'm getting the good performance I mentioned above, that's the test build I did yesterday.

Xenial and Yakkety both ship 1.1.0:

stgraber@castiana:~$ rmadison golang-github-mattn-go-sqlite3-dev
 golang-github-mattn-go-sqlite3-dev | 1.1.0~dfsg1-1        | xenial  | all
 golang-github-mattn-go-sqlite3-dev | 1.1.0~dfsg1-1ubuntu2 | yakkety | amd64, arm64, armhf, i386, powerpc, ppc64el, s390x

Now replace that packaged version of golang-github-mattn-go-sqlite3-dev with what they've got in git master and you get the terrible performance result. Because of the way we build things, this means that:

  • Package from Ubuntu 16.04 is fast (LXD 2.0.4)
  • Package from Ubuntu 16.10 is fast (LXD 2.2)
  • PPA packages are all slow (any of the PPA at ppa:ubuntu-lxc/*)
  • Backport packages are all slow (lxd from trusty-backports)
  • Snap packages are all slow

I suspect it'd be useful to confirm that using upstream's v1.1.0 branch fixes things. If it does, we can simply replace our import with "https://gopkg.in/mattn/go-sqlite3.v1" which would pull that particular branch instead of master.

@leehambley
Copy link
Author

leehambley commented Sep 21, 2016

Thanks for the clarification, that helps a lot, I hope my "archeology" was at least borderline useful!

I'll check with my team from where we're installing LXD, my understanding is we're literally running # apt-get install lxd zfstools [sic] and taking whatever is current, and not doing a manual build, or using a PPA etc.

@stgraber
Copy link
Contributor

Hmm, so building using go-sqlite3.v1 doesn't help...

@leehambley
Copy link
Author

leehambley commented Sep 21, 2016

I'm not on my work machine right now unfortunately, so I don't have a build environment to hand, but could you humour me and try introducing rand.Intn(50) to the sleep time time.Sleep(100 * time.Millisecond) lines in db.go and see if the benchmark results improve?

@stgraber
Copy link
Contributor

Sure, I'll do that now. I also tracked down the difference of performance in go-sqlite3.v1 vs Ubuntu package to be because Ubuntu links against the system libsqlite3 instead of using a bundled copy.

@stgraber
Copy link
Contributor

LXD built with "go install -v -x github.com/lxc/lxd/lxd" => 0.63 containers/s
LXD built with "go install -v -x -tags libsqlite3 github.com/lxc/lxd/lxd" and the embedded copy of sqlite3 removed => 6.123 containers/s

@leehambley
Copy link
Author

That's quite the difference! Excellent detective work, and really, heartfelt kudos for being so engaged with us here, I sincerely appreciate it.

Can I somehow help you in identifying which version of libsqlite3 is bundled in the horribly performing version?

@stgraber
Copy link
Contributor

mattn/go-sqlite3#330 is about the -tags libsqlite3 trick not actually working so well...

@stgraber
Copy link
Contributor

he's apparently using 3.14.0, Ubuntu is on 3.14.1, but I'd be surprised that the problem would be so trivial... I instead suspect it's got to do with the way cgo works when the code is embedded vs used through a shared library.

Anyway, I'll refresh his embedded copy to 3.14.1 and see if that fixes things somehow.

@stgraber
Copy link
Contributor

Hmm, actually, Ubuntu has 3.14.1 now but Xenial had 3.11, so it's almost certainly not the problem since Ubuntu's 3.11 was fast too.

@stgraber
Copy link
Contributor

Confirmed that updating with a copy of 3.14.1 doesn't fix the performance problem.

@stgraber
Copy link
Contributor

As for the rand.Intn idea, I applied http://paste.ubuntu.com/23212369/ and I'm getting 0.79 containers/s so nowhere near the kind of performance boost you get from using the system libsqlite3.so

@leehambley
Copy link
Author

As for the rand.Intn idea, I applied http://paste.ubuntu.com/23212369/ and I'm getting 0.79 containers/s so nowhere near the kind of performance boost you get from using the system libsqlite3.so

Thanks @stgraber I really appreciate you shooting in the dark with my bonkers idea there :)

I didn't even know that Go could bundle shared libraries to use as part of the binary.... I'm trying to read into cgo a bit more to improve my understanding in this field. Thanks for opening the issue at mattn/go-sqlite3#330, I've subscribed

@leehambley
Copy link
Author

leehambley commented Sep 21, 2016

I didn't even know that Go could bundle shared libraries to use as part of the binary.... I'm trying to read into cgo a bit more to improve my understanding in this field.

Is it as simple as static vs. dynamic linking ?

@stgraber
Copy link
Contributor

It could be, yes, I'm still not sure why it'd matter though... All it should be doing is add a few kB to the binary for that extra section rather than load it from somewhere else at startup...

@leehambley
Copy link
Author

Talking to someone on my team today, they theorised that maybe cgo c-compiler is much, much, much worse than gcc for speed an optimization, so the resulting library might be simply horribly poorly optimized.

Unfortunately I know virtually nothing about cgo and how it deals with linking, or bundling dynamic libraries to make portable dystatic binaries.

@leehambley
Copy link
Author

leehambley commented Sep 22, 2016

So I did a bit more digging, wanted to leave my observations here, for you and/or any of my team who're checking-in in the morning before I do:

  • SQLite3 has something called amalgamation, they claim it can be faster…

This is what their docs say (emphasis mine):

In addition to making SQLite easier to incorporate into other projects, the amalgamation also makes it run faster. Many compilers are able to do additional optimizations on code when it is contained with in a single translation unit such as it is in the amalgamation. We have measured performance improvements of between 5 and 10% when we use the amalgamation to compile SQLite rather than individual source files. The downside of this is that the additional optimizations often take the form of function inlining which tends to make the size of the resulting binary image larger.

So, I also noted that running the following on 16.04 (OS shouldn't matter):

go clean -v -i --tags libsqlite3 -x github.com/lxc/lxd/lxd
go get -v -x -tags libsqlite3 github.com/lxc/lxd/lxd
go install -v -x -tags libsqlite3 github.com/lxc/lxd/lxd

Gives me lxd 2.2 whereas the system package gives me 2.0.4 which is what we've been talking about up until now. I guess I need to checkout the code locally and select the 2.0.4 tag instead of just using go get ...

I can confirm your results about the -tags libsqlite3 apparently having no effect, when compiling with or without -tags libsqlite3 the resulting output of ldd is always:

root@virthost ~/go # ldd $GOPATH/bin/lxd
        linux-vdso.so.1 =>  (0x00007fff013b2000)
        liblxc.so.1 => /usr/lib/x86_64-linux-gnu/liblxc.so.1 (0x00007f3ccc70f000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f3ccc50c000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3ccc2ee000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3ccc0ea000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3ccbd21000)
        libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007f3ccbb1a000)
        libapparmor.so.1 => /lib/x86_64-linux-gnu/libapparmor.so.1 (0x00007f3ccb90a000)
        libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f3ccb6e8000)
        libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007f3ccb4a9000)
        /lib64/ld-linux-x86-64.so.2 (0x000055ba0f384000)
        libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f3ccb239000)

The flag USE_LIBSQLITE3 is widely used, for example here.

Which is strange because this looks like it's setting the appropriate CFLAG, and the output of go build ..... (seems to) confirms it .... ...LE_FTS4_UNICODE61 -DSQLITE_TRACE_SIZE_LIMIT=15 -DUSE_LIBSQLITE3 -I. backup.go callback.go doc.g...:

root@virthost ~/go # go install -v -tags libsqlite3 -x github.com/lxc/lxd/lxd | grep CFLAGS
WORK=/tmp/go-build434220573
github.com/mattn/go-sqlite3
mkdir -p $WORK/github.com/mattn/go-sqlite3/_obj/
mkdir -p $WORK/github.com/mattn/
cd /root/go/src/github.com/mattn/go-sqlite3
CGO_LDFLAGS="-g" "-O2" "-lsqlite3" "-ldl" /usr/lib/go-1.6/pkg/tool/linux_amd64/cgo -objdir $WORK/github.com/mattn/go-sqlite3/_obj/ -importpath github.com/mattn/go-sqlite3 -- -I $WORK/github.com/mattn/go-sqlite3/_obj/ -std=gnu99 -DSQLITE_ENABLE_RTREE -DSQLITE_THREADSAFE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_ENABLE_FTS4_UNICODE61 -DSQLITE_TRACE_SIZE_LIMIT=15 -DUSE_LIBSQLITE3 -I. backup.go callback.go doc.go error.go sqlite3.go sqlite3_libsqlite3.go sqlite3_load_extension.go sqlite3_other.go
gcc -I . -fPIC -m64 -pthread -fmessage-length=0 -I $WORK/github.com/mattn/go-sqlite3/_obj/ -g -O2 -std=gnu99 -DSQLITE_ENABLE_RTREE -DSQLITE_THREADSAFE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_ENABLE_FTS4_UNICODE61 -DSQLITE_TRACE_SIZE_LIMIT=15 -DUSE_LIBSQLITE3 -I. -o $WORK/github.com/mattn/go-sqlite3/_obj/_cgo_main.o -c $WORK/github.com/mattn/go-sqlite3/_obj/_cgo_main.c
gcc -I . -fPIC -m64 -pthread -fmessage-length=0 -I $WORK/github.com/mattn/go-sqlite3/_obj/ -g -O2 -std=gnu99 -DSQLITE_ENABLE_RTREE -DSQLITE_THREADSAFE -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_ENABLE_FTS4_UNICODE61 -DSQLITE_TRACE_SIZE_LIMIT=15 -DUSE_LIBSQLITE3 -I. -o $WORK/github.com/mattn/go-sqlite3/_obj/_cgo_export.o -c $WORK/github.com/mattn/go-sqlite3/_obj/_cgo_export.c
.... snip ....

Curiously sqlite3-binding.c which those guarded header includes refer to is the amalgamation to which I linked, I'd have expected sqlite3-binding.h to be some bindings to a shared library. Probably this is just my unfamiliarity with the code.

You mentioned something about removing the bundled version, but I wasn't able to find out what you were referring to?

I now need to find out why cgo is ignoring CFLAGS.

@stgraber
Copy link
Contributor

go to $GOPATH/src/github.com/mattn/go-sqlite3 and remove sqlite3-binding.c, sqlite3-binding.h, then run the go install again and LXD will be built using the shared library.

@leehambley
Copy link
Author

go to $GOPATH/src/github.com/mattn/go-sqlite3 and remove sqlite3-binding.c, sqlite3-binding.h, then run the go install again and LXD will be built using the shared library.

Thanks, I thought you'd modified the binary somehow, I was wondering exactly which kind of wizard you were.

Then seems like it's an interplay between linker lfags and cflags not doing the right thing?

@stgraber
Copy link
Contributor

Yeah, my current assumption is that cgo builds all .c files, then the linker just picked up the local symbols over the system ones, which makes sense.

I suspect that just moving the .c and .h files into a sub-directory may fix this issue but I haven't had time to test yet.

@leehambley
Copy link
Author

Interestingly, final point, our horrible performance is on 16.04 with 2.0.4 with a version using the shared system library at /usr/lib/x86_64-linux-gnu/libsqlite3.so from libsqlite3[-dev]:amd64 at 3.11.0.... so whilst this might be an interesting look into some weirdo build problems, I don't think this issue can be the root cause of our 10-4 performance issues?

@leehambley
Copy link
Author

Yeah, my current assumption is that cgo builds all .c files, then the linker just picked up the local symbols over the system ones, which makes sense.

Excellent intuition, I'd not have guessed at that, but I can imagine that!

@stgraber
Copy link
Contributor

And you can confirm that "ldd /usr/bin/lxd" on your test systems shows that it's linked with the system's libsqlite3?

If so, then this is very weird as I certainly cannot reproduce the issue on any of my systems unless I use a build of lxd that's not linked against it.

@leehambley
Copy link
Author

leehambley commented Sep 22, 2016

Confirmed, barring user error. Process tree shows /usr/bin/lxd, ldd on the
file links to libsqlite3.so 3.11.0. both come from default package repos

@stgraber
Copy link
Contributor

Hmm, that's a bit puzzling then, I wonder what else would be causing the exact same kind of performance degradation for you if it's not that issue... Also weird that I can't reproduce it while using the same LXD version, kernel and filesystem driver.

@leehambley
Copy link
Author

leehambley commented Sep 22, 2016

Tomorrow we'll write a short script to start lxd with the -cpuprofile flag
enabled and lxd-benchmark it, then stop the daemon (theory being that a
long lived daemon would distort benchmarks).

I was glad to see Go's profiling tools baked in already when I forked lxd
to patch them in.

I hope this shows us where we're spending the most time.

@stgraber
Copy link
Contributor

Yeah, I always restart the lxd daemon before doing a benchmark run, I've not noticed a significant slowdown overtime, but it's always best to have a clean, identical setup when doing benchmarks :)

@leehambley
Copy link
Author

leehambley commented Sep 23, 2016

CPU profiling leads to very interesting results, our binary and CPU profile and test script are attached to this comment.

I will now close this issue and open a fresh one... bottom line seems to be us losing most of our time in int shiftowner.

That leads me to suspect kernel version, or configuration as it deals solely with the procfs... but I'd appreciate a second opinion on our methodology.

Our test Script:

#!/bin/bash
set -euo pipefail
IFS=$'\n\t'

(we modified the systemd unit to pass `--cpuprofile /tmp/lxd.cpuprofile')

systemctl stop lxd;
rm /tmp/lxd.cpuprofile

systemctl start lxd;

/usr/bin/time /usr/local/src/go/bin/lxd-benchmark spawn --count=50 --image=images:ubuntu-16.04 --parallel=10 | tee "lxd-$(date --rfc-3339=seconds).log"

systemctl stop lxd;

cp /tmp/lxd.cpuprofile "lxd-$(date --rfc-3339=seconds).cpuprofile"

/usr/bin/time /usr/local/src/go/bin/lxd-benchmark delete --count=50 --image=images:ubuntu-16.04 --parallel=10

Output of LXD bench for this run:

Test environment:
  Server backend: lxd
  Server version: 2.0.3
  Kernel: Linux
  Kernel architecture: x86_64
  Kernel version: 4.4.0-38-generic
  Storage backend: btrfs
  Storage version: 4.4
  Container backend: lxc
  Container version: 2.0.4

Test variables:
  Container count: 50
  Container mode: unprivileged
  Startup mode: normal startup
  Image: ubuntu-16.04
  Batches: 5
  Batch size: 10
  Remainder: 0

[Sep 23 13:42:16.216] Found image in local store: ubuntu-16.04
[Sep 23 13:42:16.216] Starting the test
[Sep 23 13:42:55.619] Failed to spawn container 'benchmark-09': Error calling 'lxd forkstart benchmark-09 /var/lib/lxd/containers /var/log/lxd/benchmark-09/lxc.conf': err='exit status 1'
[Sep 23 13:43:05.266] Started 10 containers in 49.050s (0.204/s)
[Sep 23 13:44:07.018] Failed to spawn container 'benchmark-11': Error calling 'lxd forkstart benchmark-11 /var/lib/lxd/containers /var/log/lxd/benchmark-11/lxc.conf': err='exit status 1'
[Sep 23 13:44:14.921] Started 20 containers in 118.705s (0.168/s)
[Sep 23 13:45:40.346] Started 40 containers in 204.131s (0.196/s)
[Sep 23 13:46:19.997] Test completed in 243.782s

Result of calling pprof) web on the resulting CPU profile:

shape

Our exact binary version, and the CPU profile:

lxd-binary-and-cpu-profile.zip

@tych0
Copy link
Contributor

tych0 commented Sep 23, 2016

Is that cpu time or wall time? shift owner touches at every file in the container, so I'd expect it to eat a lot of wall time, but not necessarily a lot of CPU time.

@leehambley
Copy link
Author

Honestly, I don't know, I assume it's CPU time, since this is a CPU profile - just thinking out loud, but thanks for the hint to go look more closely at that.

@leehambley
Copy link
Author

We have another datapoint which is upgrading from 4.4.0-34-generic to 4.4.0-38-generic seems to give us about 50% more speed, we got upgraded today automagically by system auto-update.

# 4.4.0-34-generic
[Sep 19 20:13:05.474] Started 10 containers in 133.643s (0.075/s)
[Sep 19 20:14:42.796] Failed to spawn container 'benchmark-20': Error calling 'lxd forkstart benchmark-20 /var/lib/lxd/containers /var/log/lxd/benchmark-20/lxc.conf': err='exit status 1'
[Sep 19 20:14:48.861] Started 20 containers in 237.031s (0.084/s)
[Sep 19 20:15:39.395] Failed to spawn container 'benchmark-28': Error calling 'lxd forkstart benchmark-28 /var/lib/lxd/containers /var/log/lxd/benchmark-28/lxc.conf': err='exit status 1'
[Sep 19 20:16:23.009] Started 40 containers in 331.179s (0.121/s)
# 4.4.0-38-generic
[Sep 23 13:42:55.619] Failed to spawn container 'benchmark-09': Error calling 'lxd forkstart benchmark-09 /var/lib/lxd/containers /var/log/lxd/benchmark-09/lxc.conf': err='exit status 1'
[Sep 23 13:43:05.266] Started 10 containers in 49.050s (0.204/s)
[Sep 23 13:44:07.018] Failed to spawn container 'benchmark-11': Error calling 'lxd forkstart benchmark-11 /var/lib/lxd/containers /var/log/lxd/benchmark-11/lxc.conf': err='exit status 1'
[Sep 23 13:44:14.921] Started 20 containers in 118.705s (0.168/s)
[Sep 23 13:45:40.346] Started 40 containers in 204.131s (0.196/s)

@tych0
Copy link
Contributor

tych0 commented Sep 23, 2016

Ok. It looks like that function is switching to C, so it's possible that the context switch from go to cgo/c is expensive, although I wouldn't necessarily expect it. The C function doesn't look like it really has to be written in C, though, so it might be worth a science experiment to see if doing it in go is any faster. Anyway, thanks for looking into it!

@leehambley
Copy link
Author

leehambley commented Sep 23, 2016

I think it must be CPU time, the graph shows:

screenshot 2016-09-23 17 18 19

however /usr/bin/time shows that this test run took a few seconds longer than that which was reported by lxd-benchmark:

[Sep 23 13:46:19.997] Test completed in 243.782s

That's a pretty huge time difference, so either we lose a massive amount of time in the lxc client, and there's nothing wrong with the server :-\

Unfortunately I don't think the cpu profiling will help us here, as, as verified by our htop and general load testing, the machine seems to basically be idle, so we're not CPU bound

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants