Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI flakiness #798

Closed
jasnell opened this issue Jul 18, 2017 · 6 comments
Closed

CI flakiness #798

jasnell opened this issue Jul 18, 2017 · 6 comments

Comments

@jasnell
Copy link
Member

jasnell commented Jul 18, 2017

CI appears to be having a number of issues today. Jobs have either been hanging or erroring out in weird ways. I've also attempting to start CI jobs just to have it do absolutely nothing after pressing submit. Can someone give the CI a bit of a kick?

@gibfahn
Copy link
Member

gibfahn commented Jul 18, 2017

Previously discussed in #761 (comment)

@gibfahn
Copy link
Member

gibfahn commented Jul 18, 2017

This seems relevant:

Your Jenkins data directory "/var/lib/jenkins" (AKA JENKINS_HOME) is almost full. You should act on it before it gets completely full.

@rvagg
Copy link
Member

rvagg commented Jul 18, 2017

Yeah, full disk, working on it, we bumped into not long ago and it turned out to be a couple of very large jobs. We're going to need to go full block device on this machine, perhaps I'll try and do that now eh?

@gibfahn
Copy link
Member

gibfahn commented Jul 18, 2017

We're going to need to go full block device on this machine, perhaps I'll try and do that now eh?

Sounds reasonable, I don't think there was any objection when it was discussed before.

So how does access to the infra machines work? Everyone in build has access to test, the release WG (and some other people?) have access to release, and some subset of people have access to infra?

It'd be great to document this somewhere, maybe the build README.

@refack
Copy link
Contributor

refack commented Jul 18, 2017

I'm assuming it's probably not the same "disk full issue" but goes under "CI flakiness" title, the node-test-commit-arm-fanned is stuck
https://ci.nodejs.org/computer/node-msft-cross-compiler-1/
image

@rvagg
Copy link
Member

rvagg commented Jul 18, 2017

So I think I figured it out, the backup job that also does the cleanup wasn't doing a full cleanup. It uses find to do the job and uses the following roots /var/lib/jenkins/*/builds/ and /var/lib/jenkins/*/configurations/axis-nodes/*/builds/ but the axis jobs have names other than 'axis-nodes' including 'axis-MACHINE', 'axis-label', 'axis-v8test' and a bunch of others. Basically any multi-axis job has a name corresponding to the axis selected. What's more, the build directories are pretty deep within some of these due to the way they are configured.

So, I've changed the find to effectively do this: find /var/lib/jenkins/ -depth -type d -regex '/var/lib/jenkins/.*/builds/[0-9]+' -mtime +7 -exec rm -rf '{}' \;. So it now matches any depth of build directory and finds the job subdirectories underneath that are older than 7 days.

Back to 7 days and we're now at 86% disk usage on the machine by including these extra build directories.

@jbergstroem and @joaocgreis you might want to check my work on that .. particularly the axis thing @joaocgreis since that's your wheelhouse.

Regarding block storage, it turns out the reason we haven't done this yet is that this machine is in DigitalOcean's SFO1 which doesn't do block storage, so we'd have to redeploy the machine in a new datacenter to get this functionality unfortunately.

Regarding access, some of these key machines are accessible only by the "infra" group which is myself, @jbergstroem, @mhdawson and @joaocgreis. ci-release, www, backup(s) and a few others are reserved for this group.

Regarding further flakiness since my last message, as per @refack's comment, that's my fault as I inadvertantly upgraded Jenkins when I shouldn't have and got us into the Java 8 requirement territory as @gibfahn outlined in #775 and a large chunk of the machines couldn't properly connect! I've downgraded it again and it seems to be back to normal but we may need a bit of time to flush out current work.

@rvagg rvagg closed this as completed Jul 19, 2017
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Aug 10, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Aug 10, 2017
Initial stab at covering who has access to what.

Refs: nodejs#798 (comment)
gibfahn added a commit to gibfahn/build that referenced this issue Aug 10, 2017
PR-URL: nodejs#811
Refs: nodejs#798 (comment)
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
gibfahn added a commit to gibfahn/build that referenced this issue Aug 10, 2017
Initial stab at covering who has access to what.

PR-URL: nodejs#811
Refs: nodejs#798 (comment)
Reviewed-By: Johan Bergström <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants