CI flakiness #798

jasnell · 2017-07-18T04:24:45Z

CI appears to be having a number of issues today. Jobs have either been hanging or erroring out in weird ways. I've also attempting to start CI jobs just to have it do absolutely nothing after pressing submit. Can someone give the CI a bit of a kick?

gibfahn · 2017-07-18T05:55:58Z

Previously discussed in #761 (comment)

gibfahn · 2017-07-18T05:59:10Z

This seems relevant:

Your Jenkins data directory "/var/lib/jenkins" (AKA JENKINS_HOME) is almost full. You should act on it before it gets completely full.

rvagg · 2017-07-18T06:13:38Z

Yeah, full disk, working on it, we bumped into not long ago and it turned out to be a couple of very large jobs. We're going to need to go full block device on this machine, perhaps I'll try and do that now eh?

gibfahn · 2017-07-18T06:29:42Z

We're going to need to go full block device on this machine, perhaps I'll try and do that now eh?

Sounds reasonable, I don't think there was any objection when it was discussed before.

So how does access to the infra machines work? Everyone in build has access to test, the release WG (and some other people?) have access to release, and some subset of people have access to infra?

It'd be great to document this somewhere, maybe the build README.

refack · 2017-07-18T10:57:58Z

I'm assuming it's probably not the same "disk full issue" but goes under "CI flakiness" title, the node-test-commit-arm-fanned is stuck
https://ci.nodejs.org/computer/node-msft-cross-compiler-1/

rvagg · 2017-07-18T12:15:15Z

So I think I figured it out, the backup job that also does the cleanup wasn't doing a full cleanup. It uses find to do the job and uses the following roots /var/lib/jenkins/*/builds/ and /var/lib/jenkins/*/configurations/axis-nodes/*/builds/ but the axis jobs have names other than 'axis-nodes' including 'axis-MACHINE', 'axis-label', 'axis-v8test' and a bunch of others. Basically any multi-axis job has a name corresponding to the axis selected. What's more, the build directories are pretty deep within some of these due to the way they are configured.

So, I've changed the find to effectively do this: find /var/lib/jenkins/ -depth -type d -regex '/var/lib/jenkins/.*/builds/[0-9]+' -mtime +7 -exec rm -rf '{}' \;. So it now matches any depth of build directory and finds the job subdirectories underneath that are older than 7 days.

Back to 7 days and we're now at 86% disk usage on the machine by including these extra build directories.

@jbergstroem and @joaocgreis you might want to check my work on that .. particularly the axis thing @joaocgreis since that's your wheelhouse.

Regarding block storage, it turns out the reason we haven't done this yet is that this machine is in DigitalOcean's SFO1 which doesn't do block storage, so we'd have to redeploy the machine in a new datacenter to get this functionality unfortunately.

Regarding access, some of these key machines are accessible only by the "infra" group which is myself, @jbergstroem, @mhdawson and @joaocgreis. ci-release, www, backup(s) and a few others are reserved for this group.

Regarding further flakiness since my last message, as per @refack's comment, that's my fault as I inadvertantly upgraded Jenkins when I shouldn't have and got us into the Java 8 requirement territory as @gibfahn outlined in #775 and a large chunk of the machines couldn't properly connect! I've downgraded it again and it seems to be back to normal but we may need a bit of time to flush out current work.

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

PR-URL: nodejs#811 Refs: nodejs#798 (comment) Reviewed-By: Johan Bergström <[email protected]> Reviewed-By: Michael Dawson <[email protected]>

Initial stab at covering who has access to what. PR-URL: nodejs#811 Refs: nodejs#798 (comment) Reviewed-By: Johan Bergström <[email protected]> Reviewed-By: Michael Dawson <[email protected]>

refack mentioned this issue Jul 18, 2017

zlib: check if the stream is destroyed before push nodejs/node#14330

Closed

3 tasks

rvagg closed this as completed Jul 19, 2017

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

5bc19ac

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn mentioned this issue Jul 24, 2017

doc: add a powers.md to document who has access #811

Merged

3 tasks

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

5a8c7c0

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

af972e8

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

6cd51ba

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

5055e96

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

e5ca37c

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

7cc5206

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

45f3712

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

7b5166e

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

6706b90

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

eea8697

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Jul 24, 2017

doc: add a powers.md to document who has access

22e9807

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Aug 10, 2017

doc: add a powers.md to document who has access

df9f404

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Aug 10, 2017

doc: add a powers.md to document who has access

de2b956

Initial stab at covering who has access to what. Refs: nodejs#798 (comment)

gibfahn added a commit to gibfahn/build that referenced this issue Aug 10, 2017

doc: rename jenkins job configuration doc

efb11cc

PR-URL: nodejs#811 Refs: nodejs#798 (comment) Reviewed-By: Johan Bergström <[email protected]> Reviewed-By: Michael Dawson <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI flakiness #798

CI flakiness #798

jasnell commented Jul 18, 2017

gibfahn commented Jul 18, 2017

gibfahn commented Jul 18, 2017

rvagg commented Jul 18, 2017

gibfahn commented Jul 18, 2017

refack commented Jul 18, 2017

rvagg commented Jul 18, 2017

CI flakiness #798

CI flakiness #798

Comments

jasnell commented Jul 18, 2017

gibfahn commented Jul 18, 2017

gibfahn commented Jul 18, 2017

rvagg commented Jul 18, 2017

gibfahn commented Jul 18, 2017

refack commented Jul 18, 2017

rvagg commented Jul 18, 2017