Skip to content

Commit 0138556

Browse files
bestpractices: add crash pre-requisites
This covers the preparation of crash diagnostics Refs: nodejs#254
1 parent baa43e9 commit 0138556

File tree

1 file changed

+130
-0
lines changed

1 file changed

+130
-0
lines changed

documentation/crash/crash_setup.md

+130
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
2+
## Node.js application crash diagnostics: Best Practices series #1
3+
4+
This is the first of a series of best practices and useful tips if you
5+
are using Node.js in large scale production systems.
6+
7+
## Introduction
8+
9+
Typical prodcution systems do not enjoy the benefits of development
10+
and staging system in man aspects:
11+
12+
- those are isolated from public internet
13+
- those are not loaded with development and debug tools
14+
- those are configured with the most robust and secure
15+
configurations possible at the OS level
16+
- in certain deployment scenarios (such as Cloud) those
17+
operate in a head-less mode [ no ssh ]
18+
- in certain deployment scenarios (such as Cloud) those
19+
operate in a state-less mode [ no persistent disk]
20+
21+
The net effect of these constraints are that your production system
22+
need to be manually `prepared` in advance to enable crash dianostic
23+
data generation on the first failure itself, without loosing vital data.
24+
The rest of the document illustrates this preparation steps.
25+
26+
## Available disk space
27+
Ensure that there is enough disk space available for the core file
28+
to be written:
29+
30+
- Maximum of 4GB for a 32 bit process.
31+
- Much larger for 64 bit process (common case). To know the precise
32+
requirement, measure the peak-load memory usage of your application.
33+
Add a 10% to that to accommodate core metadata. If you are using
34+
common monitoring tools, one of the graph should reveal the peak
35+
memory. If not, you can measure it directly in the system.
36+
37+
In Linux variants, you can use `top -p <pid>` to see the instantaneous
38+
memory usage of the process:
39+
40+
```
41+
$ top -p 106916
42+
43+
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
44+
106916 user 20 0 600404 54500 15572 R 109.7 0.0 81098:54 node
45+
```
46+
47+
In Darwin, the flag is `-pid`
48+
In AIX, the command is `topas`
49+
In freebsd, the command is `top`. In both AIX and freebsd, there is no
50+
flag to show per-process details. In Windows, you could use the task
51+
manager window and view the process attributes visually.
52+
53+
Insufficient file system space will result in truncated core files,
54+
and can severely hamper the ability to diagnose the problem.
55+
56+
Figure out how much free space is available in the file system:
57+
`df -k` can be used invariably across UNIX platforms.
58+
In Windows, Windows explorer when pointed to a disk partition,
59+
provides a view of the available space in that partition.
60+
61+
## Core file location and name
62+
63+
By default, core file is generated on a crash event, and is
64+
written to the current working directory - the location from
65+
where the node process was started, in most of the UNIX variants.
66+
In Darwin, it appears in /cores location.
67+
68+
By default, core files from node processes on Linux are named as
69+
`core` or `core.<pid>`, where <pid> is node process id.
70+
By default, core files from node processes on AIX and Darwin are
71+
named ‘core’.
72+
By default, core files from node processes on freebsd are named
73+
‘%N.core’. where `%N` is the name of the crashed process.
74+
75+
However, Superuser (root) can control and change these defaults.
76+
77+
In Linux, `sysctl kernel.core_pattern` shows corrent core file pattern.
78+
79+
Modify pattern using `sysctl -w kernel.core_pattern=pattern` as root.
80+
81+
In AIX, `lscore` shows the current core file pattern.
82+
83+
Enable full core dump generation using `chdev -l sys0 -a fullcore=true`
84+
Modify the current pattern using `chcore -p on -n on -l /path/to/coredumps`
85+
86+
In Darwin and freebsd, `sysctl kern.corefile` shows the corrent core file pattern.
87+
88+
Modify the current pattern using `sysctl -w kern.corefile=newpattern` as root.
89+
90+
To obtain full core files, set the following ulimit options, across UNIX variants:
91+
92+
`ulimit -c unlimited` - turn on core file generation capability with unlimited size
93+
`ulimit -d unlimited` - set the user data limit to unlimited
94+
`ulimit -f unlimited` - set the file limit to unlimited
95+
96+
The current ulimit settings can be displayed using:
97+
98+
`ulimit -a`
99+
100+
However, these are the `soft` limits and are enforced per user, per
101+
shell environment. Please note that these values are themselves
102+
practically constrained by the system-wide `hard` limit set by the
103+
system administrator. System administrators (with superuser privileges)
104+
may display, set or change the hard limits by adding the -H flag to
105+
the standard set of ulimit commands.
106+
107+
## Manual dump generation
108+
109+
Under certain circumstances where you want to collect a core
110+
manually follow these steps:
111+
112+
In linux, use `gcore [-a] [-o filename] pid` where `-a`
113+
specifies to dump everything.
114+
In AIX, use `gencore [pid] [filename]`
115+
In freebsd and Darwin, use `gcore [-s] [executable] pid`
116+
In Windows, you can use `Task manager` window, right click on the
117+
node process and select `create dump` option.
118+
119+
Special note on Ubuntu systems with `Yama hardened kernel`
120+
121+
Yama security policy inhibits a second process from collecting dump,
122+
practically rendering `gcore` unusable.
123+
124+
`setcap cap_sys_ptrace=+ep `which gdb``
125+
126+
127+
These steps make sure that when your Node.js application crashes in
128+
production a valid, full core dump is generated at a known location that
129+
can be loaded into debuggers that understand Node.js internsls, and
130+
diagnose the issue. Next article in this series will focus on that part.

0 commit comments

Comments
 (0)