-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node.js process rss keeps rising when heapUsed/heapTotal does not (native memory leak)? #1518
Comments
@addaleax In case you get a chance would love your expertise on this 👍 |
1A) Yes, you clearly do have a memory leak.... And, by all indications, it is on the native memory side. Native memory allocations would not be reflected within the v8/JS heap; Buffers are a typical source of native memory allocation that are conducted outside of the JS heap. |
@shellberg Thanks so much for the super detailed response! Now I understand that there must be a native leak because of the growing Any ideas on how can I inspect native memory to find the leaking |
Buffers should show up in external memory, though, and that seems to be pretty low here. I’m not an expert on debugging memory leaks, but are you using any native modules in your code? |
@addaleax Thank you so much for that addition. I completely forgot about this being the case (buffers showing up in No native modules are being used. The application is actually an MQTT server that handles around 90k concurrent connections at any given moment. The clients constantly connect/disconnect over time but the number of connections never exceeds 90k, therefore the workload stays the same, but So, if |
@eladnava Looking at some other (random) reports of the actions of the oom-killer and a 4.4.0 kernel, I get the strong sense that its a bit trigger-happy when it believes that physical RAM storage is approaching full, and it has no recourse to page, i.e. use any swap. The kernel is the only process that is running truly in your physical memory; all other user jobs run in an abstracted virtual memory system that gains access to some amount of physical memory for efficiency : hence, the kernel is very protective of physical memory! Of interest, you appear to have configured for no swap resources at all?! That would raise some alarm bells for me; especially based on Linux's optimistic assumptions and over-commit memory model... (You might want to consult some swap setting guidelines too.) Incidentally, swap space is used for paging purposes in a modern kernel; its not just about swapping-out a process (which used to be quite a draconian action.). Before you go trying to find a problem that might not be present, I suggest you give the kernel the suggestion of more memory resources, for it to believe its got a few more options to consider! And, not just physical RAM! |
@shellberg Thanks for the recommendation. My EC2 servers indeed are not configured to use any swap. Can we say for sure that we are not talking about a memory leak though? Is it common practice nowadays to configure swap just to avoid OOM killer invocation? I'm scared that swap will just serve as a temporary patch and Node.js process |
@eladnava I can't say for sure (re: leak). I don't have any known good profile of an MQTT server to go by - especially not of an MQTT server in JS code (Is this I'd say its common for Linux images to need swap, because of their over-commit tendency. That, or you have to configure the kernel to lessen over-commit/confine resources used by the kernel, or both! Nevertheless, you are being caught between heuristics : that of node, and sizing its heap (based on reported available memory), and that of a (dodgy?) kernel running with only bare-metal semiconductor storage; you might want to configure and fix the JS heap of node for your MQTT server. Introducing some swap (even via a memory disk?!) may be enough to suggest to the system its healthier than it thinks, and gives it more 'rope to hang itself'! And, it provides you with some early warning of further issues based on the behaviour of utilisation of that swap device, and what processes will become largely paged-out. |
@eladnava Is there any chance at all that you could share code in order to debug this? I know that might be a lot to ask for production apps… |
@eladnava Can you check whether this also happens with Node.js 10? Do you pass any particular flags to the Node.js binary ( |
@shellberg Thanks for the additional insight. Indeed the server is configured with a I've just configured the server with a I still don't understand why @addaleax Sure, I could share the code, however the leak is not reproducible without having real MQTT devices constantly connecting and disconnecting to/from the server. I was not able to reproduce the leak with "fake" MQTT clients. Therefore it only happens in production with real connections. I've previously tested with Node v10 but this was during a different stage of testing when I was attempting to fix other possible leaks. I will give it a shot if the I have a really strong feeling about this not being a memory leak, as the As per Node.js binary args, here they are:
I've just realized that |
I don’t know how the internals of V8 work ehre, but I guess there’s a decent chance of that being true given the option description? |
@addaleax I've just realized that this flag ( This flag used to be a way to control when the v8 garbage collector would run. Passing it would completely disable automatic GCs and would require the Node.js process to manually call Thanks for the fast response. So it is not the cause of the rising |
For whatever reason the OS is not reclaiming or Node.js is not releasing its
And then Possibly the
In any case I've removed the flag Also @shellberg I will consider updating the kernel to rule out the OOM killer bug in |
@eladnava The OS is reclaiming physical memory its decided it needs to use! But, its doing so at the expense of the only active process in your workload that you care about!!! Linux heuristics are designed for a mixed workload (rather than a containerised workload), and hence its picked on the one process it considers to have most egregiously overcommitted memory consumption, and the means of reclamation is to reap(kill) that targetted process. But, your workload only consists of one active process, so by definition it is killing the one process you care about. The most recent In terms of your rising RSS profile of your node process, its related to the way the generational GC works with the pages of memory allocated to the v8/JS heap it is managing (3584MB, or 3.5GB), and also the distinction between an allocation and committing pages of physical backing store to support that allocation as pages are needed to host data, plus some heap fragmentation. (The propensity to accept an allocation without potentially having actual physical pages to host it is precisely what overcommit is!) Nevertheless, you can see that the RSS of the node process is asymptomatically converging on your total process size (code size + total allocations, crudely speaking) that is just under 3.6GB... your configured JS heap (3.5GB)+some native data (some : think 'external' plus a bit)+size of executable/shared libraries+JIT code cache. Still, the substantial size of your node process is determined by your explicit command-line configuration of a 3.5 GB heap : so, do you need that much?! Based on the heap utilisation you've shown so far, you can probably easily trim that some to 2.5 or 3GB? (Is this just a normal MQTT loading, or do you have to maintain capacity for a higher peak load?) |
Thank you @shellberg for the detailed explanation. I now understand a little better what is going on behind the scenes. If the |
It appears I was finally able to fix the memory leak issue! 🎉 Since I tried 4 fixes at once, I don't know which one specifically fixed it, but here are the last 4 things I tried:
It has now been 27 days and none of the Node.js processes have been terminated by OOM (where they would usually be terminated after a week or so)! 😄 Thanks so much @shellberg and @addaleax for all of your helpful tips and suggestions! 😄 👍 🎖 💯 |
I have the same problem and debug for quite a while, in the end, i have also found one solution. I changed the default memory allocator to "jemalloc", we are using debian distribution which is using glibc malloc as the default memory allocator, after some research, it turns out jemalloc is better than glib malloc for handling memory fragmentations, i can clearly see the rss decrease after the load after changing to jemalloc. here is the part I added to our dockerfile to change the default memory allocator.
For MacOS, i use:
for development purpose to alter the memory allocator only for node process. Hope this could help for those people who might have the same problem in the future. |
@xr Thanks so much for sharing! Can confirm this resolves the issue on my end as well! 🎉 The On Ubuntu 18.04 LTS, these commands install and configure
Restart the node process for it to start using
Check the PID of your running
|
@xr thank you for the post, this also worked for the same type of issue I was having. |
@xr Really appreciate, I also solve this problem from your post. thanks. |
Hi @xr @eladnava I have found this helpful, Also can you please tell that if i changed the default memory allocator in docker container, then do i need to install the jemalloc on OS as well? or more precisely since I have installed jemalloc in the docker container and verified by the grep command that jemalloc is being used. |
Hi @addaleax, |
We also had this issue, even when using Jemalloc, but we were on Alpine. |
My environment is docke, and the image is node:14-alpine3.16, but it doesn't work: RUN echo "/usr/lib/libjemalloc.so.2" >> /etc/ld.so.preload CMD ["node", "./app.js", "LD_PRELOAD=/usr/lib/libjemalloc.so.2"] I tried both of the above two methods, but I can’t see any information about jemalloc in /proc/PID/smaps |
Try to run |
It was normal to increase the environment variables, but I found that rss became higher after startup. |
@lwmxiaobei this issue is closed and your node14 runtime is no longer maintained - since 8 months ago. If you have an issue with a version of node that is currently maintained, I suggest creating a new issue. |
Hi all,
First thanks so much for reading this and helping me out.
I've been attempting to fix a stubborn memory leak for a while now and thought that I did manage to finally get it (#1484) but it appears to have resurfaced in the form of a slowly increasing
rss
whileheapUsed
andheapTotal
remain stable:This graph is from a two-day period.
rss
does appear to kind of stabilize towards the end of the graph and this appears to be when the system becomes low on memory (free -m
reports283
mb free). But my Node.js process does eventually (after 3-4 days) get terminated by the OOM killer:Since Node.js is the only process on the server consuming significant resources, I assumed a growing
rss
was due to no demand for memory allocation by other processes. But since the OOM killer is invoked this must not be the case.Here's the second half of that graph zoomed in (after system memory becomes low):
The instance in use is an AWS
t3.large
with8GB
of RAM.Questions:
rss
amount more than twice the size ofheapTotal
? This makes no sense I think.Thanks again for anyone pitching in any advice, tips, observations or pointers. 🙏
The text was updated successfully, but these errors were encountered: