Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC Node Client high memory usage, likely a memory leak #686

Closed
Vijendra07Kulhade opened this issue Dec 22, 2018 · 15 comments
Closed

GRPC Node Client high memory usage, likely a memory leak #686

Vijendra07Kulhade opened this issue Dec 22, 2018 · 15 comments

Comments

@Vijendra07Kulhade
Copy link

Problem description

GRPC NodeJS client is consuming lot of heap memory and crashing the node application.

Reproduction steps

Call the GRPC Node client for 10000 times in a loop. and it shows very high heap memory used.
I tested it with Simple RPC and Client streaming RPC, It consumes huge heap memory.

Environment

  • OS name, version and architecture: [Mac Mojave]
  • Node version [10.7.0 and 9.11.2]
  • Node installation method [brew, yum]
  • If applicable, compiler version [Apple LLVM version 10.0.0 (clang-1000.11.45.5) ]
  • Package name and version [[email protected] and [email protected] ]

Additional context

Using the GRPC client for sending the metrics to the GRPC Server. GRPC Server is created in Java.
Node Application is consuming data from Kafka and generate the metrics.
When ran the test for 10000 Records below is the heap usage observed for simple RPC

Memory usage in the beginning  { rss: 39030784,
  heapTotal: 23445504,
  heapUsed: 12618584,
  external: 82137 }
Memory usage at the end  { rss: 2993180672,
  heapTotal: 1056776192,
  heapUsed: 961288312,
  external: 12063228 }

Client Streaming RPC seems like costlier than previous

Memory usage in the beginning  { rss: 37113856,
  heapTotal: 22396928,
  heapUsed: 14024248,
  external: 600014 }
Memory usage at the end  { rss: 3347972096,
  heapTotal: 926752768,
  heapUsed: 831483600,
  external: 24105468 }
@murgatroid99
Copy link
Member

Can you show some specific code that experiences this problem?

@Vijendra07Kulhade
Copy link
Author

@murgatroid99 Thanks for your reply. I did a simple test with grpc node example here.
https://github.com/grpc/grpc/tree/master/examples/node
I started the grpc server from this code
https://github.com/grpc/grpc/blob/master/examples/node/dynamic_codegen/greeter_server.js

and made a small change in client.

function main() {
  var client = new hello_proto.Greeter('localhost:50051',
                                       grpc.credentials.createInsecure());
  var user;
  if (process.argv.length >= 3) {
    user = process.argv[2];
  } else {
    user = 'world';
  }
console.log('Memory before ',JSON.stringify(process.memoryUsage()));
  for(let i=0;i<100000;i++){
     client.sayHello({name: user}, function(err, response) {
       console.log('Greeting:', response.message);
     });
}
 console.log('Memory After grpc call ',JSON.stringify(process.memoryUsage()));
}

I observed rss grows quickly. so does heap. but both of them are not proportionate.

@murgatroid99
Copy link
Member

gRPC client requests are asnychronous. What you are doing here is initiating 100,000 requests, and then checking the memory usage again without waiting for any of them to complete. Each request uses some memory while it is active, so it is completely normal that the memory usage would be higher at that time.

Even accounting for that, increased memory usage alone is not sufficient to conclude that there is a memory leak. If you could show that after allowing calls to complete and the garbage collector to run, there is still increased memory usage proportional to the number of calls, then there is a memory leak.

@Vijendra07Kulhade
Copy link
Author

Yes, You are right. I have been observing a memory leak in my production code. when I am using grpc client to support logging. Let me come up with correct test code. Thanks

@dpkirchner
Copy link

dpkirchner commented Mar 7, 2019

Here's a modified example that demonstrates the issue:

function main() {
  var client = new hello_proto.Greeter('localhost:50051',
                                       grpc.credentials.createInsecure());
  var user;
  if (process.argv.length >= 3) {
    user = process.argv[2];
  } else {
    user = 'world';
  }

  global.gc();
  console.log('Memory before ',JSON.stringify(process.memoryUsage()));

  var count = 100000;
  var done = 0;
  for(let i=0;i<count;i++){
     client.sayHello({name: user}, function(err, response) {
       //console.log('Greeting:', response.message);
       done++;

       if (done == count) {
         console.log('Memory After grpc call ',JSON.stringify(process.memoryUsage()));
         global.gc();
         console.log('Memory After global gc ',JSON.stringify(process.memoryUsage()));
       }
     });
  }
}

I ran this against the example server, unmodified, and here's the output:

$ node --expose_gc ./dynamic_codegen/greeter_client.js 
Memory before  {"rss":42614784,"heapTotal":19783680,"heapUsed":9054992,"external":722523}
Memory After grpc call  {"rss":1805291520,"heapTotal":290840576,"heapUsed":257489912,"external":941424}
Memory After global gc  {"rss":1575428096,"heapTotal":59629568,"heapUsed":9215912,"external":811472}

The heap stayed about the same, after GC, but rss shot up and stayed up. Perhaps some memory leak in the native code (code not using node's heap)?

This is with:

$ cat package.json
{
  "name": "grpc-examples",
  "version": "0.1.0",
  "dependencies": {
    "@grpc/proto-loader": "^0.1.0",
    "async": "^1.5.2",
    "google-protobuf": "^3.0.0",
    "grpc": "^1.11.0",
    "lodash": "^4.6.1",
    "minimist": "^1.2.0"
  }
}
$ node --version
v8.15.1

@murgatroid99
Copy link
Member

The fact that memory increases here does not imply that there is a memory leak. In this example, the client object is still in scope, and it makes sense that the internal objects associated with it will use an approximately fixed amount of memory after making some requests as opposed to before.

If you want to show that calls are leaking memory, you have to show that the increase in memory usage after garbage collection is proportional to the number of calls made. In other words, you need more data points.

@nicolasnoble
Copy link
Member

The fact RSS stays up is a normal behavior. Worrying about this value is the wrong thing to do.

RSS isn't physically allocated memory. It's "reserved" memory, and its behavior is kind of never go down once it went up. The pages inside the RSS sections are sent back to the kernel, but the RSS high watermark doesn't move because there might be pages there that are still used.

You can end up with a machine that has 10 times more memory in the RSS sections of its processes than the total physical memory + swap space. That's not an indication of bad behavior. It's only an indication that the memory usage of said processes went up at some point, and helps gauging peak memory usage of processes.

Closing this one.

@murgatroid99
Copy link
Member

murgatroid99 commented Mar 7, 2019

As an additional note, the heapUsed and external values show actual memory usage, and the increases to those values are so small compared to the number of calls that this test almost surely rules out a per-call memory leak.

The much greater increase to RSS comes from the fact that you had 100,000 calls pending at the same time.

@nicolasnoble
Copy link
Member

For further proof, if I modify further the code to dump memory usage after every RPC, I get the following graphs:

chart

If there was a leak demonstrable with your numbers, you'd see a linear growth somewhere. This isn't the case here.

Link to the data + live graph:

https://docs.google.com/spreadsheets/d/1jXcZ88EnR15k5QUcmgvFF0BI5BRe7jRaS-hVPOkjwTo/edit?usp=sharing

New code:

  let count = 100000
  for (let i = 0; i < count; i++) {
    client.sayHello({name: user}, function(err, response) {
      let usage = process.memoryUsage()
      console.log(usage.rss + ',' + usage.heapTotal + ',' + usage.heapUsed + ',' + usage.external)
    })
  }

@nicolasnoble
Copy link
Member

Also, as @murgatroid99 pointed out, the methodology of running all rpcs at once is inherently flawed. If we run sequentially however, this is much more relevant, and here's the corresponding graphs we get, which shows very clearly that there is no memory leak:

chart

Link to the data:

https://docs.google.com/spreadsheets/d/12i9MC7Nl1Zkg3gInOhrhk_JRVAA17Rc-6UF5dxRCPAE/edit?usp=sharing

Associated code:

  let count = 100000
  let done = 0
  let startCall
  let cb = function(err, response) {
    let usage = process.memoryUsage()
    console.log(usage.rss + ',' + usage.heapTotal + ',' + usage.heapUsed + ',' + usage.external)
    if (++done != count) startCall()
  }
  startCall = function() {
    client.sayHello({ name: user }, cb)
  }
  startCall()

@nicolasnoble
Copy link
Member

By comparison, here's what a memory leak looks like:

image

(taken from nodejs/help#1518)

@dpkirchner
Copy link

OK. Maybe I have a leak somewhere else. I'm getting a lot of GRPC-related errors in my logs that coincide with extremely high and ever-growing RSS (>10GB) matching OP's symptoms. I'll poke around some more. Thanks for investigating.

@nicolasnoble
Copy link
Member

It might still be grpc-related, I'm not contesting this, but your reproduction case definitely wasn't demonstrating any memory leak. I'm not excluding the possibility that some error may leak memory for instance, but you'd need to come up with a reproduction case that causes this error and demonstrates a leak.

@dpkirchner
Copy link

Yeah, I was mostly just adjusting the original reproduction case to address the concerned raised (the one about the async nature), hoping that it would help highlight some issue.

As for the node graph, I'm trying to figure out the difference between it and the graph from the reproduction case, other than the fact that the node graph was over 2 days time and the grpc graph was immediate (because we are running all 100k requests at once). If I can show slow but steady growth over time using that sample script, but slowed down substantially, would that help?

@nicolasnoble
Copy link
Member

The graph I produced took samples over a few seconds, and it plateaued within milliseconds. I could let it run forever and it would probably still plateau similarly. Also, the location of the plateau is correlated with the moment V8's garbage collector changed strategy:

image

The other graph took days to gather, and it was growing linearly over the same sort of timeframe that I did. It started slowing down after some hours because the machine was thrashing due to having less swap space to work, but I'd be certain if their X axis was "number of operations" like mine instead of "time", the RSS section would be fully linearly growing.

So yes, if you can come up with some code sample that shows a linear growth in memory usage, that'd be useful. It should probably look like this over a couple of minutes: (yay paint)

image

@lock lock bot locked as resolved and limited conversation to collaborators Jun 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants