Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpath.select crashing node with Maximum call stack size exceeded error #78

Closed
folderol4 opened this issue Jan 16, 2018 · 14 comments
Closed

Comments

@folderol4
Copy link

folderol4 commented Jan 16, 2018

Node process is crashing with this error "Maximum call stack size exceeded" when trying to do a xpath.select call on these versions of [email protected] - [email protected], [email protected] works just fine. Raising the default node heap to 4GB didn't solve the issue. Raising it to 8GB seemed to allow the select to finish. Something is drastically less performant from a memory standpoint. A change between 0.0.24 and 0.0.25 is causing the issue.

Below is the pseudo code of how I hit the issue. Maybe I am doing something incorrectly with v0.0.25, but it is not clear to me:

const Promise = require('bluebird');
const fs = Promise.promisifyAll(require('fs-extra'));
const DOMParser = require('xmldom').DOMParser;
const xpath = require('xpath');

return fs.readFileAsync(file.path, 'utf8').then(data => {
    const xmlErrors = [];
    const doc = new DOMParser({
        errorHandler: function(key, msg) {
            xmlErrors.push(msg);
        }
    }).parseFromString(data, 'text/xml');

    if (xmlErrors.length) {
        return Promise.reject(new errors.BadRequestError(`${file.name} contains invalid content`));
    }

    const directories = xpath.select('//syscheck/directories', doc);
});

I've narrowed the issues down to PathExpr.applySteps but I can't tell what is happening after that.

I'm attaching the file that I can recreate the issue with:
file.xml.zip

@JLRishe
Copy link
Collaborator

JLRishe commented Jan 16, 2018

Thank you for reporting this.

I will look into why the changes between 0.0.24 and 0.0.25 caused this to start happening, but I suspect that the reason you have encountered this issue while others haven't is that you are using //syscheck to query a large file. // is a very inefficient way to use XPath (not just in this library, but in any library), because it means that the XPath evaluator must traverse the entire document to find any elements with the name syscheck.

If there is a determinate path that you can use from the document root to the nodes you want to query (I looked at your XML document, but //syscheck/directories doesn't seem to have anything to do with the document), then I would advise using that as that will probably remedy your issue:

/rootElement/childName/nextChildName

@folderol4
Copy link
Author

folderol4 commented Jan 16, 2018

What I really end doing is xpath.select('//Rules/PosixFileRule/StartPoints/Start'). I'm not sure if that crashes, but what is a more performant way to get at all the Start nodes?

@JLRishe
Copy link
Collaborator

JLRishe commented Jan 16, 2018

/Rules/PosixFileRule/StartPoints/Start should hopefully be more performant. The double slash is completely unnecessary there and a detriment to the performance.

@JLRishe
Copy link
Collaborator

JLRishe commented Jan 16, 2018

However, I suspect that that may not solve the issue here given the extent of the nodes you are querying (3697 nodes in total, at a considerable depth and spread out throughout the document). I will see what I can do to alleviate this, but probably won't be able to get to it right away.

@njam
Copy link

njam commented Mar 16, 2018

I experienced the same on a 6MB XML file.
Downgrading to 0.0.24 works for me too.

@folderol4
Copy link
Author

As @JLRishe mentioned moving away from // xpaths to just / avoided the issue on a version greater than 0.0.24 for me.

@jan-tosovsky-cz
Copy link

Same exception here. I can't avoid // in my XPath as the number of possible combinations is really huge (and I have no control over the input).

@folderol4
Copy link
Author

Do you know all possible combinations? I'm sure there is some number of selects that end up being slower than then the single // xpath. But the number if low enough and you could do something like this (except mode the hard coded pushes to a for loop over your array of possible paths)?

    _mySelect(path, node) {
        const nodes = [];
        nodes.push(...xpath.select(`/parents/can/be/different${path}`, node));
        nodes.push(...xpath.select(`/very/different${path}`, node));
        return nodes;
    }

    getMyNodes(doc) {
        const pathObjects = [];
        const gotMyNodes = this._mySelect('/always/the/same/child', doc);

Doing that was significantly faster than just doing xpath.select('/always/the/same/child', doc) for me

@jan-tosovsky-cz
Copy link

It seems to be related to 36a6850 revision. There is huge rafactoring in XPath processing. Stack overflows usually relate to deep recursions. This is my candidate:

PathExpr.applySteps = function (steps, xpc, nodes) {
	return reduce(function (inNodes, step) {
		return [].concat.apply([], map(function (node) {
			
			return PathExpr.applyPredicates(step.predicates, xpc, PathExpr.applyStep(step, xpc, node));
		}, inNodes));
	}, nodes, steps);
}

There are suggestions to use tail recursions to avoid stack problems. I think it is used here, but there is no tail recursion optimization implemented in V8 :-(
https://bugs.chromium.org/p/v8/issues/detail?id=4698

@gitgrimbo
Copy link

Hi, is this issue being looked at? Will it be fixed or is the only alternative to downgrade?

@JLRishe
Copy link
Collaborator

JLRishe commented Sep 12, 2018

@gitgrimbo I definitely want to fix this, but I'm just one person and have limited time. In the meantime, I would advise sticking with 0.24 until the situation can be improved.

@gitgrimbo
Copy link

Thanks @JLRishe. Do you have any further insights into the issue that could help someone (maybe myself!) looking into this?

@JLRishe
Copy link
Collaborator

JLRishe commented Aug 21, 2020

@michael-smith-tanium @gitgrimbo @bob-at-author-it @fpmk @sballesteros

Hello everyone, I'm sorry I've let this issue fall by the wayside for so long.

I am trying to look into this issue, but am unable to observe any performance differences between 0.0.24 and the current version. The example that Michael provided runs in the same amount of time on both versions (about 1 second on my machine), and that is the only example that has been provided.

I am wondering if maybe there have been changes to the V8 engine that have changed the situation here. I'm currently running Node 13.9.0.

For anyone still seeing this issue - can you please provide an example that demonstrates a noticeable difference between 0.0.24 and 0.0.27?

@JLRishe
Copy link
Collaborator

JLRishe commented Aug 21, 2020

@michael-smith-tanium @gitgrimbo @sballesteros

As noted in my previous comment, Michael's example seemed to be similar in performance when run in Node, but I observed a 4x difference in Chrome.

I have made some changes in 0.0.29 that seem to have improved the performance. On Michael's example, it seems to be comparable in performance to 0.0.24 in Chrome, and runs in half the time of 0.0.24 in Node.

If you are still seeing issues, please provide a specific, reproducible example.

@JLRishe JLRishe closed this as completed Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants