Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: prevent race conditions during shutdown #21335

Closed
wants to merge 1 commit into from
Closed

tracing: prevent race conditions during shutdown #21335

wants to merge 1 commit into from

Conversation

eugeneo
Copy link
Contributor

@eugeneo eugeneo commented Jun 14, 2018

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@eugeneo eugeneo requested a review from ofrobots June 14, 2018 23:24
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Jun 14, 2018
@ofrobots
Copy link
Contributor

/cc @nodejs/trace-events

@ofrobots
Copy link
Contributor

It would be good to add the failing test case as well.

@@ -66,16 +70,15 @@ void Agent::Start() {
started_ = true;
}

Agent::ClientHandle Agent::AddClient(const std::set<std::string>& categories,
std::unique_ptr<AsyncTraceWriter> writer) {
std::unique_ptr<ClientHandle> Agent::AddClient(
Copy link
Member

@jasnell jasnell Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~~nit: need two spaces at the start here. looks like it's only one?~~~

Sigh... nevermind... misread the diff lol...sigh.

@jasnell
Copy link
Member

jasnell commented Jun 14, 2018

main change looks good, test case would make it better.

}

void AgentHandle::AppendTraceEvent(TraceObject* trace_event) {
Mutex::ScopedLock scoped_lock(mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we benchmark the impact of this?

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. It's not entirely clear to me how this fixes #21383.

void AgentHandle::Disconnect(int client) {
// This is supposed to be only callable from a main thread, same as Reset
// This code needs to be updated to enforce that.
if (agent_ != nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this method take out the lock first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot add lock without significant rework (which would delay this PR) because node::Mutex is not reentrant... Currently this is only called on the isolate thread (i.e. it will be be correct for webworkers) so it is correct without the lock... But it is not obvious

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, can you update the comment and maybe make a bit more imperative? "Is supposed to" isn't strong enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

static ClientHandle EmptyClientHandle() {
return ClientHandle(nullptr, DisconnectClient);
private:
node::Mutex mutex_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary to qualify, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ClientHandle(const ClientHandle&) = delete;
ClientHandle& operator=(const ClientHandle&) = delete;
~ClientHandle();
int id() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int id() const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


private:
std::shared_ptr<AgentHandle> agent_;
int id_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe make this const as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

void AgentHandle::Disconnect(int client) {
// This is supposed to be only callable from a main thread, same as Reset
// This code needs to be updated to enforce that.
if (agent_ != nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, can you update the comment and maybe make a bit more imperative? "Is supposed to" isn't strong enough.

using ClientHandle = std::unique_ptr<std::pair<Agent*, int>,
void (*)(std::pair<Agent*, int>*)>;
explicit AgentHandle(Agent*);
void Reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call Reset() from the destructor so you don't have to call it explicitly in Agent::~Agent()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, AgentHandle is supposed to be a weak pointer - it allows accessing the tracing agent but does not prevent its destruction. I had a naive implementation with shared_ptr/weak_ptr but the issue there is that it may cause Agent to be deleted on a wrong thread.

// Resetting the pointer disconnects client
using ClientHandle = std::unique_ptr<std::pair<Agent*, int>,
void (*)(std::pair<Agent*, int>*)>;
explicit AgentHandle(Agent*);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent* agent and consider changing the data member to Agent* const agent_;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

ClientHandle& operator=(const ClientHandle&) = delete;
~ClientHandle();
int id() const {
return id_;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style guide allows doing this on one line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@apapirovski
Copy link
Contributor

@matthiaskrgr Would you be able to test this patch against your build with AddressSanitizer and see if it fixes the issue? Thanks!


// These 3 methods operate on a "default" client, e.g. the file writer
void Enable(const std::string& categories);
void Enable(const std::set<std::string>& categories);
void Disable(const std::set<std::string>& categories);
std::string GetEnabledCategories();

void AppendTraceEvent(TraceObject* trace_event);
// void AppendTraceEvent(TraceObject* trace_event);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed rather than commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a wrong version - this method is still in use...

PR-URL: #21335
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
@eugeneo
Copy link
Contributor Author

eugeneo commented Jun 21, 2018

@eugeneo eugeneo added the wip Issues and PRs that are still a work in progress. label Jun 21, 2018
@eugeneo
Copy link
Contributor Author

eugeneo commented Jun 21, 2018

Test failure needs to be debugged on Windows.

@ofrobots
Copy link
Contributor

Test failure on windows is actually a crash:

12:38:43     AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
12:38:43     + expected - actual
12:38:43     
12:38:43     - 1
12:38:43     + 3221225477

3221225477 is 0xC0000005, an access violation on windows.

@ofrobots ofrobots added the trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. label Jun 26, 2018
@ofrobots
Copy link
Contributor

I have been unable to reproduce the crash on my windows machine so far.

@ofrobots
Copy link
Contributor

@ofrobots
Copy link
Contributor

Second CI is green on windows as well: https://ci.nodejs.org/job/node-test-pull-request/15656/ and I cannot reproduce the crashes on windows.

@ofrobots
Copy link
Contributor

Launched a stress of the single test on windows: https://ci.nodejs.org/job/node-stress-single-test/1929/

@ofrobots
Copy link
Contributor

The stress test definitely shows crashes. @eugeneo

@eugeneo
Copy link
Contributor Author

eugeneo commented Jul 13, 2018

This will be reimplemented from scratch, current design cannot work.

@eugeneo eugeneo closed this Jul 13, 2018
@eugeneo eugeneo deleted the tracing-crash branch July 13, 2018 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants