Skip to content

[v2] Gather crash reports #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
13 of 18 tasks
ArtemGr opened this issue Aug 4, 2018 · 1 comment
Closed
13 of 18 tasks

[v2] Gather crash reports #117

ArtemGr opened this issue Aug 4, 2018 · 1 comment
Assignees

Comments

@ArtemGr
Copy link

ArtemGr commented Aug 4, 2018

  • Functional test (those things tends to be flaky, we need to automatically test that crashes are caught).
  • should] Targeted at the binary Docker deployments and internal testing there should be an option of letting the OS dump the core instead of manually collecting the stack trace. No need to upload the core anywhere, at least not withing this issue timeframe: the testing developers will get it manually from the image.
  • Handle Windows exceptions (example, _EXCEPTION_POINTERS, ru, StackWalker). This should speed up the development by allowing me to test the crash handling code from my primary development environments. And will work for the Windows deployments of MM. => POC.
  • -> A full Windows build is likely necessary. Otherwise we'll be getting linking errors with things like os_portable::OS_init missing.
  • -> -> Try to link the C MM1 library to the MM2 Rust binary instead of the other way around.
  • Handle Unix signals (having the Linux and macOS deployments in mind).
  • should] Capture not just the C crashes but also the Rust panics. => Global panic handlers aren't stable yet (#[panic_handler]). We could set thread-local hooks, but I'll try a simpler (?) route first of using RUST_BACKTRACE and getting crash reports by capturing the standard output.
  • -> Dump C/Rust backtraces to standard output and save-to-share log.
  • -> -> Rehash invariants regarding the logging of sensitive information.
  • Backtrace without line numbers first, to improve reliability? => Not an option.
  • Scan the logs folder and see if some of the logs might constitute a crash or a failure.
  • won't] Watchdog. Mark the log dirty when we're doing something and clean when we're finished. That way we can know that there was a failure even if we were too dead to capture the backtrace in the log. We'll probably need a helper, a forked process that leaves a trace that the computer and filesystem were online. If the system was online but MM failed to leave the clean mark then something went wrong and the log might help us figure it out. It's important to leave the clean mark whenever MM is killed. Won't do it in this issue, but should probably create a separate one.
  • Make sure it works on macOS, HyperDEX team mostly deploys there.
  • -> Test the macOS build?
  • -> CI macOS build?
  • -> Remove the strip from the builds, turn debug options on.
  • Figure out where to send/store them. => I'd like to use the Google Cloud Storage for that. Should ask Artem for alternatives. => For starters it might be good enough if they're printed to stdout.
  • Consider implementing this for MM1 too. => The plan is to make a new release of MM reusing the same CI build chain.
@ArtemGr ArtemGr self-assigned this Aug 4, 2018
@ArtemGr
Copy link
Author

ArtemGr commented Aug 13, 2018

I think that printing stack traces to stderr is good enough for now, while we're only going to use the MM2 internally, among the testers and GUI developers. We can suspend this issue for now, see how stack traces perform in internal tests, then get back to it if/when we need the crash reports gathered automatically from the end users.

@ArtemGr ArtemGr closed this as completed Aug 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant