You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Functional test (those things tends to be flaky, we need to automatically test that crashes are caught).
should] Targeted at the binary Docker deployments and internal testing there should be an option of letting the OS dump the core instead of manually collecting the stack trace. No need to upload the core anywhere, at least not withing this issue timeframe: the testing developers will get it manually from the image.
Handle Windows exceptions (example, _EXCEPTION_POINTERS, ru, StackWalker). This should speed up the development by allowing me to test the crash handling code from my primary development environments. And will work for the Windows deployments of MM. => POC.
-> A full Windows build is likely necessary. Otherwise we'll be getting linking errors with things like os_portable::OS_init missing.
-> -> Try to link the C MM1 library to the MM2 Rust binary instead of the other way around.
Handle Unix signals (having the Linux and macOS deployments in mind).
should] Capture not just the C crashes but also the Rust panics. => Global panic handlers aren't stable yet (#[panic_handler]). We could set thread-local hooks, but I'll try a simpler (?) route first of using RUST_BACKTRACE and getting crash reports by capturing the standard output.
-> Dump C/Rust backtraces to standard output and save-to-share log.
-> -> Rehash invariants regarding the logging of sensitive information.
Backtrace without line numbers first, to improve reliability? => Not an option.
Scan the logs folder and see if some of the logs might constitute a crash or a failure.
won't] Watchdog. Mark the log dirty when we're doing something and clean when we're finished. That way we can know that there was a failure even if we were too dead to capture the backtrace in the log. We'll probably need a helper, a forked process that leaves a trace that the computer and filesystem were online. If the system was online but MM failed to leave the clean mark then something went wrong and the log might help us figure it out. It's important to leave the clean mark whenever MM is killed. Won't do it in this issue, but should probably create a separate one.
Make sure it works on macOS, HyperDEX team mostly deploys there.
-> Test the macOS build?
-> CI macOS build?
-> Remove the strip from the builds, turn debug options on.
Figure out where to send/store them. => I'd like to use the Google Cloud Storage for that. Should ask Artem for alternatives. => For starters it might be good enough if they're printed to stdout.
Consider implementing this for MM1 too. => The plan is to make a new release of MM reusing the same CI build chain.
The text was updated successfully, but these errors were encountered:
I think that printing stack traces to stderr is good enough for now, while we're only going to use the MM2 internally, among the testers and GUI developers. We can suspend this issue for now, see how stack traces perform in internal tests, then get back to it if/when we need the crash reports gathered automatically from the end users.
core
instead of manually collecting the stack trace. No need to upload thecore
anywhere, at least not withing this issue timeframe: the testing developers will get it manually from the image.os_portable::OS_init
missing.#[panic_handler]
). We could set thread-local hooks, but I'll try a simpler (?) route first of usingRUST_BACKTRACE
and getting crash reports by capturing the standard output.strip
from the builds, turn debug options on.The text was updated successfully, but these errors were encountered: