Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: Do not lock EventLoop::mutex after EventLoop is done #159

Merged
merged 1 commit into from
Feb 10, 2025

Conversation

ryanofsky
Copy link
Collaborator

Currently, there are two cases where code may attempt to lock EventLoop::mutex after the EventLoop::loop() method has finished running. This is not a problem by itself, but is a problem if there is external code which deletes the EventLoop object immediately after the method returns, which is the case in Bitcoin Core. Both cases of locking after the loop method returns happen due to uses of the Unlock() utility function which unlocks a mutex temporarily, runs a callback function and relocks it.

The first case happens in EventLoop::removeClient when the EventLoop::done() condition is reached and it calls Unlock() in order to be able to call write(post_fd, ...) without blocking and wake the EventLoop thread. In this case, since EventLoop execution is done there is not really any point to using Unlock() and relocking, so the code is changed to use a simple lock.unlock() call and not try to relock.

The second case happens in EventLoop::post where Unlock() is used in a similar way, and depending on thread scheduling (if the thread is delayed for a long time before relocking) can result in failing to relock EventLoop::m_mutex after calling write(). In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new addClient/removeClient calls are added to this code, so the EventLoop::loop() function will never exit while the post() function is waiting.

These two changes are being labeled as a bugfix even though there is not technically a bug here in the libmultiprocess code or API. The EventLoop object itself was safe before these changes as long as outside code waited for EventLoop methods to finish executing before deleting the EventLoop object. Originally the multiprocess code added in
bitcoin/bitcoin#10102 did this, but later as more features were added for binding and connecting to unix sockets, and unit tests were added which would immediately delete the EventLoop object after EventLoop::loop() returned, it meant this code could now start failing to relock after unlocking.

A previous attempt was made to fix this bug in
bitcoin/bitcoin#31815 outside of libmultiprocess code. But it only addressed the first case of a failing Unlock() in EventLoop::removeClient(), not the second case in EventLoop::post().

This first case described above was not observed but is theoretically possible. The second case happens intermittently on macos and was reported #154

Currently, there are two cases where code may attempt to lock
`EventLoop::mutex` after the `EventLoop::loop()` method has finished running.
This is not a problem by itself, but is a problem if there is external code
which deletes the `EventLoop` object immediately after the method returns,
which is the case in Bitcoin Core. Both cases of locking after the loop method
returns happen due to uses of the `Unlock()` utility function which unlocks a
mutex temporarily, runs a callback function and relocks it.

The first case happens in `EventLoop::removeClient` when the
`EventLoop::done()` condition is reached and it calls `Unlock()` in order to be
able to call `write(post_fd, ...)` without blocking and wake the EventLoop
thread. In this case, since `EventLoop` execution is done there is not really
any point to using `Unlock()` and relocking, so the code is changed to use a
simple `lock.unlock()` call and not try to relock.

The second case happens in `EventLoop::post` where `Unlock()` is used in a
similar way, and depending on thread scheduling (if the thread is delayed for a
long time before relocking) can result in failing to relock
`EventLoop::m_mutex` after calling `write()`. In this case, since relocking the
mutex is actually necessary for the code that follows, the fix is different:
new `addClient`/`removeClient` calls are added to this code, so the
`EventLoop::loop()` function will never exit while the `post()` function is
waiting.

These two changes are being labeled as a bugfix even though there is not
technically a bug here in the libmultiprocess code or API. The `EventLoop`
object itself was safe before these changes as long as outside code waited for
`EventLoop` methods to finish executing before deleting the `EventLoop` object.
Originally the multiprocess code added in
bitcoin/bitcoin#10102 did this, but later as more
features were added for binding and connecting to unix sockets, and unit tests
were added which would immediately delete the `EventLoop` object after
`EventLoop::loop()` returned, it meant this code could now start failing to
relock after unlocking.

A previous attempt was made to fix this bug in
bitcoin/bitcoin#31815 outside of libmultiprocess code.
But it only addressed the first case of a failing `Unlock()` in
`EventLoop::removeClient()`, not the second case in `EventLoop::post()`.

This first case described above was not observed but is theoretically possible.
The second case happens intermittently on macos and was reported
bitcoin-core#154
@ryanofsky ryanofsky merged commit 26b9f3d into bitcoin-core:master Feb 10, 2025
@ryanofsky ryanofsky mentioned this pull request Feb 10, 2025
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Feb 10, 2025
Bring in a few cmake changes to improve cmake support and fix compile/lint/tidy warnings.

bitcoin-core/libmultiprocess#140 build: don't clobber user/superproject c++ version
bitcoin-core/libmultiprocess#142 build: add option for external mpgen binary
bitcoin-core/libmultiprocess#143 cleanup: initialize vars in the EventLoop constructor in the correct order
bitcoin-core/libmultiprocess#146 cmake: Suppress compiler warnings from capnproto headers
bitcoin-core/libmultiprocess#147 cmake: EXTERNAL_MPGEN cleanups
bitcoin-core/libmultiprocess#148 util: fix -Wpessimizing-move warning
bitcoin-core/libmultiprocess#145 CTest: Module must be included at the top level
bitcoin-core/libmultiprocess#149 Avoid `-Wundef` compiler warnings
bitcoin-core/libmultiprocess#152 refactor: Fix compiler and clang-tidy warnings
bitcoin-core/libmultiprocess#155 scripted-diff: s/Libmultiprocess_EXTERNAL_MPGEN/MPGEN_EXECUTABLE/g
bitcoin-core/libmultiprocess#156 refactor: Remove locale-dependent function calls
bitcoin-core/libmultiprocess#157 refactor: Avoid using std::format
bitcoin-core/libmultiprocess#159 bugfix: Do not lock EventLoop::mutex after EventLoop is done
@ryanofsky
Copy link
Collaborator Author

#160 contains some followups to try to make this code a little safer and easier to maintain.

Sjors pushed a commit to Sjors/bitcoin that referenced this pull request Feb 10, 2025
Bring in a few cmake changes to improve cmake support and fix compile/lint/tidy warnings.

bitcoin-core/libmultiprocess#140 build: don't clobber user/superproject c++ version
bitcoin-core/libmultiprocess#142 build: add option for external mpgen binary
bitcoin-core/libmultiprocess#143 cleanup: initialize vars in the EventLoop constructor in the correct order
bitcoin-core/libmultiprocess#146 cmake: Suppress compiler warnings from capnproto headers
bitcoin-core/libmultiprocess#147 cmake: EXTERNAL_MPGEN cleanups
bitcoin-core/libmultiprocess#148 util: fix -Wpessimizing-move warning
bitcoin-core/libmultiprocess#145 CTest: Module must be included at the top level
bitcoin-core/libmultiprocess#149 Avoid `-Wundef` compiler warnings
bitcoin-core/libmultiprocess#152 refactor: Fix compiler and clang-tidy warnings
bitcoin-core/libmultiprocess#155 scripted-diff: s/Libmultiprocess_EXTERNAL_MPGEN/MPGEN_EXECUTABLE/g
bitcoin-core/libmultiprocess#156 refactor: Remove locale-dependent function calls
bitcoin-core/libmultiprocess#157 refactor: Avoid using std::format
bitcoin-core/libmultiprocess#159 bugfix: Do not lock EventLoop::mutex after EventLoop is done
bitcoin-core/libmultiprocess#161 cmake: Avoid including CTest if not top level project
Sjors pushed a commit to Sjors/bitcoin that referenced this pull request Feb 13, 2025
Bring in a few cmake changes to improve cmake support and fix compile/lint/tidy warnings.

bitcoin-core/libmultiprocess#140 build: don't clobber user/superproject c++ version
bitcoin-core/libmultiprocess#142 build: add option for external mpgen binary
bitcoin-core/libmultiprocess#143 cleanup: initialize vars in the EventLoop constructor in the correct order
bitcoin-core/libmultiprocess#146 cmake: Suppress compiler warnings from capnproto headers
bitcoin-core/libmultiprocess#147 cmake: EXTERNAL_MPGEN cleanups
bitcoin-core/libmultiprocess#148 util: fix -Wpessimizing-move warning
bitcoin-core/libmultiprocess#145 CTest: Module must be included at the top level
bitcoin-core/libmultiprocess#149 Avoid `-Wundef` compiler warnings
bitcoin-core/libmultiprocess#152 refactor: Fix compiler and clang-tidy warnings
bitcoin-core/libmultiprocess#155 scripted-diff: s/Libmultiprocess_EXTERNAL_MPGEN/MPGEN_EXECUTABLE/g
bitcoin-core/libmultiprocess#156 refactor: Remove locale-dependent function calls
bitcoin-core/libmultiprocess#157 refactor: Avoid using std::format
bitcoin-core/libmultiprocess#159 bugfix: Do not lock EventLoop::mutex after EventLoop is done
bitcoin-core/libmultiprocess#161 cmake: Avoid including CTest if not top level project
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Feb 24, 2025
Bump libmultiprocess library to include bugfix
bitcoin-core/libmultiprocess#159 which should fix
intermittent CI failure reported
bitcoin#31921

This change is bumping the libmultiprocess version instead of cherry picking
the bugfix. It could cherry-pick the bugfix instead, but there are reasons to
prefer just bumping the version:

- Bugfix might interact with earlier PRs, and the latest version is better
  tested with testing done in many CI configurations in bitcoin#30975 and bitcoin#31802

- Even though we are in feature freeze for a release, the MULTIPROCESS=1 option
  is currently not enabled for release, so this PR only affect CI builds and
  local builds, not the release build.

bitcoin-core/libmultiprocess#140 build: don't clobber user/superproject c++ version
bitcoin-core/libmultiprocess#142 build: add option for external mpgen binary
bitcoin-core/libmultiprocess#143 cleanup: initialize vars in the EventLoop constructor in the correct order
bitcoin-core/libmultiprocess#146 cmake: Suppress compiler warnings from capnproto headers
bitcoin-core/libmultiprocess#147 cmake: EXTERNAL_MPGEN cleanups
bitcoin-core/libmultiprocess#148 util: fix -Wpessimizing-move warning
bitcoin-core/libmultiprocess#145 CTest: Module must be included at the top level
bitcoin-core/libmultiprocess#149 Avoid `-Wundef` compiler warnings
bitcoin-core/libmultiprocess#152 refactor: Fix compiler and clang-tidy warnings
bitcoin-core/libmultiprocess#155 scripted-diff: s/Libmultiprocess_EXTERNAL_MPGEN/MPGEN_EXECUTABLE/g
bitcoin-core/libmultiprocess#156 refactor: Remove locale-dependent function calls
bitcoin-core/libmultiprocess#157 refactor: Avoid using std::format
bitcoin-core/libmultiprocess#159 bugfix: Do not lock EventLoop::mutex after EventLoop is done
bitcoin-core/libmultiprocess#161 cmake: Avoid including CTest if not top level project
bitcoin-core/libmultiprocess#164 Bump minimum required cmake to 3.12
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Feb 24, 2025
Bump libmultiprocess library to include bugfix
bitcoin-core/libmultiprocess#159 which should fix
intermittent CI failures reported in
bitcoin#31921

This change is bumping the libmultiprocess version instead of cherry picking
the bugfix. It could cherry-pick the bugfix instead, but there are reasons to
prefer bumping the version:

- Bugfix might interact with earlier PRs, and the latest version is better
  tested with testing done in many CI configurations in bitcoin#30975 and bitcoin#31802

- Even though we are in feature freeze for a release, the MULTIPROCESS=1 option
  is currently not enabled for release, so this PR only affect CI builds and
  local builds, not the release build.

This update brings in the following changes:

bitcoin-core/libmultiprocess#140 build: don't clobber user/superproject c++ version
bitcoin-core/libmultiprocess#142 build: add option for external mpgen binary
bitcoin-core/libmultiprocess#143 cleanup: initialize vars in the EventLoop constructor in the correct order
bitcoin-core/libmultiprocess#146 cmake: Suppress compiler warnings from capnproto headers
bitcoin-core/libmultiprocess#147 cmake: EXTERNAL_MPGEN cleanups
bitcoin-core/libmultiprocess#148 util: fix -Wpessimizing-move warning
bitcoin-core/libmultiprocess#145 CTest: Module must be included at the top level
bitcoin-core/libmultiprocess#149 Avoid `-Wundef` compiler warnings
bitcoin-core/libmultiprocess#152 refactor: Fix compiler and clang-tidy warnings
bitcoin-core/libmultiprocess#155 scripted-diff: s/Libmultiprocess_EXTERNAL_MPGEN/MPGEN_EXECUTABLE/g
bitcoin-core/libmultiprocess#156 refactor: Remove locale-dependent function calls
bitcoin-core/libmultiprocess#157 refactor: Avoid using std::format
bitcoin-core/libmultiprocess#159 bugfix: Do not lock EventLoop::mutex after EventLoop is done
bitcoin-core/libmultiprocess#161 cmake: Avoid including CTest if not top level project
bitcoin-core/libmultiprocess#164 Bump minimum required cmake to 3.12
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Feb 24, 2025
Bump libmultiprocess library to include bugfix
bitcoin-core/libmultiprocess#159 which should fix
intermittent CI failures reported in
bitcoin#31921

This change is bumping the libmultiprocess version instead of cherry picking
the bugfix. It could cherry-pick the bugfix instead, but there are reasons to
prefer bumping the version:

- Bugfix might interact with earlier PRs, and the latest version is better
  tested with testing done in many CI configurations in bitcoin#30975 and bitcoin#31802

- Even though we are in feature freeze for a release, the MULTIPROCESS=1 option
  is currently not enabled for release, so this PR only affect CI builds and
  local builds, not the release build.

This update brings in the following changes:

bitcoin-core/libmultiprocess#140 build: don't clobber user/superproject c++ version
bitcoin-core/libmultiprocess#142 build: add option for external mpgen binary
bitcoin-core/libmultiprocess#143 cleanup: initialize vars in the EventLoop constructor in the correct order
bitcoin-core/libmultiprocess#146 cmake: Suppress compiler warnings from capnproto headers
bitcoin-core/libmultiprocess#147 cmake: EXTERNAL_MPGEN cleanups
bitcoin-core/libmultiprocess#148 util: fix -Wpessimizing-move warning
bitcoin-core/libmultiprocess#145 CTest: Module must be included at the top level
bitcoin-core/libmultiprocess#149 Avoid `-Wundef` compiler warnings
bitcoin-core/libmultiprocess#152 refactor: Fix compiler and clang-tidy warnings
bitcoin-core/libmultiprocess#155 scripted-diff: s/Libmultiprocess_EXTERNAL_MPGEN/MPGEN_EXECUTABLE/g
bitcoin-core/libmultiprocess#156 refactor: Remove locale-dependent function calls
bitcoin-core/libmultiprocess#157 refactor: Avoid using std::format
bitcoin-core/libmultiprocess#159 bugfix: Do not lock EventLoop::mutex after EventLoop is done
bitcoin-core/libmultiprocess#161 cmake: Avoid including CTest if not top level project
bitcoin-core/libmultiprocess#164 Bump minimum required cmake to 3.12
fanquake pushed a commit to bitcoin/bitcoin that referenced this pull request Feb 25, 2025
Bump libmultiprocess library to include bugfix
bitcoin-core/libmultiprocess#159 which should fix
intermittent CI failures reported in
#31921

This change is bumping the libmultiprocess version instead of cherry picking
the bugfix. It could cherry-pick the bugfix instead, but there are reasons to
prefer bumping the version:

- Bugfix might interact with earlier PRs, and the latest version is better
  tested with testing done in many CI configurations in #30975 and #31802

- Even though we are in feature freeze for a release, the MULTIPROCESS=1 option
  is currently not enabled for release, so this PR only affect CI builds and
  local builds, not the release build.

This update brings in the following changes:

bitcoin-core/libmultiprocess#140 build: don't clobber user/superproject c++ version
bitcoin-core/libmultiprocess#142 build: add option for external mpgen binary
bitcoin-core/libmultiprocess#143 cleanup: initialize vars in the EventLoop constructor in the correct order
bitcoin-core/libmultiprocess#146 cmake: Suppress compiler warnings from capnproto headers
bitcoin-core/libmultiprocess#147 cmake: EXTERNAL_MPGEN cleanups
bitcoin-core/libmultiprocess#148 util: fix -Wpessimizing-move warning
bitcoin-core/libmultiprocess#145 CTest: Module must be included at the top level
bitcoin-core/libmultiprocess#149 Avoid `-Wundef` compiler warnings
bitcoin-core/libmultiprocess#152 refactor: Fix compiler and clang-tidy warnings
bitcoin-core/libmultiprocess#155 scripted-diff: s/Libmultiprocess_EXTERNAL_MPGEN/MPGEN_EXECUTABLE/g
bitcoin-core/libmultiprocess#156 refactor: Remove locale-dependent function calls
bitcoin-core/libmultiprocess#157 refactor: Avoid using std::format
bitcoin-core/libmultiprocess#159 bugfix: Do not lock EventLoop::mutex after EventLoop is done
bitcoin-core/libmultiprocess#161 cmake: Avoid including CTest if not top level project
bitcoin-core/libmultiprocess#164 Bump minimum required cmake to 3.12
fanquake added a commit to bitcoin/bitcoin that referenced this pull request Feb 25, 2025
01f7715 depends: Update libmultiprocess library to fix CI failure (Ryan Ofsky)

Pull request description:

  Bump libmultiprocess library to include bugfix bitcoin-core/libmultiprocess#159 which should fix intermittent CI failures reported in #31921

  This change is bumping the libmultiprocess version instead of cherry picking the bugfix. It could cherry-pick the bugfix instead, but there are reasons to prefer bumping the version:

  - Bugfix might interact with earlier PRs, and the latest version is better tested with testing done in many CI configurations in [#30975](#30975) and [#31802](#31802)

  - Even though we are in feature freeze for a release, the MULTIPROCESS=1 option is currently not enabled for release, so this PR only affect CI builds and local builds, not the release build.

  This update brings in the following changes:

  bitcoin-core/libmultiprocess#140 build: don't clobber user/superproject c++ version
  bitcoin-core/libmultiprocess#142 build: add option for external mpgen binary
  bitcoin-core/libmultiprocess#143 cleanup: initialize vars in the EventLoop constructor in the correct order
  bitcoin-core/libmultiprocess#146 cmake: Suppress compiler warnings from capnproto headers
  bitcoin-core/libmultiprocess#147 cmake: EXTERNAL_MPGEN cleanups
  bitcoin-core/libmultiprocess#148 util: fix -Wpessimizing-move warning
  bitcoin-core/libmultiprocess#145 CTest: Module must be included at the top level
  bitcoin-core/libmultiprocess#149 Avoid `-Wundef` compiler warnings
  bitcoin-core/libmultiprocess#152 refactor: Fix compiler and clang-tidy warnings
  bitcoin-core/libmultiprocess#155 scripted-diff: s/Libmultiprocess_EXTERNAL_MPGEN/MPGEN_EXECUTABLE/g
  bitcoin-core/libmultiprocess#156 refactor: Remove locale-dependent function calls
  bitcoin-core/libmultiprocess#157 refactor: Avoid using std::format
  bitcoin-core/libmultiprocess#159 bugfix: Do not lock EventLoop::mutex after EventLoop is done
  bitcoin-core/libmultiprocess#161 cmake: Avoid including CTest if not top level project
  bitcoin-core/libmultiprocess#164 Bump minimum required cmake to 3.12

  ---

  This PR is part of the [process separation project](#28722).

ACKs for top commit:
  fanquake:
    ACK 01f7715

Tree-SHA512: a6a795e4d4e13e9d35c9346f3c83d5da817f1452bdc4a9412aeadc4c652ad6e5047f4c77756570594a70ec9095cc786772a0469e306dc19bb5a508fd515c37f4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant