Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote disable raft flush flag to non-experimental #11519

Closed
deepthidevaki opened this issue Feb 1, 2023 · 5 comments · Fixed by #11705
Closed

Promote disable raft flush flag to non-experimental #11519

deepthidevaki opened this issue Feb 1, 2023 · 5 comments · Fixed by #11705
Assignees
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. support Marks an issue as related to a customer support request version:8.2.0-alpha5 Marks an issue as being completely or in parts released in 8.2.0-alpha5 version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0

Comments

@deepthidevaki
Copy link
Contributor

deepthidevaki commented Feb 1, 2023

Description

Now that #11423 is done, we can officially support this flag.

-[ ] Properly document dangers of disabling raft flush.
-[ ] Move the flag out of non-experimental. TBD whether we can make breaking changes for this config.

Related https://github.com/camunda/product-hub/issues/642
Relates to: https://jira.camunda.com/browse/SUPPORT-15888

@deepthidevaki deepthidevaki added the kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. label Feb 1, 2023
@deepthidevaki deepthidevaki self-assigned this Feb 3, 2023
@npepinpe npepinpe assigned npepinpe and unassigned deepthidevaki Feb 3, 2023
@deepthidevaki
Copy link
Contributor Author

Summary from discussion with @npepinpe

We would first like to evaluate #11488. On suggestion there is to flush every X seconds. In addition we can also do flush asynchronously without waiting. With a proper value for X, we might be able get similar performance as disabling raft flush, but with added guarantee that the data is flushed atleast once in X seconds. This would be better than disabling flush completely.

If the performance with this is comparable to disabling flush, then it would be worth re-evaluating whether we should support disabling raft completely. Instead we can support configurable flush interval.

@npepinpe
Copy link
Member

npepinpe commented Feb 7, 2023

Will be split into two PRs. The first one will introduce the configuration, the various flush strategies, but will be synchronous. This is because flushing asynchronously is not safe at the moment, as the write path of the journal is not thread safe.

A second PR will look into making the delayed flush strategy asynchronous.

@npepinpe
Copy link
Member

After the PR, the next step will be to make flushing asynchronous. So we don't forget, here's the idea we came up with during brainstorming:

  • Introduce a background thread context to the journal
  • Synchronize compaction, next segment creation, and flushing on that thread.
  • When the last reader is closed on a segment marked for deletion, enqueue the task to delete the segment on the background context.

This approach ensures we can safely flush asynchronously without having to worry about a mapped buffer being unmapped.

After this is done, we should look into pushing down the concept of lastFlushedIndex to the journal.

@ChrisKujawa
Copy link
Member

@npepinpe does it make sense to already introduced here our actors?

ghost pushed a commit that referenced this issue Feb 12, 2023
11569: Allow configuring different Raft flush strategies r=npepinpe a=npepinpe

## Description

This PR introduce a new user facing configuration which allows user to configure different Raft flush strategies. This can help users for whom the default performance of Zeebe is insufficient by trading off safety for performance.

The configuration looks like this:

```yaml
zeebe:
  broker:
    cluster:
      # Configure raft properties
      # raft:
        # Configures how often data is explicitly flushed to disk. By default, for a given
        # partition, data is flushed on every leader commit, and every follower append. This is to
        # ensure consistency across all replicas. Disabling this can cause inconsistencies, and at
        # worst, data corruption or data loss scenarios.
        #
        # The default behavior is optimized for safety, and flushing occurs on every leader commit
        # and follower append in a synchronous fashion. You can introduce a delay to reduce the
        # performance penalty of flushing via `delayTime`.
        flush:
          # If false, explicit flushing of the Raft log is disabled, and flushing only occurs right
          # before a snapshot is taken. You should only disable explicit flushing if you are willing
          # to accept potential data loss at the expense of performance. Before disabling it, try
          # the delayed options, which provide a trade-off between safety and performance.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_RAFT_FLUSH_ENABLED
          # enabled: true
          # If the delay is > 0, then flush requests are delayed by at least the given period. It is
          # recommended that you find the smallest delay here with which you achieve your
          # performance goals. It's also likely that anything above 30s is not useful, as this is
          # the typical default flush interval for the Linux OS.
          # This setting can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_RAFT_FLUSH_DELAYTIME
          # delayTime: 0s
```

There are 3 strategies:

- NoOp: calling `RaftLog#flush` will do nothing. This is the fastest, but most dangerous. You can enable this by configuring `ZEEBE_BROKER_CLUSTER_RAFT_FLUSH_ENABLED=false`
- Direct: calling `RaftLog#flush` will flush the journal. This is the default configuration, and the safest, and what's currently in use. You can configure it by setting `ZEEBE_BROKER_CLUSTER_RAFT_FLUSH_ENABLED=true` and `ZEEBE_BROKER_CLUSTER_RAFT_FLUSH_DELAYTIME=0s`.
- Delayed: calling `RaftLog#flush` will signal the flusher there is something to flush. If nothing is scheduled yet, a flush operation is scheduled for the configured delay in `ZEEBE_BROKER_CLUSTER_RAFT_FLUSH_DELAYTIME`. If there is already a scheduled operation, then nothing happens. This means under constant load you would be flushing periodically with the delay as period. Under low load, then when there is nothing to flush, nothing happens.

Because flushing the journal is asynchronously is not thread safe at the moment, this PR flushes synchronously on the Raft thread, and some corners were cut. This will be tackled in a second PR to keep things manageable.

The old `disableExplicitRaftFlush` flag is still respected and will set the strategy to `NoOp`. It's marked as deprecated now and marked for removal in 8.3.0.

## Related issues

related to #11519 



Co-authored-by: Nicolas Pepin-Perreault <[email protected]>
@npepinpe
Copy link
Member

I would keep it simpler and introduce the smallest possible interface, which in this case is just an Executor. Possibly a combination of AutoCloseable and Executor, if we decide the journal manages the thread or not (probably).

@npepinpe npepinpe mentioned this issue Feb 20, 2023
14 tasks
@ghost ghost closed this as completed in d24d29b Feb 27, 2023
@deepthidevaki deepthidevaki added the version:8.2.0-alpha5 Marks an issue as being completely or in parts released in 8.2.0-alpha5 label Mar 7, 2023
@megglos megglos added the support Marks an issue as related to a customer support request label Mar 14, 2023
@npepinpe npepinpe added the version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0 label Apr 5, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. support Marks an issue as related to a customer support request version:8.2.0-alpha5 Marks an issue as being completely or in parts released in 8.2.0-alpha5 version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants