Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not unban replicas if a primary is available #843

Merged
merged 1 commit into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CONFIG.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,16 @@ default: 60 # seconds

How long to ban a server if it fails a health check (seconds).

### unban_replicas_when_all_banned
```
path: general.unban_replicas_when_all_banned
default: true
```

Whether or not we should unban all replicas when they are all banned. This is set
to true by default to prevent disconnection when we have replicas with a false positive
health check.

### log_client_connections
```
path: general.log_client_connections
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ The setting will persist until it's changed again or the client disconnects.
By default, all queries are routed to the first available server; `default_role` setting controls this behavior.

### Failover
All servers are checked with a `;` (very fast) query before being given to a client. Additionally, the server health is monitored with every client query that it processes. If the server is not reachable, it will be banned and cannot serve any more transactions for the duration of the ban. The queries are routed to the remaining servers. If all servers become banned, the ban list is cleared: this is a safety precaution against false positives. The primary can never be banned.
All servers are checked with a `;` (very fast) query before being given to a client. Additionally, the server health is monitored with every client query that it processes. If the server is not reachable, it will be banned and cannot serve any more transactions for the duration of the ban. The queries are routed to the remaining servers. If all servers become banned, the behavior is controlled by the configuration parameter `unban_replicas_when_all_banned`. If it is set to true (the default), the ban list is cleared: this is a safety precaution against false positives, if it is set to false, no replicas will be available until they become healthy. The primary can never be banned.

The ban time can be changed with `ban_time`. The default is 60 seconds.

Expand Down
4 changes: 4 additions & 0 deletions src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,9 @@ pub struct General {
#[serde(default = "General::default_ban_time")]
pub ban_time: i64,

#[serde(default)] // True
pub unban_replicas_when_all_banned: bool,

#[serde(default = "General::default_idle_client_in_transaction_timeout")]
pub idle_client_in_transaction_timeout: u64,

Expand Down Expand Up @@ -460,6 +463,7 @@ impl Default for General {
healthcheck_timeout: Self::default_healthcheck_timeout(),
healthcheck_delay: Self::default_healthcheck_delay(),
ban_time: Self::default_ban_time(),
unban_replicas_when_all_banned: true,
idle_client_in_transaction_timeout: Self::default_idle_client_in_transaction_timeout(),
server_lifetime: Self::default_server_lifetime(),
server_round_robin: Self::default_server_round_robin(),
Expand Down
10 changes: 9 additions & 1 deletion src/pool.rs
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,9 @@ pub struct PoolSettings {
// Ban time
pub ban_time: i64,

// Should we automatically unban replicas when all are banned?
pub unban_replicas_when_all_banned: bool,

// Regex for searching for the sharding key in SQL statements
pub sharding_key_regex: Option<Regex>,

Expand Down Expand Up @@ -228,6 +231,7 @@ impl Default for PoolSettings {
healthcheck_delay: General::default_healthcheck_delay(),
healthcheck_timeout: General::default_healthcheck_timeout(),
ban_time: General::default_ban_time(),
unban_replicas_when_all_banned: true,
sharding_key_regex: None,
shard_id_regex: None,
regex_search_limit: 1000,
Expand Down Expand Up @@ -541,6 +545,9 @@ impl ConnectionPool {
healthcheck_delay: config.general.healthcheck_delay,
healthcheck_timeout: config.general.healthcheck_timeout,
ban_time: config.general.ban_time,
unban_replicas_when_all_banned: config
.general
.unban_replicas_when_all_banned,
sharding_key_regex: pool_config
.sharding_key_regex
.clone()
Expand Down Expand Up @@ -946,8 +953,9 @@ impl ConnectionPool {
let read_guard = self.banlist.read();
let all_replicas_banned = read_guard[address.shard].len() == replicas_available;
drop(read_guard);
let unban_replicas_when_all_banned = self.settings.clone().unban_replicas_when_all_banned;

if all_replicas_banned {
if all_replicas_banned && unban_replicas_when_all_banned {
let mut write_guard = self.banlist.write();
warn!("Unbanning all replicas.");
write_guard[address.shard].clear();
Expand Down
2 changes: 2 additions & 0 deletions src/query_router.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1464,6 +1464,7 @@ mod test {
healthcheck_delay: PoolSettings::default().healthcheck_delay,
healthcheck_timeout: PoolSettings::default().healthcheck_timeout,
ban_time: PoolSettings::default().ban_time,
unban_replicas_when_all_banned: true,
sharding_key_regex: None,
shard_id_regex: None,
default_shard: crate::config::DefaultShard::Shard(0),
Expand Down Expand Up @@ -1542,6 +1543,7 @@ mod test {
healthcheck_delay: PoolSettings::default().healthcheck_delay,
healthcheck_timeout: PoolSettings::default().healthcheck_timeout,
ban_time: PoolSettings::default().ban_time,
unban_replicas_when_all_banned: true,
sharding_key_regex: Some(Regex::new(r"/\* sharding_key: (\d+) \*/").unwrap()),
shard_id_regex: Some(Regex::new(r"/\* shard_id: (\d+) \*/").unwrap()),
default_shard: crate::config::DefaultShard::Shard(0),
Expand Down