Skip to content

Commit cdcfa99

Browse files
authored
Do not unban replicas if a primary is available (postgresml#843)
Add `unban_replicas_when_all_banned` to control unbanning replicas behavior.
1 parent f27dc6b commit cdcfa99

File tree

5 files changed

+26
-2
lines changed

5 files changed

+26
-2
lines changed

CONFIG.md

+10
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,16 @@ default: 60 # seconds
130130

131131
How long to ban a server if it fails a health check (seconds).
132132

133+
### unban_replicas_when_all_banned
134+
```
135+
path: general.unban_replicas_when_all_banned
136+
default: true
137+
```
138+
139+
Whether or not we should unban all replicas when they are all banned. This is set
140+
to true by default to prevent disconnection when we have replicas with a false positive
141+
health check.
142+
133143
### log_client_connections
134144
```
135145
path: general.log_client_connections

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ The setting will persist until it's changed again or the client disconnects.
175175
By default, all queries are routed to the first available server; `default_role` setting controls this behavior.
176176

177177
### Failover
178-
All servers are checked with a `;` (very fast) query before being given to a client. Additionally, the server health is monitored with every client query that it processes. If the server is not reachable, it will be banned and cannot serve any more transactions for the duration of the ban. The queries are routed to the remaining servers. If all servers become banned, the ban list is cleared: this is a safety precaution against false positives. The primary can never be banned.
178+
All servers are checked with a `;` (very fast) query before being given to a client. Additionally, the server health is monitored with every client query that it processes. If the server is not reachable, it will be banned and cannot serve any more transactions for the duration of the ban. The queries are routed to the remaining servers. If all servers become banned, the behavior is controlled by the configuration parameter `unban_replicas_when_all_banned`. If it is set to true (the default), the ban list is cleared: this is a safety precaution against false positives, if it is set to false, no replicas will be available until they become healthy. The primary can never be banned.
179179

180180
The ban time can be changed with `ban_time`. The default is 60 seconds.
181181

src/config.rs

+4
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,9 @@ pub struct General {
315315
#[serde(default = "General::default_ban_time")]
316316
pub ban_time: i64,
317317

318+
#[serde(default)] // True
319+
pub unban_replicas_when_all_banned: bool,
320+
318321
#[serde(default = "General::default_idle_client_in_transaction_timeout")]
319322
pub idle_client_in_transaction_timeout: u64,
320323

@@ -460,6 +463,7 @@ impl Default for General {
460463
healthcheck_timeout: Self::default_healthcheck_timeout(),
461464
healthcheck_delay: Self::default_healthcheck_delay(),
462465
ban_time: Self::default_ban_time(),
466+
unban_replicas_when_all_banned: true,
463467
idle_client_in_transaction_timeout: Self::default_idle_client_in_transaction_timeout(),
464468
server_lifetime: Self::default_server_lifetime(),
465469
server_round_robin: Self::default_server_round_robin(),

src/pool.rs

+9-1
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,9 @@ pub struct PoolSettings {
189189
// Ban time
190190
pub ban_time: i64,
191191

192+
// Should we automatically unban replicas when all are banned?
193+
pub unban_replicas_when_all_banned: bool,
194+
192195
// Regex for searching for the sharding key in SQL statements
193196
pub sharding_key_regex: Option<Regex>,
194197

@@ -228,6 +231,7 @@ impl Default for PoolSettings {
228231
healthcheck_delay: General::default_healthcheck_delay(),
229232
healthcheck_timeout: General::default_healthcheck_timeout(),
230233
ban_time: General::default_ban_time(),
234+
unban_replicas_when_all_banned: true,
231235
sharding_key_regex: None,
232236
shard_id_regex: None,
233237
regex_search_limit: 1000,
@@ -541,6 +545,9 @@ impl ConnectionPool {
541545
healthcheck_delay: config.general.healthcheck_delay,
542546
healthcheck_timeout: config.general.healthcheck_timeout,
543547
ban_time: config.general.ban_time,
548+
unban_replicas_when_all_banned: config
549+
.general
550+
.unban_replicas_when_all_banned,
544551
sharding_key_regex: pool_config
545552
.sharding_key_regex
546553
.clone()
@@ -946,8 +953,9 @@ impl ConnectionPool {
946953
let read_guard = self.banlist.read();
947954
let all_replicas_banned = read_guard[address.shard].len() == replicas_available;
948955
drop(read_guard);
956+
let unban_replicas_when_all_banned = self.settings.clone().unban_replicas_when_all_banned;
949957

950-
if all_replicas_banned {
958+
if all_replicas_banned && unban_replicas_when_all_banned {
951959
let mut write_guard = self.banlist.write();
952960
warn!("Unbanning all replicas.");
953961
write_guard[address.shard].clear();

src/query_router.rs

+2
Original file line numberDiff line numberDiff line change
@@ -1464,6 +1464,7 @@ mod test {
14641464
healthcheck_delay: PoolSettings::default().healthcheck_delay,
14651465
healthcheck_timeout: PoolSettings::default().healthcheck_timeout,
14661466
ban_time: PoolSettings::default().ban_time,
1467+
unban_replicas_when_all_banned: true,
14671468
sharding_key_regex: None,
14681469
shard_id_regex: None,
14691470
default_shard: crate::config::DefaultShard::Shard(0),
@@ -1542,6 +1543,7 @@ mod test {
15421543
healthcheck_delay: PoolSettings::default().healthcheck_delay,
15431544
healthcheck_timeout: PoolSettings::default().healthcheck_timeout,
15441545
ban_time: PoolSettings::default().ban_time,
1546+
unban_replicas_when_all_banned: true,
15451547
sharding_key_regex: Some(Regex::new(r"/\* sharding_key: (\d+) \*/").unwrap()),
15461548
shard_id_regex: Some(Regex::new(r"/\* shard_id: (\d+) \*/").unwrap()),
15471549
default_shard: crate::config::DefaultShard::Shard(0),

0 commit comments

Comments
 (0)