Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic discovery of Primary and Replica nodes #228

Open
Kobaruon opened this issue Nov 10, 2022 · 12 comments
Open

Automatic discovery of Primary and Replica nodes #228

Kobaruon opened this issue Nov 10, 2022 · 12 comments
Labels
enhancement New feature or request

Comments

@Kobaruon
Copy link

Is your feature request related to a problem? Please describe.
When a primary node failover on PostgreSQL cluster specifically Patroni in my case, pgcat needs to be configured again to correctly route queries.

[pools.simple_db]
pool_mode = "transaction"
default_role = "any"
query_parser_enabled = true
primary_reads_enabled = false
sharding_function = "pg_bigint_hash"

[pools.simple_db.users.0]
username = "dba"
password = "password"
pool_size = 100
statement_timeout = 0

[pools.simple_db.shards.0]
servers = [
    [ "10.85.246.169", 5432, "replica"],
    [ "10.85.241.153", 5432, "primary"], # this node failover and another node becomes primary
    [ "10.85.245.26", 5432, "replica"]
]
database = "simple_db"

When this happens, it bans that node correctly, but it can not detect another node that becomes primary on its own. As a result of this, write queries get errors.

[2022-11-10T12:59:00.125226Z ERROR pgcat::pool] Banning instance Address { id: 1, host: "10.85.241.153", port: 5432, shard: 0, database: "simple_db", role: Primary, replica_number: 1, address_index: 1, username: "dba", pool_name: "simple_db" }, error: TimedOut
[2022-11-10T12:59:00.125272Z ERROR pgcat::pool] Banning Address { id: 1, host: "10.85.241.153", port: 5432, shard: 0, database: "simple_db", role: Primary, replica_number: 1, address_index: 1, username: "dba", pool_name: "simple_db" }
[2022-11-10T12:59:00.125340Z ERROR pgcat::client] Could not get connection from pool: { pool_name: "simple_db", username: "dba", shard: 0, role: "Some(Primary)", error: "AllServersDown" }

Describe the solution you'd like
I think this behaviour can be improved by checking nodes with a query like this to see if they are primary or replica.

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

If this query returns true pgcat can mark that node replica, if it's false then it knows that's a primary node.

Describe alternatives you've considered
Another alternative can be creating a test table when pgcat starts, and it can run INSERT/DELETE/UPDATE on that table to see if it runs successfully.

What do you think? Is this a viable solution to this problem?

@levkk
Copy link
Contributor

levkk commented Nov 10, 2022

I agree, I was thinking of implementing the same thing. Another approach could be to patch Patroni to update PgCat's config when something changes in the cluster.

Generally I think config-driven routing is better than probing nodes for information because configs are changed and verified by some system who's entire job is to make sure configs are correct. Meanwhile, someone can just remove standby.signal by accident from a replica and promote it, and we'll have a split-brain situation or a config validation failure because PgCat does not accept multiple primaries in one shard.

I think overall though, in most situations, this feature is likely to do more good than harm, so we should implement it but put it behind a config, e.g. automatic_role_discovery = true and set it to false by default for the time being until we understand all corner cases better.

@levkk levkk added the enhancement New feature or request label Nov 10, 2022
@tsingson
Copy link

hi,
mark "primary" is write-able with read
, and 'replica' for read only ?

@Kobaruon
Copy link
Author

hi, mark "primary" is write-able with read , and 'replica' for read only ?

Yes, primary is a leader node with both reads and writes. Replica is for read only.

@thedodd
Copy link

thedodd commented Jan 12, 2023

Patroni definitely does a lot of great stuff; however, would it be within scope to just replace Patroni with something like this? The primary mechanism with pgcat seems to be a bit more narrow in scope. It doesn't appear as though the primary can be dynamically changed in this system, instead such would require a manual — or at least an externally driven — reconfig.

Thoughts?

@levkk
Copy link
Contributor

levkk commented Jan 12, 2023

At the moment, that's true, but we can make it dynamic. Patroni does more things to the actual DBs, like reconfiguration after failover, so pgcat can't replace Patroni yet, and might be out of scope at the moment.

There is benefit to having a unified tool that handles Postgres production deployments as a whole, in the future.

@SLoeuillet
Copy link

Got the same issue with pg-auto-failover
select pg_is_in_recovery(); => t on primary
select pg_is_in_recovery(); => f on secondaries

@rim99
Copy link

rim99 commented Aug 27, 2023

Patroni uses confd for proxy config patching, https://github.com/zalando/patroni/tree/master/extras#confd
I think pgcat can maintain a set of configs of its own

@kevinelliott
Copy link
Contributor

Got the same issue with pg-auto-failover
select pg_is_in_recovery(); => t on primary
select pg_is_in_recovery(); => f on secondaries

I'm also using pg-auto-failover, so it would be great to make it easier to mark the new primary automatically, and then at least offer some notification capability.

@brunoprietog
Copy link

pg_autofailover here too

@BhautikChudasama
Copy link

Hi @levkk
Consider I have postgres running on A machine and I have B which is a replica. Now I want to do failover from A to B without downtime or data loss, is it possible to use pgCat or other tools?
Thanks

@AkashRajvanshi
Copy link

can someone found any solution for that issue? I'm getting same issue.

@nik736
Copy link

nik736 commented Aug 19, 2024

This would be a great feature to have. We are not using Patroni but this feature would be perfect for our use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests