MongoDB replica sets provide high availability through replication and automated failover. We have a cluster comprising three nodes: replicas "mentos-a" and "mentos-b", plus an arbiter. The problem is that every X seconds, the PRIMARY steps down and the cluster failover to the other node.
SOLUTION:
The way we detect a downed node is by a loss of heartbeats and heartbeat responses. Heartbeat responses time out after 10 seconds and then if we have not received a heartbeat from them in the past two seconds (they are sent every two seconds), we mark them as down. So it is common for the election process to take 10 seconds before it starts.
We can change the number of seconds that the replica set members wait for a successful heartbeat from each other. If a member does not respond in time, other members mark the delinquent member as inaccessible.
In the following example we will change the default 2 seconds heartbeat to 30 seconds
rs0:PRIMARY> cfg = rs.conf(); { "_id" : "rs0", "version" : 2, "members" : [ { "_id" : 0, "host" : "mentos-a:27017" }, { "_id" : 1, "host" : "mentos-b:27017" }, { "_id" : 2, "host" : "mentos-c:27017", "arbiterOnly" : true } ] } rs0:PRIMARY> cfg["settings"] = { heartbeatTimeoutSecs : 30 } { "heartbeatTimeoutSecs" : 30 } rs0:PRIMARY> rs.reconfig(cfg); { "down" : [ "mentos-a:27017" ], "ok" : 1 } rs0:PRIMARY> rs.conf() { "_id" : "rs0", "version" : 3, "members" : [ { "_id" : 0, "host" : "mentos-a:27017" }, { "_id" : 1, "host" : "mentos-b:27017" }, { "_id" : 2, "host" : "mentos-c:27017", "arbiterOnly" : true } ], "settings" : { "heartbeatTimeoutSecs" : 30 } }
If you find this useful, you are welcome to press one of the ads in this page.. Thanks!