We have a two node cluster, an AWS node froze for us, we could neither start or stop it. That made the first node unresponsive to mgmt db actions, all API request timed out. We restart the first node but a lot of queues are then inaccessible:
=ERROR REPORT==== 1-Apr-2014::02:58:03 ===
connection <0.5559.279>, channel 1 - soft error:
"home node 'rabbit@node2' of durable queue 'celery' in vhost 'vhost1' is down or inaccessible",
We issue rabbitmqctl forget_cluster_node rabbit@node2 as we still can't access node2.
Node1 continue to report a lot of "home node of queue is down".
Node2 has now restarted, but can't join the cluster. Is there a way to rejoin the cluster without resetting?
We reset node2 and tries to join_cluster again but with the following result:
Clustering node 'rabbit@node2' with 'rabbit@node1' ...
node2# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node2' ...
But start_app doesn't join node1.
node1# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node1' ...