We are currently running a rabbitmq cluster running on RHEL 6.5 64 bit os, comprising of 10 nodes. We use the auto-configuration for the cluster where the first node doesn't have any cluster_nodes specified in the rabbitmq.config and all other nodes have just the first node specified in the cluster_nodes. I bring up the first node and then the rest of the nodes. I see the cluster is setup correctly and things seem to work fine.
However occasionally when the nodes reboot I see the startup hangs in Starting rabbitmq-cluster. It seems to hang forever and doesn't timeout or anything. In some cases we have left the system for a couple of hours and it doesn't seem to timeout, suggesting the system is in a deadlock or something. A reset of the node in the hung state sometimes recovers and sometimes it doesn't.
The strange part is I cannot reproduce this at will but it happens nevertheless.
Has anyone seen this behavior?
Is specifying the cluster_nodes the way I described is the correct way to do so?
I would appreciate if anyone has any suggestions on how to deal with this issue..