deploying to rackspace cloud -- network partitions?
Does anyone on this list have experience running RabbitMQ in the Rackspace hosting provider? If so, how have you dealt with network partitions?
We have an cluster of 3 rabbitmq nodes hosted in Rackspace. In the last few months we've seen two network partitioning events: there will be some kind of network hiccup, and all 3 rabbit nodes will been partitioned from each other. This requires manual intervention to restart rabbit.
We've been experimenting with pause-minority and autoheal ( https://www.rabbitmq.com/partitions.html#automatic-handling ). We've found that with pause-minority, all 3 nodes end up in a partition with one node, they each then think they're in the minority, and all 3 nodes stop accepting messages.
With autoheal we've found some bizarre errors. In one test the cluster fell into 3 separate parts, and the nodes would not rejoin the cluster. In a second case two of the nodes became partitioned from each other, and the third node would not start. Error message was: