deploying to rackspace cloud -- network partitions?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

deploying to rackspace cloud -- network partitions?

Ben Hsu
Hello,

Does anyone on this list have experience running RabbitMQ in the Rackspace hosting provider? If so, how have you dealt with network partitions?

We have an cluster of 3 rabbitmq nodes hosted in Rackspace. In the last few months we've seen two network partitioning events: there will be some kind of network hiccup, and all 3 rabbit nodes will been partitioned from each other. This requires manual intervention to restart rabbit.

We've been experimenting with pause-minority and autoheal  ( https://www.rabbitmq.com/partitions.html#automatic-handling ). We've found that with pause-minority, all 3 nodes end up in a partition with one node, they each then think they're in the minority, and all 3 nodes stop accepting messages.

With autoheal we've found some bizarre errors. In one test the cluster fell into 3 separate parts, and the nodes would not rejoin the cluster. In a second case two of the nodes became partitioned from each other, and the third node would not start. Error message was:

"inet_tcp",{{badmatch,{error,ehostunreach}

_______________________________________________
rabbitmq-discuss mailing list has moved to https://groups.google.com/forum/#!forum/rabbitmq-users,
please subscribe to the new list!

[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss