Queue data recovery after master failure

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Queue data recovery after master failure

Andrei D.
We have a 12 node cluster with dozens of mirrored queues (2 slaves per queue). 
Here's the scenario we're trying to understand how to recover from.

Say we have a complete power failure and when power is restored one of the nodes is dead. 
For at least one queue, that node used to be the master node. The queue is now unresponsive, which is somewhat expected (no failover happened before the crash so we now have 2 slaves). The queue data (messages) must be physically present on at least one slave (at least one of those is a disc node). However, we seem to have no way to recover the queue and keep that data. 
If we bring a new node up to replace the old one (we reset the old one to simulate a fresh node), the queue becomes available but it's now empty (we assume this to be the result of the - now empty - master synchronizing with the slaves, sort of in the "opposite direction" of what we'd like). 
Is there a way to either designate the slave that still has the data as a master for the troubled queue, or to push that queue data to the new (resurrected) node? 

Thanks is advance!


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Queue data recovery after master failure

Simon MacMullen-2
On 18/06/2014 04:43, Andrei wrote:
> Is there a way to either designate the slave that still has the data as
> a master for the troubled queue, or to push that queue data to the new
> (resurrected) node?

I'm afraid not. Really "rabbitmqctl forget_cluster_node" should be able
to cause down slaves to come back as new masters, which would be the
right solution to this. I'm hoping that we'll be able to do that for
3.4.0, but it's a somewhat intrusive change. The bug number for this
branch will be 26191, so you can keep an eye on it in future (currently
there's nothing there).

Cheers, Simon
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Queue data recovery after master failure

Andrei D.
Thanks for the quick response Simon.
I assume there's no easy workaround? (such as manually extracting the queue data from the slave and copying it to the new master before it rejoins the cluster; I'm not familiar with the rabbit queue data storage format so I'm not sure if that's feasible - probably not since you haven't mentioned it ;) )
ps: couldn't access the 26191 in bugzilla, I assume it's private to contributors?

thanks, andrei
Reply | Threaded
Open this post in threaded view
|

Re: Queue data recovery after master failure

Simon MacMullen-2
On 18/06/14 16:21, Andrei D. wrote:
> Thanks for the quick response Simon.
> I assume there's no easy workaround?

I can't think of an easy one. If I was desperate then I would try the
following, assuming we start from a completely stopped cluster:

0) Back up Mnesia dirs on all machines, obviously.

1) Start a slave node with RABBITMQ_NODE_ONLY set. Make sure it is set,
or the slave will start the rabbit app which will clear out the slave's
persistent storage, and you restore from 0).

2) Run "rabbitmqctl forget_cluster_node --offline <dead-master>"

3) Start the mnesia app on the slave.

4) Update the rabbit_durable_queue records for queues that need
recovering from this slave, moving the slave pid for the appropriate
node() from the 'slave_pids' field to the 'pid' field.

5) Start the rabbit app on the slave.

I think that stands a decent chance of working, but obviously the
usability of such a solution is exceptionally poor. Step 4) in
particular would require some Erlang programming.

> (such as manually extracting the queue
> data from the slave and copying it to the new master before it rejoins the
> cluster; I'm not familiar with the rabbit queue data storage format so I'm
> not sure if that's feasible - probably not since you haven't mentioned it ;)

I'm not sure how well that would work, you'd have the problem that you
need not just the individual queue's index files but also the files
containing that queue's messages from the message store. I can't see
that being fun to sort out.

> ps: couldn't access the 26191 in bugzilla, I assume it's private to
> contributors?

Yes. But you can look out for it in future release notes, and as a
branch in hg.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Queue data recovery after master failure

Andrei D.
makes sense, thanks again
Reply | Threaded
Open this post in threaded view
|

Re: Queue data recovery after master failure

Simon MacMullen-2
In reply to this post by Simon MacMullen-2
On 18/06/14 09:42, Simon MacMullen wrote:
> I'm afraid not. Really "rabbitmqctl forget_cluster_node" should be able
> to cause down slaves to come back as new masters, which would be the
> right solution to this. I'm hoping that we'll be able to do that for
> 3.4.0, but it's a somewhat intrusive change. The bug number for this
> branch will be 26191, so you can keep an eye on it in future (currently
> there's nothing there).

To follow on:

This turns out to be easy if we can assume that there will be a slave to
promote that is also down (quite likely since you tend to encounter this
while your cluster is down), so I've done that as a first pass at the
problem. See:

http://next.rabbitmq.com/ha.html#promotion-while-down

for documentation of how it will be in tonight's nightly build.

26191 is still reserved for the more thorny case of how to do this when
the rest of the cluster has come back up - there are a lot of issues
there so it might not happen soon.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Queue data recovery after master failure

Andrei D.
Great, I think we can make that assumption (slaves down) in the scenario I described.
I'm thinking the recovery procedure would look like this:
1. power up all nodes without starting rabbit; say node X doesn't come up.
2. start rabbit on all the nodes that were not a slave for (any queue on) X
3. run the new and improved :) forget_cluster_node X. -> this should promote some (offline) slave S as master
4. start rabbit on S (and the rest of the nodes) which should now be master and have all the messages it had when the cluster went down.
Assuming the above should work (could you kindly confirm?), what do you think the ETA would be for the next official release that would include the required forget_cluster_node fix? (the one that's already in the nightly build)
Thanks!
Andrei
Reply | Threaded
Open this post in threaded view
|

Re: Queue data recovery after master failure

Simon MacMullen-2
On 15/07/2014 8:34PM, Andrei D. wrote:
> Great, I think we can make that assumption (slaves down) in the scenario I
> described.
> I'm thinking the recovery procedure would look like this:
> 1. power up all nodes without starting rabbit; say node X doesn't come up.
> 2. start rabbit on all the nodes that were not a slave for (any queue on) X
> 3. run the new and improved:)  forget_cluster_node X. -> this should promote
> some (offline) slave S as master
> 4. start rabbit on S (and the rest of the nodes) which should now be master
> and have all the messages it had when the cluster went down.

Yes, that's correct. Except that you don't need to do 2), the cluster
can be completely down for this to happen.

> Assuming the above should work (could you kindly confirm?), what do you
> think the ETA would be for the next official release that would include the
> required forget_cluster_node fix? (the one that's already in the nightly
> build)

It will be in 3.4.0. We usually make two feature releases per yesr, in
spring and autumn, so I would guess that would mean September-ish.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list has moved to https://groups.google.com/forum/#!forum/rabbitmq-users,
please subscribe to the new list!

[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss