Recover crashing node

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Recover crashing node

ranjitiyer
I am at using RabbitMQ 3.2.3 and my situation is this

Machine A
> rabbitmq-server -detached

Machine B
> rabbitmq-server -detached
> rabbitmqctl stop_app
> rabbitmqctl join_cluster rabbit@A
> rabbitmq stop (shutdown erl)

Machine A
> rabbitmqctl stop_app

Machine B
> rabbimq-server -detached (starts erl but crashes after 30 secs because A isn't running)

> rabbitmqctl status (hangs)
> rabbitmqctl cluster_status (hangs)
> rabbitmqctl reset (errors)
> rabbitmqctl forget_cluster (errors)

How do I fix B?
Reply | Threaded
Open this post in threaded view
|

Re: Recover crashing node

Michael Klishin-2


On 14 May 2014 at 21:32:26, ranjitiyer ([hidden email]) wrote:
> > > rabbitmqctl status (hangs)
> > rabbitmqctl cluster_status (hangs)
> > rabbitmqctl reset (errors)
> > rabbitmqctl forget_cluster (errors)
>  
> How do I fix B?

See Breaking up a Cluster:
http://www.rabbitmq.com/clustering.html

but if you are willing to reset a node, you can simply wipe out
its database directory. 
--  
MK  

Software Engineer, Pivotal/RabbitMQ
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Recover crashing node

ranjitiyer
Appreciate your quick response. The documentation assumes erl is running. I have trouble starting erl and keeping it running when the master (A) isn't available. Note that in our scenario someone could physically turn off Machine B and so when B comes back up, I first need to "rabbitmq-server -detached" and I am seeing erl will crash if A isn't available. So I don't have an opportunity to fix the cluster using rabbitmqctl commands.

I deleted the database folders under /var/lib/rabbitmq/mnesia but erl crashes right away when I start rabbit (rabbitmq-server -detached). I am certain I am doing something wrong here. Please advise.

Thanks,
Ranjit
Reply | Threaded
Open this post in threaded view
|

Re: Recover crashing node

ranjitiyer
Ok, deleting the Mnesia database worked and I can start the node. Is it possible to pull out the persisted messages from the database?

Back to my scenario. A had some persisted messages when it went down. When B doesn't start because A is down I'll fix up B to start on its own but I want A persisted msgs brought into B. (Assume A's mnesia database is only a network file share that B has access to). Is this possible? This is really important for our async job processing solution to work on RabbitMQ. We don't want to lose jobs persisted on A when A went down.

Reply | Threaded
Open this post in threaded view
|

Re: Recover crashing node

Michael Klishin-2
On 14 May 2014 at 22:28:40, ranjitiyer ([hidden email]) wrote:

> > A had some persisted messages when it went down. When B
> doesn't start because A is down I'll fix up B to start on its own  
> but I want
> A persisted msgs brought into B. (Assume A's mnesia database  
> is only a
> network file share that B has access to). Is this possible? This  
> is really
> important for our async job processing solution to work on RabbitMQ.  
> We
> don't want to lose jobs persisted on A when A went down.

Everything on B is gone because you deleted its database. A's database
is just fine, so if you cluster B with it, mirrored
queues can be synchronised. See http://www.rabbitmq.com/ha.html
--  
MK  

Software Engineer, Pivotal/RabbitMQ
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Recover crashing node

ranjitiyer
I am imagining a situation in which A never comes up. Assuming B has access to A's DB, can it import the persisted queues into B's DB? I know this would be getting into Erlang/Mnesia space.

Ranjit
Reply | Threaded
Open this post in threaded view
|

Re: Recover crashing node

Michael Klishin-2


On 14 May 2014 at 23:04:35, ranjitiyer ([hidden email]) wrote:
> > I am imagining a situation in which A never comes up. Assuming  
> B has access
> to A's DB, can it import the persisted queues into B's DB? I know  
> this would
> be getting into Erlang/Mnesia space.

B has no access to A's message store. Messages and their positions  are not stored in Mnesia.
--  
MK  

Software Engineer, Pivotal/RabbitMQ
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Recover crashing node

ranjitiyer
MK, I'd configure the RABBITMQ_MNESIA_BASE to be a network share so B has access to A's DB.

If A went down last but never came up (say the machine was permanently taken down), I don't want to lose those messages. So when B comes up (by resetting itself because I can't reach A) I am looking for a way by which B can import A's messages into it's own database.

                                        \\share
                                                A
                                                        mnesia
                                                                rabbit@A
                                                                        rabbit_durable_queue.DCD
                                                               
                                                B
                                                        mnesia
                                                                rabbit@B
                                                                        rabbit_durable_queue.DCD
 
           



Reply | Threaded
Open this post in threaded view
|

Re: Recover crashing node

Matthias Radestock-3
On 14/05/14 21:20, ranjitiyer wrote:
> MK, I'd configure the RABBITMQ_MNESIA_BASE to be a network share so B has
> access to A's DB.
>
> If A went down last but never came up (say the machine was permanently taken
> down), I don't want to lose those messages. So when B comes up (by resetting
> itself because I can't reach A) I am looking for a way by which B can import
> A's messages into it's own database.

Do you want to *add* A's data to B or *replace* B's data with A's? There
is no (easy) way to do the former. For the latter, change B's hostname
to A and then just use A's data instead of (what used to be) B's.

Matthias.
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss