Rabbitmq boot failure with "tables_not_present"

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Rabbitmq boot failure with "tables_not_present"

Zhao, Shanyu

Hi,

 

We have two rabbitmq servers to form a cluster. It mostly runs great. But sometimes after redeploy, we saw some boot failure error in rabbitmq server log.

 

The relevant part of the log is shown below. But the problem is that we saw these log messages repeated every 7-8 seconds and can last as long as 80 minutes before rabbit finally start up correctly. During this time any connection to the rabbitmq cluster will get a disconnected exception.

 

Any idea on what might have caused this problem?

 

=INFO REPORT==== 16-Jan-2013::14:11:36 ===

Starting RabbitMQ 3.0.1 on Erlang R14B04

 

=INFO REPORT==== 16-Jan-2013::14:11:37 ===

Limiting to approx 924 file handles (829 sockets)

 

=INFO REPORT==== 16-Jan-2013::14:11:37 ===

Error description:

   {case_clause,{error,tables_not_present}}

 

Log files (may contain more information):

   /var/log/rabbitmq/[hidden email]

   /var/log/rabbitmq/[hidden email]

 

Stack trace:

   [{rabbit_mnesia,discover_cluster,1},

    {rabbit_mnesia,init_from_config,0},

    {rabbit_mnesia,init,0},

    {rabbit,'-run_boot_step/1-lc$^1/1-1-',1},

    {rabbit,run_boot_step,1},

    {rabbit,'-start/2-lc$^0/1-0-',1},

    {rabbit,start,2},

    {application_master,start_it_old,4}]

 

 

=INFO REPORT==== 16-Jan-2013::14:11:38 ===

    application: rabbit

    exited: {bad_return,

                {{rabbit,start,[normal,[]]},

                 {'EXIT',

                     {rabbit,failure_during_boot,

                         {case_clause,{error,tables_not_present}}}}}}

type: temporary

 

 

I really appreciate any insight into this problem!

 

Thanks,

Shanyu

 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

“I always like to point out that it isn't methodologies that succeed or fail, it's teams that succeed or fail. Taking on a process can help a team raise its game, but in the end it's the team that matters and carries the responsibility to do what works for them.”

--by Martin Fowler

 


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Jerry Kuch
On Wed, Jan 16, 2013 at 11:54 AM, Zhao, Shanyu <[hidden email]> wrote:
 

The relevant part of the log is shown below. But the problem is that we saw these log messages repeated every 7-8 seconds and can last as long as 80 minutes before rabbit finally start up correctly. During this time any connection to the rabbitmq cluster will get a disconnected exception.

 

Any idea on what might have caused this problem? 

=INFO REPORT==== 16-Jan-2013::14:11:37 ===

Error description:

   {case_clause,{error,tables_not_present}}

 

Log files (may contain more information):

   /var/log/rabbitmq/[hidden email]

   /var/log/rabbitmq/[hidden email]

 

Stack trace:

   [{rabbit_mnesia,discover_cluster,1},

    {rabbit_mnesia,init_from_config,0},

    {rabbit_mnesia,init,0},

    {rabbit,'-run_boot_step/1-lc$^1/1-1-',1},

    {rabbit,run_boot_step,1},

    {rabbit,'-start/2-lc$^0/1-0-',1},

    {rabbit,start,2},

    {application_master,start_it_old,4}]

 

 

=INFO REPORT==== 16-Jan-2013::14:11:38 ===

    application: rabbit

    exited: {bad_return,

                {{rabbit,start,[normal,[]]},

                 {'EXIT',

                     {rabbit,failure_during_boot,

                         {case_clause,{error,tables_not_present}}}}}}

type: temporary


You mention that you sometime see this after a redeploy.  Depending on how you've redeployed, have you successfully clustered the nodes in the first place?  The error means that some of the tables in Erlang's Mnesia distributed database upon which Rabbit relies to maintain broker metadata weren't found, suggesting that some prior state or configuration perished during your redeploy process.

Best regards,
Jerry


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Zhao, Shanyu

Hi Jerry,

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Jerry Kuch
Sent: Wednesday, January 16, 2013 4:02 PM
To: Discussions about RabbitMQ
Subject: Re: [rabbitmq-discuss] Rabbitmq boot failure with "tables_not_present"

 

On Wed, Jan 16, 2013 at 11:54 AM, Zhao, Shanyu <[hidden email]> wrote:

 

The relevant part of the log is shown below. But the problem is that we saw these log messages repeated every 7-8 seconds and can last as long as 80 minutes before rabbit finally start up correctly. During this time any connection to the rabbitmq cluster will get a disconnected exception.

 

Any idea on what might have caused this problem? 

=INFO REPORT==== 16-Jan-2013::14:11:37 ===

Error description:

   {case_clause,{error,tables_not_present}}

 

Log files (may contain more information):

   [hidden email]

   [hidden email]

 

Stack trace:

   [{rabbit_mnesia,discover_cluster,1},

    {rabbit_mnesia,init_from_config,0},

    {rabbit_mnesia,init,0},

    {rabbit,'-run_boot_step/1-lc$^1/1-1-',1},

    {rabbit,run_boot_step,1},

    {rabbit,'-start/2-lc$^0/1-0-',1},

    {rabbit,start,2},

    {application_master,start_it_old,4}]

 

 

=INFO REPORT==== 16-Jan-2013::14:11:38 ===

    application: rabbit

    exited: {bad_return,

                {{rabbit,start,[normal,[]]},

                 {'EXIT',

                     {rabbit,failure_during_boot,

                         {case_clause,{error,tables_not_present}}}}}}

type: temporary

 

You mention that you sometime see this after a redeploy.  Depending on how you've redeployed, have you successfully clustered the nodes in the first place?  The error means that some of the tables in Erlang's Mnesia distributed database upon which Rabbit relies to maintain broker metadata weren't found, suggesting that some prior state or configuration perished during your redeploy process.

 

I think during the time the error logs are generated, the cluster may not be successfully formed. As part of the deployment scripts, I deleted all content in /var/lib/rabbitmq/mnesia to recover from some scenario when cluster cannot be formed. Here is the relevant part of the deployment scripts:

 

sudo("bash -c 'echo XXXXXXXXXXXXXXXX > /var/lib/rabbitmq/.erlang.cookie'")

sudo("chown rabbitmq /var/lib/rabbitmq/.erlang.cookie")

sudo("chmod 600 /var/lib/rabbitmq/.erlang.cookie")

sudo("rm -fr /var/lib/rabbitmq/mnesia")

 

What I want to achieve after redeployment is to erase previous states completely and let the cluster starts with a clean state, that’s why I erased the /mnesia folder (is there a better way to do that?). The problem is sometimes the error messages show up for a few minutes then everything works fine after that, but other times I saw the error message being logged for 80 minutes before the cluster works correctly. Do you have any suggestions?

 

Thanks,

Shanyu

 

 


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Jerry Kuch

I think during the time the error logs are generated, the cluster may not be successfully formed. As part of the deployment scripts, I deleted all content in /var/lib/rabbitmq/mnesia to recover from some scenario when cluster cannot be formed. Here is the relevant part of the deployment scripts:

 

sudo("bash -c 'echo XXXXXXXXXXXXXXXX > /var/lib/rabbitmq/.erlang.cookie'")

sudo("chown rabbitmq /var/lib/rabbitmq/.erlang.cookie")

sudo("chmod 600 /var/lib/rabbitmq/.erlang.cookie")

sudo("rm -fr /var/lib/rabbitmq/mnesia")


That is indeed a fine way to get rid of your Mnesia contents including clustering info and any metadata that needs to be shared amongst the nodes (queue, exchange, binding, user, vhost, etc. definitions).

On the other hand, after you've done it, you've got no really good reason to expect your nodes to act as clustered.
 

What I want to achieve after redeployment is to erase previous states completely and let the cluster starts with a clean state, that’s why I erased the /mnesia folder (is there a better way to do that?). The problem is sometimes the error messages show up for a few minutes then everything works fine after that, but other times I saw the error message being logged for 80 minutes before the cluster works correctly. Do you have any suggestions?


Are you establishing your clusters using the rabbitmq command line tools or by statically encoding their properties in your rabbitmq.config files?  You're going to have to repeat whichever you did when you bring a newly redeployed cluster, having gone through the cleansing you outline above, back online.

You might consider setting up scripts to execute the appropriate commands, as per our clustering guide, on the appropriate nodes after you've done the scripted clean-up you describe.

Best regards,
Jerry


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Simon MacMullen-2
On 17/01/13 00:45, Jerry Kuch wrote:
> That is indeed a fine way to get rid of your Mnesia contents including
> clustering info and any metadata that needs to be shared amongst the
> nodes (queue, exchange, binding, user, vhost, etc. definitions).

Yes. You could also use 'rabbitmqctl reset'.

> On the other hand, after you've done it, you've got no really good
> reason to expect your nodes to act as clustered.

...unless you have set cluster_nodes in the configuration file.

> Are you establishing your clusters using the rabbitmq command line tools
> or by statically encoding their properties in your rabbitmq.config
> files?

The stack trace in the OP shows the latter.

Cheers, Simon


--
Simon MacMullen
RabbitMQ, VMware
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Simon MacMullen-2
In reply to this post by Zhao, Shanyu
On 16/01/13 19:54, Zhao, Shanyu wrote:
> We have two rabbitmq servers to form a cluster. It mostly runs great.
> But sometimes after redeploy, we saw some boot failure error in rabbitmq
> server log.

Hi. I've reproduced the problem. In order for this to occur:

* You must be using the cluster_nodes configuration parameter
* You must stop all nodes and then fully reset them
* You must then start all nodes simultaneously

I suspect it's the latter bit which is causing this to be intermittent
for you. I'll file a bug to get this fixed, but in the mean time if you
could stagger the start of your cluster nodes (so that one node starts
on its own, and then others can start simultaneously or apart) that will
act as a workaround.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, VMware
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Zhao, Shanyu
In reply to this post by Jerry Kuch

Hi Jerry,

sudo("bash -c 'echo XXXXXXXXXXXXXXXX > /var/lib/rabbitmq/.erlang.cookie'")

sudo("chown rabbitmq /var/lib/rabbitmq/.erlang.cookie")

sudo("chmod 600 /var/lib/rabbitmq/.erlang.cookie")

sudo("rm -fr /var/lib/rabbitmq/mnesia")

 

That is indeed a fine way to get rid of your Mnesia contents including clustering info and any metadata that needs to be shared amongst the nodes (queue, exchange, binding, user, vhost, etc. definitions).

 

On the other hand, after you've done it, you've got no really good reason to expect your nodes to act as clustered.

What I want to achieve after redeployment is to erase previous states completely and let the cluster starts with a clean state, that’s why I erased the /mnesia folder (is there a better way to do that?). The problem is sometimes the error messages show up for a few minutes then everything works fine after that, but other times I saw the error message being logged for 80 minutes before the cluster works correctly. Do you have any suggestions?

 

Are you establishing your clusters using the rabbitmq command line tools or by statically encoding their properties in your rabbitmq.config files?  You're going to have to repeat whichever you did when you bring a newly redeployed cluster, having gone through the cleansing you outline above, back online.

 

You might consider setting up scripts to execute the appropriate commands, as per our clustering guide, on the appropriate nodes after you've done the scripted clean-up you describe.

 

Oh, I used rabbitmq.conf to config clustering, like Simon has pointed it out in another email, here is what it looks like:

[

{rabbit,

  [

    {tcp_listeners, [5672]},

    {cluster_nodes, {['rabbit@ip-10-0-2-97', 'rabbit@ip-10-0-2-106'], disc}}

  ]

}

].

I have the same config file shown above on the two rabbitmq servers 10.0.2.97 and 10.0.2.106.

 

Do you have any suggestions that what might have gone wrong? This configuration works fine in about 80% of time, when the “tables_not_present” error only show up for a few minutes. In about 20% of time, this error appears in the log file for as long as several hours, but in the end the cluster successfully established. Is this a normal behavior?

 

Thanks,

Shanyu

 


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Zhao, Shanyu
In reply to this post by Simon MacMullen-2
Simon,

>On 17/01/13 00:45, Jerry Kuch wrote:
>> That is indeed a fine way to get rid of your Mnesia contents including
>> clustering info and any metadata that needs to be shared amongst the
>> nodes (queue, exchange, binding, user, vhost, etc. definitions).
>
>Yes. You could also use 'rabbitmqctl reset'.

I think I used this before. But in rare situations when the clustering config messed up (when the two rabbitmq server appears to be two standalone rabbitmq servers), doing a 'rabbitmq reset' couldn't fix the problem.

>
>> On the other hand, after you've done it, you've got no really good
>> reason to expect your nodes to act as clustered.
>
>...unless you have set cluster_nodes in the configuration file.
>
>> Are you establishing your clusters using the rabbitmq command line
>> tools or by statically encoding their properties in your
>> rabbitmq.config files?
>
>The stack trace in the OP shows the latter.
>
>Cheers, Simon
>
>
>--
>Simon MacMullen
>RabbitMQ, VMware
>_______________________________________________
>rabbitmq-discuss mailing list
>[hidden email]
>https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Thanks,
Shanyu
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Zhao, Shanyu
In reply to this post by Simon MacMullen-2
Hi Simon,

>-----Original Message-----
>From: Simon MacMullen [mailto:[hidden email]]
>Sent: Thursday, January 17, 2013 2:35 AM
>To: Discussions about RabbitMQ
>Cc: Zhao, Shanyu
>Subject: Re: [rabbitmq-discuss] Rabbitmq boot failure with
>"tables_not_present"
>
>On 16/01/13 19:54, Zhao, Shanyu wrote:
>> We have two rabbitmq servers to form a cluster. It mostly runs great.
>> But sometimes after redeploy, we saw some boot failure error in
>> rabbitmq server log.
>
>Hi. I've reproduced the problem. In order for this to occur:
>
>* You must be using the cluster_nodes configuration parameter
>* You must stop all nodes and then fully reset them
>* You must then start all nodes simultaneously
>
>I suspect it's the latter bit which is causing this to be intermittent for
>you. I'll file a bug to get this fixed, but in the mean time if you could
>stagger the start of your cluster nodes (so that one node starts on its own,
>and then others can start simultaneously or apart) that will act as a
>workaround.

Thanks a lot for your help! Yes, our deployment scripts simultaneously deploy the rabbitmq servers for efficiency. I'll try the workaround you mentioned here.

Shanyu

>
>Cheers, Simon
>
>--
>Simon MacMullen
>RabbitMQ, VMware
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Matthias Radestock-3
In reply to this post by Zhao, Shanyu
On 17/01/13 17:35, Zhao, Shanyu wrote:
> Do you have any suggestions that what might have gone wrong?

See Simon's reply
(http://rabbitmq.1065348.n5.nabble.com/Rabbitmq-boot-failure-with-tables-not-present-tp24494p24512.html)

Matthias.
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Zhao, Shanyu
In reply to this post by Zhao, Shanyu
Hi Simon,

>>On 16/01/13 19:54, Zhao, Shanyu wrote:
>>> We have two rabbitmq servers to form a cluster. It mostly runs great.
>>> But sometimes after redeploy, we saw some boot failure error in
>>> rabbitmq server log.
>>
>>Hi. I've reproduced the problem. In order for this to occur:
>>
>>* You must be using the cluster_nodes configuration parameter
>>* You must stop all nodes and then fully reset them
>>* You must then start all nodes simultaneously
>>
>>I suspect it's the latter bit which is causing this to be intermittent
>>for you. I'll file a bug to get this fixed, but in the mean time if you
>>could stagger the start of your cluster nodes (so that one node starts
>>on its own, and then others can start simultaneously or apart) that
>>will act as a workaround.
>
>Thanks a lot for your help! Yes, our deployment scripts simultaneously
>deploy the rabbitmq servers for efficiency. I'll try the workaround you
>mentioned here.

We've tried your workaround and it works great! Thanks a lot!

Shanyu
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

waseemtaj
This post was updated on .
In reply to this post by Simon MacMullen-2
Hi Simon


>>> I'll file a bug to get this fixed, but in the mean time if you
could stagger the start of your cluster nodes (so that one node starts
on its own, and then others can start simultaneously or apart) that will
act as a workaround.

Has this issue been fixed and if so which version was this fixed in? Release notes for 3.0.2 (http://www.rabbitmq.com/release-notes/README-3.0.2.txt), mention this bug fix:

25420 fix issue causing crash at startup if another node reports Mnesia
      starting or stopping

Is this the one?

Thanks in advance.

Waseem Taj
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

Simon MacMullen-2
On 30/08/2013 12:41AM, waseemtaj wrote:
> Has this issue been fixed and if so which version was this fixed in? Thanks
> in advance.

I'm afraid it hasn't been fixed yet.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Rabbitmq boot failure with "tables_not_present"

srinath
I faced this issue using RabbitMQ 3.2.2
If you have a process of up-voting an issue, please consider my vote.
Would also be interested to know when this is likely to be fixed.

Btw, what is the bug id?