Quantcast

Active-active crash report

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Active-active crash report

vchekan
Hi all,

I'm testing my active-active setup (2.8.1, linux 64) and I am randomly running into some crashes when I'm stopping a node. I can stop one node abut another one fails along with it. Below is a crash log.

Is "rabbitmq_management" the UI plugin? Could it bring down the whone node, because I keep UI open in browser while doing my testing.

Vadim.

=INFO REPORT==== 26-Apr-2012::12:15:58 ===
Adding mirror of queue 'test_33' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4773.0>

=INFO REPORT==== 26-Apr-2012::12:15:58 ===
Adding mirror of queue 'test_34' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4777.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_35' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4782.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_36' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4786.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_37' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4790.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_38' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4795.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_39' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4799.0>

=ERROR REPORT==== 26-Apr-2012::12:15:59 ===
Discarding message {'$gen_call',{<0.1955.0>,#Ref<0.0.0.5513>},{add_on_right,{9,<0.1955.0>}}} from <0.1955.0> to <0.26823.834>
 in an old incarnation (2) of this node (3)


=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_40' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4803.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_41' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4807.0>

=ERROR REPORT==== 26-Apr-2012::12:15:59 ===
Discarding message {'$gen_call',{<0.1983.0>,#Ref<0.0.0.5648>},{add_on_right,{9,<0.1983.0>}}} from <0.1983.0> to <0.26850.834>
 in an old incarnation (2) of this node (3)


=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_42' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4811.0>

=INFO REPORT==== 26-Apr-2012::12:15:59 ===
Adding mirror of queue 'test_43' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4816.0>

=INFO REPORT==== 26-Apr-2012::12:16:39 ===
rabbit on node 'rabbit@rabbitmq-dev-3' up

=WARNING REPORT==== 26-Apr-2012::12:18:15 ===
closing AMQP connection <0.1433.0> (172.16.51.100:16360 -> 10.7.1.21:5672):
connection_closed_abruptly

=INFO REPORT==== 26-Apr-2012::12:23:57 ===
    application: rabbitmq_management
    exited: shutdown
    type: permanent

=INFO REPORT==== 26-Apr-2012::12:23:57 ===
stopped TCP Listener on 0.0.0.0:5672

=ERROR REPORT==== 26-Apr-2012::12:23:57 ===
** Generic server <0.1800.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<0.4684.0>]}}
** When Server state == {state,
                            {amqqueue,
                                {resource,<<"/">>,queue,<<"test_29">>},
                                true,false,<0.1433.0>,
                                [{<<"x-ha-policy">>,longstr,<<"all">>},
                                 {<<"x-message-ttl">>,signedint,600000}],
                                <0.1799.0>,[],all},
                            <0.1801.0>,
                            {dict,0,16,16,8,80,48,
                                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            #Fun<rabbit_mirror_queue_master.1.2951048>,
                            #Fun<rabbit_mirror_queue_master.2.72654940>}
** Reason for termination ==
** {{case_clause,{ok,<3066.9234.0>,[<0.4683.0>]}},
    [{rabbit_mirror_queue_coordinator,handle_cast,2},
     {gen_server2,handle_msg,2},
     {proc_lib,wake_up,3}]}

=ERROR REPORT==== 26-Apr-2012::12:23:57 ===
** Generic server <0.1714.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<0.4653.0>]}}
** When Server state == {state,
                            {amqqueue,
                                {resource,<<"/">>,queue,<<"test_23">>},
                                true,false,<0.1433.0>,
                                [{<<"x-ha-policy">>,longstr,<<"all">>},
                                 {<<"x-message-ttl">>,signedint,600000}],
                                <0.1712.0>,[],all},
                            <0.1715.0>,
                            {dict,0,16,16,8,80,48,
                                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            #Fun<rabbit_mirror_queue_master.1.2951048>,
                            #Fun<rabbit_mirror_queue_master.2.72654940>}
** Reason for termination ==
** {{case_clause,{ok,<3066.9155.0>,[<0.4652.0>]}},
    [{rabbit_mirror_queue_coordinator,handle_cast,2},
     {gen_server2,handle_msg,2},
     {proc_lib,wake_up,3}]}

=ERROR REPORT==== 26-Apr-2012::12:23:57 ===
** Generic server <0.1531.0> terminating
** Last message in was {'$gen_cast',{gm_deaths,[<0.4587.0>]}}
** When Server state == {state,
                            {amqqueue,
                                {resource,<<"/">>,queue,<<"test_10">>},
                                true,false,<0.1433.0>,
                                [{<<"x-ha-policy">>,longstr,<<"all">>},
                                 {<<"x-message-ttl">>,signedint,600000}],
                                <0.1530.0>,[],all},
                            <0.1532.0>,
                            {dict,0,16,16,8,80,48,
                                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            #Fun<rabbit_mirror_queue_master.1.2951048>,
                            #Fun<rabbit_mirror_queue_master.2.72654940>}
** Reason for termination ==
** {noproc,{gen_server2,call,[worker_pool,next_free,infinity]}}
(END)



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Active-active crash report

Matthew Sackman-3
Hi Vadim,

On Thu, Apr 26, 2012 at 01:01:20PM -0700, Vadim Chekan wrote:
> I'm testing my active-active setup (2.8.1, linux 64) and I am randomly
> running into some crashes when I'm stopping a node. I can stop one node
> abut another one fails along with it. Below is a crash log.
>
> =ERROR REPORT==== 26-Apr-2012::12:15:59 ===
> Discarding message
> {'$gen_call',{<0.1955.0>,#Ref<0.0.0.5513>},{add_on_right,{9,<0.1955.0>}}}
> from <0.1955.0> to <0.26823.834>
>  in an old incarnation (2) of this node (3)

I'm worried about these messages. Someone else on this list has seen
this sort of thing too and it's causing them trouble. I've not seen this
issue myself in testing which is frustrating. However, that's not the
cause of your crash in this case (I think).

> ** Generic server <0.1800.0> terminating
> ** Last message in was {'$gen_cast',{gm_deaths,[<0.4684.0>]}}
> ** When Server state == {state,
>                             {amqqueue,
>                                 {resource,<<"/">>,queue,<<"test_29">>},
>                                 true,false,<0.1433.0>,
>                                 [{<<"x-ha-policy">>,longstr,<<"all">>},
>                                  {<<"x-message-ttl">>,signedint,600000}],
>                                 <0.1799.0>,[],all},
>                             <0.1801.0>,
>                             {dict,0,16,16,8,80,48,
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                                  []},
>                                 {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                                   [],[]}}},
>                             #Fun<rabbit_mirror_queue_master.1.2951048>,
>                             #Fun<rabbit_mirror_queue_master.2.72654940>}
> ** Reason for termination ==
> ** {{case_clause,{ok,<3066.9234.0>,[<0.4683.0>]}},
>     [{rabbit_mirror_queue_coordinator,handle_cast,2},
>      {gen_server2,handle_msg,2},
>      {proc_lib,wake_up,3}]}

Well this is very odd. We fixed a bug that looked like this, but it got
fixed in 2.7.1 (and related to x-ha-policy = nodes. Could you just check
that you really are running 2.8.1? We're not aware of any bug in this
area in 2.8.1, but that's certainly not saying there's not one there! Is
there any particular sequence of events that you can perform that
reliably triggers this crash? Could you also check the logs of the other
nodes (both .log and -sasl.log) to see if there's further crash reports
in there?

Also, there have been discovered lots of bugs relating to the code
changes made to add DLX support in 2.8.1, especially in relation to HA.
It's possible one of the issues I found with TTL and HA is causing this.
2.8.2 should be out soonish which might introduce fewer new bugs than it
fixes, but in the mean time, could you try without the TTL and see if
that helps?

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Active-active crash report

vchekan
Hi Matthew,

Thanks for your reply. At least web UI reports that all 3 nodes are 2.8.1 so I guess they are  up to date.
Reproducing this bug is difficult. I'll work on it for may be 2 more days, trying to find a combination which would be reproducible. Very preliminary, I have impression that "exclusive" queue flag have something to do with it, but I will experiment with ttl too, as you suggested.

I dont know if it is related, but I had a strange situation when there was an exclusive autodelete queue without a connection (connection reported "unknown") and it was impossible to delete it because "resource lock" error.
Hopefully I'll be able to report more tomorrow.

Thanks,
Vadim.

On Thu, Apr 26, 2012 at 3:31 PM, Matthew Sackman <[hidden email]> wrote:
Hi Vadim,

On Thu, Apr 26, 2012 at 01:01:20PM -0700, Vadim Chekan wrote:
> I'm testing my active-active setup (2.8.1, linux 64) and I am randomly
> running into some crashes when I'm stopping a node. I can stop one node
> abut another one fails along with it. Below is a crash log.
>
> =ERROR REPORT==== 26-Apr-2012::12:15:59 ===
> Discarding message
> {'$gen_call',{<0.1955.0>,#Ref<0.0.0.5513>},{add_on_right,{9,<0.1955.0>}}}
> from <0.1955.0> to <0.26823.834>
>  in an old incarnation (2) of this node (3)

I'm worried about these messages. Someone else on this list has seen
this sort of thing too and it's causing them trouble. I've not seen this
issue myself in testing which is frustrating. However, that's not the
cause of your crash in this case (I think).

> ** Generic server <0.1800.0> terminating
> ** Last message in was {'$gen_cast',{gm_deaths,[<0.4684.0>]}}
> ** When Server state == {state,
>                             {amqqueue,
>                                 {resource,<<"/">>,queue,<<"test_29">>},
>                                 true,false,<0.1433.0>,
>                                 [{<<"x-ha-policy">>,longstr,<<"all">>},
>                                  {<<"x-message-ttl">>,signedint,600000}],
>                                 <0.1799.0>,[],all},
>                             <0.1801.0>,
>                             {dict,0,16,16,8,80,48,
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                                  []},
>                                 {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                                   [],[]}}},
>                             #Fun<rabbit_mirror_queue_master.1.2951048>,
>                             #Fun<rabbit_mirror_queue_master.2.72654940>}
> ** Reason for termination ==
> ** {{case_clause,{ok,<3066.9234.0>,[<0.4683.0>]}},
>     [{rabbit_mirror_queue_coordinator,handle_cast,2},
>      {gen_server2,handle_msg,2},
>      {proc_lib,wake_up,3}]}

Well this is very odd. We fixed a bug that looked like this, but it got
fixed in 2.7.1 (and related to x-ha-policy = nodes. Could you just check
that you really are running 2.8.1? We're not aware of any bug in this
area in 2.8.1, but that's certainly not saying there's not one there! Is
there any particular sequence of events that you can perform that
reliably triggers this crash? Could you also check the logs of the other
nodes (both .log and -sasl.log) to see if there's further crash reports
in there?

Also, there have been discovered lots of bugs relating to the code
changes made to add DLX support in 2.8.1, especially in relation to HA.
It's possible one of the issues I found with TTL and HA is causing this.
2.8.2 should be out soonish which might introduce fewer new bugs than it
fixes, but in the mean time, could you try without the TTL and see if
that helps?

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



--
From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is explicitly specified

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Active-active crash report

vchekan
Ok, I got it again.
3 nodes. Stop slave, restart app(client), bring slave back online, stop master, restart app, bring master back online, BOOM!
So here are logs from both nodes, the one which I shutdown (dev-2) and the one which failed unexpectedly (dev-3)

dev-2 was the master node and when I stopped it, it brought dev-3 node down with it.
Did everything started because of this?
> Reason:     reached_max_restart_intensity

Vadim

dev-3 log :
===============================================
=INFO REPORT==== 26-Apr-2012::16:50:38 ===
Management agent started.

=INFO REPORT==== 26-Apr-2012::16:50:38 ===
Management plugin started. Port: 55672, path: /

=INFO REPORT==== 26-Apr-2012::16:51:07 ===
    application: rabbitmq_management
    exited: shutdown
    type: permanent

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
stopped TCP Listener on 0.0.0.0:5672
===============================================

dev-3-sasl.log
===============================================
=SUPERVISOR REPORT==== 26-Apr-2012::16:51:07 ===
     Supervisor: {local,rabbit_mgmt_sup}
     Context:    child_terminated
     Reason:     shutdown
     Offender:   [{pid,<0.396.0>},
                  {name,mirroring},
                  {mfa,
                      {mirrored_supervisor,start_internal,
                          [rabbit_mgmt_sup,
                           [{rabbit_mgmt_db,
                                {rabbit_mgmt_db,start_link,[]},
                                permanent,4294967295,worker,
                                [rabbit_mgmt_db]}]]}},
                  {restart_type,permanent},
                  {shutdown,4294967295},
                  {child_type,worker}]


=SUPERVISOR REPORT==== 26-Apr-2012::16:51:07 ===
     Supervisor: {local,rabbit_mgmt_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.396.0>},
                  {name,mirroring},
                  {mfa,
                      {mirrored_supervisor,start_internal,
                          [rabbit_mgmt_sup,
                           [{rabbit_mgmt_db,
                                {rabbit_mgmt_db,start_link,[]},
                                permanent,4294967295,worker,
                                [rabbit_mgmt_db]}]]}},
                  {restart_type,permanent},
                  {shutdown,4294967295},
                  {child_type,worker}]

===============================================

dev-2 log
===============================================
=INFO REPORT==== 26-Apr-2012::16:51:07 ===
Stopping Rabbit

=INFO REPORT==== 26-Apr-2012::16:51:07 ===
    application: rabbitmq_management_visualiser
    exited: stopped
    type: permanent

=INFO REPORT==== 26-Apr-2012::16:51:07 ===
    application: rabbitmq_management
    exited: stopped
    type: permanent

=INFO REPORT==== 26-Apr-2012::16:51:07 ===
    application: rabbitmq_management_agent
    exited: stopped
    type: permanent

=INFO REPORT==== 26-Apr-2012::16:51:07 ===
stopped TCP Listener on 0.0.0.0:5672

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
rabbit on node 'rabbit@rabbitmq-dev-3' down

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
Mirrored-queue (queue 'Advertiser-ONYX' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.265.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.290.0>

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
Mirrored-queue (queue 'Search-JROSE6420' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.276.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.328.0>

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
Mirrored-queue (queue 'Advertiser-DHOHMEYERGX280' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.268.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.288.0>

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
Mirrored-queue (queue 'CustomerAdmin-DHOHMEYERGX280' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.270.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.294.0>

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
Mirrored-queue (queue 'Tracking-JROSE6420' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.250.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.247.0>

=ERROR REPORT==== 26-Apr-2012::16:51:08 ===
** Generic server rabbit_mirror_queue_slave_sup terminating
** Last message in was {'EXIT',<0.127.0>,shutdown}
** When Server state == {state,
                         {local,rabbit_mirror_queue_slave_sup},
                         simple_one_for_one_terminate,
                         [{child,undefined,rabbit_mirror_queue_slave,
                           {rabbit_mirror_queue_slave,start_link,[]},
                           temporary,4294967295,worker,
                           [rabbit_mirror_queue_slave]}],
                         {dict,11,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],
                            [[<0.265.0>,
                              {amqqueue,
                               {resource,<<"/">>,queue,<<"Advertiser-ONYX">>},
                               false,false,none,
                               [{<<"x-ha-policy">>,longstr,<<"all">>},
                                {<<"x-message-ttl">>,signedint,1200000}],
                               <3067.289.0>,
                               [<3065.334.0>],
                               all}]],
                            [],
                            [[undefined,
                              {amqqueue,
                               {resource,<<"/">>,queue,
                                <<"Advertiser-heartbeat-DHOHMEYERGX280">>},
                               false,true,<3065.24765.0>,
                               [{<<"x-ha-policy">>,longstr,<<"all">>}],
                               <3065.24784.0>,
                               [<3067.12370.2>],
                               all}],
                             [<0.6730.0>,
                              {amqqueue,
                               {resource,<<"/">>,queue,
                                <<"Advertiser-heartbeat-DHOHMEYERGX280">>},
                               false,true,<3065.14349.1>,
                               [{<<"x-ha-policy">>,longstr,<<"all">>}],
                               <3065.14370.1>,[],all}]],
                            [[<0.268.0>,
                              {amqqueue,
                               {resource,<<"/">>,queue,
                                <<"Advertiser-DHOHMEYERGX280">>},
                               false,false,none,
                               [{<<"x-ha-policy">>,longstr,<<"all">>},
                                {<<"x-message-ttl">>,signedint,1200000}],
                               <3067.287.0>,
                               [<3065.332.0>],
                               all}]],
                            [],
                            [[<0.270.0>,
                              {amqqueue,
                               {resource,<<"/">>,queue,
                                <<"CustomerAdmin-DHOHMEYERGX280">>},
                               false,false,none,
                               [{<<"x-ha-policy">>,longstr,<<"all">>},
                                {<<"x-message-ttl">>,signedint,1200000}],
                               <3067.295.0>,
                               [<3065.338.0>],
                               all}]],
                            [],
                            [[<0.272.0>,
                              {amqqueue,
                               {resource,<<"/">>,queue,
                                <<"Sales-DHOHMEYERGX280">>},
                               false,false,none,
                               [{<<"x-ha-policy">>,longstr,<<"all">>},
                                {<<"x-message-ttl">>,signedint,1200000}],
                               <3067.317.0>,
                               [<3065.356.0>],
                               all}]],
                            [[<0.250.0>,
                              {amqqueue,
                               {resource,<<"/">>,queue,
                                <<"Tracking-JROSE6420">>},
                               false,false,none,
                               [{<<"x-ha-policy">>,longstr,<<"all">>},
                                {<<"x-message-ttl">>,signedint,1200000}],
                               <3067.247.0>,
                               [<3065.249.0>],
                               all}]],
                            [[<0.274.0>,
                              {amqqueue,
                               {resource,<<"/">>,queue,
                                <<"RelatedLinks-ROGERMOORE">>},
                               false,false,none,
                               [{<<"x-ha-policy">>,longstr,<<"all">>},
                                {<<"x-message-ttl">>,signedint,1200000}],
                               <3067.331.0>,
                               [<3065.370.0>],
                               all}]],
                            [[<0.252.0>,
                              {amqqueue,
                               {resource,<<"ActivityLogging">>,queue,
                                <<"vsw.activitylogging.conversion.pendingconversion">>},
                               true,false,none,
                               [{<<"x-ha-policy">>,longstr,<<"all">>}],
                               <3067.253.0>,
                               [<3065.257.0>],
                               all}]],
                            [[<0.276.0>,
                              {amqqueue,
                               {resource,<<"/">>,queue,<<"Search-JROSE6420">>},
                               false,false,none,
                               [{<<"x-ha-policy">>,longstr,<<"all">>},
                                {<<"x-message-ttl">>,signedint,1200000}],
                               <3067.26259.1>,
                               [<3065.4161.0>],
                               all}]],
                            [[<0.261.0>,
                              {amqqueue,
                               {resource,<<"/">>,queue,
                                <<"RelatedLinks-ONYX">>},
                               false,false,none,
                               [{<<"x-ha-policy">>,longstr,<<"all">>},
                                {<<"x-message-ttl">>,signedint,1200000}],
                               <3067.277.0>,
                               [<3065.322.0>],
                               all}]],
                            []}}},
                         10,10,[],rabbit_mirror_queue_slave_sup,[]}
** Reason for termination ==
** {badarg,[{erlang,unlink,[undefined]},
            {supervisor2,'-terminate_simple_children/3-fun-0-',4},
            {dict,fold_bucket,3},
            {dict,fold_seg,4},
            {dict,fold_segs,4},
            {supervisor2,terminate_simple_children,3},
            {supervisor2,terminate,2},
            {gen_server,terminate,6}]}

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
Mirrored-queue (queue 'vsw.activitylogging.conversion.pendingconversion' in vhost 'ActivityLogging'): Slave <'rabbit@rabbitmq-dev-2'.1.252.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.253.0>

=ERROR REPORT==== 26-Apr-2012::16:51:08 ===
** Generic server <0.265.0> terminating
** Last message in was {'EXIT',<0.249.0>,
                           {badarg,
                               [{erlang,unlink,[undefined]},
                                {supervisor2,
                                    '-terminate_simple_children/3-fun-0-',4},
                                {dict,fold_bucket,3},
                                {dict,fold_seg,4},
                                {dict,fold_segs,4},
                                {supervisor2,terminate_simple_children,3},
                                {supervisor2,terminate,2},
                                {gen_server,terminate,6}]}}
** When Server state == {state,
                         {amqqueue,
                          {resource,<<"/">>,queue,<<"Advertiser-ONYX">>},
                          false,false,none,
                          [{<<"x-ha-policy">>,longstr,<<"all">>},
                           {<<"x-message-ttl">>,signedint,1200000}],
                          <3067.289.0>,
                          [<3065.334.0>],
                          all},
                         <0.267.0>,<3065.334.0>,rabbit_variable_queue,
                         {vqstate,
                          {0,{[],[]}},
                          {0,{[],[]}},
                          {delta,undefined,0,undefined},
                          {0,{[],[]}},
                          {0,{[],[]}},
                          0,
                          {0,nil},
                          undefined,
                          {0,nil},
                          {qistate,
                           "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/queues/1ON03RTBZAE64B67H41L4ICDE",
                           {{dict,0,16,16,8,80,48,
                             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                             {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []}}},
                            []},
                           undefined,0,262144,
                           #Fun<rabbit_variable_queue.2.91259159>,
                           {0,nil}},
                          {undefined,
                           {client_msstate,msg_store_transient,
                            <<228,99,153,84,177,114,138,215,76,30,0,48,128,77,
                              68,245>>,
                            {dict,0,16,16,8,80,48,
                             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                             {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []}}},
                            {state,368709,
                             "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient"},
                            rabbit_msg_store_ets_index,
                            "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient",
                            <0.239.0>,372806,364585,376903,381000}},
                          false,0,#Fun<rabbit_mirror_queue_slave.3.29941254>,
                          0,0,infinity,0,0,0,0,0,
                          {rates,
                           {{1335,483942,985021},0},
                           {{1335,483942,985021},0},
                           0.0,0.0,
                           {1335,484240,226498}},
                          {0,nil},
                          {0,nil},
                          {0,nil},
                          {0,nil},
                          0,0,
                          {rates,
                           {{1335,483942,985021},0},
                           {{1335,483942,985021},0},
                           0.0,0.0,
                           {1335,484240,226498}}},
                         undefined,#Ref<0.0.9.158766>,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         0,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         true}
** Reason for termination ==
** {badarg,[{erlang,unlink,[undefined]},
            {supervisor2,'-terminate_simple_children/3-fun-0-',4},
            {dict,fold_bucket,3},
            {dict,fold_seg,4},
            {dict,fold_segs,4},
            {supervisor2,terminate_simple_children,3},
            {supervisor2,terminate,2},
            {gen_server,terminate,6}]}

=ERROR REPORT==== 26-Apr-2012::16:51:08 ===
** Generic server <0.6730.0> terminating
** Last message in was {'EXIT',<0.249.0>,
                           {badarg,
                               [{erlang,unlink,[undefined]},
                                {supervisor2,
                                    '-terminate_simple_children/3-fun-0-',4},
                                {dict,fold_bucket,3},
                                {dict,fold_seg,4},
                                {dict,fold_segs,4},
                                {supervisor2,terminate_simple_children,3},
                                {supervisor2,terminate,2},
                                {gen_server,terminate,6}]}}
** When Server state == {state,
                         {amqqueue,
                          {resource,<<"/">>,queue,
                           <<"Advertiser-heartbeat-DHOHMEYERGX280">>},
                          false,true,<3065.14349.1>,
                          [{<<"x-ha-policy">>,longstr,<<"all">>}],
                          <3065.14370.1>,[],all},
                         <0.6731.0>,<3065.14370.1>,rabbit_variable_queue,
                         {vqstate,
                          {0,{[],[]}},
                          {0,{[],[]}},
                          {delta,undefined,0,undefined},
                          {0,{[],[]}},
                          {0,{[],[]}},
                          0,
                          {0,nil},
                          undefined,
                          {0,nil},
                          {qistate,
                           "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/queues/5R0GE9UEB8S0EMSJORBUKHBTE",
                           {{dict,0,16,16,8,80,48,
                             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                             {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []}}},
                            []},
                           undefined,0,262144,
                           #Fun<rabbit_variable_queue.2.91259159>,
                           {0,nil}},
                          {undefined,
                           {client_msstate,msg_store_transient,
                            <<182,86,2,135,69,87,14,108,163,188,15,159,127,119,
                              55,241>>,
                            {dict,0,16,16,8,80,48,
                             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                             {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []}}},
                            {state,368709,
                             "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient"},
                            rabbit_msg_store_ets_index,
                            "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient",
                            <0.239.0>,372806,364585,376903,381000}},
                          false,0,#Fun<rabbit_mirror_queue_slave.3.29941254>,
                          0,0,infinity,0,0,0,0,0,
                          {rates,
                           {{1335,484239,805553},0},
                           {{1335,484239,805553},0},
                           0.0,0.0,
                           {1335,484255,991021}},
                          {0,nil},
                          {0,nil},
                          {0,nil},
                          {0,nil},
                          0,0,
                          {rates,
                           {{1335,484239,805553},0},
                           {{1335,484239,805553},0},
                           0.0,0.0,
                           {1335,484255,991021}}},
                         undefined,#Ref<0.0.9.158602>,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         0,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         true}
** Reason for termination ==
** {badarg,[{erlang,unlink,[undefined]},
            {supervisor2,'-terminate_simple_children/3-fun-0-',4},
            {dict,fold_bucket,3},
            {dict,fold_seg,4},
            {dict,fold_segs,4},
            {supervisor2,terminate_simple_children,3},
            {supervisor2,terminate,2},
            {gen_server,terminate,6}]}

=ERROR REPORT==== 26-Apr-2012::16:51:08 ===
** Generic server <0.252.0> terminating
** Last message in was {'EXIT',<0.249.0>,shutdown}
** When Server state == {state,
                         {amqqueue,
                          {resource,<<"ActivityLogging">>,queue,
                           <<"vsw.activitylogging.conversion.pendingconversion">>},
                          true,false,none,
                          [{<<"x-ha-policy">>,longstr,<<"all">>}],
                          <3067.253.0>,
                          [<3065.257.0>],
                          all},
                         <0.253.0>,<3065.257.0>,rabbit_variable_queue,
                         {vqstate,
                          {0,{[],[]}},
                          {0,{[],[]}},
                          {delta,undefined,0,undefined},
                          {0,{[],[]}},
                          {0,{[],[]}},
                          0,
                          {0,nil},
                          undefined,
                          {0,nil},
                          {qistate,
                           "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/queues/9KDJJKR11HTT40SVOPWG0AWDM",
                           {{dict,0,16,16,8,80,48,
                             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                             {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []}}},
                            []},
                           undefined,0,262144,
                           #Fun<rabbit_variable_queue.2.91259159>,
                           {0,nil}},
                          {{client_msstate,msg_store_persistent,
                            <<58,135,182,192,228,241,29,217,255,38,4,53,94,213,
                              95,102>>,
                            {dict,0,16,16,8,80,48,
                             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                             {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []}}},
                            {state,389194,
                             "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_persistent"},
                            rabbit_msg_store_ets_index,
                            "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_persistent",
                            <0.244.0>,393291,385097,397388,401485},
                           {client_msstate,msg_store_transient,
                            <<63,170,201,248,36,125,208,20,133,142,139,71,219,
                              91,212,33>>,
                            {dict,0,16,16,8,80,48,
                             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                             {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []}}},
                            {state,368709,
                             "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient"},
                            rabbit_msg_store_ets_index,
                            "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient",
                            <0.239.0>,372806,364585,376903,381000}},
                          true,0,#Fun<rabbit_mirror_queue_slave.3.29941254>,0,
                          0,infinity,0,0,0,0,0,
                          {rates,
                           {{1335,483943,551021},0},
                           {{1335,483943,551021},0},
                           0.0,0.0,
                           {1335,484240,51118}},
                          {0,nil},
                          {0,nil},
                          {0,nil},
                          {0,nil},
                          0,0,
                          {rates,
                           {{1335,483943,551021},0},
                           {{1335,483943,551021},0},
                           0.0,0.0,
                           {1335,484240,51118}}},
                         undefined,#Ref<0.0.9.158618>,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         0,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         true}
** Reason for termination ==
** {shutdown,
       {gen_server2,call,
           [msg_store_persistent,
            {client_terminate,
                <<58,135,182,192,228,241,29,217,255,38,4,53,94,213,95,102>>},
            infinity]}}
** In 'terminate' callback with reason ==
** shutdown

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
    application: rabbit
    exited: stopped
    type: permanent

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
    application: os_mon
    exited: stopped
    type: permanent

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
    application: mnesia
    exited: stopped
    type: permanent

=INFO REPORT==== 26-Apr-2012::16:51:08 ===
Halting Erlang VM

===============================================

dev-2-sasl.log
===============================================
=CRASH REPORT==== 26-Apr-2012::16:51:08 ===
  crasher:
    initial call: supervisor2:init/1
    pid: <0.249.0>
    registered_name: rabbit_mirror_queue_slave_sup
    exception exit: {badarg,
                        [{erlang,unlink,[undefined]},
                         {supervisor2,'-terminate_simple_children/3-fun-0-',4},
                         {dict,fold_bucket,3},
                         {dict,fold_seg,4},
                         {dict,fold_segs,4},
                         {supervisor2,terminate_simple_children,3},
                         {supervisor2,terminate,2},
                         {gen_server,terminate,6}]}
      in function  gen_server:terminate/6
    ancestors: [rabbit_sup,<0.126.0>]
    messages: [{'DOWN',#Ref<0.0.9.159703>,process,
                          {undefined,'rabbit@rabbitmq-dev-2'},
                          noproc}]
    links: [<0.265.0>,<0.6730.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 5064
  neighbours:

=SUPERVISOR REPORT==== 26-Apr-2012::16:51:08 ===
     Supervisor: {local,rabbit_sup}
     Context:    shutdown_error
     Reason:     {badarg,
                     [{erlang,unlink,[undefined]},
                      {supervisor2,'-terminate_simple_children/3-fun-0-',4},
                      {dict,fold_bucket,3},
                      {dict,fold_seg,4},
                      {dict,fold_segs,4},
                      {supervisor2,terminate_simple_children,3},
                      {supervisor2,terminate,2},
                      {gen_server,terminate,6}]}
     Offender:   [{pid,<0.249.0>},
                  {name,rabbit_mirror_queue_slave_sup},
                  {mfargs,{rabbit_mirror_queue_slave_sup,start_link,[]}},
                  {restart_type,transient},
                  {shutdown,infinity},
                  {child_type,supervisor}]


=CRASH REPORT==== 26-Apr-2012::16:51:08 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.265.0>
    registered_name: []
    exception exit: {badarg,
                        [{erlang,unlink,[undefined]},
                         {supervisor2,'-terminate_simple_children/3-fun-0-',4},
                         {dict,fold_bucket,3},
                         {dict,fold_seg,4},
                         {dict,fold_segs,4},
                         {supervisor2,terminate_simple_children,3},
                         {supervisor2,terminate,2},
                         {gen_server,terminate,6}]}
      in function  gen_server2:terminate/3
    ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.126.0>]
    messages: [{'EXIT',<0.267.0>,normal},
                  {'DOWN',#Ref<0.0.3.44831>,process,<3065.334.0>,normal}]
    links: []
    dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct},
                  {{xtype_to_module,topic},rabbit_exchange_type_topic},
                  {guid,{{3831732564,2977073879,1277034544,2152547573},0}}]
    trap_exit: true
    status: running
    heap_size: 2584
    stack_size: 24
    reductions: 4916
  neighbours:

=CRASH REPORT==== 26-Apr-2012::16:51:08 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.6730.0>
    registered_name: []
    exception exit: {badarg,
                        [{erlang,unlink,[undefined]},
                         {supervisor2,'-terminate_simple_children/3-fun-0-',4},
                         {dict,fold_bucket,3},
                         {dict,fold_seg,4},
                         {dict,fold_segs,4},
                         {supervisor2,terminate_simple_children,3},
                         {supervisor2,terminate,2},
                         {gen_server,terminate,6}]}
      in function  gen_server2:terminate/3
    ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.126.0>]
    messages: [{'EXIT',<0.6731.0>,normal},
                  {'DOWN',#Ref<0.0.4.191216>,process,<3065.14370.1>,normal}]
    links: []
    dictionary: [{credit_blocked,[]},
                  {{xtype_to_module,direct},rabbit_exchange_type_direct},
                  {{xtype_to_module,fanout},rabbit_exchange_type_fanout},
                  {guid,{{3059090055,1163333228,2747010975,2138519537},0}}]
    trap_exit: true
    status: running
    heap_size: 1597
    stack_size: 24
    reductions: 8290
  neighbours:
===============================================




_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Active-active crash report

Matthias Radestock-3
Vadim,

On 27/04/12 01:31, Vadim Chekan wrote:
> Ok, I got it again.

This does look like a different bug, and one that, for a change, is
easily fixed.

It is possible that this may either be triggered by or be the trigger of
the previous bug. Is there nothing in the logs this time about
"Discarding message" or "gm_deaths"?

Matthias.
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Active-active crash report

Matthias Radestock-3
Vadim,

(putting the list back on cc)

On 27/04/12 23:28, Vadim Chekan wrote:
> I've spent some time today playing with different client settings. Seems
> ttl does not affect failures at all.

That's good to know. The more factors we can eliminate as possible
causes the better.

> I managed to reproduce crash many times today. Basic idea is: an
> application, 40 threads create a pub/sub exchange and publish a message
> every second. Under this load I bring the master node (all queues are
> usually created on the same node) and often it causes another node to
> fail. Here is my load simulator (in c#):
> http://www.heypasteit.com/clip/0B5W

That code connects to "rabbitmq-dev". Is that a load balancer sitting in
front of the three nodes?

How do you deal with the disconnects resulting from the shutting down of
nodes? There doesn't seem to be any code to handle that.

Mind you, I suspect that the failure should still be reproducible
without any subscriptions and publishes. Would be good to try that and
just watch the broker logs for errors.

> As long as maillist does not allow zip attaches, I'm mailing to you guys
> directly

Thanks for posting these. There is an error in the logs that we haven't
seen before:

            {{badmatch,[]},
             [{rabbit_mirror_queue_misc,'-remove_from_queue/2-fun-0-',2},
              {mnesia_tm,apply_fun,3},
              {mnesia_tm,execute_transaction,5},
              {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1},
              {worker_pool_worker,handle_call,3},
              {gen_server2,handle_msg,2},
              {proc_lib,wake_up,3}]}

Looking at the code, this appears to indicate that there are no
master/mirror processes left for the queue. Which is...unexpected. That
should give us something to go on.

Regards,

Matthias.
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Active-active crash report

vchekan
Thanks for looking into it Matthias,

"rabbitmq-dev": yes, it is load balancer.
Disconnects: I was too lazy to write code for it and just restarted client manually.
I did make a short try to reproduce failure without any load but it worked just fine. I'll try more today.

Vadim.

On Sat, Apr 28, 2012 at 12:13 AM, Matthias Radestock <[hidden email]> wrote:
Vadim,

(putting the list back on cc)


On 27/04/12 23:28, Vadim Chekan wrote:
I've spent some time today playing with different client settings. Seems
ttl does not affect failures at all.

That's good to know. The more factors we can eliminate as possible causes the better.


I managed to reproduce crash many times today. Basic idea is: an
application, 40 threads create a pub/sub exchange and publish a message
every second. Under this load I bring the master node (all queues are
usually created on the same node) and often it causes another node to
fail. Here is my load simulator (in c#):
http://www.heypasteit.com/clip/0B5W

That code connects to "rabbitmq-dev". Is that a load balancer sitting in front of the three nodes?

How do you deal with the disconnects resulting from the shutting down of nodes? There doesn't seem to be any code to handle that.

Mind you, I suspect that the failure should still be reproducible without any subscriptions and publishes. Would be good to try that and just watch the broker logs for errors.


As long as maillist does not allow zip attaches, I'm mailing to you guys
directly

Thanks for posting these. There is an error in the logs that we haven't seen before:

          {{badmatch,[]},
           [{rabbit_mirror_queue_misc,'-remove_from_queue/2-fun-0-',2},
            {mnesia_tm,apply_fun,3},
            {mnesia_tm,execute_transaction,5},
            {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1},
            {worker_pool_worker,handle_call,3},
            {gen_server2,handle_msg,2},
            {proc_lib,wake_up,3}]}

Looking at the code, this appears to indicate that there are no master/mirror processes left for the queue. Which is...unexpected. That should give us something to go on.

Regards,

Matthias.



--
From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is explicitly specified

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Loading...