|
Hi all,
I'm testing my active-active setup (2.8.1, linux 64) and I am randomly running into some crashes when I'm stopping a node. I can stop one node abut another one fails along with it. Below is a crash log. Is "rabbitmq_management" the UI plugin? Could it bring down the whone node, because I keep UI open in browser while doing my testing. Vadim. =INFO REPORT==== 26-Apr-2012::12:15:58 === Adding mirror of queue 'test_33' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4773.0> =INFO REPORT==== 26-Apr-2012::12:15:58 === Adding mirror of queue 'test_34' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4777.0> =INFO REPORT==== 26-Apr-2012::12:15:59 === Adding mirror of queue 'test_35' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4782.0> =INFO REPORT==== 26-Apr-2012::12:15:59 === Adding mirror of queue 'test_36' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4786.0> =INFO REPORT==== 26-Apr-2012::12:15:59 === Adding mirror of queue 'test_37' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4790.0> =INFO REPORT==== 26-Apr-2012::12:15:59 === Adding mirror of queue 'test_38' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4795.0> =INFO REPORT==== 26-Apr-2012::12:15:59 === Adding mirror of queue 'test_39' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4799.0> =ERROR REPORT==== 26-Apr-2012::12:15:59 === Discarding message {'$gen_call',{<0.1955.0>,#Ref<0.0.0.5513>},{add_on_right,{9,<0.1955.0>}}} from <0.1955.0> to <0.26823.834> in an old incarnation (2) of this node (3) =INFO REPORT==== 26-Apr-2012::12:15:59 === Adding mirror of queue 'test_40' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4803.0> =INFO REPORT==== 26-Apr-2012::12:15:59 === Adding mirror of queue 'test_41' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4807.0> =ERROR REPORT==== 26-Apr-2012::12:15:59 === Discarding message {'$gen_call',{<0.1983.0>,#Ref<0.0.0.5648>},{add_on_right,{9,<0.1983.0>}}} from <0.1983.0> to <0.26850.834> in an old incarnation (2) of this node (3) =INFO REPORT==== 26-Apr-2012::12:15:59 === Adding mirror of queue 'test_42' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4811.0> =INFO REPORT==== 26-Apr-2012::12:15:59 === Adding mirror of queue 'test_43' in vhost '/' on node 'rabbit@rabbitmq-dev-2': <3066.4816.0> =INFO REPORT==== 26-Apr-2012::12:16:39 === rabbit on node 'rabbit@rabbitmq-dev-3' up =WARNING REPORT==== 26-Apr-2012::12:18:15 === closing AMQP connection <0.1433.0> (172.16.51.100:16360 -> 10.7.1.21:5672): connection_closed_abruptly =INFO REPORT==== 26-Apr-2012::12:23:57 === application: rabbitmq_management exited: shutdown type: permanent =INFO REPORT==== 26-Apr-2012::12:23:57 === stopped TCP Listener on 0.0.0.0:5672 =ERROR REPORT==== 26-Apr-2012::12:23:57 === ** Generic server <0.1800.0> terminating ** Last message in was {'$gen_cast',{gm_deaths,[<0.4684.0>]}} ** When Server state == {state, {amqqueue, {resource,<<"/">>,queue,<<"test_29">>}, true,false,<0.1433.0>, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,600000}], <0.1799.0>,[],all}, <0.1801.0>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], [],[]}}}, #Fun<rabbit_mirror_queue_master.1.2951048>, #Fun<rabbit_mirror_queue_master.2.72654940>} ** Reason for termination == ** {{case_clause,{ok,<3066.9234.0>,[<0.4683.0>]}}, [{rabbit_mirror_queue_coordinator,handle_cast,2}, {gen_server2,handle_msg,2}, {proc_lib,wake_up,3}]} =ERROR REPORT==== 26-Apr-2012::12:23:57 === ** Generic server <0.1714.0> terminating ** Last message in was {'$gen_cast',{gm_deaths,[<0.4653.0>]}} ** When Server state == {state, {amqqueue, {resource,<<"/">>,queue,<<"test_23">>}, true,false,<0.1433.0>, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,600000}], <0.1712.0>,[],all}, <0.1715.0>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], [],[]}}}, #Fun<rabbit_mirror_queue_master.1.2951048>, #Fun<rabbit_mirror_queue_master.2.72654940>} ** Reason for termination == ** {{case_clause,{ok,<3066.9155.0>,[<0.4652.0>]}}, [{rabbit_mirror_queue_coordinator,handle_cast,2}, {gen_server2,handle_msg,2}, {proc_lib,wake_up,3}]} =ERROR REPORT==== 26-Apr-2012::12:23:57 === ** Generic server <0.1531.0> terminating ** Last message in was {'$gen_cast',{gm_deaths,[<0.4587.0>]}} ** When Server state == {state, {amqqueue, {resource,<<"/">>,queue,<<"test_10">>}, true,false,<0.1433.0>, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,600000}], <0.1530.0>,[],all}, <0.1532.0>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], [],[]}}}, #Fun<rabbit_mirror_queue_master.1.2951048>, #Fun<rabbit_mirror_queue_master.2.72654940>} ** Reason for termination == ** {noproc,{gen_server2,call,[worker_pool,next_free,infinity]}} (END) _______________________________________________ rabbitmq-discuss mailing list [hidden email] https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss |
|
Hi Vadim,
On Thu, Apr 26, 2012 at 01:01:20PM -0700, Vadim Chekan wrote: > I'm testing my active-active setup (2.8.1, linux 64) and I am randomly > running into some crashes when I'm stopping a node. I can stop one node > abut another one fails along with it. Below is a crash log. > > =ERROR REPORT==== 26-Apr-2012::12:15:59 === > Discarding message > {'$gen_call',{<0.1955.0>,#Ref<0.0.0.5513>},{add_on_right,{9,<0.1955.0>}}} > from <0.1955.0> to <0.26823.834> > in an old incarnation (2) of this node (3) I'm worried about these messages. Someone else on this list has seen this sort of thing too and it's causing them trouble. I've not seen this issue myself in testing which is frustrating. However, that's not the cause of your crash in this case (I think). > ** Generic server <0.1800.0> terminating > ** Last message in was {'$gen_cast',{gm_deaths,[<0.4684.0>]}} > ** When Server state == {state, > {amqqueue, > {resource,<<"/">>,queue,<<"test_29">>}, > true,false,<0.1433.0>, > [{<<"x-ha-policy">>,longstr,<<"all">>}, > {<<"x-message-ttl">>,signedint,600000}], > <0.1799.0>,[],all}, > <0.1801.0>, > {dict,0,16,16,8,80,48, > > {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], > []}, > {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], > [],[]}}}, > #Fun<rabbit_mirror_queue_master.1.2951048>, > #Fun<rabbit_mirror_queue_master.2.72654940>} > ** Reason for termination == > ** {{case_clause,{ok,<3066.9234.0>,[<0.4683.0>]}}, > [{rabbit_mirror_queue_coordinator,handle_cast,2}, > {gen_server2,handle_msg,2}, > {proc_lib,wake_up,3}]} Well this is very odd. We fixed a bug that looked like this, but it got fixed in 2.7.1 (and related to x-ha-policy = nodes. Could you just check that you really are running 2.8.1? We're not aware of any bug in this area in 2.8.1, but that's certainly not saying there's not one there! Is there any particular sequence of events that you can perform that reliably triggers this crash? Could you also check the logs of the other nodes (both .log and -sasl.log) to see if there's further crash reports in there? Also, there have been discovered lots of bugs relating to the code changes made to add DLX support in 2.8.1, especially in relation to HA. It's possible one of the issues I found with TTL and HA is causing this. 2.8.2 should be out soonish which might introduce fewer new bugs than it fixes, but in the mean time, could you try without the TTL and see if that helps? Matthew _______________________________________________ rabbitmq-discuss mailing list [hidden email] https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss |
|
Hi Matthew,
Thanks for your reply. At least web UI reports that all 3 nodes are 2.8.1 so I guess they are up to date. Reproducing this bug is difficult. I'll work on it for may be 2 more days, trying to find a combination which would be reproducible. Very preliminary, I have impression that "exclusive" queue flag have something to do with it, but I will experiment with ttl too, as you suggested. I dont know if it is related, but I had a strange situation when there was an exclusive autodelete queue without a connection (connection reported "unknown") and it was impossible to delete it because "resource lock" error. Hopefully I'll be able to report more tomorrow. Thanks, Vadim. On Thu, Apr 26, 2012 at 3:31 PM, Matthew Sackman <[hidden email]> wrote: Hi Vadim, -- From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is explicitly specified _______________________________________________ rabbitmq-discuss mailing list [hidden email] https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss |
|
Ok, I got it again.
3 nodes. Stop slave, restart app(client), bring slave back online, stop master, restart app, bring master back online, BOOM! So here are logs from both nodes, the one which I shutdown (dev-2) and the one which failed unexpectedly (dev-3) dev-2 was the master node and when I stopped it, it brought dev-3 node down with it. Did everything started because of this? > Reason: reached_max_restart_intensity Vadim dev-3 log : =============================================== =INFO REPORT==== 26-Apr-2012::16:50:38 === Management agent started. =INFO REPORT==== 26-Apr-2012::16:50:38 === Management plugin started. Port: 55672, path: / =INFO REPORT==== 26-Apr-2012::16:51:07 === application: rabbitmq_management exited: shutdown type: permanent =INFO REPORT==== 26-Apr-2012::16:51:08 === stopped TCP Listener on 0.0.0.0:5672 =============================================== dev-3-sasl.log =============================================== =SUPERVISOR REPORT==== 26-Apr-2012::16:51:07 === Supervisor: {local,rabbit_mgmt_sup} Context: child_terminated Reason: shutdown Offender: [{pid,<0.396.0>}, {name,mirroring}, {mfa, {mirrored_supervisor,start_internal, [rabbit_mgmt_sup, [{rabbit_mgmt_db, {rabbit_mgmt_db,start_link,[]}, permanent,4294967295,worker, [rabbit_mgmt_db]}]]}}, {restart_type,permanent}, {shutdown,4294967295}, {child_type,worker}] =SUPERVISOR REPORT==== 26-Apr-2012::16:51:07 === Supervisor: {local,rabbit_mgmt_sup} Context: shutdown Reason: reached_max_restart_intensity Offender: [{pid,<0.396.0>}, {name,mirroring}, {mfa, {mirrored_supervisor,start_internal, [rabbit_mgmt_sup, [{rabbit_mgmt_db, {rabbit_mgmt_db,start_link,[]}, permanent,4294967295,worker, [rabbit_mgmt_db]}]]}}, {restart_type,permanent}, {shutdown,4294967295}, {child_type,worker}] =============================================== dev-2 log =============================================== =INFO REPORT==== 26-Apr-2012::16:51:07 === Stopping Rabbit =INFO REPORT==== 26-Apr-2012::16:51:07 === application: rabbitmq_management_visualiser exited: stopped type: permanent =INFO REPORT==== 26-Apr-2012::16:51:07 === application: rabbitmq_management exited: stopped type: permanent =INFO REPORT==== 26-Apr-2012::16:51:07 === application: rabbitmq_management_agent exited: stopped type: permanent =INFO REPORT==== 26-Apr-2012::16:51:07 === stopped TCP Listener on 0.0.0.0:5672 =INFO REPORT==== 26-Apr-2012::16:51:08 === rabbit on node 'rabbit@rabbitmq-dev-3' down =INFO REPORT==== 26-Apr-2012::16:51:08 === Mirrored-queue (queue 'Advertiser-ONYX' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.265.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.290.0> =INFO REPORT==== 26-Apr-2012::16:51:08 === Mirrored-queue (queue 'Search-JROSE6420' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.276.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.328.0> =INFO REPORT==== 26-Apr-2012::16:51:08 === Mirrored-queue (queue 'Advertiser-DHOHMEYERGX280' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.268.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.288.0> =INFO REPORT==== 26-Apr-2012::16:51:08 === Mirrored-queue (queue 'CustomerAdmin-DHOHMEYERGX280' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.270.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.294.0> =INFO REPORT==== 26-Apr-2012::16:51:08 === Mirrored-queue (queue 'Tracking-JROSE6420' in vhost '/'): Slave <'rabbit@rabbitmq-dev-2'.1.250.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.247.0> =ERROR REPORT==== 26-Apr-2012::16:51:08 === ** Generic server rabbit_mirror_queue_slave_sup terminating ** Last message in was {'EXIT',<0.127.0>,shutdown} ** When Server state == {state, {local,rabbit_mirror_queue_slave_sup}, simple_one_for_one_terminate, [{child,undefined,rabbit_mirror_queue_slave, {rabbit_mirror_queue_slave,start_link,[]}, temporary,4294967295,worker, [rabbit_mirror_queue_slave]}], {dict,11,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[], [[<0.265.0>, {amqqueue, {resource,<<"/">>,queue,<<"Advertiser-ONYX">>}, false,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,1200000}], <3067.289.0>, [<3065.334.0>], all}]], [], [[undefined, {amqqueue, {resource,<<"/">>,queue, <<"Advertiser-heartbeat-DHOHMEYERGX280">>}, false,true,<3065.24765.0>, [{<<"x-ha-policy">>,longstr,<<"all">>}], <3065.24784.0>, [<3067.12370.2>], all}], [<0.6730.0>, {amqqueue, {resource,<<"/">>,queue, <<"Advertiser-heartbeat-DHOHMEYERGX280">>}, false,true,<3065.14349.1>, [{<<"x-ha-policy">>,longstr,<<"all">>}], <3065.14370.1>,[],all}]], [[<0.268.0>, {amqqueue, {resource,<<"/">>,queue, <<"Advertiser-DHOHMEYERGX280">>}, false,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,1200000}], <3067.287.0>, [<3065.332.0>], all}]], [], [[<0.270.0>, {amqqueue, {resource,<<"/">>,queue, <<"CustomerAdmin-DHOHMEYERGX280">>}, false,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,1200000}], <3067.295.0>, [<3065.338.0>], all}]], [], [[<0.272.0>, {amqqueue, {resource,<<"/">>,queue, <<"Sales-DHOHMEYERGX280">>}, false,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,1200000}], <3067.317.0>, [<3065.356.0>], all}]], [[<0.250.0>, {amqqueue, {resource,<<"/">>,queue, <<"Tracking-JROSE6420">>}, false,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,1200000}], <3067.247.0>, [<3065.249.0>], all}]], [[<0.274.0>, {amqqueue, {resource,<<"/">>,queue, <<"RelatedLinks-ROGERMOORE">>}, false,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,1200000}], <3067.331.0>, [<3065.370.0>], all}]], [[<0.252.0>, {amqqueue, {resource,<<"ActivityLogging">>,queue, <<"vsw.activitylogging.conversion.pendingconversion">>}, true,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}], <3067.253.0>, [<3065.257.0>], all}]], [[<0.276.0>, {amqqueue, {resource,<<"/">>,queue,<<"Search-JROSE6420">>}, false,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,1200000}], <3067.26259.1>, [<3065.4161.0>], all}]], [[<0.261.0>, {amqqueue, {resource,<<"/">>,queue, <<"RelatedLinks-ONYX">>}, false,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,1200000}], <3067.277.0>, [<3065.322.0>], all}]], []}}}, 10,10,[],rabbit_mirror_queue_slave_sup,[]} ** Reason for termination == ** {badarg,[{erlang,unlink,[undefined]}, {supervisor2,'-terminate_simple_children/3-fun-0-',4}, {dict,fold_bucket,3}, {dict,fold_seg,4}, {dict,fold_segs,4}, {supervisor2,terminate_simple_children,3}, {supervisor2,terminate,2}, {gen_server,terminate,6}]} =INFO REPORT==== 26-Apr-2012::16:51:08 === Mirrored-queue (queue 'vsw.activitylogging.conversion.pendingconversion' in vhost 'ActivityLogging'): Slave <'rabbit@rabbitmq-dev-2'.1.252.0> saw deaths of mirrors <'rabbit@rabbitmq-dev-3'.2.253.0> =ERROR REPORT==== 26-Apr-2012::16:51:08 === ** Generic server <0.265.0> terminating ** Last message in was {'EXIT',<0.249.0>, {badarg, [{erlang,unlink,[undefined]}, {supervisor2, '-terminate_simple_children/3-fun-0-',4}, {dict,fold_bucket,3}, {dict,fold_seg,4}, {dict,fold_segs,4}, {supervisor2,terminate_simple_children,3}, {supervisor2,terminate,2}, {gen_server,terminate,6}]}} ** When Server state == {state, {amqqueue, {resource,<<"/">>,queue,<<"Advertiser-ONYX">>}, false,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}, {<<"x-message-ttl">>,signedint,1200000}], <3067.289.0>, [<3065.334.0>], all}, <0.267.0>,<3065.334.0>,rabbit_variable_queue, {vqstate, {0,{[],[]}}, {0,{[],[]}}, {delta,undefined,0,undefined}, {0,{[],[]}}, {0,{[],[]}}, 0, {0,nil}, undefined, {0,nil}, {qistate, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/queues/1ON03RTBZAE64B67H41L4ICDE", {{dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, []}, undefined,0,262144, #Fun<rabbit_variable_queue.2.91259159>, {0,nil}}, {undefined, {client_msstate,msg_store_transient, <<228,99,153,84,177,114,138,215,76,30,0,48,128,77, 68,245>>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, {state,368709, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient"}, rabbit_msg_store_ets_index, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient", <0.239.0>,372806,364585,376903,381000}}, false,0,#Fun<rabbit_mirror_queue_slave.3.29941254>, 0,0,infinity,0,0,0,0,0, {rates, {{1335,483942,985021},0}, {{1335,483942,985021},0}, 0.0,0.0, {1335,484240,226498}}, {0,nil}, {0,nil}, {0,nil}, {0,nil}, 0,0, {rates, {{1335,483942,985021},0}, {{1335,483942,985021},0}, 0.0,0.0, {1335,484240,226498}}}, undefined,#Ref<0.0.9.158766>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, 0, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, true} ** Reason for termination == ** {badarg,[{erlang,unlink,[undefined]}, {supervisor2,'-terminate_simple_children/3-fun-0-',4}, {dict,fold_bucket,3}, {dict,fold_seg,4}, {dict,fold_segs,4}, {supervisor2,terminate_simple_children,3}, {supervisor2,terminate,2}, {gen_server,terminate,6}]} =ERROR REPORT==== 26-Apr-2012::16:51:08 === ** Generic server <0.6730.0> terminating ** Last message in was {'EXIT',<0.249.0>, {badarg, [{erlang,unlink,[undefined]}, {supervisor2, '-terminate_simple_children/3-fun-0-',4}, {dict,fold_bucket,3}, {dict,fold_seg,4}, {dict,fold_segs,4}, {supervisor2,terminate_simple_children,3}, {supervisor2,terminate,2}, {gen_server,terminate,6}]}} ** When Server state == {state, {amqqueue, {resource,<<"/">>,queue, <<"Advertiser-heartbeat-DHOHMEYERGX280">>}, false,true,<3065.14349.1>, [{<<"x-ha-policy">>,longstr,<<"all">>}], <3065.14370.1>,[],all}, <0.6731.0>,<3065.14370.1>,rabbit_variable_queue, {vqstate, {0,{[],[]}}, {0,{[],[]}}, {delta,undefined,0,undefined}, {0,{[],[]}}, {0,{[],[]}}, 0, {0,nil}, undefined, {0,nil}, {qistate, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/queues/5R0GE9UEB8S0EMSJORBUKHBTE", {{dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, []}, undefined,0,262144, #Fun<rabbit_variable_queue.2.91259159>, {0,nil}}, {undefined, {client_msstate,msg_store_transient, <<182,86,2,135,69,87,14,108,163,188,15,159,127,119, 55,241>>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, {state,368709, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient"}, rabbit_msg_store_ets_index, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient", <0.239.0>,372806,364585,376903,381000}}, false,0,#Fun<rabbit_mirror_queue_slave.3.29941254>, 0,0,infinity,0,0,0,0,0, {rates, {{1335,484239,805553},0}, {{1335,484239,805553},0}, 0.0,0.0, {1335,484255,991021}}, {0,nil}, {0,nil}, {0,nil}, {0,nil}, 0,0, {rates, {{1335,484239,805553},0}, {{1335,484239,805553},0}, 0.0,0.0, {1335,484255,991021}}}, undefined,#Ref<0.0.9.158602>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, 0, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, true} ** Reason for termination == ** {badarg,[{erlang,unlink,[undefined]}, {supervisor2,'-terminate_simple_children/3-fun-0-',4}, {dict,fold_bucket,3}, {dict,fold_seg,4}, {dict,fold_segs,4}, {supervisor2,terminate_simple_children,3}, {supervisor2,terminate,2}, {gen_server,terminate,6}]} =ERROR REPORT==== 26-Apr-2012::16:51:08 === ** Generic server <0.252.0> terminating ** Last message in was {'EXIT',<0.249.0>,shutdown} ** When Server state == {state, {amqqueue, {resource,<<"ActivityLogging">>,queue, <<"vsw.activitylogging.conversion.pendingconversion">>}, true,false,none, [{<<"x-ha-policy">>,longstr,<<"all">>}], <3067.253.0>, [<3065.257.0>], all}, <0.253.0>,<3065.257.0>,rabbit_variable_queue, {vqstate, {0,{[],[]}}, {0,{[],[]}}, {delta,undefined,0,undefined}, {0,{[],[]}}, {0,{[],[]}}, 0, {0,nil}, undefined, {0,nil}, {qistate, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/queues/9KDJJKR11HTT40SVOPWG0AWDM", {{dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, []}, undefined,0,262144, #Fun<rabbit_variable_queue.2.91259159>, {0,nil}}, {{client_msstate,msg_store_persistent, <<58,135,182,192,228,241,29,217,255,38,4,53,94,213, 95,102>>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, {state,389194, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_persistent"}, rabbit_msg_store_ets_index, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_persistent", <0.244.0>,393291,385097,397388,401485}, {client_msstate,msg_store_transient, <<63,170,201,248,36,125,208,20,133,142,139,71,219, 91,212,33>>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []}}}, {state,368709, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient"}, rabbit_msg_store_ets_index, "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-dev-2/msg_store_transient", <0.239.0>,372806,364585,376903,381000}}, true,0,#Fun<rabbit_mirror_queue_slave.3.29941254>,0, 0,infinity,0,0,0,0,0, {rates, {{1335,483943,551021},0}, {{1335,483943,551021},0}, 0.0,0.0, {1335,484240,51118}}, {0,nil}, {0,nil}, {0,nil}, {0,nil}, 0,0, {rates, {{1335,483943,551021},0}, {{1335,483943,551021},0}, 0.0,0.0, {1335,484240,51118}}}, undefined,#Ref<0.0.9.158618>, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, 0, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}, true} ** Reason for termination == ** {shutdown, {gen_server2,call, [msg_store_persistent, {client_terminate, <<58,135,182,192,228,241,29,217,255,38,4,53,94,213,95,102>>}, infinity]}} ** In 'terminate' callback with reason == ** shutdown =INFO REPORT==== 26-Apr-2012::16:51:08 === application: rabbit exited: stopped type: permanent =INFO REPORT==== 26-Apr-2012::16:51:08 === application: os_mon exited: stopped type: permanent =INFO REPORT==== 26-Apr-2012::16:51:08 === application: mnesia exited: stopped type: permanent =INFO REPORT==== 26-Apr-2012::16:51:08 === Halting Erlang VM =============================================== dev-2-sasl.log =============================================== =CRASH REPORT==== 26-Apr-2012::16:51:08 === crasher: initial call: supervisor2:init/1 pid: <0.249.0> registered_name: rabbit_mirror_queue_slave_sup exception exit: {badarg, [{erlang,unlink,[undefined]}, {supervisor2,'-terminate_simple_children/3-fun-0-',4}, {dict,fold_bucket,3}, {dict,fold_seg,4}, {dict,fold_segs,4}, {supervisor2,terminate_simple_children,3}, {supervisor2,terminate,2}, {gen_server,terminate,6}]} in function gen_server:terminate/6 ancestors: [rabbit_sup,<0.126.0>] messages: [{'DOWN',#Ref<0.0.9.159703>,process, {undefined,'rabbit@rabbitmq-dev-2'}, noproc}] links: [<0.265.0>,<0.6730.0>] dictionary: [] trap_exit: true status: running heap_size: 987 stack_size: 24 reductions: 5064 neighbours: =SUPERVISOR REPORT==== 26-Apr-2012::16:51:08 === Supervisor: {local,rabbit_sup} Context: shutdown_error Reason: {badarg, [{erlang,unlink,[undefined]}, {supervisor2,'-terminate_simple_children/3-fun-0-',4}, {dict,fold_bucket,3}, {dict,fold_seg,4}, {dict,fold_segs,4}, {supervisor2,terminate_simple_children,3}, {supervisor2,terminate,2}, {gen_server,terminate,6}]} Offender: [{pid,<0.249.0>}, {name,rabbit_mirror_queue_slave_sup}, {mfargs,{rabbit_mirror_queue_slave_sup,start_link,[]}}, {restart_type,transient}, {shutdown,infinity}, {child_type,supervisor}] =CRASH REPORT==== 26-Apr-2012::16:51:08 === crasher: initial call: gen:init_it/6 pid: <0.265.0> registered_name: [] exception exit: {badarg, [{erlang,unlink,[undefined]}, {supervisor2,'-terminate_simple_children/3-fun-0-',4}, {dict,fold_bucket,3}, {dict,fold_seg,4}, {dict,fold_segs,4}, {supervisor2,terminate_simple_children,3}, {supervisor2,terminate,2}, {gen_server,terminate,6}]} in function gen_server2:terminate/3 ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.126.0>] messages: [{'EXIT',<0.267.0>,normal}, {'DOWN',#Ref<0.0.3.44831>,process,<3065.334.0>,normal}] links: [] dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct}, {{xtype_to_module,topic},rabbit_exchange_type_topic}, {guid,{{3831732564,2977073879,1277034544,2152547573},0}}] trap_exit: true status: running heap_size: 2584 stack_size: 24 reductions: 4916 neighbours: =CRASH REPORT==== 26-Apr-2012::16:51:08 === crasher: initial call: gen:init_it/6 pid: <0.6730.0> registered_name: [] exception exit: {badarg, [{erlang,unlink,[undefined]}, {supervisor2,'-terminate_simple_children/3-fun-0-',4}, {dict,fold_bucket,3}, {dict,fold_seg,4}, {dict,fold_segs,4}, {supervisor2,terminate_simple_children,3}, {supervisor2,terminate,2}, {gen_server,terminate,6}]} in function gen_server2:terminate/3 ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.126.0>] messages: [{'EXIT',<0.6731.0>,normal}, {'DOWN',#Ref<0.0.4.191216>,process,<3065.14370.1>,normal}] links: [] dictionary: [{credit_blocked,[]}, {{xtype_to_module,direct},rabbit_exchange_type_direct}, {{xtype_to_module,fanout},rabbit_exchange_type_fanout}, {guid,{{3059090055,1163333228,2747010975,2138519537},0}}] trap_exit: true status: running heap_size: 1597 stack_size: 24 reductions: 8290 neighbours: =============================================== _______________________________________________ rabbitmq-discuss mailing list [hidden email] https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss |
|
Vadim,
On 27/04/12 01:31, Vadim Chekan wrote: > Ok, I got it again. This does look like a different bug, and one that, for a change, is easily fixed. It is possible that this may either be triggered by or be the trigger of the previous bug. Is there nothing in the logs this time about "Discarding message" or "gm_deaths"? Matthias. _______________________________________________ rabbitmq-discuss mailing list [hidden email] https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss |
|
Vadim,
(putting the list back on cc) On 27/04/12 23:28, Vadim Chekan wrote: > I've spent some time today playing with different client settings. Seems > ttl does not affect failures at all. That's good to know. The more factors we can eliminate as possible causes the better. > I managed to reproduce crash many times today. Basic idea is: an > application, 40 threads create a pub/sub exchange and publish a message > every second. Under this load I bring the master node (all queues are > usually created on the same node) and often it causes another node to > fail. Here is my load simulator (in c#): > http://www.heypasteit.com/clip/0B5W That code connects to "rabbitmq-dev". Is that a load balancer sitting in front of the three nodes? How do you deal with the disconnects resulting from the shutting down of nodes? There doesn't seem to be any code to handle that. Mind you, I suspect that the failure should still be reproducible without any subscriptions and publishes. Would be good to try that and just watch the broker logs for errors. > As long as maillist does not allow zip attaches, I'm mailing to you guys > directly Thanks for posting these. There is an error in the logs that we haven't seen before: {{badmatch,[]}, [{rabbit_mirror_queue_misc,'-remove_from_queue/2-fun-0-',2}, {mnesia_tm,apply_fun,3}, {mnesia_tm,execute_transaction,5}, {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1}, {worker_pool_worker,handle_call,3}, {gen_server2,handle_msg,2}, {proc_lib,wake_up,3}]} Looking at the code, this appears to indicate that there are no master/mirror processes left for the queue. Which is...unexpected. That should give us something to go on. Regards, Matthias. _______________________________________________ rabbitmq-discuss mailing list [hidden email] https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss |
|
Thanks for looking into it Matthias,
"rabbitmq-dev": yes, it is load balancer. Disconnects: I was too lazy to write code for it and just restarted client manually. I did make a short try to reproduce failure without any load but it worked just fine. I'll try more today. Vadim. On Sat, Apr 28, 2012 at 12:13 AM, Matthias Radestock <[hidden email]> wrote: Vadim, -- From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is explicitly specified _______________________________________________ rabbitmq-discuss mailing list [hidden email] https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss |
| Powered by Nabble | Edit this page |
