rabbitmqctl start_app hangs when replacing mirrored cluster instances in EC2

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

rabbitmqctl start_app hangs when replacing mirrored cluster instances in EC2

Mike Zraly
[I tried posting this to the new group, rabbitmq-users, but got no response.  Google groups tells me rabbitmq-users only has 101 members now, compared to 1800 or so for rabbitmq-discuss, so I hope re-posting to the larger group will at least elicit some (non-meta) feedback.]

Hi all,

I am setting up a RabbitMQ cluster in an Amazon EC2 region.  Each host is in the same geographical region, so I do not expect network partitions in the sense that two members of the cluster are both running but cannot communicate with each other.  However it is reasonable to expect individual cluster hosts to be terminated and replaced with new hosts having the same hostname but a new IP address and a fresh install of RabbitMQ.  A typical use case for this is a rolling upgrade where we keep 2 of the 3 cluster nodes up at all times to continue providing service during the upgrade period.

What I hope is that the same post-install provisioning script that joins a newly created instance into the cluster will work for the new instance that is taking over for an older one.  What I am seeing is rabbitmqctl start_app hang.

The installation sequence is basically this:

install rabbitmq-server_3.3.1-1
enable management plugin
add health check user account with monitoring tag
add application user account
add HA policy '{"ha-mode": "all", "ha-sync-mode": "automatic"} for all application queues
service rabbitmq-server stop
set /var/lib/rabbitmq/.erlang.cookie
reboot system (restarting rabbitmq server)
for each hostname 'target' that this host should join into a cluster with:
    if target is listening on port 5672
        rabbitmqctl stop_app
        if rabbitmqctl join_cluster target has non-zero exit status
            rabbitmqctl start_app

What I see if I start a cluster with hosts A, B, and C, then terminate instance C and replace it with a new instance that executes these same steps, is that rabbitmqctl join_cluster succeeds saying C is already part of the cluster, then rabbitmqctl start_app hangs.

What am I doing wrong?


_______________________________________________
rabbitmq-discuss mailing list has moved to https://groups.google.com/forum/#!forum/rabbitmq-users,
please subscribe to the new list!

[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: rabbitmqctl start_app hangs when replacing mirrored cluster instances in EC2

mc717990
If this is what I've seen before the cluster thinks C is node XYZ.  But you're trying to tell the cluster that C is really your new host YZX.  You need to remove the old node from the cluster to add your new node as a replacement for C.  Your new node tries to start up and thinks it should be part of the cluster because you just tried to join it, but the cluster refuses to accept the new node so it seems to hang.  I could be completely wrong on this though.

As I recall, there was a rabbitmqctl command to completely remove a node from the cluster, though I don't recall what it is off hand.  You could try doing that first and then adding your node?

Jason


On Mon, Jul 7, 2014 at 6:30 AM, Mike Zraly <[hidden email]> wrote:
[I tried posting this to the new group, rabbitmq-users, but got no response.  Google groups tells me rabbitmq-users only has 101 members now, compared to 1800 or so for rabbitmq-discuss, so I hope re-posting to the larger group will at least elicit some (non-meta) feedback.]

Hi all,

I am setting up a RabbitMQ cluster in an Amazon EC2 region.  Each host is in the same geographical region, so I do not expect network partitions in the sense that two members of the cluster are both running but cannot communicate with each other.  However it is reasonable to expect individual cluster hosts to be terminated and replaced with new hosts having the same hostname but a new IP address and a fresh install of RabbitMQ.  A typical use case for this is a rolling upgrade where we keep 2 of the 3 cluster nodes up at all times to continue providing service during the upgrade period.

What I hope is that the same post-install provisioning script that joins a newly created instance into the cluster will work for the new instance that is taking over for an older one.  What I am seeing is rabbitmqctl start_app hang.

The installation sequence is basically this:

install rabbitmq-server_3.3.1-1
enable management plugin
add health check user account with monitoring tag
add application user account
add HA policy '{"ha-mode": "all", "ha-sync-mode": "automatic"} for all application queues
service rabbitmq-server stop
set /var/lib/rabbitmq/.erlang.cookie
reboot system (restarting rabbitmq server)
for each hostname 'target' that this host should join into a cluster with:
    if target is listening on port 5672
        rabbitmqctl stop_app
        if rabbitmqctl join_cluster target has non-zero exit status
            rabbitmqctl start_app

What I see if I start a cluster with hosts A, B, and C, then terminate instance C and replace it with a new instance that executes these same steps, is that rabbitmqctl join_cluster succeeds saying C is already part of the cluster, then rabbitmqctl start_app hangs.

What am I doing wrong?


_______________________________________________
rabbitmq-discuss mailing list has moved to https://groups.google.com/forum/#!forum/rabbitmq-users,
please subscribe to the new list!

[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




--
Jason McIntosh
https://github.com/jasonmcintosh/
573-424-7612

_______________________________________________
rabbitmq-discuss mailing list has moved to https://groups.google.com/forum/#!forum/rabbitmq-users,
please subscribe to the new list!

[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: rabbitmqctl start_app hangs when replacing mirrored cluster instances in EC2

Simon MacMullen-2
Correct.

See my answer in the new group here :-)

https://groups.google.com/forum/#!topic/rabbitmq-users/QoHbNrK_Zg4

Cheers, Simon

On 10/07/2014 3:39PM, Jason McIntosh wrote:

> If this is what I've seen before the cluster thinks C is node XYZ.  But
> you're trying to tell the cluster that C is really your new host YZX.
>   You need to remove the old node from the cluster to add your new node
> as a replacement for C.  Your new node tries to start up and thinks it
> should be part of the cluster because you just tried to join it, but the
> cluster refuses to accept the new node so it seems to hang.  I could be
> completely wrong on this though.
>
> As I recall, there was a rabbitmqctl command to completely remove a node
> from the cluster, though I don't recall what it is off hand.  You could
> try doing that first and then adding your node?
>
> Jason
>
>
> On Mon, Jul 7, 2014 at 6:30 AM, Mike Zraly <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     [I tried posting this to the new group, rabbitmq-users, but got no
>     response.  Google groups tells me rabbitmq-users only has 101
>     members now, compared to 1800 or so for rabbitmq-discuss, so I hope
>     re-posting to the larger group will at least elicit some (non-meta)
>     feedback.]
>
>     Hi all,
>
>     I am setting up a RabbitMQ cluster in an Amazon EC2 region.  Each
>     host is in the same geographical region, so I do not expect network
>     partitions in the sense that two members of the cluster are both
>     running but cannot communicate with each other.  However it is
>     reasonable to expect individual cluster hosts to be terminated and
>     replaced with new hosts having the same hostname but a new IP
>     address and a fresh install of RabbitMQ.  A typical use case for
>     this is a rolling upgrade where we keep 2 of the 3 cluster nodes up
>     at all times to continue providing service during the upgrade period.
>
>     What I hope is that the same post-install provisioning script that
>     joins a newly created instance into the cluster will work for the
>     new instance that is taking over for an older one.  What I am seeing
>     is rabbitmqctl start_app hang.
>
>     The installation sequence is basically this:
>
>     install rabbitmq-server_3.3.1-1
>     enable management plugin
>     add health check user account with monitoring tag
>     add application user account
>     add HA policy '{"ha-mode": "all", "ha-sync-mode": "automatic"} for
>     all application queues
>     service rabbitmq-server stop
>     set /var/lib/rabbitmq/.erlang.__cookie
>     reboot system (restarting rabbitmq server)
>     for each hostname 'target' that this host should join into a cluster
>     with:
>          if target is listening on port 5672
>              rabbitmqctl stop_app
>              if rabbitmqctl join_cluster target has non-zero exit status
>                  rabbitmqctl start_app
>
>     What I see if I start a cluster with hosts A, B, and C, then
>     terminate instance C and replace it with a new instance that
>     executes these same steps, is that rabbitmqctl join_cluster succeeds
>     saying C is already part of the cluster, then rabbitmqctl start_app
>     hangs.
>
>     What am I doing wrong?
>
>
>     _______________________________________________
>     rabbitmq-discuss mailing list has moved to
>     https://groups.google.com/forum/#!forum/rabbitmq-users,
>     please subscribe to the new list!
>
>     [hidden email]
>     <mailto:[hidden email]>
>     https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
>
>
> --
> Jason McIntosh
> https://github.com/jasonmcintosh/
> 573-424-7612
>
>
> _______________________________________________
> rabbitmq-discuss mailing list has moved to https://groups.google.com/forum/#!forum/rabbitmq-users,
> please subscribe to the new list!
>
> [hidden email]
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list has moved to https://groups.google.com/forum/#!forum/rabbitmq-users,
please subscribe to the new list!

[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss