Clustering - just can't get it going

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Clustering - just can't get it going

Derek Wyatt
Hi,

I've seen a number of people failing to get clustering running and, unfortunately, I can't get it going either.  Here's the summary of what I've got:
  • Two nodes - RMQ1 and RMQ2
  • I can ping RMQ1 from RMQ2, and vice versa
  • I can telnet from RMQ1 to RMQ2:epmd, and vice versa
  • I can telnet from RMQ1 to RMQ2:amqp, and vice versa
  • The cookie file is identical, as is clear from the startup INFO
My goal is to have RMQ2 join RMQ1 in a cluster.

The servers are started using the init script in Ubuntu (i.e. service rabbitmq-server start).  This is different than the script at http://www.rabbitmq.com/clustering.html, which says to start with "rabbitmq-server -detached".  I've tried that and it doesn't seem to make any difference so I always use the init script instead.

So, the script says to stop the RMQ2 server and then join the cluster.  The following transcript shows how well all this goes:

02:~$ sudo rabbitmqctl stop_app
Stopping node 'rabbit@RMQ2' ...
...done.

02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1
Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}

However, as I said above, telnetting to the ports works just fine:

02:~$ telnet RMQ1 epmd
Trying <ip address>...
Connected to RMQ1
Escape character is '^]'.
booger!
Connection closed by foreign host.

02:~$ telnet RMQ1 amqp
Trying <ip address>...
Connected to RMQ1
Escape character is '^]'.
booger!
AMQP Connection closed by foreign host.

I'm stuck for what else to test.  Does anyone know how to troubleshoot this thing further?

Thanks,
Derek


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

mc717990
Check your erlang cookie on both servers to make sure it matches I think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl from one machine and see if you can connect to another to list queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for example.  If both servers can talk to each other then it should be rabbitmqctl stop_app, join_cluster, start_app.

Jason


On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]> wrote:
Hi,

I've seen a number of people failing to get clustering running and, unfortunately, I can't get it going either.  Here's the summary of what I've got:
  • Two nodes - RMQ1 and RMQ2
  • I can ping RMQ1 from RMQ2, and vice versa
  • I can telnet from RMQ1 to RMQ2:epmd, and vice versa
  • I can telnet from RMQ1 to RMQ2:amqp, and vice versa
  • The cookie file is identical, as is clear from the startup INFO
My goal is to have RMQ2 join RMQ1 in a cluster.

The servers are started using the init script in Ubuntu (i.e. service rabbitmq-server start).  This is different than the script at http://www.rabbitmq.com/clustering.html, which says to start with "rabbitmq-server -detached".  I've tried that and it doesn't seem to make any difference so I always use the init script instead.

So, the script says to stop the RMQ2 server and then join the cluster.  The following transcript shows how well all this goes:

02:~$ sudo rabbitmqctl stop_app
Stopping node 'rabbit@RMQ2' ...
...done.

02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1
Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}

However, as I said above, telnetting to the ports works just fine:

02:~$ telnet RMQ1 epmd
Trying <ip address>...
Connected to RMQ1
Escape character is '^]'.
booger!
Connection closed by foreign host.

02:~$ telnet RMQ1 amqp
Trying <ip address>...
Connected to RMQ1
Escape character is '^]'.
booger!
AMQP Connection closed by foreign host.

I'm stuck for what else to test.  Does anyone know how to troubleshoot this thing further?

Thanks,
Derek


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




--
Jason McIntosh
http://mcintosh.poetshome.com/blog/
573-424-7612

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Derek Wyatt
Thanks Jason.  The cookies are the same.  Running anything from rabbitmqctl to the remote host fails, unfortunately.  The list_queues call fails to connect, the same as join_cluster.

As I indicated, the telnets all work just fine, so I'm stuck for what to diagnose next.  Any ideas would be great.


On 25 September 2013 09:57, Jason McIntosh <[hidden email]> wrote:
Check your erlang cookie on both servers to make sure it matches I think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl from one machine and see if you can connect to another to list queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for example.  If both servers can talk to each other then it should be rabbitmqctl stop_app, join_cluster, start_app.

Jason


On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]> wrote:
Hi,

I've seen a number of people failing to get clustering running and, unfortunately, I can't get it going either.  Here's the summary of what I've got:
  • Two nodes - RMQ1 and RMQ2
  • I can ping RMQ1 from RMQ2, and vice versa
  • I can telnet from RMQ1 to RMQ2:epmd, and vice versa
  • I can telnet from RMQ1 to RMQ2:amqp, and vice versa
  • The cookie file is identical, as is clear from the startup INFO
My goal is to have RMQ2 join RMQ1 in a cluster.

The servers are started using the init script in Ubuntu (i.e. service rabbitmq-server start).  This is different than the script at http://www.rabbitmq.com/clustering.html, which says to start with "rabbitmq-server -detached".  I've tried that and it doesn't seem to make any difference so I always use the init script instead.

So, the script says to stop the RMQ2 server and then join the cluster.  The following transcript shows how well all this goes:

02:~$ sudo rabbitmqctl stop_app
Stopping node 'rabbit@RMQ2' ...
...done.

02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1
Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}

However, as I said above, telnetting to the ports works just fine:

02:~$ telnet RMQ1 epmd
Trying <ip address>...
Connected to RMQ1
Escape character is '^]'.
booger!
Connection closed by foreign host.

02:~$ telnet RMQ1 amqp
Trying <ip address>...
Connected to RMQ1
Escape character is '^]'.
booger!
AMQP Connection closed by foreign host.

I'm stuck for what else to test.  Does anyone know how to troubleshoot this thing further?

Thanks,
Derek


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




--
Jason McIntosh
http://mcintosh.poetshome.com/blog/
<a href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Simon MacMullen-2
Are other ports firewalled? If so please read
http://www.rabbitmq.com/clustering.html#firewall

Cheers, Simon

On 25/09/13 15:36, Derek Wyatt wrote:

> Thanks Jason.  The cookies are the same.  Running anything from
> rabbitmqctl to the remote host fails, unfortunately.  The list_queues
> call fails to connect, the same as join_cluster.
>
> As I indicated, the telnets all work just fine, so I'm stuck for what to
> diagnose next.  Any ideas would be great.
>
>
> On 25 September 2013 09:57, Jason McIntosh <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Check your erlang cookie on both servers to make sure it matches I
>     think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl
>     from one machine and see if you can connect to another to list
>     queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for
>     example.  If both servers can talk to each other then it should be
>     rabbitmqctl stop_app, join_cluster, start_app.
>
>     Jason
>
>
>     On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         Hi,
>
>         I've seen a number of people failing to get clustering running
>         and, unfortunately, I can't get it going either.  Here's the
>         summary of what I've got:
>
>           * Two nodes - RMQ1 and RMQ2
>           * I can ping RMQ1 from RMQ2, and vice versa
>           * I can telnet from RMQ1 to RMQ2:epmd, and vice versa
>           * I can telnet from RMQ1 to RMQ2:amqp, and vice versa
>           * The cookie file is identical, as is clear from the startup INFO
>
>         My goal is to have RMQ2 join RMQ1 in a cluster.
>
>         The servers are started using the init script in Ubuntu (i.e.
>         service rabbitmq-server start).  This is different than the
>         script at http://www.rabbitmq.com/clustering.html, which says to
>         start with "rabbitmq-server -detached".  I've tried that and it
>         doesn't seem to make any difference so I always use the init
>         script instead.
>
>         So, the script says to stop the RMQ2 server and then join the
>         cluster.  The following transcript shows how well all this goes:
>
>         02:~$ sudo rabbitmqctl stop_app
>         Stopping node 'rabbit@RMQ2' ...
>         ...done.
>
>         02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1
>         Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...
>         Error: {cannot_discover_cluster,"The nodes provided are either
>         offline or not running"}
>
>         However, as I said above, telnetting to the ports works just fine:
>
>         02:~$ telnet RMQ1 epmd
>         Trying <ip address>...
>         Connected to RMQ1
>         Escape character is '^]'.
>         booger!
>         Connection closed by foreign host.
>
>         02:~$ telnet RMQ1 amqp
>         Trying <ip address>...
>         Connected to RMQ1
>         Escape character is '^]'.
>         booger!
>         AMQPConnection closed by foreign host.
>
>         I'm stuck for what else to test.  Does anyone know how to
>         troubleshoot this thing further?
>
>         Thanks,
>         Derek
>
>
>         _______________________________________________
>         rabbitmq-discuss mailing list
>         [hidden email]
>         <mailto:[hidden email]>
>         https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
>
>
>     --
>     Jason McIntosh
>     http://mcintosh.poetshome.com/blog/
>     573-424-7612 <tel:573-424-7612>
>
>     _______________________________________________
>     rabbitmq-discuss mailing list
>     [hidden email]
>     <mailto:[hidden email]>
>     https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> [hidden email]
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>


--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Derek Wyatt
In reply to this post by mc717990
Ah, I do have more information though:

DIAGNOSTICS
===========

nodes in question: ['RMQ1']

hosts, their running nodes and ports:
- unable to connect to epmd on RMQ1: nxdomain (non-existing domain)

current node details:
- node name: 'rabbitmqctl1577@RMQ2'
- home dir: /var/lib/rabbitmq
- cookie hash: ohQKEF09peb6bAgNqawvKA==

And just to be clear, the cookie is the same:

01:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie 
a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie
02:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie
a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

Somehow, telnet to epmd works just fine, but something that RMQ is doing fails to make that happen.  Is there some sort of DNS work that it's doing, instead of using the hosts files?

i.e. one thing I found is that nslookup fails:

02:~$ nslookup RMQ1
;; Got SERVFAIL reply from <ipaddress>, trying next server
Server: <ipaddress>
Address: <ipaddress>

** server can't find RMQ1: SERVFAIL

But if I ping RMQ1 it works fine.  /etc/nsswitch.conf specifies that files should be tried first, before DNS w.r.t. hosts.

So, it looks like RMQ is doing something more rigorous to resolve the host, and I don't know how to change that.  I also don't have access to the DNS server configuration in order to modify it in any way.



On 25 September 2013 09:57, Jason McIntosh <[hidden email]> wrote:
Check your erlang cookie on both servers to make sure it matches I think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl from one machine and see if you can connect to another to list queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for example.  If both servers can talk to each other then it should be rabbitmqctl stop_app, join_cluster, start_app.

Jason


On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]> wrote:
Hi,

I've seen a number of people failing to get clustering running and, unfortunately, I can't get it going either.  Here's the summary of what I've got:
  • Two nodes - RMQ1 and RMQ2
  • I can ping RMQ1 from RMQ2, and vice versa
  • I can telnet from RMQ1 to RMQ2:epmd, and vice versa
  • I can telnet from RMQ1 to RMQ2:amqp, and vice versa
  • The cookie file is identical, as is clear from the startup INFO
My goal is to have RMQ2 join RMQ1 in a cluster.

The servers are started using the init script in Ubuntu (i.e. service rabbitmq-server start).  This is different than the script at http://www.rabbitmq.com/clustering.html, which says to start with "rabbitmq-server -detached".  I've tried that and it doesn't seem to make any difference so I always use the init script instead.

So, the script says to stop the RMQ2 server and then join the cluster.  The following transcript shows how well all this goes:

02:~$ sudo rabbitmqctl stop_app
Stopping node 'rabbit@RMQ2' ...
...done.

02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1
Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}

However, as I said above, telnetting to the ports works just fine:

02:~$ telnet RMQ1 epmd
Trying <ip address>...
Connected to RMQ1
Escape character is '^]'.
booger!
Connection closed by foreign host.

02:~$ telnet RMQ1 amqp
Trying <ip address>...
Connected to RMQ1
Escape character is '^]'.
booger!
AMQP Connection closed by foreign host.

I'm stuck for what else to test.  Does anyone know how to troubleshoot this thing further?

Thanks,
Derek


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




--
Jason McIntosh
http://mcintosh.poetshome.com/blog/
<a href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Derek Wyatt
In reply to this post by Simon MacMullen-2
I forgot to mention that, sorry.  Everything is wide open.

01:~$ sudo iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination       


02:~$ sudo iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination       






On 25 September 2013 10:41, Simon MacMullen <[hidden email]> wrote:
Are other ports firewalled? If so please read http://www.rabbitmq.com/clustering.html#firewall

Cheers, Simon

On 25/09/13 15:36, Derek Wyatt wrote:
Thanks Jason.  The cookies are the same.  Running anything from
rabbitmqctl to the remote host fails, unfortunately.  The list_queues
call fails to connect, the same as join_cluster.

As I indicated, the telnets all work just fine, so I'm stuck for what to
diagnose next.  Any ideas would be great.


On 25 September 2013 09:57, Jason McIntosh <[hidden email]
<mailto:[hidden email]>> wrote:

    Check your erlang cookie on both servers to make sure it matches I
    think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl
    from one machine and see if you can connect to another to list
    queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for
    example.  If both servers can talk to each other then it should be
    rabbitmqctl stop_app, join_cluster, start_app.

    Jason


    On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]
    <mailto:[hidden email]>> wrote:

        Hi,

        I've seen a number of people failing to get clustering running
        and, unfortunately, I can't get it going either.  Here's the
        summary of what I've got:

          * Two nodes - RMQ1 and RMQ2
          * I can ping RMQ1 from RMQ2, and vice versa
          * I can telnet from RMQ1 to RMQ2:epmd, and vice versa
          * I can telnet from RMQ1 to RMQ2:amqp, and vice versa
          * The cookie file is identical, as is clear from the startup INFO

        My goal is to have RMQ2 join RMQ1 in a cluster.

        The servers are started using the init script in Ubuntu (i.e.
        service rabbitmq-server start).  This is different than the
        script at http://www.rabbitmq.com/clustering.html, which says to
        start with "rabbitmq-server -detached".  I've tried that and it
        doesn't seem to make any difference so I always use the init
        script instead.

        So, the script says to stop the RMQ2 server and then join the
        cluster.  The following transcript shows how well all this goes:

        02:~$ sudo rabbitmqctl stop_app
        Stopping node 'rabbit@RMQ2' ...
        ...done.

        02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1
        Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...
        Error: {cannot_discover_cluster,"The nodes provided are either
        offline or not running"}

        However, as I said above, telnetting to the ports works just fine:

        02:~$ telnet RMQ1 epmd
        Trying <ip address>...
        Connected to RMQ1
        Escape character is '^]'.
        booger!
        Connection closed by foreign host.

        02:~$ telnet RMQ1 amqp
        Trying <ip address>...
        Connected to RMQ1
        Escape character is '^]'.
        booger!
        AMQPConnection closed by foreign host.

        I'm stuck for what else to test.  Does anyone know how to
        troubleshoot this thing further?

        Thanks,
        Derek


        _______________________________________________
        rabbitmq-discuss mailing list
        [hidden email]
        <mailto:[hidden email]>
        https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




    --
    Jason McIntosh
    http://mcintosh.poetshome.com/blog/
    <a href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612 <tel:<a href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612>

    _______________________________________________
    rabbitmq-discuss mailing list
    [hidden email]
    <mailto:[hidden email]>
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



--
Simon MacMullen
RabbitMQ, Pivotal


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Robin Lawrie - HostelBookers
In reply to this post by Derek Wyatt

Hi,

 

In my case, I have a 2 node cluster (called cache1 and cache2) and I needed to add an entry to the hosts file on both nodes to ensure each node can resolve the name of the other node before clustering worked for me.

 

My hosts file is in /etc and is called hosts

 

In there I entered the following:

 

On cache1.lon.hosting, enter the line 192.168.3.1 Cache2.domain.com Cache2

On cache2.lon.hosting, enter the line 192.168.3.0 Cache1.domain.com Cache1

 

Once done, I needed to confirm I could ping each node using it’s hostname from the other node. I don’t care about DNS or nslookup working/resolving the name.

 

HTH

 

Robin

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Derek Wyatt
Sent: 25 September 2013 15:47
To: Discussions about RabbitMQ
Subject: Re: [rabbitmq-discuss] Clustering - just can't get it going

 

Ah, I do have more information though:

 

DIAGNOSTICS

===========

 

nodes in question: ['RMQ1']

 

hosts, their running nodes and ports:

- unable to connect to epmd on RMQ1: nxdomain (non-existing domain)

 

current node details:

- node name: 'rabbitmqctl1577@RMQ2'

- home dir: /var/lib/rabbitmq

- cookie hash: ohQKEF09peb6bAgNqawvKA==

 

And just to be clear, the cookie is the same:

 

01:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie 

a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

02:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie

a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

 

Somehow, telnet to epmd works just fine, but something that RMQ is doing fails to make that happen.  Is there some sort of DNS work that it's doing, instead of using the hosts files?

 

i.e. one thing I found is that nslookup fails:

 

02:~$ nslookup RMQ1

;; Got SERVFAIL reply from <ipaddress>, trying next server

Server:       <ipaddress>

Address:  <ipaddress>

 

** server can't find RMQ1: SERVFAIL

 

But if I ping RMQ1 it works fine.  /etc/nsswitch.conf specifies that files should be tried first, before DNS w.r.t. hosts.

 

So, it looks like RMQ is doing something more rigorous to resolve the host, and I don't know how to change that.  I also don't have access to the DNS server configuration in order to modify it in any way.

 

 

On 25 September 2013 09:57, Jason McIntosh <[hidden email]> wrote:

Check your erlang cookie on both servers to make sure it matches I think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl from one machine and see if you can connect to another to list queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for example.  If both servers can talk to each other then it should be rabbitmqctl stop_app, join_cluster, start_app.

Jason

 

On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]> wrote:

Hi,

 

I've seen a number of people failing to get clustering running and, unfortunately, I can't get it going either.  Here's the summary of what I've got:

  • Two nodes - RMQ1 and RMQ2
  • I can ping RMQ1 from RMQ2, and vice versa
  • I can telnet from RMQ1 to RMQ2:epmd, and vice versa
  • I can telnet from RMQ1 to RMQ2:amqp, and vice versa
  • The cookie file is identical, as is clear from the startup INFO

My goal is to have RMQ2 join RMQ1 in a cluster.

 

The servers are started using the init script in Ubuntu (i.e. service rabbitmq-server start).  This is different than the script at http://www.rabbitmq.com/clustering.html, which says to start with "rabbitmq-server -detached".  I've tried that and it doesn't seem to make any difference so I always use the init script instead.

 

So, the script says to stop the RMQ2 server and then join the cluster.  The following transcript shows how well all this goes:

 

02:~$ sudo rabbitmqctl stop_app

Stopping node 'rabbit@RMQ2' ...

...done.

 

02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1

Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...

Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}

 

However, as I said above, telnetting to the ports works just fine:

 

02:~$ telnet RMQ1 epmd

Trying <ip address>...

Connected to RMQ1

Escape character is '^]'.

booger!

Connection closed by foreign host.

 

02:~$ telnet RMQ1 amqp

Trying <ip address>...

Connected to RMQ1

Escape character is '^]'.

booger!

AMQP Connection closed by foreign host.

 

I'm stuck for what else to test.  Does anyone know how to troubleshoot this thing further?

 

Thanks,

Derek

 


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



 

--
Jason McIntosh
http://mcintosh.poetshome.com/blog/
<a href="tel:573-424-7612" target="_blank">573-424-7612


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

 


This email is from Hostelbookers.com Limited. Registered office: 52-54 High Holborn, London, WC1V 6RL, UK. Registered in England under Company No.: 2841908. This email and any files transmitted with it are confidential and may be privileged and are intended solely for the use of the individual or entity to whom they are addressed. As email can be subject to operational or technical difficulties and time delays, communications that are subject to deadlines should also be sent by post. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the email in error, please notify [hidden email]



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Derek Wyatt
Damn. That's exactly the setup I have.


On 25 September 2013 10:55, Robin Lawrie - HostelBookers <[hidden email]> wrote:

Hi,

 

In my case, I have a 2 node cluster (called cache1 and cache2) and I needed to add an entry to the hosts file on both nodes to ensure each node can resolve the name of the other node before clustering worked for me.

 

My hosts file is in /etc and is called hosts

 

In there I entered the following:

 

On cache1.lon.hosting, enter the line 192.168.3.1 Cache2.domain.com Cache2

On cache2.lon.hosting, enter the line 192.168.3.0 Cache1.domain.com Cache1

 

Once done, I needed to confirm I could ping each node using it’s hostname from the other node. I don’t care about DNS or nslookup working/resolving the name.

 

HTH

 

Robin

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Derek Wyatt
Sent: 25 September 2013 15:47
To: Discussions about RabbitMQ
Subject: Re: [rabbitmq-discuss] Clustering - just can't get it going

 

Ah, I do have more information though:

 

DIAGNOSTICS

===========

 

nodes in question: ['RMQ1']

 

hosts, their running nodes and ports:

- unable to connect to epmd on RMQ1: nxdomain (non-existing domain)

 

current node details:

- node name: 'rabbitmqctl1577@RMQ2'

- home dir: /var/lib/rabbitmq

- cookie hash: ohQKEF09peb6bAgNqawvKA==

 

And just to be clear, the cookie is the same:

 

01:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie 

a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

02:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie

a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

 

Somehow, telnet to epmd works just fine, but something that RMQ is doing fails to make that happen.  Is there some sort of DNS work that it's doing, instead of using the hosts files?

 

i.e. one thing I found is that nslookup fails:

 

02:~$ nslookup RMQ1

;; Got SERVFAIL reply from <ipaddress>, trying next server

Server:       <ipaddress>

Address:  <ipaddress>

 

** server can't find RMQ1: SERVFAIL

 

But if I ping RMQ1 it works fine.  /etc/nsswitch.conf specifies that files should be tried first, before DNS w.r.t. hosts.

 

So, it looks like RMQ is doing something more rigorous to resolve the host, and I don't know how to change that.  I also don't have access to the DNS server configuration in order to modify it in any way.

 

 

On 25 September 2013 09:57, Jason McIntosh <[hidden email]> wrote:

Check your erlang cookie on both servers to make sure it matches I think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl from one machine and see if you can connect to another to list queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for example.  If both servers can talk to each other then it should be rabbitmqctl stop_app, join_cluster, start_app.

Jason

 

On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]> wrote:

Hi,

 

I've seen a number of people failing to get clustering running and, unfortunately, I can't get it going either.  Here's the summary of what I've got:

  • Two nodes - RMQ1 and RMQ2
  • I can ping RMQ1 from RMQ2, and vice versa
  • I can telnet from RMQ1 to RMQ2:epmd, and vice versa
  • I can telnet from RMQ1 to RMQ2:amqp, and vice versa
  • The cookie file is identical, as is clear from the startup INFO

My goal is to have RMQ2 join RMQ1 in a cluster.

 

The servers are started using the init script in Ubuntu (i.e. service rabbitmq-server start).  This is different than the script at http://www.rabbitmq.com/clustering.html, which says to start with "rabbitmq-server -detached".  I've tried that and it doesn't seem to make any difference so I always use the init script instead.

 

So, the script says to stop the RMQ2 server and then join the cluster.  The following transcript shows how well all this goes:

 

02:~$ sudo rabbitmqctl stop_app

Stopping node 'rabbit@RMQ2' ...

...done.

 

02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1

Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...

Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}

 

However, as I said above, telnetting to the ports works just fine:

 

02:~$ telnet RMQ1 epmd

Trying <ip address>...

Connected to RMQ1

Escape character is '^]'.

booger!

Connection closed by foreign host.

 

02:~$ telnet RMQ1 amqp

Trying <ip address>...

Connected to RMQ1

Escape character is '^]'.

booger!

AMQP Connection closed by foreign host.

 

I'm stuck for what else to test.  Does anyone know how to troubleshoot this thing further?

 

Thanks,

Derek

 


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



 

--
Jason McIntosh
http://mcintosh.poetshome.com/blog/
<a href="tel:573-424-7612" target="_blank">573-424-7612


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

 


This email is from Hostelbookers.com Limited. Registered office: 52-54 High Holborn, London, WC1V 6RL, UK. Registered in England under Company No.: 2841908. This email and any files transmitted with it are confidential and may be privileged and are intended solely for the use of the individual or entity to whom they are addressed. As email can be subject to operational or technical difficulties and time delays, communications that are subject to deadlines should also be sent by post. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the email in error, please notify [hidden email]



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Robin Lawrie - HostelBookers

The other issue I had in creating a cluster was that it wouldn’t create if the web management plugin was running/installed on both nodes.

 

I needed to enter the following commands on each node:

 

rabbitmq-plugins disable rabbitmq_management

service rabbitmq-server restart

 

Regards

 

Robin

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Derek Wyatt
Sent: 25 September 2013 16:04
To: Discussions about RabbitMQ
Subject: Re: [rabbitmq-discuss] Clustering - just can't get it going

 

Damn. That's exactly the setup I have.

 

On 25 September 2013 10:55, Robin Lawrie - HostelBookers <[hidden email]> wrote:

Hi,

 

In my case, I have a 2 node cluster (called cache1 and cache2) and I needed to add an entry to the hosts file on both nodes to ensure each node can resolve the name of the other node before clustering worked for me.

 

My hosts file is in /etc and is called hosts

 

In there I entered the following:

 

On cache1.lon.hosting, enter the line 192.168.3.1 Cache2.domain.com Cache2

On cache2.lon.hosting, enter the line 192.168.3.0 Cache1.domain.com Cache1

 

Once done, I needed to confirm I could ping each node using it’s hostname from the other node. I don’t care about DNS or nslookup working/resolving the name.

 

HTH

 

Robin

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Derek Wyatt
Sent: 25 September 2013 15:47
To: Discussions about RabbitMQ
Subject: Re: [rabbitmq-discuss] Clustering - just can't get it going

 

Ah, I do have more information though:

 

DIAGNOSTICS

===========

 

nodes in question: ['RMQ1']

 

hosts, their running nodes and ports:

- unable to connect to epmd on RMQ1: nxdomain (non-existing domain)

 

current node details:

- node name: 'rabbitmqctl1577@RMQ2'

- home dir: /var/lib/rabbitmq

- cookie hash: ohQKEF09peb6bAgNqawvKA==

 

And just to be clear, the cookie is the same:

 

01:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie 

a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

02:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie

a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

 

Somehow, telnet to epmd works just fine, but something that RMQ is doing fails to make that happen.  Is there some sort of DNS work that it's doing, instead of using the hosts files?

 

i.e. one thing I found is that nslookup fails:

 

02:~$ nslookup RMQ1

;; Got SERVFAIL reply from <ipaddress>, trying next server

Server:       <ipaddress>

Address:  <ipaddress>

 

** server can't find RMQ1: SERVFAIL

 

But if I ping RMQ1 it works fine.  /etc/nsswitch.conf specifies that files should be tried first, before DNS w.r.t. hosts.

 

So, it looks like RMQ is doing something more rigorous to resolve the host, and I don't know how to change that.  I also don't have access to the DNS server configuration in order to modify it in any way.

 

 

On 25 September 2013 09:57, Jason McIntosh <[hidden email]> wrote:

Check your erlang cookie on both servers to make sure it matches I think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl from one machine and see if you can connect to another to list queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for example.  If both servers can talk to each other then it should be rabbitmqctl stop_app, join_cluster, start_app.

Jason

 

On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]> wrote:

Hi,

 

I've seen a number of people failing to get clustering running and, unfortunately, I can't get it going either.  Here's the summary of what I've got:

  • Two nodes - RMQ1 and RMQ2
  • I can ping RMQ1 from RMQ2, and vice versa
  • I can telnet from RMQ1 to RMQ2:epmd, and vice versa
  • I can telnet from RMQ1 to RMQ2:amqp, and vice versa
  • The cookie file is identical, as is clear from the startup INFO

My goal is to have RMQ2 join RMQ1 in a cluster.

 

The servers are started using the init script in Ubuntu (i.e. service rabbitmq-server start).  This is different than the script at http://www.rabbitmq.com/clustering.html, which says to start with "rabbitmq-server -detached".  I've tried that and it doesn't seem to make any difference so I always use the init script instead.

 

So, the script says to stop the RMQ2 server and then join the cluster.  The following transcript shows how well all this goes:

 

02:~$ sudo rabbitmqctl stop_app

Stopping node 'rabbit@RMQ2' ...

...done.

 

02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1

Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...

Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}

 

However, as I said above, telnetting to the ports works just fine:

 

02:~$ telnet RMQ1 epmd

Trying <ip address>...

Connected to RMQ1

Escape character is '^]'.

booger!

Connection closed by foreign host.

 

02:~$ telnet RMQ1 amqp

Trying <ip address>...

Connected to RMQ1

Escape character is '^]'.

booger!

AMQP Connection closed by foreign host.

 

I'm stuck for what else to test.  Does anyone know how to troubleshoot this thing further?

 

Thanks,

Derek

 


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



 

--
Jason McIntosh
http://mcintosh.poetshome.com/blog/
<a href="tel:573-424-7612" target="_blank">573-424-7612


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

 


This email is from Hostelbookers.com Limited. Registered office: 52-54 High Holborn, London, WC1V 6RL, UK. Registered in England under Company No.: 2841908. This email and any files transmitted with it are confidential and may be privileged and are intended solely for the use of the individual or entity to whom they are addressed. As email can be subject to operational or technical difficulties and time delays, communications that are subject to deadlines should also be sent by post. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the email in error, please notify [hidden email]



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

 


This email is from Hostelbookers.com Limited. Registered office: 52-54 High Holborn, London, WC1V 6RL, UK. Registered in England under Company No.: 2841908. This email and any files transmitted with it are confidential and may be privileged and are intended solely for the use of the individual or entity to whom they are addressed. As email can be subject to operational or technical difficulties and time delays, communications that are subject to deadlines should also be sent by post. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the email in error, please notify [hidden email]



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Derek Wyatt
That was something new to try - thanks - but didn't help :)

I'm looking into this nxdomain (non-existing domain) issue - it's gotta have something to do with that.  Seems odd though... the hosts file is just fine.


On 25 September 2013 11:08, Robin Lawrie - HostelBookers <[hidden email]> wrote:

The other issue I had in creating a cluster was that it wouldn’t create if the web management plugin was running/installed on both nodes.

 

I needed to enter the following commands on each node:

 

rabbitmq-plugins disable rabbitmq_management

service rabbitmq-server restart

 

Regards

 

Robin

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Derek Wyatt
Sent: 25 September 2013 16:04
To: Discussions about RabbitMQ
Subject: Re: [rabbitmq-discuss] Clustering - just can't get it going

 

Damn. That's exactly the setup I have.

 

On 25 September 2013 10:55, Robin Lawrie - HostelBookers <[hidden email]> wrote:

Hi,

 

In my case, I have a 2 node cluster (called cache1 and cache2) and I needed to add an entry to the hosts file on both nodes to ensure each node can resolve the name of the other node before clustering worked for me.

 

My hosts file is in /etc and is called hosts

 

In there I entered the following:

 

On cache1.lon.hosting, enter the line 192.168.3.1 Cache2.domain.com Cache2

On cache2.lon.hosting, enter the line 192.168.3.0 Cache1.domain.com Cache1

 

Once done, I needed to confirm I could ping each node using it’s hostname from the other node. I don’t care about DNS or nslookup working/resolving the name.

 

HTH

 

Robin

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Derek Wyatt
Sent: 25 September 2013 15:47
To: Discussions about RabbitMQ
Subject: Re: [rabbitmq-discuss] Clustering - just can't get it going

 

Ah, I do have more information though:

 

DIAGNOSTICS

===========

 

nodes in question: ['RMQ1']

 

hosts, their running nodes and ports:

- unable to connect to epmd on RMQ1: nxdomain (non-existing domain)

 

current node details:

- node name: 'rabbitmqctl1577@RMQ2'

- home dir: /var/lib/rabbitmq

- cookie hash: ohQKEF09peb6bAgNqawvKA==

 

And just to be clear, the cookie is the same:

 

01:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie 

a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

02:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie

a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

 

Somehow, telnet to epmd works just fine, but something that RMQ is doing fails to make that happen.  Is there some sort of DNS work that it's doing, instead of using the hosts files?

 

i.e. one thing I found is that nslookup fails:

 

02:~$ nslookup RMQ1

;; Got SERVFAIL reply from <ipaddress>, trying next server

Server:       <ipaddress>

Address:  <ipaddress>

 

** server can't find RMQ1: SERVFAIL

 

But if I ping RMQ1 it works fine.  /etc/nsswitch.conf specifies that files should be tried first, before DNS w.r.t. hosts.

 

So, it looks like RMQ is doing something more rigorous to resolve the host, and I don't know how to change that.  I also don't have access to the DNS server configuration in order to modify it in any way.

 

 

On 25 September 2013 09:57, Jason McIntosh <[hidden email]> wrote:

Check your erlang cookie on both servers to make sure it matches I think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl from one machine and see if you can connect to another to list queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for example.  If both servers can talk to each other then it should be rabbitmqctl stop_app, join_cluster, start_app.

Jason

 

On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]> wrote:

Hi,

 

I've seen a number of people failing to get clustering running and, unfortunately, I can't get it going either.  Here's the summary of what I've got:

  • Two nodes - RMQ1 and RMQ2
  • I can ping RMQ1 from RMQ2, and vice versa
  • I can telnet from RMQ1 to RMQ2:epmd, and vice versa
  • I can telnet from RMQ1 to RMQ2:amqp, and vice versa
  • The cookie file is identical, as is clear from the startup INFO

My goal is to have RMQ2 join RMQ1 in a cluster.

 

The servers are started using the init script in Ubuntu (i.e. service rabbitmq-server start).  This is different than the script at http://www.rabbitmq.com/clustering.html, which says to start with "rabbitmq-server -detached".  I've tried that and it doesn't seem to make any difference so I always use the init script instead.

 

So, the script says to stop the RMQ2 server and then join the cluster.  The following transcript shows how well all this goes:

 

02:~$ sudo rabbitmqctl stop_app

Stopping node 'rabbit@RMQ2' ...

...done.

 

02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1

Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...

Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}

 

However, as I said above, telnetting to the ports works just fine:

 

02:~$ telnet RMQ1 epmd

Trying <ip address>...

Connected to RMQ1

Escape character is '^]'.

booger!

Connection closed by foreign host.

 

02:~$ telnet RMQ1 amqp

Trying <ip address>...

Connected to RMQ1

Escape character is '^]'.

booger!

AMQP Connection closed by foreign host.

 

I'm stuck for what else to test.  Does anyone know how to troubleshoot this thing further?

 

Thanks,

Derek

 


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



 

--
Jason McIntosh
http://mcintosh.poetshome.com/blog/
<a href="tel:573-424-7612" target="_blank">573-424-7612


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

 


This email is from Hostelbookers.com Limited. Registered office: 52-54 High Holborn, London, WC1V 6RL, UK. Registered in England under Company No.: 2841908. This email and any files transmitted with it are confidential and may be privileged and are intended solely for the use of the individual or entity to whom they are addressed. As email can be subject to operational or technical difficulties and time delays, communications that are subject to deadlines should also be sent by post. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the email in error, please notify [hidden email]



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

 


This email is from Hostelbookers.com Limited. Registered office: 52-54 High Holborn, London, WC1V 6RL, UK. Registered in England under Company No.: 2841908. This email and any files transmitted with it are confidential and may be privileged and are intended solely for the use of the individual or entity to whom they are addressed. As email can be subject to operational or technical difficulties and time delays, communications that are subject to deadlines should also be sent by post. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the email in error, please notify [hidden email]



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Simon MacMullen-2
In reply to this post by Derek Wyatt
You may (depending on Erlang version) need to make sure that each
machine can resolve its own hostname too, as well as the other one.

Cheers, Simon

On 25/09/13 16:04, Derek Wyatt wrote:

> Damn. That's exactly the setup I have.
>
>
> On 25 September 2013 10:55, Robin Lawrie - HostelBookers
> <[hidden email] <mailto:[hidden email]>>
> wrote:
>
>     Hi,
>
>     In my case, I have a 2 node cluster (called cache1 and cache2) and I
>     needed to add an entry to the hosts file on both nodes to ensure
>     each node can resolve the name of the other node before clustering
>     worked for me.
>
>     My hosts file is in /etc and is called hosts
>
>     In there I entered the following:
>
>     On cache1.lon.hosting, enter the line 192.168.3.1 Cache2.domain.com
>     <http://Cache2.domain.com> Cache2
>
>     On cache2.lon.hosting, enter the line 192.168.3.0 Cache1.domain.com
>     <http://Cache1.domain.com> Cache1
>
>     Once done, I needed to confirm I could ping each node using it’s
>     hostname from the other node. I don’t care about DNS or nslookup
>     working/resolving the name.
>
>     HTH
>
>     Robin
>
>     *From:*[hidden email]
>     <mailto:[hidden email]>
>     [mailto:[hidden email]
>     <mailto:[hidden email]>] *On Behalf Of
>     *Derek Wyatt
>     *Sent:* 25 September 2013 15:47
>     *To:* Discussions about RabbitMQ
>     *Subject:* Re: [rabbitmq-discuss] Clustering - just can't get it going
>
>     Ah, I do have more information though:
>
>     DIAGNOSTICS
>
>     ===========
>
>     nodes in question: ['RMQ1']
>
>     hosts, their running nodes and ports:
>
>     - unable to connect to epmd on RMQ1: nxdomain (non-existing domain)
>
>     current node details:
>
>     - node name: 'rabbitmqctl1577@RMQ2'
>
>     - home dir: /var/lib/rabbitmq
>
>     - cookie hash: ohQKEF09peb6bAgNqawvKA==
>
>     And just to be clear, the cookie is the same:
>
>     *01*:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie
>
>     a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie
>
>     *02*:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie
>
>     a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie
>
>     Somehow, telnet to epmd works just fine, but something that RMQ is
>     doing fails to make that happen.  Is there some sort of DNS work
>     that it's doing, instead of using the hosts files?
>
>     i.e. one thing I found is that nslookup fails:
>
>     02:~$ nslookup RMQ1
>
>     ;; Got SERVFAIL reply from <ipaddress>, trying next server
>
>     Server:       <ipaddress>
>
>     Address:  <ipaddress>
>
>     ** server can't find RMQ1: SERVFAIL
>
>     But if I ping RMQ1 it works fine. /etc/nsswitch.conf specifies that
>     files should be tried first, before DNS w.r.t. hosts.
>
>     So, it looks like RMQ is doing something more rigorous to resolve
>     the host, and I don't know how to change that.  I also don't have
>     access to the DNS server configuration in order to modify it in any way.
>
>     On 25 September 2013 09:57, Jason McIntosh <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>     Check your erlang cookie on both servers to make sure it matches I
>     think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl
>     from one machine and see if you can connect to another to list
>     queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for
>     example.  If both servers can talk to each other then it should be
>     rabbitmqctl stop_app, join_cluster, start_app.
>
>     Jason
>
>     On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>     Hi,
>
>     I've seen a number of people failing to get clustering running and,
>     unfortunately, I can't get it going either.  Here's the summary of
>     what I've got:
>
>       * Two nodes - RMQ1 and RMQ2
>       * I can ping RMQ1 from RMQ2, and vice versa
>       * I can telnet from RMQ1 to RMQ2:epmd, and vice versa
>       * I can telnet from RMQ1 to RMQ2:amqp, and vice versa
>       * The cookie file is identical, as is clear from the startup INFO
>
>     My goal is to have RMQ2 join RMQ1 in a cluster.
>
>     The servers are started using the init script in Ubuntu (i.e.
>     service rabbitmq-server start).  This is different than the script
>     at http://www.rabbitmq.com/clustering.html, which says to start with
>     "rabbitmq-server -detached".  I've tried that and it doesn't seem to
>     make any difference so I always use the init script instead.
>
>     So, the script says to stop the RMQ2 server and then join the
>     cluster.  The following transcript shows how well all this goes:
>
>     02:~$ sudo rabbitmqctl stop_app
>
>     Stopping node 'rabbit@RMQ2' ...
>
>     ...done.
>
>     02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1
>
>     Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...
>
>     Error: {cannot_discover_cluster,"The nodes provided are either
>     offline or not running"}
>
>     However, as I said above, telnetting to the ports works just fine:
>
>     02:~$ telnet RMQ1 epmd
>
>     Trying <ip address>...
>
>     Connected to RMQ1
>
>     Escape character is '^]'.
>
>     booger!
>
>     Connection closed by foreign host.
>
>     02:~$ telnet RMQ1 amqp
>
>     Trying <ip address>...
>
>     Connected to RMQ1
>
>     Escape character is '^]'.
>
>     booger!
>
>     AMQP Connection closed by foreign host.
>
>     I'm stuck for what else to test.  Does anyone know how to
>     troubleshoot this thing further?
>
>     Thanks,
>
>     Derek
>
>
>     _______________________________________________
>     rabbitmq-discuss mailing list
>     [hidden email]
>     <mailto:[hidden email]>
>     https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
>
>     --
>     Jason McIntosh
>     http://mcintosh.poetshome.com/blog/
>     573-424-7612 <tel:573-424-7612>
>
>
>     _______________________________________________
>     rabbitmq-discuss mailing list
>     [hidden email]
>     <mailto:[hidden email]>
>     https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>     ------------------------------------------------------------------------
>
>     This email is from Hostelbookers.com Limited. Registered office:
>     52-54 High Holborn, London, WC1V 6RL, UK. Registered in England
>     under Company No.: 2841908. This email and any files transmitted
>     with it are confidential and may be privileged and are intended
>     solely for the use of the individual or entity to whom they are
>     addressed. As email can be subject to operational or technical
>     difficulties and time delays, communications that are subject to
>     deadlines should also be sent by post. Any unauthorised direct or
>     indirect dissemination, distribution or copying of this message and
>     any attachments is strictly prohibited. If you have received the
>     email in error, please notify [hidden email]
>     <mailto:[hidden email]>
>
>     ------------------------------------------------------------------------
>
>     _______________________________________________
>     rabbitmq-discuss mailing list
>     [hidden email]
>     <mailto:[hidden email]>
>     https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> [hidden email]
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>


--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Simon MacMullen-2
In reply to this post by Robin Lawrie - HostelBookers
On 25/09/13 16:08, Robin Lawrie - HostelBookers wrote:
> The other issue I had in creating a cluster was that it wouldn’t create
> if the web management plugin was running/installed on both nodes.

Uh, really?

If you try to create a cluster on a single machine then you need to
configure management to open a different port for each node. But
clusters where every node runs management are definitely supported (and
indeed rather common). What error message did you see?

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Robin Lawrie - HostelBookers
Yep, really. I can't recall the error message though but after creating the cluster the web management plugin was enabled again.

-----Original Message-----
From: Simon MacMullen [mailto:[hidden email]]
Sent: 25 September 2013 16:18
To: Discussions about RabbitMQ
Cc: Robin Lawrie - HostelBookers
Subject: Re: [rabbitmq-discuss] Clustering - just can't get it going

On 25/09/13 16:08, Robin Lawrie - HostelBookers wrote:
> The other issue I had in creating a cluster was that it wouldn't
> create if the web management plugin was running/installed on both nodes.

Uh, really?

If you try to create a cluster on a single machine then you need to configure management to open a different port for each node. But clusters where every node runs management are definitely supported (and indeed rather common). What error message did you see?

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
________________________________

This email is from Hostelbookers.com Limited. Registered office: 52-54 High Holborn, London, WC1V 6RL, UK. Registered in England under Company No.: 2841908. This email and any files transmitted with it are confidential and may be privileged and are intended solely for the use of the individual or entity to whom they are addressed. As email can be subject to operational or technical difficulties and time delays, communications that are subject to deadlines should also be sent by post. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the email in error, please notify [hidden email]

________________________________
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Derek Wyatt
In reply to this post by Simon MacMullen-2
This is rabbitmq 3.1.5 - I'm not sure what the erlang version is, but the erts version is 5.8.5.  I'm a little new to the whole erlang thing, so I just picked a component with 'e' in it :P

I'm planning to get these machines into the DNS proper to see if that helps.  It would be pretty weird if it works, since everyone else works OK with just /etc/hosts resolution, but it's worth a try.


On 25 September 2013 11:16, Simon MacMullen <[hidden email]> wrote:
You may (depending on Erlang version) need to make sure that each machine can resolve its own hostname too, as well as the other one.

Cheers, Simon

On 25/09/13 16:04, Derek Wyatt wrote:
Damn. That's exactly the setup I have.


On 25 September 2013 10:55, Robin Lawrie - HostelBookers
<[hidden email] <mailto:[hidden email]>>
wrote:

    Hi,

    In my case, I have a 2 node cluster (called cache1 and cache2) and I
    needed to add an entry to the hosts file on both nodes to ensure
    each node can resolve the name of the other node before clustering
    worked for me.

    My hosts file is in /etc and is called hosts

    In there I entered the following:

    On cache1.lon.hosting, enter the line 192.168.3.1 Cache2.domain.com
    <http://Cache2.domain.com> Cache2

    On cache2.lon.hosting, enter the line 192.168.3.0 Cache1.domain.com
    <http://Cache1.domain.com> Cache1

    Once done, I needed to confirm I could ping each node using it’s
    hostname from the other node. I don’t care about DNS or nslookup
    working/resolving the name.

    HTH

    Robin

    *From:*[hidden email]
    <mailto:[hidden email]>
    [mailto:[hidden email]
    <mailto:[hidden email]>] *On Behalf Of
    *Derek Wyatt
    *Sent:* 25 September 2013 15:47
    *To:* Discussions about RabbitMQ
    *Subject:* Re: [rabbitmq-discuss] Clustering - just can't get it going

    Ah, I do have more information though:

    DIAGNOSTICS

    ===========

    nodes in question: ['RMQ1']

    hosts, their running nodes and ports:

    - unable to connect to epmd on RMQ1: nxdomain (non-existing domain)

    current node details:

    - node name: 'rabbitmqctl1577@RMQ2'

    - home dir: /var/lib/rabbitmq

    - cookie hash: ohQKEF09peb6bAgNqawvKA==

    And just to be clear, the cookie is the same:

    *01*:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie

    a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

    *02*:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie

    a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

    Somehow, telnet to epmd works just fine, but something that RMQ is
    doing fails to make that happen.  Is there some sort of DNS work
    that it's doing, instead of using the hosts files?

    i.e. one thing I found is that nslookup fails:

    02:~$ nslookup RMQ1

    ;; Got SERVFAIL reply from <ipaddress>, trying next server

    Server:       <ipaddress>

    Address:  <ipaddress>

    ** server can't find RMQ1: SERVFAIL

    But if I ping RMQ1 it works fine. /etc/nsswitch.conf specifies that
    files should be tried first, before DNS w.r.t. hosts.

    So, it looks like RMQ is doing something more rigorous to resolve
    the host, and I don't know how to change that.  I also don't have
    access to the DNS server configuration in order to modify it in any way.

    On 25 September 2013 09:57, Jason McIntosh <[hidden email]
    <mailto:[hidden email]>> wrote:

    Check your erlang cookie on both servers to make sure it matches I
    think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl
    from one machine and see if you can connect to another to list
    queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for
    example.  If both servers can talk to each other then it should be
    rabbitmqctl stop_app, join_cluster, start_app.

    Jason

    On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]
    <mailto:[hidden email]>> wrote:

    Hi,

    I've seen a number of people failing to get clustering running and,
    unfortunately, I can't get it going either.  Here's the summary of
    what I've got:

      * Two nodes - RMQ1 and RMQ2
      * I can ping RMQ1 from RMQ2, and vice versa
      * I can telnet from RMQ1 to RMQ2:epmd, and vice versa
      * I can telnet from RMQ1 to RMQ2:amqp, and vice versa
      * The cookie file is identical, as is clear from the startup INFO

    My goal is to have RMQ2 join RMQ1 in a cluster.

    The servers are started using the init script in Ubuntu (i.e.
    service rabbitmq-server start).  This is different than the script
    at http://www.rabbitmq.com/clustering.html, which says to start with
    "rabbitmq-server -detached".  I've tried that and it doesn't seem to
    make any difference so I always use the init script instead.

    So, the script says to stop the RMQ2 server and then join the
    cluster.  The following transcript shows how well all this goes:

    02:~$ sudo rabbitmqctl stop_app

    Stopping node 'rabbit@RMQ2' ...

    ...done.

    02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1

    Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...

    Error: {cannot_discover_cluster,"The nodes provided are either
    offline or not running"}

    However, as I said above, telnetting to the ports works just fine:

    02:~$ telnet RMQ1 epmd

    Trying <ip address>...

    Connected to RMQ1

    Escape character is '^]'.

    booger!

    Connection closed by foreign host.

    02:~$ telnet RMQ1 amqp

    Trying <ip address>...

    Connected to RMQ1

    Escape character is '^]'.

    booger!

    AMQP Connection closed by foreign host.

    I'm stuck for what else to test.  Does anyone know how to
    troubleshoot this thing further?

    Thanks,

    Derek


    _______________________________________________
    rabbitmq-discuss mailing list
    [hidden email]
    <mailto:[hidden email]>
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



    --
    Jason McIntosh
    http://mcintosh.poetshome.com/blog/
    <a href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612 <tel:<a href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612>


    _______________________________________________
    rabbitmq-discuss mailing list
    [hidden email]
    <mailto:[hidden email]>
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

    ------------------------------------------------------------------------

    This email is from Hostelbookers.com Limited. Registered office:
    52-54 High Holborn, London, WC1V 6RL, UK. Registered in England
    under Company No.: 2841908. This email and any files transmitted
    with it are confidential and may be privileged and are intended
    solely for the use of the individual or entity to whom they are
    addressed. As email can be subject to operational or technical
    difficulties and time delays, communications that are subject to
    deadlines should also be sent by post. Any unauthorised direct or
    indirect dissemination, distribution or copying of this message and
    any attachments is strictly prohibited. If you have received the
    email in error, please notify [hidden email]
    <mailto:[hidden email]>

    ------------------------------------------------------------------------

    _______________________________________________
    rabbitmq-discuss mailing list
    [hidden email]
    <mailto:[hidden email]>
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



--
Simon MacMullen
RabbitMQ, Pivotal


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Derek Wyatt
In reply to this post by Robin Lawrie - HostelBookers
SOLVED.

After getting these things in the DNS and ensuring that /etc/resolv.conf was correct (the dynamic configuration puts them in the wrong domain), things are working.  checking the cluster_status on both machines has things listed properly.

Thanks for the help.

Regs,
Derek


On 25 September 2013 11:19, Robin Lawrie - HostelBookers <[hidden email]> wrote:
Yep, really. I can't recall the error message though but after creating the cluster the web management plugin was enabled again.

-----Original Message-----
From: Simon MacMullen [mailto:[hidden email]]
Sent: 25 September 2013 16:18
To: Discussions about RabbitMQ
Cc: Robin Lawrie - HostelBookers
Subject: Re: [rabbitmq-discuss] Clustering - just can't get it going

On 25/09/13 16:08, Robin Lawrie - HostelBookers wrote:
> The other issue I had in creating a cluster was that it wouldn't
> create if the web management plugin was running/installed on both nodes.

Uh, really?

If you try to create a cluster on a single machine then you need to configure management to open a different port for each node. But clusters where every node runs management are definitely supported (and indeed rather common). What error message did you see?

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
________________________________

This email is from Hostelbookers.com Limited. Registered office: 52-54 High Holborn, London, WC1V 6RL, UK. Registered in England under Company No.: 2841908. This email and any files transmitted with it are confidential and may be privileged and are intended solely for the use of the individual or entity to whom they are addressed. As email can be subject to operational or technical difficulties and time delays, communications that are subject to deadlines should also be sent by post. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the email in error, please notify [hidden email]

________________________________
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

Allan Baker
In reply to this post by Derek Wyatt
Hello Derek

Three things I might recommend given my experience having issues similar to yours.

1) Do be sure that every package that you have, depending on your Linux distribution (or depending on the OS version), is updated to the latest version.
In my case, I apparently had permission problems with Erlang and a segmentation fault. I added the Debian repository
along with the Squeeze repositories, used the latest version and it worked. This only applies to Debian Squeeze (6) and RabbitMQ V 3.1.5

2) Be sure that you don't have the hosts ports restricted by any Firewalling or Wrappers software. Be sure you have no entries
on /etc/hosts.allow and /etc/hosts.deny and that your endpoint (network router/switch) doesn't have any port or network restrictions.

3) Test the connectivity and the ports. You can try doing "telnet hostname port" to see if you can see the ports from one host to the
other. I would particularly recommend testing this with the Rabbit and to simply disconnect.

Let's hope this helps

Regards,
Allan Baker

El 25/09/2013 10:27 a.m., Derek Wyatt escribió:
This is rabbitmq 3.1.5 - I'm not sure what the erlang version is, but the erts version is 5.8.5.  I'm a little new to the whole erlang thing, so I just picked a component with 'e' in it :P

I'm planning to get these machines into the DNS proper to see if that helps.  It would be pretty weird if it works, since everyone else works OK with just /etc/hosts resolution, but it's worth a try.


On 25 September 2013 11:16, Simon MacMullen <[hidden email]> wrote:
You may (depending on Erlang version) need to make sure that each machine can resolve its own hostname too, as well as the other one.

Cheers, Simon

On 25/09/13 16:04, Derek Wyatt wrote:
Damn. That's exactly the setup I have.


On 25 September 2013 10:55, Robin Lawrie - HostelBookers
<[hidden email] <mailto:[hidden email]>>
wrote:

    Hi,

    In my case, I have a 2 node cluster (called cache1 and cache2) and I
    needed to add an entry to the hosts file on both nodes to ensure
    each node can resolve the name of the other node before clustering
    worked for me.

    My hosts file is in /etc and is called hosts

    In there I entered the following:

    On cache1.lon.hosting, enter the line 192.168.3.1 Cache2.domain.com
    <http://Cache2.domain.com> Cache2

    On cache2.lon.hosting, enter the line 192.168.3.0 Cache1.domain.com
    <http://Cache1.domain.com> Cache1

    Once done, I needed to confirm I could ping each node using it’s
    hostname from the other node. I don’t care about DNS or nslookup
    working/resolving the name.

    HTH

    Robin

    *From:*[hidden email]
    <mailto:[hidden email]>
    [mailto:[hidden email]
    <mailto:[hidden email]>] *On Behalf Of
    *Derek Wyatt
    *Sent:* 25 September 2013 15:47
    *To:* Discussions about RabbitMQ
    *Subject:* Re: [rabbitmq-discuss] Clustering - just can't get it going

    Ah, I do have more information though:

    DIAGNOSTICS

    ===========

    nodes in question: ['RMQ1']

    hosts, their running nodes and ports:

    - unable to connect to epmd on RMQ1: nxdomain (non-existing domain)

    current node details:

    - node name: 'rabbitmqctl1577@RMQ2'

    - home dir: /var/lib/rabbitmq

    - cookie hash: ohQKEF09peb6bAgNqawvKA==

    And just to be clear, the cookie is the same:

    *01*:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie

    a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

    *02*:~$ sudo md5sum /var/lib/rabbitmq/.erlang.cookie

    a2140a105d3da5e6fa6c080da9ac2f28  /var/lib/rabbitmq/.erlang.cookie

    Somehow, telnet to epmd works just fine, but something that RMQ is
    doing fails to make that happen.  Is there some sort of DNS work
    that it's doing, instead of using the hosts files?

    i.e. one thing I found is that nslookup fails:

    02:~$ nslookup RMQ1

    ;; Got SERVFAIL reply from <ipaddress>, trying next server

    Server:       <ipaddress>

    Address:  <ipaddress>

    ** server can't find RMQ1: SERVFAIL

    But if I ping RMQ1 it works fine. /etc/nsswitch.conf specifies that
    files should be tried first, before DNS w.r.t. hosts.

    So, it looks like RMQ is doing something more rigorous to resolve
    the host, and I don't know how to change that.  I also don't have
    access to the DNS server configuration in order to modify it in any way.

    On 25 September 2013 09:57, Jason McIntosh <[hidden email]
    <mailto:[hidden email]>> wrote:

    Check your erlang cookie on both servers to make sure it matches I
    think it's in - /var/lib/rabbitmq/ - then you can use rabbitmqctl
    from one machine and see if you can connect to another to list
    queues.  I THINK that's rabbitmqctl -n <servernode> list_queues for
    example.  If both servers can talk to each other then it should be
    rabbitmqctl stop_app, join_cluster, start_app.

    Jason

    On Wed, Sep 25, 2013 at 8:50 AM, Derek Wyatt <[hidden email]
    <mailto:[hidden email]>> wrote:

    Hi,

    I've seen a number of people failing to get clustering running and,
    unfortunately, I can't get it going either.  Here's the summary of
    what I've got:

      * Two nodes - RMQ1 and RMQ2
      * I can ping RMQ1 from RMQ2, and vice versa
      * I can telnet from RMQ1 to RMQ2:epmd, and vice versa
      * I can telnet from RMQ1 to RMQ2:amqp, and vice versa
      * The cookie file is identical, as is clear from the startup INFO

    My goal is to have RMQ2 join RMQ1 in a cluster.

    The servers are started using the init script in Ubuntu (i.e.
    service rabbitmq-server start).  This is different than the script
    at http://www.rabbitmq.com/clustering.html, which says to start with
    "rabbitmq-server -detached".  I've tried that and it doesn't seem to
    make any difference so I always use the init script instead.

    So, the script says to stop the RMQ2 server and then join the
    cluster.  The following transcript shows how well all this goes:

    02:~$ sudo rabbitmqctl stop_app

    Stopping node 'rabbit@RMQ2' ...

    ...done.

    02:~$ sudo rabbitmqctl join_cluster --ram rabbit@RMQ1

    Clustering node 'rabbit@RMQ2' with 'rabbit@RMQ1' ...

    Error: {cannot_discover_cluster,"The nodes provided are either
    offline or not running"}

    However, as I said above, telnetting to the ports works just fine:

    02:~$ telnet RMQ1 epmd

    Trying <ip address>...

    Connected to RMQ1

    Escape character is '^]'.

    booger!

    Connection closed by foreign host.

    02:~$ telnet RMQ1 amqp

    Trying <ip address>...

    Connected to RMQ1

    Escape character is '^]'.

    booger!

    AMQP Connection closed by foreign host.

    I'm stuck for what else to test.  Does anyone know how to
    troubleshoot this thing further?

    Thanks,

    Derek


    _______________________________________________
    rabbitmq-discuss mailing list
    [hidden email]
    <mailto:[hidden email]>
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



    --
    Jason McIntosh
    http://mcintosh.poetshome.com/blog/
    <a moz-do-not-send="true" href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612 <tel:<a moz-do-not-send="true" href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612>


    _______________________________________________
    rabbitmq-discuss mailing list
    [hidden email]
    <mailto:[hidden email]>
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

    ------------------------------------------------------------------------

    This email is from Hostelbookers.com Limited. Registered office:
    52-54 High Holborn, London, WC1V 6RL, UK. Registered in England
    under Company No.: 2841908. This email and any files transmitted
    with it are confidential and may be privileged and are intended
    solely for the use of the individual or entity to whom they are
    addressed. As email can be subject to operational or technical
    difficulties and time delays, communications that are subject to
    deadlines should also be sent by post. Any unauthorised direct or
    indirect dissemination, distribution or copying of this message and
    any attachments is strictly prohibited. If you have received the
    email in error, please notify [hidden email]
    <mailto:[hidden email]>

    ------------------------------------------------------------------------

    _______________________________________________
    rabbitmq-discuss mailing list
    [hidden email]
    <mailto:[hidden email]>
    https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



--
Simon MacMullen
RabbitMQ, Pivotal



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clustering - just can't get it going

peter.slovak
This post has NOT been accepted by the mailing list yet.
In reply to this post by Derek Wyatt
I know this is an old thread, but the problem resurfaced in my setup too. Rabbitmq - or better said, Erlang - has its own DNS resolver that apart from the resolving itself consults and monitors "standard" Unix files like /etc/hosts and /etc/resolv.conf. So i couldn't understand why it doesn't take into acoount the static records I've put inside /etc/hosts.

Then I realized that rabbitmq was running under the "rabbit" user and my misfortunate /etc/hosts had mode 600 instead of 644 - so the Erlang resolver couldn't even peek into the file. Changing the mode and re-clustering (stop_app, join_cluster etc.) solved the issue for me, no external DNS needed.
Loading...