Pages

Friday, December 3, 2010

ICMP Ping in Erlang, part 2

I've covered sending ICMP packets from Erlang using BSD raw sockets and Linux's PF_PACKET socket option.

gen_icmp tries to be a simple interface for ICMP sockets using the BSD raw socket interface for portability. It should work on both Linux and BSD's (I've tested on Ubuntu and Mac OS X).

Sending Ping's

To ping a host:
1> gen_icmp:ping("erlang.org").
[{ok,{193,180,168,20},
     {{33786,0,129305},
      <<" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK">>}}]
The response is a list of 3-tuples. The third element is a 2-tuple holding the ICMP echo request ID, the sequence number, the elapsed time and the payload.

A bad response looks like:
2> gen_icmp:ping("192.168.213.4").
[{{error,host_unreachable},
  {192,168,213,4},
  {{34491,0},
   <<69,0,0,84,0,0,64,0,64,1,14,220,192,168,213,119,192,
     168,213,4,8,0,196,...>>}}]
The argument to gen_icmp:ping/1 takes either a string or a list of strings. For example, to ping every host on a /24 network:
1> gen_icmp:ping([ {192,168,213,N} || N <- lists:seq(1,254) ]).
[{{error,host_unreachable},
  {192,168,213,254},
  {{54370,0},
   <<69,0,0,84,0,0,64,0,64,1,13,226,192,168,213,119,192,
     168,213,254,8,0,82,...>>}},
 {{error,host_unreachable},
  {192,168,213,190},
  {{54370,0},
   <<69,0,0,84,0,0,64,0,64,1,14,34,192,168,213,119,192,168,
     213,190,8,0,...>>}},
gen_icmp:ping/1 takes care of opening and closing the raw socket. This operation is somewhat expensive because Erlang is spawning a setuid executable to get the socket. If you'll be doing a lot of ping's, it's better to keep the socket around and use ping/3:
1> {ok,Socket} = gen_icmp:open().
{ok,<0.308.0>}

2> gen_icmp:ping(Socket, ["www.yahoo.com", "erlang.org", {192,168,213,1}],
    [{id, 123}, {sequence, 0}, {timeout, 5000}]).
[{ok,{193,180,168,20},
     {{123,0,126270},
      <<" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK">>}},
     {ok,{69,147,125,65},
     {{123,0,29377},
      <<" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK">>}},
     {ok,{192,168,213,1},
     {{123,0,3586},
      <<" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK">>}}]

3> gen_icmp:close(Socket).
ok

Creating Other ICMP Packet Types


ICMP destination unreachable and time exceeded packets return at least the first 64 bits of the header and payload of the original packet. Here is an example of generating an ICMP port unreachable for a fake TCP packet sent to port 80.
-module(icmperr).
-export([unreachable/3]).
-include("epcap_net.hrl").

-define(IPV4HDRLEN, 20).
-define(TCPHDRLEN, 20).

unreachable(Saddr, Daddr, Data) ->
    {ok, Socket} = gen_icmp:open(),

    IP = #ipv4{
        p = ?IPPROTO_TCP,
        len = ?IPV4HDRLEN + ?TCPHDRLEN + byte_size(Data),
        saddr = Daddr,
        daddr = Saddr
    },

    TCP = #tcp{
        dport = 80,
        sport = crypto:rand_uniform(0, 16#FFFF),
        seqno = crypto:rand_uniform(0, 16#FFFF),
        syn = 1
    },

    IPsum = epcap_net:makesum(IP),
    TCPsum = epcap_net:makesum([IP, TCP, Data]),

    Packet = <<
        (epcap_net:ipv4(IP#ipv4{sum = IPsum}))/bits,
        (epcap_net:tcp(TCP#tcp{sum = TCPsum}))/bits,
        Data/bits
        >>,

    ICMP = gen_icmp:packet([
        {type, ?ICMP_DEST_UNREACH},
        {code, ?ICMP_UNREACH_PORT}
    ], Packet),

    ok = gen_icmp:send(Socket, Daddr, ICMP),
    gen_icmp:close(Socket).

To create the IPv4 and TCP headers, we make the protocol records and use the epcap_net module functions to encode the headers with the proper checksums. For creating the ICMP packet, we use the gen_icmp:packet/2 function (which again simply calls epcap_net).


ICMP Ping Tunnel


We can tunnel any data we like in the payload of an ICMP packet. In this example, we'll use ICMP echo requests to tunnel an ssh connection between 2 hosts. The ICMP echo replies sent back by the peer OS ensure the data was received, like the ACK in a TCP connection.

The tunnel exports 2 functions:

  • ptun:server(ClientAddress, LocalPort) -> void()

    Types   ClientAddress = tuple()
                LocalPort = 0..65534
    
        ClientAddress is the IPv4 address of the peer represented as a tuple.
    
        The server listens on LocalPort for TCP connections and will close
        the port after a TCP client connects.  Data received on this port
        will be sent to the peer as the payload of the ICMP packets.
    


  • ptun:client(ServerAddress, LocalPort) -> void()

    Types   ServerAddress = tuple()
                LocalPort = 0..65534
    
        ServerAddress is the IPv4 address of the peer.
    
        When the client receives an ICMP echo request, the client opens a
        TCP connection to the LocalPort on localhost, proxying the data.
    
To start the tunnel, you'll need 2 hosts. In this example, 192.168.213.7 is the client and 192.168.213.119 is the server. 192.168.213.7 forwards any tunnelled data it receives to a local SSH server. On 192.168.213.7:
1> ptun:client({192,168,213,119}, 22).
On 192.168.213.119:
1> ptun:server({192,168,213,7}, 8787).
Open another shell and start an SSH connection to port 8787 on localhost:
ssh -p 8787 127.0.0.1
<...>
$ ifconfig eth0 | awk '/inet addr/{print $2}'
addr:192.168.213.7

20 comments:

  1. Here you implemented ping function using gen_udp:send function. If I understand right it's impossible to change TTL value of IP packet with gen_udp. I wonder how the it could be done. I try to understand how traceroute function coul be written in Erlang. Is it possible for you to show an example of Erlang traceroute implementation?

    ReplyDelete
  2. The use of gen_udp is a bit misleading. It's only being used to read/write to the socket. We're constucting the full packet (IPv4 header, ICMP header) by hand.

    So to set the TTL, just set the value in the IPv4 header. For example, to construct an IPv4 header with a TTL set to 1 using pkt/epcap_net:

    IP = #ipv4{ ttl = 1 }.

    See the record definition for IPv4 headers here:

    https://github.com/msantos/pkt/blob/master/include/pkt.hrl#L134

    Are you interested in using UDP or ICMP probes for the traceroute? I think adding a traceroute to gen_icmp is a great idea. I'll see if I can put something together.

    ReplyDelete
  3. As you know sometimes we need ICMP and sometimes UDP version of traceroute. It would really cool to have both of them implemented :)

    And what about IPv6? ))

    ReplyDelete
  4. I still can't find where it constructs IPv4 header in gen_icmp module. I found only packet/2 function where it prepares ICMP packet.

    Supposing if I want to change TTL for some reason in ping packet where exactly I have to change #ipv4{ ttl = 1 } value?

    ReplyDelete
  5. And there is another question from me :) I tried to compile and run gen_icmp:ping("google.com") in Erlang shell (Ubuntu Linux), and it seems that it only works under root privileges otherwise it throws an exception {badmatch, {error, eperm}} in function gen_icmp:init/1.

    Could it be somehow corrected?

    ReplyDelete
  6. Hey there!

    1. Sure, having UDP and ICMP probes would be nice and wouldn't be much more work. Which would you want working first though?

    I'm planning on adding IPv6 support. The library gen_icmp uses to interact with sockets (procket) was updated a few months ago to support IPV6, so sending IPv6 ICMP packets should work from Erlang if you call procket directly.

    2. Sorry, I forgot gen_icmp uses the raw socket interface. The IP header is added by the OS in this case. This behaviour can be changed by using the HDRINCL socket option, something like (untested):

    -define(IP_HDRINCL, 3).

    {ok, Socket} = procket:open(0, [{protocol, icmp}, {type, raw}, {family, inet}]),
    ok = procket:setsockopt(Socket, ?IPPROTO_IP, ?IP_HDRINCL, <<1:32/native>>),

    Packet = list_to_binary([
    #ipv4{ ttl = 1},
    ...
    ]).

    3. So under Linux, you can either give beam the CAP_NET_RAW capability or use sudo to run the procket helper binary (or make it setuid).

    The gen_icmp and procket READMEs have instructions to set it up:

    https://github.com/msantos/gen_icmp/blob/master/README.md

    https://github.com/msantos/procket/blob/master/README.md

    If any of that is unclear, please let me know! Hope this helps!

    ReplyDelete
  7. 1. Pure logic is prompting that in gen_icmp module it's cool to have an ICMP traceroute implementation at first hand. :) And then UDP version :)

    2. I'll try it out. Thanks for a hint.

    3. I ran following commands:

    $ sudo setcap 'cap_net_raw=ep' /usr/local/lib/erlang/erts-5.8.4/bin/beam.smp

    $ getcap /usr/local/lib/erlang/erts-5.8.4/bin/beam.smp
    /usr/local/lib/erlang/erts-5.8.4/bin/beam.smp = cap_net_raw+ep

    and still have that:

    Eshell V5.8.4 (abort with ^G)
    1> gen_icmp:ping("google.com").
    ** exception exit: {badmatch,{error,eperm}}
    in function gen_icmp:init/1
    in call from gen_server:init_it/6
    in call from proc_lib:init_p_do_apply/3

    I use Ubuntu Linux 11.04 and configured by default Erlang.

    May be I should set cap_net_raw permissions to some other files?

    ReplyDelete
  8. For using gen_icmp with the capability set, you have to use the long form (ping/3):

    {ok, Socket} = gen_icmp:open([{setuid,false}], []),
    gen_icmp:ping(Socket, ["www.google.com"], []).

    If you want to use ping/1, set up sudo:

    sudo visudo
    ALL = NOPASSWD: /path/to/procket/priv/procket

    ReplyDelete
  9. It didn't help. Now I have the same error in gen_icmp:open/2 call.


    Erlang R14B03 (erts-5.8.4) [source] [64-bit] [rq:1] [async-threads:0] [kernel-poll:false]

    Eshell V5.8.4 (abort with ^G)
    1> {ok, Socket} = gen_icmp:open([{setuid,false}], []).
    ** exception exit: {badmatch,{error,eperm}}
    in function gen_icmp:init/1
    in call from gen_server:init_it/6
    in call from proc_lib:init_p_do_apply/3

    ReplyDelete
  10. Looks like you setcap the smp beam but are running the non-smp beam. So either:

    erl -smp

    or

    sudo setcap 'cap_net_raw=ep' /usr/local/lib/erlang/erts-5.8.4/bin/beam

    ReplyDelete
  11. Michael, thak you very much for your help! :) Now it works just fine.

    So now I'm really looking forward for having traceroute implemented in gen_icmp library :)

    ReplyDelete
  12. Hi, I'm thinking about writing Erlang based ipv4/ipv6, router with firewall, connection tracking (icmp/udp/tcp), NAT44, NAT66, NAT46, sit tunnels, and few other things. I was wondering what performance overheads your libraries and interfaces incure compared to native kernel performance? Will it scale to multi-core? Is it asynchronous or synchronous? What are advantages / disanvantages of nif vs ports here?

    Thanks in advance. Lots of cool and good work you have done!

    ReplyDelete
  13. Hey Witold!

    That sounds like an awesome project!

    > I was wondering what performance overheads your libraries and interfaces
    > incure compared to native kernel performance?

    Hard to say but same problems and (maybe) similar performance as other network stacks running in user space like OpenVPN or Qemu/KVM (both of these use tun/tap).

    Here are some old iperf results. The test was first run directly to the server (192.168.213.7) over 802.11 and then across a tunnel set up between the client and the server (10.11.11.2).

    The tunnel runs over the Erlang distribution protocol. Here's the code:

    https://gist.github.com/1020605

    It uses tunctl to configure the interfaces:

    https://github.com/msantos/tunctl

    No idea what these results demonstrate, except maybe that the tunnel doesn't crash under this amount of load.

    [Run 1]

    [client to server over 802.11]
    $ iperf -c 192.168.213.7
    ------------------------------------------------------------
    Client connecting to 192.168.213.7, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [ 3] local 192.168.213.119 port 37025 connected with 192.168.213.7 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-10.3 sec 8.75 MBytes 7.14 Mbits/sec

    [client to server tunnelled over distribution]
    $ iperf -c 10.11.11.2
    ------------------------------------------------------------
    Client connecting to 10.11.11.2, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [ 3] local 10.11.11.1 port 58447 connected with 10.11.11.2 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-10.5 sec 7.07 MBytes 5.66 Mbits/sec


    [Run 2]

    [client to server over 802.11]
    $ iperf -c 192.168.213.7
    ------------------------------------------------------------
    Client connecting to 192.168.213.7, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [ 3] local 192.168.213.119 port 40688 connected with 192.168.213.7 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-10.0 sec 13.5 MBytes 11.3 Mbits/sec

    [client to server tunnelled over distribution]
    $ iperf -c 10.11.11.2
    ------------------------------------------------------------
    Client connecting to 10.11.11.2, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [ 3] local 10.11.11.1 port 35319 connected with 10.11.11.2 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-10.1 sec 10.6 MBytes 8.83 Mbits/sec

    > Will it scale to multi-core?

    Depends on your code. Most operations involve doing sequential reads/writes from a file descriptor. This will run on only one core.

    But, for example, say you were writing a firewall. One process could read packets from the network. You could compile groups of firewall rules to Erlang modules and run each set of rules as a filter process. These processes could run on other cores or on other nodes using distribution.

    I played with something similar for snort rules and it seemed to work ok.

    > Is it asynchronous or synchronous?

    Has to be async, otherwise the Erlang scheduler will be blocked. For simplicity, some of the interfaces presented to the caller are blocking though. Something along the lines of:

    read(Socket) ->
    case procket:read(Socket, 16#FFFF) of
    {ok, Data} -> {ok, Data};
    {error, eagain} ->
    timer:sleep(10),
    read(Socket);
    Error -> Error
    end.

    > What are advantages / disanvantages of nif vs ports here?

    Either would work really. Using a port running as a Unix process would involve some overhead passing the fd back and forth to beam over a Unix socket.

    ReplyDelete
  14. hello Michael,

    is it suppose to build using :

    Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:16:16] [rq:16] [async-threads:0] [kernel-poll:false]

    ReplyDelete
  15. Hey Ben!

    Yes, it should work with R14B04. What errors are you getting? (And 16 cores ... nice! :)

    ReplyDelete
  16. here it is :
    git@gist.github.com:20952a23e9c392fe3f04.git

    ReplyDelete
  17. Hey Ben, it looks as if your shell is picking up an older version of Erlang (/usr/lib/erlang/erts-5.7.4/include/erl_nif_api_funcs.h). Try putting the path to the R14B04 erl at the front of your PATH.

    If you still get a compile error, try updating procket from git.

    Hope that works for you and be sure to let me know if you have any questions!

    ReplyDelete
  18. thx , that fixed the build , now getting a runtime issue :
    see same gist for4an update :
    git@gist.github.com:20952a23e9c392fe3f04.git

    ReplyDelete
  19. From this:

    =ERROR REPORT==== 12-Dec-2011::23:04:34 ===
    Error in process <0.45.0> with exit value: {undef,[{pkt,icmp,[{icmp,8,0,0,123,0,{127,0,0,1},<<4 bytes>>,0,0,0,0,0}]},{gen_icmp,packet,2},{gen_icmp,'-ping/3-fun-0-',5}]}

    It looks as if you don't have the pkt library. It should have been downloaded as a dependency by rebar (to the deps directory).

    Try doing a make clean; make in gen_icmp or do a clone of pkt in deps.

    ReplyDelete