Pages

Sunday, July 4, 2010

DNS Programming with Erlang

I have this strange fascination with DNS. By which I mean loathing. Yet somehow I've already written 3 small DNS servers in Erlang:
  • emdns: An unfinished multicast DNS server with unspecified yet no doubt awesome features. Someday I'll finish it. Maybe.
  • spood: A strange, little program; a spoofing DNS proxy that will send out your DNS requests from somebody else's IP address and sniff the responses. Maybe (if you're somewhat sketchy) you could use it to hide your DNS lookups. Maybe, you could use it to ramp up your DNS requests on networks that throttle them down. Not that I would do any of that.
  • seds: a DNS server that tunnels TCP/IP. I'm typing this blog over a DNS tunnel right now, stress testing it (with my blazing fast ASCII input) and trying to make seds crash. Also stress testing my patience.
Since the programmatic interfaces to DNS in Erlang are mostly undocumented, I thought I'd go over them briefly. So I'll remember how to use them if I ever finish emdns. I figured out how they worked mainly by reading the source and dumping DNS packets to see the record structures. The DNS parsing functions are kept in lib/kernel/src/inet_dns.erl. Pretty much the only functions that you will need from this module are encode/1 and decode/1. The tricky part is passing in the appropriate data structures.
  • decode/1 takes a binary and returns a #dns_rec{} record or {error, fmt} if the DNS payload cannot be decoded
  • encode/1 as you might expect, does the inverse, taking an appropriate record and returning a binary
The record structure is defined in lib/kernel/src/inet_dns.hrl.
-record(dns_rec,
    {
        header,       %% dns_header record
        qdlist = [],  %% list of question entries
        anlist = [],  %% list of answer entries
        nslist = [],  %% list of authority entries
        arlist = []   %% list of resource entries
    }).
  • The DNS header is another record:
    -record(dns_header,
        {
         id = 0,       %% ushort query identification number
         %% byte F0
         qr = 0,       %% :1   response flag
         opcode = 0,   %% :4   purpose of message
         aa = 0,       %% :1   authoritive answer
         tc = 0,       %% :1   truncated message
         rd = 0,       %% :1   recursion desired
         %% byte F1
         ra = 0,       %% :1   recursion available
         pr = 0,       %% :1   primary server required (non standard)
                       %% :2   unused bits
         rcode = 0     %% :4   response code
        }).
    
    While the defaults are initialized to small integers, inet_dns replaces them with atoms. So, the 1 bit values are either the atoms 'true' or 'false' and the opcode is set to an atom, for example, 'query'. Both integers and the atom representations are usually accepted by the functions though.

  • qdlist is a list of DNS query records:
    -record(dns_query,
        {
         domain,     %% query domain
         type,        %% query type
         class      %% query class
         }).
    
    • domain is a string representing the domain name, e.g., "foo.bar.example.com"
    • type is an atom describing the DNS type: a, cname, txt, null, srv, ns, ...
    • class will most commonly be 'in' (Internet), though multicast DNS uses "cache flush" (32769) for some operations

Making a valid Erlang DNS query would look something like:
-module(dns).
-compile(export_all).

-include_lib("kernel/src/inet_dns.hrl").

q(Domain, NS) ->
    Query = inet_dns:encode(
        #dns_rec{
            header = #dns_header{
                id = crypto:rand_uniform(1,16#FFFF),
                opcode = 'query',
                rd = true
            },
            qdlist = [#dns_query{
                domain = Domain,
                type = a,
                class = in
            }]
        }),
    {ok, Socket} = gen_udp:open(0, [binary, {active, false}]),
    gen_udp:send(Socket, NS, 53, Query),
    {ok, {NS, 53, Reply}} = gen_udp:recv(Socket, 65535),
    inet_dns:decode(Reply).
I enabled recursion because the request will be going through the one of the public Google nameservers (8.8.8.8) instead of going directly through the authoritative nameserver.

Testing the results:
$ erl
Erlang R14A (erts-5.8) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8  (abort with ^G)
1> {ok, Q} = dns:q("listincomprehension.com", {8,8,8,8}).
{ok,{dns_rec,{dns_header,7296,true,'query',false,false,
                         true,true,false,0},
             [{dns_query,"listincomprehension.com",a,in}],
             [{dns_rr,"listincomprehension.com",a,in,0,656,
                      {216,239,32,21},                      undefined,[],false},
              {dns_rr,"listincomprehension.com",a,in,0,656,
                      {216,239,34,21},
                      undefined,[],false},
              {dns_rr,"listincomprehension.com",a,in,0,656,
                      {216,239,36,21},
                      undefined,[],false},
              {dns_rr,"listincomprehension.com",a,in,0,656,
                      {216,239,38,21},
                      undefined,[],false}],
             [],[]}}
2> rr("/usr/local/lib/erlang/lib/kernel-2.14/src/inet_dns.hrl").
[dns_header,dns_query,dns_rec,dns_rr,dns_rr_opt]
3> Q.
#dns_rec{header = #dns_header{id = 7296,qr = true,
                              opcode = 'query',aa = false,tc = false,rd = true,ra = true,
                              pr = false,rcode = 0},
         qdlist = [#dns_query{domain = "listincomprehension.com",
                              type = a,class = in}],
         anlist = [#dns_rr{domain = "listincomprehension.com",
                           type = a,class = in,cnt = 0,ttl = 656,
                           data = {216,239,32,21},
                           tm = undefined,bm = [],func = false},
                   #dns_rr{domain = "listincomprehension.com",type = a,
                           class = in,cnt = 0,ttl = 656,
                           data = {216,239,34,21},
                           tm = undefined,bm = [],func = false},
                   #dns_rr{domain = "listincomprehension.com",type = a,
                           class = in,cnt = 0,ttl = 656,
                           data = {216,239,36,21},
                           tm = undefined,bm = [],func = false},
                   #dns_rr{domain = "listincomprehension.com",type = a,
                           class = in,cnt = 0,ttl = 656,
                           data = {216,239,38,21},
                           tm = undefined,bm = [],func = false}],
         nslist = [],arlist = []}
The records are displayed as tuples. You can pretty print the records by using the shell rr() command to include the header file wherever it is on your system.

The query returned the same packet we sent with some changes to the header:
  • The response flag (qr) is set to true
  • The recursion available flag (ra) is also set to true
The answer to our query is a list bound to the anlist record atom. The #dns_rr{} record looks like:
-record(dns_rr,
    {
     domain = "",   %% resource domain
     type = any,    %% resource type
     class = in,    %% reource class
     cnt = 0,       %% access count
     ttl = 0,       %% time to live
     data = [],     %% raw data
      %%  
     tm,            %% creation time
         bm = [],       %% Bitmap storing domain character case information.
         func = false   %% Optional function calculating the data field.
    }).
The data field is interesting. Although it's initialized as an empty list, the data structure bound to it depends on the DNS record type. For example, from the ones I remember:
  • A: tuple representing the IP address
  • TXT: a list of strings
  • NULL: a binary
  • CNAME: a domain name string appropriately "labelled" (canonicalized by the "."'s), e.g., "ghs.google.com". inet_dns takes care of breaking the domain name into the appropriate, compressed domain name -- a weird form where the "."'s are replaced by nulls and each component is prefaced by a length or a pointer redirecting to another field (hence the compression).

Pattern Matching

The cool thing is that, since the DNS records are nested records, its very easy to pattern match on the results. Modifying the example above:
-module(dns1).
-compile(export_all).

-include_lib("kernel/src/inet_dns.hrl").

q(Type, Domain, NS) ->
    Query = inet_dns:encode(
        #dns_rec{
            header = #dns_header{
                id = crypto:rand_uniform(1,16#FFFF),
                opcode = 'query',
                rd = true
            },
            qdlist = [#dns_query{
                    domain = Domain,
                    type = Type,
                    class = in
                }]
        }),
    {ok, Socket} = gen_udp:open(0, [binary, {active, true}]),
    gen_udp:send(Socket, NS, 53, Query),
    loop(Socket, Type, Domain, NS).


loop(Socket, Type, Domain, NS) ->
    receive
        {udp, Socket, NS, _, Packet} ->
            {ok, Response} = inet_dns:decode(Packet),
            match(Type, Domain, Response)
    end.

match(a, Domain, #dns_rec{
        header = #dns_header{
            qr = true,
            opcode = 'query'
        },
        qdlist = [#dns_query{
                domain = Domain,
                type = a,
                class = in
            }],
        anlist = [#dns_rr{
                domain = Domain,
                type = a,
                class = in,
                data = {IP1, IP2, IP3, IP4}
            }|_]}) ->
    {a, Domain, {IP1,IP2,IP3,IP4}};
match(cname, Domain, #dns_rec{
        header = #dns_header{
            qr = true,
            opcode = 'query'
        },
        qdlist = [#dns_query{
                domain = Domain,
                type = cname,
                class = in
            }],
        anlist = [#dns_rr{
                domain = Domain,
                type = cname,
                class = in,
                data = Data
            }|_]}) ->
    {cname, Domain, Data}.

And the results:

$ erl
Erlang R14A (erts-5.8) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8 (abort with ^G)
1> dns1:q(cname, "blog.listincomprehension.com", {8,8,8,8}).
{cname,"blog.listincomprehension.com","ghs.google.com"}

Thursday, July 1, 2010

Fun with Raw Sockets in Erlang: Finding MAC and IP Addresses

(See the update for versions of some of these functions in standard Erlang).

When working with PF_PACKET raw sockets, the caller needs to provide the source/destination MAC and IP addresses.
Playing with a spoofing DNS proxy, I got tired of hardcoding the addresses, then WTF'ing every time I switched networks. So I added some functions to procket to lookup the system network interface and its MAC and IP addresses.

Retrieving the MAC Address of an Interface


Under Linux, getting the MAC address of an interface involves calling an ioctl() with the request set to SIOCGIFHWADDR and passing in a struct ifreq.

Here is the code to do so in C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/ioctl.h>

#include <net/if.h>
#include <netinet/ether.h>


int
main(int argc, char *argv[])
{
    int s = -1;

    struct ifreq ifr = {0};
    char *dev = NULL;
    struct sockaddr *sa;

    dev = strdup((argc == 2 ? argv[1] : "eth0"));

    if (dev == NULL)
        err(EXIT_FAILURE, "strdup");

    if ( (s = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
        err(EXIT_FAILURE, "socket");

    (void)memcpy(ifr.ifr_name, dev, sizeof(ifr.ifr_name)-1);

    if (ioctl(s, SIOCGIFHWADDR, &ifr) < 0)
        err(EXIT_FAILURE, "ioctl");

    sa = (struct sockaddr *)&ifr.ifr_hwaddr;

    (void)printf("%02x:%02x:%02x:%02x:%02x:%02x\n",
            sa->sa_data[0], sa->sa_data[1], sa->sa_data[2], sa->sa_data[3],
            sa->sa_data[4], sa->sa_data[5]);

    free(dev);

    exit (EXIT_SUCCESS);

}

The equivalent in Erlang uses procket:ioctl/2
macaddress(Socket, Dev) ->
    {ok, <<_Ifname:16/bytes,
        ?PF_INET:16,                       % family
        SM1,SM2,SM3,SM4,SM5,SM6,    % mac address
        _/binary>>} = procket:ioctl(Socket,
        ?SIOCGIFHWADDR,
        list_to_binary([
                Dev, <<0:((15*8) - (length(Dev)*8)), 0:8, 0:128>>
            ])),
    {SM1,SM2,SM3,SM4,SM5,SM6}.
Results may differ depending on the endian-ness of your platform.

Retrieving the IP Address of an Interface


The IP address of an interface can be obtained by another ioctl() with a request value of SIOCGIFADDR. In C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/ioctl.h>

#include <netinet/in.h>
#include <arpa/inet.h>

#include <net/if.h>


int
main(int argc, char *argv[])
{
    int s = -1;

    struct ifreq ifr = {0};
    char *dev = NULL;
    struct sockaddr_in *sa;

    dev = strdup((argc == 2 ? argv[1] : "eth0"));

    if (dev == NULL)
        err(EXIT_FAILURE, "strdup");

    if ( (s = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
        err(EXIT_FAILURE, "socket");

    (void)memcpy(ifr.ifr_name, dev, sizeof(ifr.ifr_name)-1);
    ifr.ifr_addr.sa_family = PF_INET;

    if (ioctl(s, SIOCGIFADDR, &ifr) < 0)
        err(EXIT_FAILURE, "ioctl");

    sa = (struct sockaddr_in *)&ifr.ifr_hwaddr;

    (void)printf("%s\n", inet_ntoa(sa->sin_addr));

    free(dev);

    exit (EXIT_SUCCESS);

}

And the Erlang version:
ipv4address(Socket, Dev) ->
    {ok, <<_Ifname:16/bytes,
        ?PF_INET:16/native, % sin_family
        _:16,               % sin_port 
        SA1,SA2,SA3,SA4,    % sin_addr
        _/binary>>} = procket:ioctl(Socket,
            ?SIOCGIFADDR,
            list_to_binary([
                Dev, <<0:((15*8) - (length(Dev)*8)), 0:8>>,
                <<?PF_INET:16/native,       % family
                0:112>>
            ])),
    {SA1,SA2,SA3,SA4}.

Looking Up an IP Address in the ARP Cache


ARP cache lookups can be done by using:
ioctl(socket, SIOCGARP, struct arpreq);
But utilities on Linux just seem to parse /proc/net/arp:
arplookup({IP1,IP2,IP3,IP4}) ->
    {ok, FD} = file:open("/proc/net/arp", [read,raw]),
    arploop(FD, inet_parse:ntoa({IP1,IP2,IP3,IP4})).

arploop(FD, Address) ->
    case file:read_line(FD) of
        eof ->
            file:close(FD),
            not_found;
        {ok, Line} ->
            case lists:prefix(Address, Line) of
                true ->
                    file:close(FD),
                    M = string:tokens(
                        lists:nth(?HWADDR_OFF, string:tokens(Line, " \n")), ":"),
                    list_to_tuple([ erlang:list_to_integer(E, 16) || E <- M ]);
                false -> arploop(FD, Address)
            end
    end.

Getting a List of Interfaces


To get the list of interfaces on a system, yet another ioctl() is used, this time passing in SIOCGIFCONF and this structure as arguments:
struct ifconf
{
    int ifc_len;            /* Size of buffer.  */
    union
    {
        __caddr_t ifcu_buf;
        struct ifreq *ifcu_req;
    } ifc_ifcu;
};
Here is an example of retrieving the interface list in C.

The ioctl() takes, as an argument, a structure using a length and a pointer to a buffer. procket doesn't have a way of allocating a piece of memory though it could be modified to have an NIF that allocates a binary and returns the address of the binary as an integer. Functions could then pass in the memory address but a buggy piece of Erlang code might pass in the wrong value and crash the VM. (Edit: The erl_nif interface in Erlang R14A supports safely passing a reference to a block of memory between functions using enif_alloc_resource() to create a "Resource Object".)

Instead, I simply parse the output of /proc/net/dev. For example, here is the output on my laptop:

$ cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:  526441    3620    0    0    0     0          0         0   526441    3620    0    0    0     0       0          0
  eth0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
 wifi0:10960093   26897    0    0    0  1578          0         0   734661    4876    0    0    0     0       0          0
  ath0:14422892   12536    0    0    0     0          0         0   576599    4706    0    0    0     0       0          0
Even nastier than the arp cache lookup, since I resorted to using regular expressions.
iflist() ->
    {ok, FD} = file:open("/proc/net/dev", [raw, read]),
    iflistloop(FD, []).

iflistloop(FD, Ifs) ->
    case file:read_line(FD) of
        eof ->
            file:close(FD),
            Ifs;
        {ok, Line} ->
            iflistloop(FD, iflistmatch(Line, Ifs))
    end.

iflistmatch(Data, Ifs) ->
    case re:run(Data, "^\\s*([a-z]+[0-9]+):", [{capture, [1], list}]) of
        nomatch -> Ifs;
        {match, [If]} -> [If|Ifs]
    end.

Finding the Default Interface


In spood, I took the easy way and just sort of guessed. A proper solution would check the routing table. Instead I look for the first interface without a local IP address:
device() ->
    {ok, S} = procket:listen(0, [{protocol, udp}, {family, inet}, {type, dgram}]),
    [Dev|_] = [ If || If <- packet:iflist(), ipcheck(S, If) ],
    procket:close(S),
    Dev.

ipcheck(S, If) ->
    try packet:ipv4address(S, If) of
        {127,_,_,_} -> false;
        {169,_,_,_} -> false;
        _ -> true
    catch
        error:_ -> false
    end.

Update: Using the inet Module

After having gone through all the above, I discovered that the inet module, which is part of the Erlang standard library, is able to retrieve information about the local interfaces.

inet has 2 functions:
  • getiflist/0: retrieve a list of all the local interfaces, e.g., {ok, ["eth0", "eth1"]}
  • ifget/2: retrieve interface attributes. The arguments can be:
    • addr: IP address of interface
    • hwaddr: the MAC address of the interface. Works on Linux, doesn't work on Mac OS X (returns an empty list).
      (Update: I've submitted a patch to get the MAC address on Mac OS X)
    • dstaddr
    • netmask
    • broadcast
    • mtu
    • flags: returns the interface status, e.g., [up, broadcast, running, multicast]

For example:
  • To get a list of interfaces:
    1> inet:getiflist().
    {ok,["lo","eth0","eth1"]}
    
  • To retrieve the IP and MAC addresses of an interface:
    3> inet:ifget("eth0", [addr, hwaddr]).
    {ok,[{addr,{192,168,1,11}},{hwaddr,[0,11,22,33,44,55]}]}
    

Update2: Using inet:getifaddrs/0

As of R14B01, Erlang has a supported, cross-platform method for retrieving interface attributes. The functions returns a list holding the interface information. For example:
1> inet:getifaddrs().
{ok,[{"lo",
      [{flags,[up,loopback,running]},
       {hwaddr,[0,0,0,0,0,0]},
       {addr,{127,0,0,1}},
       {netmask,{255,0,0,0}},
       {addr,{0,0,0,0,0,0,0,1}},
       {netmask,{65535,65535,65535,65535,65535,65535,65535,
                 65535}}]}]