Saturday, December 25, 2010

Unix Sockets

There are various ways to use Unix sockets from within Erlang such as gen_socket and unixdom_drv. Code examples are even bundled with the Erlang source.

To work with Unix sockets, I've broken out the socket primitives in the procket NIF and made them accessible from Erlang.

Unix (or local or file) sockets reside as files on the local server filesystem. Like internet sockets, the Unix version can be created as either stream (reliable, connected, no packet boundary) or datagram (unreliable, packet boundaries) sockets.

Creating a Datagram Socket

The Erlang procket functions are simple wrappers around the C library. See the C library man pages for more details.

To register the server, we get a socket file descriptor and bind it to the pathname of the socket on the filesystem. The bind function takes 2 arguments, the file descriptor and a sockaddr_un. On Linux, the sockaddr_un is defined as:

typedef unsigned short int sa_family_t;

struct sockaddr_un {
    sa_family_t sun_family;         /* 2 bytes: AF_UNIX */
    char sun_path[UNIX_PATH_MAX];   /* 108 bytes: pathname */

We use a binary to compose the structure, zero'ing out the unused portion:

#define UNIX_PATH_MAX 108
#define PATH <<"/tmp/unix.sock">>

<<?PF_LOCAL:16/native,        % sun_family
  ?PATH/binary,               % address

This binary representation of the socket structure has a portability issue. For BSD systems, the first byte of the structure holds the length of the socket address. The second byte is set to the protocol family. The value for UNIX_PATH_MAX is also smaller:
typedef __uint8_t   __sa_family_t;  /* socket address family */

struct sockaddr_un {
    unsigned char   sun_len;    /* 1 byte: sockaddr len including null */
    sa_family_t sun_family;     /* 1 byte: AF_UNIX */
    char    sun_path[104];      /* path name (gag) */
The binary can be built like:
#define UNIX_PATH_MAX 104
#define PATH <<"/tmp/unix.sock">>

  (byte_size(?PATH)):8,         % socket address length
  ?PF_LOCAL:8,                  % sun_family
  ?PATH/binary,                 % address

The code below might need to be adjusted for BSD. Or it might just work. Some code I tested on Mac OS X just happened to work, presumably because the length field was ignored, the endianness happened to put the protocol family in the second byte and the extra 4 bytes was truncated.

Here is the code to send data from the client to the server:

Start up an Erlang VM and run the server (remembering to include the path to the procket library):

$ erl -pa /path/to/procket/ebin
Erlang R14B02 (erts-5.8.3) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.3  (abort with ^G)
1> unix_dgram:server().

And in a second Erlang VM run:
1> unix_dgram:client(<<104,101,108,108,111,32,119,111,114,108,100>>). % Erlangish for <<"hello world">>, I am being a smartass

In the first VM, you should see printed out:
<<"hello world">>

Creating an Abstract Socket

Linux allows you to bind an arbitrary name (a name that is not a file system path) by using an abstract socket. The abstract socket naming convention uses a NULL prefacing arbitrary bytes in place of the path used by traditional Unix sockets. To define an abstract socket, a binary is passed as the second argument to procket:bind/2, in the format of a struct sockaddr:
<<?PF_LOCAL:16/native,        % sun_family
  0:8,                        % abstract address
  "1234",                     % the address

To create a datagram echo server, the source address of the client socket is bound to an address so the server has somewhere to send the response. We modify the datagram server to use recvfrom/4, passing in an additional flag argument (which is set to 0) and a length. recvfrom/4 will return an additional value containing up to length bytes of the socket address.

We also need to modify the client to bind to an abstract socket. The server will receive this socket address in the return value of recvfrom/4; this value can be passed to sendto/4.

1> unix_dgram1:server().

1> unix_dgram1:client(<<104,101,108,108,111,32,119,111,114,108,100>>).
<<"hello world">>

Creating a Stream Socket

To create a stream socket, we use the SOCK_STREAM type (or 1) for the second value passed to socket/3. The socket arguments can be either integers or atoms; for variety, atoms are used here.

After the socket is bound, we mark the socket as listening and poll it (rather inefficiently) for connections. When a new connection is received, it is accepted, the file descriptor for the new connection is returned and a process is spawned to handle the connection.

On the client side, after obtaining a stream socket, we do connect the socket and so do not need to explicitly bind it.

Running the same steps for the client and server as above:

1> unix_stream:server().
<<"hello world">>
** client disconnected

1> unix_stream:client(<<104,101,108,108,111,32,119,111,114,108,100>>).
<<"hello world">>

Friday, December 3, 2010

ICMP Ping in Erlang, part 2

I've covered sending ICMP packets from Erlang using BSD raw sockets and Linux's PF_PACKET socket option.

gen_icmp tries to be a simple interface for ICMP sockets using the BSD raw socket interface for portability. It should work on both Linux and BSD's (I've tested on Ubuntu and Mac OS X).

Sending Ping's

To ping a host:
1> gen_icmp:ping("").
      <<" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK">>}}]
The response is a list of 3-tuples. The third element is a 2-tuple holding the ICMP echo request ID, the sequence number, the elapsed time and the payload.

A bad response looks like:
2> gen_icmp:ping("").
The argument to gen_icmp:ping/1 takes either a string or a list of strings. For example, to ping every host on a /24 network:
1> gen_icmp:ping([ {192,168,213,N} || N <- lists:seq(1,254) ]).
gen_icmp:ping/1 takes care of opening and closing the raw socket. This operation is somewhat expensive because Erlang is spawning a setuid executable to get the socket. If you'll be doing a lot of ping's, it's better to keep the socket around and use ping/3:
1> {ok,Socket} = gen_icmp:open().

2> gen_icmp:ping(Socket, ["", "", {192,168,213,1}],
    [{id, 123}, {sequence, 0}, {timeout, 5000}]).
      <<" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK">>}},
      <<" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK">>}},
      <<" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK">>}}]

3> gen_icmp:close(Socket).

Creating Other ICMP Packet Types

ICMP destination unreachable and time exceeded packets return at least the first 64 bits of the header and payload of the original packet. Here is an example of generating an ICMP port unreachable for a fake TCP packet sent to port 80.

-define(IPV4HDRLEN, 20).
-define(TCPHDRLEN, 20).

unreachable(Saddr, Daddr, Data) ->
    {ok, Socket} = gen_icmp:open(),

    IP = #ipv4{
        p = ?IPPROTO_TCP,
        len = ?IPV4HDRLEN + ?TCPHDRLEN + byte_size(Data),
        saddr = Daddr,
        daddr = Saddr

    TCP = #tcp{
        dport = 80,
        sport = crypto:rand_uniform(0, 16#FFFF),
        seqno = crypto:rand_uniform(0, 16#FFFF),
        syn = 1

    IPsum = epcap_net:makesum(IP),
    TCPsum = epcap_net:makesum([IP, TCP, Data]),

    Packet = <<
        (epcap_net:ipv4(IP#ipv4{sum = IPsum}))/bits,
        (epcap_net:tcp(TCP#tcp{sum = TCPsum}))/bits,

    ICMP = gen_icmp:packet([
        {type, ?ICMP_DEST_UNREACH},
        {code, ?ICMP_UNREACH_PORT}
    ], Packet),

    ok = gen_icmp:send(Socket, Daddr, ICMP),

To create the IPv4 and TCP headers, we make the protocol records and use the epcap_net module functions to encode the headers with the proper checksums. For creating the ICMP packet, we use the gen_icmp:packet/2 function (which again simply calls epcap_net).

ICMP Ping Tunnel

We can tunnel any data we like in the payload of an ICMP packet. In this example, we'll use ICMP echo requests to tunnel an ssh connection between 2 hosts. The ICMP echo replies sent back by the peer OS ensure the data was received, like the ACK in a TCP connection.

The tunnel exports 2 functions:

  • ptun:server(ClientAddress, LocalPort) -> void()

    Types   ClientAddress = tuple()
                LocalPort = 0..65534
        ClientAddress is the IPv4 address of the peer represented as a tuple.
        The server listens on LocalPort for TCP connections and will close
        the port after a TCP client connects.  Data received on this port
        will be sent to the peer as the payload of the ICMP packets.

  • ptun:client(ServerAddress, LocalPort) -> void()

    Types   ServerAddress = tuple()
                LocalPort = 0..65534
        ServerAddress is the IPv4 address of the peer.
        When the client receives an ICMP echo request, the client opens a
        TCP connection to the LocalPort on localhost, proxying the data.
To start the tunnel, you'll need 2 hosts. In this example, is the client and is the server. forwards any tunnelled data it receives to a local SSH server. On
1> ptun:client({192,168,213,119}, 22).
1> ptun:server({192,168,213,7}, 8787).
Open another shell and start an SSH connection to port 8787 on localhost:
ssh -p 8787
$ ifconfig eth0 | awk '/inet addr/{print $2}'

Sunday, November 14, 2010

Playing with Diagrams

websequencediagrams is a site that generates diagrams, sort of like GraphViz or Diagrammr.

The site has some sample code for generating the image files programmatically. Here is an example in Erlang:

To generate an image from a description in a text file such as:
client->server: POST request containing style and message
note right of server: server generates PNG file
server->client: server returns JSON containing image name
client->server: GET request for image
server->client: server returns PNG file

Read the file into Erlang and render the description:
{ok, Bin} = file:read_file("descr.txt"),
wsd:render([{message, binary_to_list(Bin)}]).
The output looks like:

Tuesday, October 26, 2010

Fun with Raw Sockets in Erlang: Bridging

Hosts connecting to another system on the same network map the protocol address (e.g., IPv4 address) of the destination host to a hardware address (e.g., ethernet MAC address) using the ARP protocol. Clients on different networks can communicate by having a device forward the packets between networks. These devices, usually multi-homed and spanning networks, forward packets by re-writing the packet headers: for example, the ethernet header (a bridge), the IP header (a router) or the IP and TCP headers (a NAT).

An ethernet II header:

  • Preamble:42
  • Start of Field Delimeter:8
  • Destination Host:48
  • Source Host:48
  • Type:16
  • Protocol Payload:N
  • CRC:16

Not all of these fields will be visible to processes running on the operating system.

  • The Preamble starts the Ethernet frame (not available to the OS)
  • The SOF Delimiter marks the start of the destination host address field (not available to the OS)
  • The Destination Host is the 6 byte MAC address of the target host
  • The Source Host is the 6 byte MAC address of the sending host
  • The Type represents the protocol held in the payload. Common types are IPv4 (ETH_P_IP (16#0800)), ARP (ETH_P_ARP (16#0806)) and IPv6 (ETH_P_IPV6 (16#86DD)).
  • A trailing CRC checksum of the packet (frame check sequence) (usually not available to the OS)

    Ethernet frames can apparently include other trailing data, leading to trailing junk when doing packet captures.

    When capturing data off the network, it's important to properly calculate the size of the packet. For example, for an IPv4 TCP packet:

    IP length - (IP header length * 4) - (TCP offset * 4)

The ethernet header has fixed size fields and so does not explicitly include a field for length. Interestingly, Ethernet 802.3 frames use the Type field for the length (according to Wikipedia 802.3 packets can be distinguished from Ethernet II packets by:

  1. if the value of the Type field is equal to or less than 1500 bytes (maximum frame length), the frame is Ethernet 802.3 and the value represents the length of the frame
  2. if the value of the Type field is equal to or greather than 1536, the frame is Ethernet II and the value represents the protocol type of the encapsulated packet

    The protocol of 802.3 frames is always IPX.

  3. if the value of the Type field is between 1500 and 1536, the behaviour is undefined)

An Erlang binary representation of an Ethernet II frame:

Dhost:6/bytes,        % destination MAC address
Shost:6/bytes,        % source MAC address
Type:16               % protocol type, usually ETH_P_IP

Using Erlang to Bridge Packets

To go along with the ARP poisoning, we need a sort of "one armed bridging" to forward packets from our spoofing host to the real destinations. Once we have the raw ethernet frames, doing the bridging is quite simple. For the complete code, see herp on GitHub (yes, the herp is what you get when you've been promiscuous. I hear like 80% of adults have it).

I won't include much of the code here because it's so trivial. The bridging process captures packets off the network and pattern matches on the headers:

filter(#ether{shost = MAC}, _, #state{mac = MAC}) ->
filter(#ether{type = ?ETH_P_IP}, Packet, State) ->
    {#ipv4{daddr = DA}, _} = epcap_net:ipv4(Packet),
    filter1(DA, Packet, State);
filter(_, _, _) ->

filter1(IP, _, #state{ip = IP}) ->
filter1(IP, Packet, #state{gw = GW}) ->
    MAC = case packet:arplookup(IP) of
        false -> GW;
        {M1,M2,M3,M4,M5,M6} -> <<M1,M2,M3,M4,M5,M6>>
    bridge(MAC, Packet).
  1. Check if the frame has our MAC address
  2. Retrieve the IP address from the frame and check the system ARP cache. A real bridge would monitor the network for ARP packets and cache the results.
  3. If the IP exists in our ARP cache, use the MAC address, otherwise, send it to the gateway (it's possible the IP address does not exist in the system ARP cache and will be wrongly forwarded to the gateway)
If the frame should be bridged, we create a new frame with the source hardware address set to our host's MAC address. The IP header and payload are not touched.
handle_call({packet, DstMAC, Packet}, _From, #state{
        mac = MAC,
        s = Socket,
        i = Ifindex} = State) ->

    Ether = epcap_net:ether(#ether{
            dhost = DstMAC,
            shost = MAC,
            type = ?ETH_P_IP

    packet:send(Socket, Ifindex, list_to_binary([Ether, Packet])),
    {reply, ok, State};

It'd be interesting to experiment with making herp into a traditional network bridge: quite likely very slow, but also redundant, distributed and fault tolerant.

Monday, October 25, 2010

Fun with Raw Sockets in Erlang: ARP Poisoning

On IP ethernet networks, hosts use a peer to peer method called ARP (address resolution protocol) to discover the hardware address of the peer with which they intend to communicate.

IPv4 ethernet ARP packets are specified as:
  • Hardware Type:16
  • Protocol Type:16
  • Hardware Length:8
  • Protocol Length:8
  • Operation:16
  • Sending Hardware Address:48
  • Sending IP Address:32
  • Target Hardware Address:48
  • Target IP Address:32

The numbers after the colon represent the size in bits of the field.

  • The Hardware Type of the network is ethernet, so the value is set to ARPHRD_ETHER (1)
  • The Protocol Type of the network is IPv4, so the value is set to ETH_P_IP (0x0800)
  • The Hardware Length of an ethernet MAC address is 6 bytes
  • The Protocol Length of an IPv4 address is 4 bytes
  • Operation is usually an ARP request (ARPOP_REQUEST (1)) or reply (ARPOP_REPLY (2))
  • The Sending Hardware Address is the MAC address of the host sending the ARP packet
  • The Sending IP Address is the IPv4 address of the host sending the ARP packet
  • The Target Hardware Address is the MAC address of the host receiving the ARP packet

    The target address may be the ethernet broadcast address (FF:FF:FF:FF:FF:FF or 00:00:00:00:00:00) which results in all hosts receiving the ARP packet.

  • The Target IP Address is the IPv4 address of the host sending the ARP packet
The corresponding Erlang representation of an IPv4 ethernet ARP packet is:
<<Hrd:16, Pro:16,
Hln:8, Pln:8, Op:16,
Sha:48, Sip:32,
Tha:48, Tip:48>>

Behaviour of the ARP Cache

ARP caches are key/value stores holding a mapping of the protocol address to the hardware address. To prevent caching of stale data, entries eventually expire. The expiry timeout varies; for example, on MS Windows, arp entries are kept for 2 minutes if another session to the remote host is not initiated. If a session is initiated within the 2 minute period, the ARP cache expiry time is extended to 10 minutes.

ARP is opportunistic and trust-based. If a host sees an ARP request or reply for which it is not the target, the host may cache the information. However, caching all requests would be pointless, since arp lookup would be slow on a network with a large number of peers with which the host might never communicate.

Gratuitous ARPs

ARPs are gratuitous when no request was made for the information. Gratuitous ARPs are useful for:
  • discovering IP conflict
  • IP take over: in a high availability cluster of servers, one of the hosts is active (holding the service IP address). In the event of a failure of the active node, one of the slave nodes can assume the service IP address, sending a gratuitous ARP to inform the other nodes and the gateway that the IP address is associated with a new MAC address
An Erlang binary representing a gratuitous ARP looks like:
1:16,                                 % hardware type
16#0800:16,                           % protocol type
6:8,                                  % hardware length
4:8,                                  % protocol length
2:16,                                 % operation: ARPOP_REPLY
0,1,2,3,4,5,                          % sending MAC address
192,168,1,100,                        % sending IPv4 address
16#FF,16#FF,16#FF,16#FF,16#FF,16#FF,  % target MAC address: ethernet broadcast
192,168,1,100                         % target IPv4 address: set to sending address
Behaving badly by gratuitously arp'ing for all the IP addresses on the network will DoS the other hosts, eventually forcing them to report a network error condition and go offline.

Sending an ARP Reply

To send an ARP packet from Erlang, we'll use the procket module on GitHub. The functions in procket used for these examples are unfortunately Linux-only. For this example, the binaries are manually specified. The epcap_net module on GitHub has convenience functions for creating and decomposing ARP packets into a record structure. If the network was set up like:
  • arping Erlang node:
  • source host:
  • target host (doesn't exist):
Login to another host on your network (the source node) and run tcpdump:
tcpdump -n -e arp
In another window, run the code:
$ erl -pa /path/to/procket/ebin
Erlang R14B01 (erts-5.8.2) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.2  (abort with ^G)
1> carp:send({10,11,11,11}). % Use an address on your network
If the MAC address of the Erlang node is 00:aa:bb:cc:dd:ee, then on the other host you should see something like:
00:59:35.051302 00:aa:bb:cc:dd:ee > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp reply is-at 00:aa:bb:cc:dd:ee
Check the ARP cache on the source node:
arp -an
The ARP entry may not exist. Since ARP caching is opportunistic, it is up to a host to decide whether it will optimize future connections by caching an unsolicited entry. To force an ARP cache entry on the remote host, ping the fake IP address:
Then, in the Erlang shell, run carp:send/1. Run "arp -an" again on the remote host. The ARP cache entry for should now be there.
? ( at 00:aa:bb:cc:dd:ee [ether] on eth0
If you run tcpdump on the host doing the ARP'ing, you should see ICMP traffic for
01:13:36.075040 IP > ICMP echo request, id 35604, seq 17, length 64
01:13:36.088794 IP > ICMP echo request, id 35604, seq 18, length 64
01:13:36.106572 IP > ICMP echo request, id 35604, seq 19, length 64
Since this IP address is not bound to any interface on your host, there will, of course, be no reply.

Poisoning the ARPs

On old networks using hubs or open networks using 802.11 access points, data is broadcasted to all the hosts. Running a packet sniffer displays the network activity for everyone, not just yourself.

Switched networks learn the MAC address of the host connected to the port of the switch and send data only to the recipient. ARP poisoning or spoofing fools other hosts on the network to send data to the host under your control. If your host spoofs the gateway, you'll also see the internet traffic. Obviously, if your host is not somehow routing, the packets will go nowhere, performing a denial of service on your network. But you will be able to see the data other hosts are sending.

Using the code we ran earlier, we can test ARP spoofing. You'll need 3 hosts: one doing the poisoning and two victims, a source and a target. The process is the same as above. For example, if were a real host, we would have convinced to send its data through our host.

I've put my experiment in progress with ARP poisoning, farp, on GitHub. farp works by replying to ARP requests with our host's MAC address. It can also optionally send out gratuitous arps as the gateway to speed up the process.

Sunday, October 17, 2010

Setting Parity on DES Keys

DES keys are 8 bytes, of which only 7 bits of each byte are used for the key. The least significant bit is used for parity and is thrown away for the actual encryption/decryption (resulting in a 56-bit key for single DES). Mostly, it seems, the parity is ignored by DES implementations, but occasionally a system using DES will check the parity and reject the key if the parity is not odd (or so Google tells me, I've never actually seen this happen). The parity bit was intended to prevent corruption or tampering with the key.

The DES parity calculation works as follows:
00001001 = 9
  • for each byte, count the number of bits that are set. For the example byte above, 2 bits are set
  • if the number of bits set is odd, do nothing
  • if the number of bits is even, set or unset the least significant bit to make the count odd
In the example, the new value for the byte would be
00001000 = 8
Reading through Erlang's crypto support for DES and DES3, it's up to the caller to use a valid key. For example, the NIF making up crypto:des_cbc_crypt/3 is defined as:
DES_set_key((const_DES_cblock*), &schedule);
DES_ncbc_encrypt(, enif_make_new_binary(env, text.size, &ret),
    text.size, &schedule, &ivec_clone, (argv[3] == atom_true));
DES_set_key() is an OpenSSL compatibility function that, in this case, is identical to DES_set_key_unchecked(). The corresponding checking function, DES_set_key_checked(), returns some information about the key: -1 (if the parity is even) and -2 (if the key is weak).

So I was curious how to go about setting the parity in a functional language. It turns out to be quite easy:

-export([set_parity/1, check_parity/1, odd_parity/1]).

set_parity(Key) ->
    << <<(check_parity(N))>> || <<N>> <= Key >>.

check_parity(N) ->
    case odd_parity(N) of
        true -> N;
        false -> N bxor 1

odd_parity(N) ->
    Set = length([ 1 || <<1:1>> <= <<N>> ]),
    Set rem 2 == 1.

set_parity/1 uses a binary comprehension to read 1 byte at a time from the 8 byte key.

check_parity/1 checks whether an integer has an odd or even parity and returns the integer XOR'ed with 1 if the parity is even.

odd_parity/1 counts the bit set by using a bit comprehension to return the list of bits that are set. The modulus of the length of this list returns oddness/evenness.

To test if this is correct, we can check if a few cases (all even bits or all odd bits in a key) work and then test that a key with corrected parity produces the same cipher text as the uncorrected key:

test() ->
    <<1,1,1,1,1,1,1,1>> = set_parity(<<0:(8*8)>>),
    <<1,1,1,1,1,1,1,1>> = set_parity(<<1,1,1,1,1,1,1,1>>),
    K1 = <<"Pa5Sw0rd">>,
    K2 = set_parity(K1),
    Enc = crypto:des_cbc_encrypt(K1, <<0:64>>, "12345678"),
    Enc = crypto:des_cbc_encrypt(K2, <<0:64>>, "12345678"),
    <<"12345678">> = crypto:des_cbc_decrypt(K2, <<0:64>>, Enc),

Tuesday, August 31, 2010

Dumping Payloads with epcap and procket

epcap and procket allow Erlang code to sniff data off of a network. With either, once the packets are in the Erlang VM, manipulating the contents is straight forward using pattern matching.

Which to use?

epcap and procket sort of overlap in functionality. The main differences, at the moment, are:
  • portability

    epcap: should work on any Unix with pcap installed

    procket: for sniffing, procket uses Linux's PF_PACKET socket option, so Linux only. I plan to add support for BPF someday, so maybe in the future procket will support BSD as well.

  • safety

    epcap: runs as a separate system process. Any bugs in epcap will not affect the Erlang VM.

    procket: linked into the Erlang VM using the NIF interface. Bugs may stall or crash the VM.

  • packet generation

    epcap: can only sniff packets

    procket: can generate whole packets. Again, currently Linux only, but should work under BSD's, like Mac OS X, when BPF is supported.

    Raw sockets (for example, generating ICMP echo packets) work under BSD as well.

    In fact, it should be possible to combine the power of procket, epcap, and BSD to send and receive arbitrary TCP or UDP packets now (since TCP/UDP raw sockets can send data only, we need to use epcap to sniff the response).

  • filtering

    epcap: packet filtering rules are processed in C, either in the kernel or in a library.

    procket: all packets are received and must be filtered by an Erlang process

Decapsulating Packets

procket and epcap have different ways of being started and reading packets but once the raw packets are received by an Erlang process, they can be decapsulated with a small module ("epcap_net.erl") distributed with epcap. Say, for example, we wanted to monitor http requests and write out a file containing just the client side: We request a raw socket from procket and then loop, polling the socket every 10ms for data. We look for established connections by matching packets with only the ACK bit set (we ignore connections in progress) and spawn another process to accumulate the data. When we see a packet indicating a connection has been closed, we tell the spawned process to write out its state to a file. The spawned process will also terminate if it reaches an arbitrary timeout. You've probably noticed that this code mimics some of the functionality of OTP behaviours. I wrote it this way for simplicity, but it certainly could be more compactly (and elegantly) written as a gen_server or a gen_fsm.

Matching on Payloads

A similar example with epcap: match on all http requests and write the complete transaction to a file based on the etag. epcap doesn't require polling as with procket. Instead, messages are received from the port similar to gen_tcp in {active, true} mode. The message contains some additional information about the length and time of the captured packet. For this example, we ignore it. We're only interested in the packet contents.

Similar to the procket example, we loop, blocking in receive. When data is received, we check if the connection is in the established state, spawning a process to accumulate the data if we haven't seen this session before.

Finally, we write out the data to the file system when the connection is closed, using the value of the "ETag" header for the file name. For succintness, I used a regular expression to match on the payload. Probably better to write a parser.

Thanks, Zabrane, for suggesting this post!

Sunday, July 4, 2010

DNS Programming with Erlang

I have this strange fascination with DNS. By which I mean loathing. Yet somehow I've already written 3 small DNS servers in Erlang:
  • emdns: An unfinished multicast DNS server with unspecified yet no doubt awesome features. Someday I'll finish it. Maybe.
  • spood: A strange, little program; a spoofing DNS proxy that will send out your DNS requests from somebody else's IP address and sniff the responses. Maybe (if you're somewhat sketchy) you could use it to hide your DNS lookups. Maybe, you could use it to ramp up your DNS requests on networks that throttle them down. Not that I would do any of that.
  • seds: a DNS server that tunnels TCP/IP. I'm typing this blog over a DNS tunnel right now, stress testing it (with my blazing fast ASCII input) and trying to make seds crash. Also stress testing my patience.
Since the programmatic interfaces to DNS in Erlang are mostly undocumented, I thought I'd go over them briefly. So I'll remember how to use them if I ever finish emdns. I figured out how they worked mainly by reading the source and dumping DNS packets to see the record structures. The DNS parsing functions are kept in lib/kernel/src/inet_dns.erl. Pretty much the only functions that you will need from this module are encode/1 and decode/1. The tricky part is passing in the appropriate data structures.
  • decode/1 takes a binary and returns a #dns_rec{} record or {error, fmt} if the DNS payload cannot be decoded
  • encode/1 as you might expect, does the inverse, taking an appropriate record and returning a binary
The record structure is defined in lib/kernel/src/inet_dns.hrl.
        header,       %% dns_header record
        qdlist = [],  %% list of question entries
        anlist = [],  %% list of answer entries
        nslist = [],  %% list of authority entries
        arlist = []   %% list of resource entries
  • The DNS header is another record:
         id = 0,       %% ushort query identification number
         %% byte F0
         qr = 0,       %% :1   response flag
         opcode = 0,   %% :4   purpose of message
         aa = 0,       %% :1   authoritive answer
         tc = 0,       %% :1   truncated message
         rd = 0,       %% :1   recursion desired
         %% byte F1
         ra = 0,       %% :1   recursion available
         pr = 0,       %% :1   primary server required (non standard)
                       %% :2   unused bits
         rcode = 0     %% :4   response code
    While the defaults are initialized to small integers, inet_dns replaces them with atoms. So, the 1 bit values are either the atoms 'true' or 'false' and the opcode is set to an atom, for example, 'query'. Both integers and the atom representations are usually accepted by the functions though.

  • qdlist is a list of DNS query records:
         domain,     %% query domain
         type,        %% query type
         class      %% query class
    • domain is a string representing the domain name, e.g., ""
    • type is an atom describing the DNS type: a, cname, txt, null, srv, ns, ...
    • class will most commonly be 'in' (Internet), though multicast DNS uses "cache flush" (32769) for some operations

Making a valid Erlang DNS query would look something like:


q(Domain, NS) ->
    Query = inet_dns:encode(
            header = #dns_header{
                id = crypto:rand_uniform(1,16#FFFF),
                opcode = 'query',
                rd = true
            qdlist = [#dns_query{
                domain = Domain,
                type = a,
                class = in
    {ok, Socket} = gen_udp:open(0, [binary, {active, false}]),
    gen_udp:send(Socket, NS, 53, Query),
    {ok, {NS, 53, Reply}} = gen_udp:recv(Socket, 65535),
I enabled recursion because the request will be going through the one of the public Google nameservers ( instead of going directly through the authoritative nameserver.

Testing the results:
$ erl
Erlang R14A (erts-5.8) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8  (abort with ^G)
1> {ok, Q} = dns:q("", {8,8,8,8}).
                      {216,239,32,21},                      undefined,[],false},
2> rr("/usr/local/lib/erlang/lib/kernel-2.14/src/inet_dns.hrl").
3> Q.
#dns_rec{header = #dns_header{id = 7296,qr = true,
                              opcode = 'query',aa = false,tc = false,rd = true,ra = true,
                              pr = false,rcode = 0},
         qdlist = [#dns_query{domain = "",
                              type = a,class = in}],
         anlist = [#dns_rr{domain = "",
                           type = a,class = in,cnt = 0,ttl = 656,
                           data = {216,239,32,21},
                           tm = undefined,bm = [],func = false},
                   #dns_rr{domain = "",type = a,
                           class = in,cnt = 0,ttl = 656,
                           data = {216,239,34,21},
                           tm = undefined,bm = [],func = false},
                   #dns_rr{domain = "",type = a,
                           class = in,cnt = 0,ttl = 656,
                           data = {216,239,36,21},
                           tm = undefined,bm = [],func = false},
                   #dns_rr{domain = "",type = a,
                           class = in,cnt = 0,ttl = 656,
                           data = {216,239,38,21},
                           tm = undefined,bm = [],func = false}],
         nslist = [],arlist = []}
The records are displayed as tuples. You can pretty print the records by using the shell rr() command to include the header file wherever it is on your system.

The query returned the same packet we sent with some changes to the header:
  • The response flag (qr) is set to true
  • The recursion available flag (ra) is also set to true
The answer to our query is a list bound to the anlist record atom. The #dns_rr{} record looks like:
     domain = "",   %% resource domain
     type = any,    %% resource type
     class = in,    %% reource class
     cnt = 0,       %% access count
     ttl = 0,       %% time to live
     data = [],     %% raw data
     tm,            %% creation time
         bm = [],       %% Bitmap storing domain character case information.
         func = false   %% Optional function calculating the data field.
The data field is interesting. Although it's initialized as an empty list, the data structure bound to it depends on the DNS record type. For example, from the ones I remember:
  • A: tuple representing the IP address
  • TXT: a list of strings
  • NULL: a binary
  • CNAME: a domain name string appropriately "labelled" (canonicalized by the "."'s), e.g., "". inet_dns takes care of breaking the domain name into the appropriate, compressed domain name -- a weird form where the "."'s are replaced by nulls and each component is prefaced by a length or a pointer redirecting to another field (hence the compression).

Pattern Matching

The cool thing is that, since the DNS records are nested records, its very easy to pattern match on the results. Modifying the example above:


q(Type, Domain, NS) ->
    Query = inet_dns:encode(
            header = #dns_header{
                id = crypto:rand_uniform(1,16#FFFF),
                opcode = 'query',
                rd = true
            qdlist = [#dns_query{
                    domain = Domain,
                    type = Type,
                    class = in
    {ok, Socket} = gen_udp:open(0, [binary, {active, true}]),
    gen_udp:send(Socket, NS, 53, Query),
    loop(Socket, Type, Domain, NS).

loop(Socket, Type, Domain, NS) ->
        {udp, Socket, NS, _, Packet} ->
            {ok, Response} = inet_dns:decode(Packet),
            match(Type, Domain, Response)

match(a, Domain, #dns_rec{
        header = #dns_header{
            qr = true,
            opcode = 'query'
        qdlist = [#dns_query{
                domain = Domain,
                type = a,
                class = in
        anlist = [#dns_rr{
                domain = Domain,
                type = a,
                class = in,
                data = {IP1, IP2, IP3, IP4}
            }|_]}) ->
    {a, Domain, {IP1,IP2,IP3,IP4}};
match(cname, Domain, #dns_rec{
        header = #dns_header{
            qr = true,
            opcode = 'query'
        qdlist = [#dns_query{
                domain = Domain,
                type = cname,
                class = in
        anlist = [#dns_rr{
                domain = Domain,
                type = cname,
                class = in,
                data = Data
            }|_]}) ->
    {cname, Domain, Data}.

And the results:

$ erl
Erlang R14A (erts-5.8) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8 (abort with ^G)
1> dns1:q(cname, "", {8,8,8,8}).

Thursday, July 1, 2010

Fun with Raw Sockets in Erlang: Finding MAC and IP Addresses

(See the update for versions of some of these functions in standard Erlang).

When working with PF_PACKET raw sockets, the caller needs to provide the source/destination MAC and IP addresses.
Playing with a spoofing DNS proxy, I got tired of hardcoding the addresses, then WTF'ing every time I switched networks. So I added some functions to procket to lookup the system network interface and its MAC and IP addresses.

Retrieving the MAC Address of an Interface

Under Linux, getting the MAC address of an interface involves calling an ioctl() with the request set to SIOCGIFHWADDR and passing in a struct ifreq.

Here is the code to do so in C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/ioctl.h>

#include <net/if.h>
#include <netinet/ether.h>

main(int argc, char *argv[])
    int s = -1;

    struct ifreq ifr = {0};
    char *dev = NULL;
    struct sockaddr *sa;

    dev = strdup((argc == 2 ? argv[1] : "eth0"));

    if (dev == NULL)
        err(EXIT_FAILURE, "strdup");

    if ( (s = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
        err(EXIT_FAILURE, "socket");

    (void)memcpy(ifr.ifr_name, dev, sizeof(ifr.ifr_name)-1);

    if (ioctl(s, SIOCGIFHWADDR, &ifr) < 0)
        err(EXIT_FAILURE, "ioctl");

    sa = (struct sockaddr *)&ifr.ifr_hwaddr;

            sa->sa_data[0], sa->sa_data[1], sa->sa_data[2], sa->sa_data[3],
            sa->sa_data[4], sa->sa_data[5]);


    exit (EXIT_SUCCESS);


The equivalent in Erlang uses procket:ioctl/2
macaddress(Socket, Dev) ->
    {ok, <<_Ifname:16/bytes,
        ?PF_INET:16,                       % family
        SM1,SM2,SM3,SM4,SM5,SM6,    % mac address
        _/binary>>} = procket:ioctl(Socket,
                Dev, <<0:((15*8) - (length(Dev)*8)), 0:8, 0:128>>
Results may differ depending on the endian-ness of your platform.

Retrieving the IP Address of an Interface

The IP address of an interface can be obtained by another ioctl() with a request value of SIOCGIFADDR. In C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/ioctl.h>

#include <netinet/in.h>
#include <arpa/inet.h>

#include <net/if.h>

main(int argc, char *argv[])
    int s = -1;

    struct ifreq ifr = {0};
    char *dev = NULL;
    struct sockaddr_in *sa;

    dev = strdup((argc == 2 ? argv[1] : "eth0"));

    if (dev == NULL)
        err(EXIT_FAILURE, "strdup");

    if ( (s = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
        err(EXIT_FAILURE, "socket");

    (void)memcpy(ifr.ifr_name, dev, sizeof(ifr.ifr_name)-1);
    ifr.ifr_addr.sa_family = PF_INET;

    if (ioctl(s, SIOCGIFADDR, &ifr) < 0)
        err(EXIT_FAILURE, "ioctl");

    sa = (struct sockaddr_in *)&ifr.ifr_hwaddr;

    (void)printf("%s\n", inet_ntoa(sa->sin_addr));


    exit (EXIT_SUCCESS);


And the Erlang version:
ipv4address(Socket, Dev) ->
    {ok, <<_Ifname:16/bytes,
        ?PF_INET:16/native, % sin_family
        _:16,               % sin_port 
        SA1,SA2,SA3,SA4,    % sin_addr
        _/binary>>} = procket:ioctl(Socket,
                Dev, <<0:((15*8) - (length(Dev)*8)), 0:8>>,
                <<?PF_INET:16/native,       % family

Looking Up an IP Address in the ARP Cache

ARP cache lookups can be done by using:
ioctl(socket, SIOCGARP, struct arpreq);
But utilities on Linux just seem to parse /proc/net/arp:
arplookup({IP1,IP2,IP3,IP4}) ->
    {ok, FD} = file:open("/proc/net/arp", [read,raw]),
    arploop(FD, inet_parse:ntoa({IP1,IP2,IP3,IP4})).

arploop(FD, Address) ->
    case file:read_line(FD) of
        eof ->
        {ok, Line} ->
            case lists:prefix(Address, Line) of
                true ->
                    M = string:tokens(
                        lists:nth(?HWADDR_OFF, string:tokens(Line, " \n")), ":"),
                    list_to_tuple([ erlang:list_to_integer(E, 16) || E <- M ]);
                false -> arploop(FD, Address)

Getting a List of Interfaces

To get the list of interfaces on a system, yet another ioctl() is used, this time passing in SIOCGIFCONF and this structure as arguments:
struct ifconf
    int ifc_len;            /* Size of buffer.  */
        __caddr_t ifcu_buf;
        struct ifreq *ifcu_req;
    } ifc_ifcu;
Here is an example of retrieving the interface list in C.

The ioctl() takes, as an argument, a structure using a length and a pointer to a buffer. procket doesn't have a way of allocating a piece of memory though it could be modified to have an NIF that allocates a binary and returns the address of the binary as an integer. Functions could then pass in the memory address but a buggy piece of Erlang code might pass in the wrong value and crash the VM. (Edit: The erl_nif interface in Erlang R14A supports safely passing a reference to a block of memory between functions using enif_alloc_resource() to create a "Resource Object".)

Instead, I simply parse the output of /proc/net/dev. For example, here is the output on my laptop:

$ cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:  526441    3620    0    0    0     0          0         0   526441    3620    0    0    0     0       0          0
  eth0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
 wifi0:10960093   26897    0    0    0  1578          0         0   734661    4876    0    0    0     0       0          0
  ath0:14422892   12536    0    0    0     0          0         0   576599    4706    0    0    0     0       0          0
Even nastier than the arp cache lookup, since I resorted to using regular expressions.
iflist() ->
    {ok, FD} = file:open("/proc/net/dev", [raw, read]),
    iflistloop(FD, []).

iflistloop(FD, Ifs) ->
    case file:read_line(FD) of
        eof ->
        {ok, Line} ->
            iflistloop(FD, iflistmatch(Line, Ifs))

iflistmatch(Data, Ifs) ->
    case re:run(Data, "^\\s*([a-z]+[0-9]+):", [{capture, [1], list}]) of
        nomatch -> Ifs;
        {match, [If]} -> [If|Ifs]

Finding the Default Interface

In spood, I took the easy way and just sort of guessed. A proper solution would check the routing table. Instead I look for the first interface without a local IP address:
device() ->
    {ok, S} = procket:listen(0, [{protocol, udp}, {family, inet}, {type, dgram}]),
    [Dev|_] = [ If || If <- packet:iflist(), ipcheck(S, If) ],

ipcheck(S, If) ->
    try packet:ipv4address(S, If) of
        {127,_,_,_} -> false;
        {169,_,_,_} -> false;
        _ -> true
        error:_ -> false

Update: Using the inet Module

After having gone through all the above, I discovered that the inet module, which is part of the Erlang standard library, is able to retrieve information about the local interfaces.

inet has 2 functions:
  • getiflist/0: retrieve a list of all the local interfaces, e.g., {ok, ["eth0", "eth1"]}
  • ifget/2: retrieve interface attributes. The arguments can be:
    • addr: IP address of interface
    • hwaddr: the MAC address of the interface. Works on Linux, doesn't work on Mac OS X (returns an empty list).
      (Update: I've submitted a patch to get the MAC address on Mac OS X)
    • dstaddr
    • netmask
    • broadcast
    • mtu
    • flags: returns the interface status, e.g., [up, broadcast, running, multicast]

For example:
  • To get a list of interfaces:
    1> inet:getiflist().
  • To retrieve the IP and MAC addresses of an interface:
    3> inet:ifget("eth0", [addr, hwaddr]).

Update2: Using inet:getifaddrs/0

As of R14B01, Erlang has a supported, cross-platform method for retrieving interface attributes. The functions returns a list holding the interface information. For example:
1> inet:getifaddrs().