Pages

Showing posts with label nif. Show all posts
Showing posts with label nif. Show all posts

Saturday, April 23, 2011

BPF and Erlang

An Erlang Interface to the Berkeley Packet Filter

The man page about bpf says:

The Berkeley Packet Filter provides a raw interface to data link layers
in a protocol independent fashion.  All packets on the network, even
those destined for other hosts, are accessible through this mechanism.

On BSD systems, bpf can be used for capturing and generating raw network frames. We can use bpf to send and receive network packets from Erlang code in a similar way to using the PF_PACKET socket interface on Linux.

All of the code presented here was run on Mac OS X (Snow Leopard). Hopefully the code is portable to all BSD operating systems. If it isn't, let me know in the comments or email me.

Pre-Requisites for Running the Code Examples

For the Erlang examples below, you'll need a fairly recent version of Erlang and a few libraries. You can download the Erlang source code or check it out from github:

git clone git://github.com/erlang/otp.git

procket is an NIF used to extend the Erlang runtime to make the various system calls. You can check it out here:

git clone git://github.com/msantos/procket.git

See the README, there is a bit of work involving setting up sudo to run a helper app.

Finally, the examples use another small library (pkt) to parse the packets using Erlang binaries and convert packets from Erlang records to binaries. It's available here:

git clone git://github.com/msantos/pkt.git

To compile the example Erlang modules:

erlc -I /path/to/procket/include -I /path/to/pkt/include annoy.erl

The BPF C Interface

The bpf character device is used to transfer raw network frames between user space and the kernel. On Mac OS X, there are a fixed number of available devices, each acting as a communication path for a single process.

Using bpf works as follows:

  • Starting from 0, open the bpf devices. If the the open() system call returns failure (-1) and errno is set to EBUSY, try the next character device, e.g., /dev/bpf1.

    Typical values for errno might be:

    • EPERM: the process does not have permission to access the character device. Either the process can temporarily be given superuser privileges or the permissions of the character device can be modified.

    Since bpf relies on the file permissions of the character device, the act of opening the device is the only operation that requires privileges. Other operations on the file descriptor do not require any special privileges.

    • ENOENT: the bpf device does not exist
  • The open() call returns a file descriptor.

  • ioctl() is used to associate the bpf device with an interface

  • ioctl() is used to retrieve the bpf buffer length

    The bpf device maintains a fixed buffer size. For efficiency, reads performed on the bpf device will block until either the buffer is full or a timeout is reached (by default, infinity). As a consequence, several packets may be returned by a single read.

  • Set some optional attributes that affects the behaviour of the bpf device.

    For example:

    • BIOCSHDRCMPLT: by default, the bpf device will construct a valid packet header for the underlying datalink type. Setting the "header complete" attribute allows the user to set the packet headers themselves.

    • BIOCSEESENT: the bpf device does not return packets sent from the host. This ioctl request can be used to change that behaviour.

    • BIOCIMMEDIATE: causes reads to immediately return after a packet is returned rather than buffering the packets

  • Apply filtering rules using BPF bytecode.

    bpf supports a set of instructions that allow the user to restrict which packets are returned by the device.

See this tutorial for a clear, concise example of capturing packets in C.

I've also put together some simple, runnable code. To compile it:

gcc -g -Wall -o bpf bpf.c

To keep the example from becoming too huge, not much is done except printing out the ethernet header. We'll cover more interesting ways of using bpf later.

The Erlang BPF Interface

To use bpf within Erlang, we'll need to be able to open the bpf device, perform the appropriate ioctl() operations, generate BPF filtering code and read and write from the device.

Opening the BPF Device

procket uses a setuid helper executable to open the bpf character device and pass the file descriptor back to Erlang:

{ok, Socket} = procket:dev("bpf").

Or using the bpf module:

{ok, Socket, Length} = bpf:open("en1").

Controlling the BPF Device

Once we have the file descriptor, we can set the device attributes by using the procket:ioctl/3 NIF.

The bpf header file defines a number of macros for calculating the correct ioctl request. Porting these macros to Erlang is straightforward.

According the to the man page on Mac OS X, the ioctl signature is defined as:

int ioctl(int fildes, unsigned long request, ...);

An ioctl request is an unsigned long, so the size of the command will either be 4 or 8 bytes, depending on whether the platform is 32 or 64-bit. However, the ioctl macros compute the request with the assumption that the command is a word (or 4 bytes): the lower half of the word holds the command and the top half has the length and the direction of the command.

(The number after the colon represents the number of bits in the field.)

  1. Copy argument into kernel:1
  2. Copy argument from kernel:1
  3. No arguments:1
  4. Parameter length:13
  5. Command group:8
  6. Command:8

The fields in order are:

  • IN: if set, the argument (the 3rd argument to ioctl) is read from the user space buffer

  • OUT: if set, the argument is written to the user space buffer

A command that is IN/OUT will have the contents of the buffer read by the kernel and written back.

  • VOID: no arguments are required by the ioctl request

  • Length: the size of the command in bytes

  • Group: the command group acts as namespace for organizing the ioctl requests

  • Command: 1 byte is reserved for the actual command

For example, the BIOCSHDRCMPLT macro in C is:

#define IOCPARM_MASK    0x1fff      /* parameter length, at most 13 bits */
#define _IOC(inout,group,num,len) \
    (inout | ((len & IOCPARM_MASK) << 16) | ((group) << 8) | (num))
#define IOC_IN      (__uint32_t)0x80000000
#define _IOW(g,n,t) _IOC(IOC_IN,    (g), (n), sizeof(t))

#define BIOCSHDRCMPLT   _IOW('B',117, u_int)

The corresponding macro defined in Erlang:

-define(SIZEOF_U_INT, 4).
-define(IOCPARM_MASK, 16#1fff).
-define(IOC_INOUT, ?IOC_IN bor ?IOC_OUT).

-define(BIOCSHDRCMPLT, bpf:iow($B, 117, ?SIZEOF_U_INT)).

ioc(Inout, Group, Num, Len) ->
    Inout bor ((Len band ?IOCPARM_MASK) bsl 16) bor (Group bsl 8) bor Num.

iow(G,N,T) ->
    ioc(?IOC_IN, G, N, T).

To set the "header complete" mode from within Erlang:

procket:ioctl(Socket, ?BIOCSHDRCMPLT, <<1:32/native>>).

Or using the bpf module:

bpf:ctl(Sockt, hdrcmplt, true).

BPF Filtering

bpf has a bytecode language to filter out unwanted packets. The bytecode is generated by a set of macros in the bpf header file. For convenience in porting examples to Erlang, I defined macros wrapping the Erlang functions.

A BPF filtering program consists of an 8 byte instruction:

struct bpf_insn {
    u_short code;
    u_char  jt;
    u_char  jf;
    bpf_u_int32 k;
};
  • code: 2 bytes

The opcodes are a set of instructions for moving within the packet, testing values and control flow. The opcodes are OR'ed together.

  • jt: 1 byte

jump true: if the operation evaluates as true (non-0), jump this many instructions. Instructions are numbered from 0 (the statement following the test is instruction 0).

  • jf: 1 byte

jump false: if the operation evaluates as false (0), jump this many instructions. Instructions use a 0 offset, starting with the following instruction.

  • k: 4 bytes (the man page incorrectly defines this field as a u_long)

A value whose usage depends on the opcode.

An Example in C

I'll illustrate how the filters work by using an example from the man page. This example filters out all packets except reverse proxy requests:

struct bpf_insn insns[] = {
    BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
    BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
    BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
        sizeof(struct ether_header)),
    BPF_STMT(BPF_RET+BPF_K, 0),
};
  • BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12)

    The BPF_STMT macro takes an opcode and a k value as arguments.

    • BPF_LD: load value (move to offset)

    • BPF_H: load a half word value (2 bytes)

    • BPF_ABS: use an absolute offset from the beginning of the packet

    • 12: move 12 bytes into the ethernet frame

    An ethernet frame looks like (the numbers are bytes):

    1. Destination MAC Address:6
    2. Source MAC Address:6
    3. Type:2

    A 12 byte offset leaves the program at the ethernet type.

  • BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3)

    The BPF_JUMP macro arguments are: opcode, k, jt, jf

    • BPF_JMP: A branching operation, depending on whether the test evaluates as true or not

    • BPF_JEQ: the equality of the value at this offset (defined in the previous instruction as a half-word) is tested against the value held in the k field.

      If the value is equal to ETHERTYPE_REVARP (0x8035), the packet is a reverse ARP packet and control drops to the next statement. If the statement is false (for example, it is an IP packet), control jumps to the final statement:

      BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
      0: BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
      1: BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
      2: BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
          sizeof(struct ether_header)),
      3: BPF_STMT(BPF_RET+BPF_K, 0),
      
  • BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20)

The packet is a reverse arp packet. Move to offset 20. A reverse ARP packet looks like (numbers are bytes):

  1. Hardware Type:2
  2. Protocol Type:2
  3. Hardware Length:1
  4. Protocol Length:1
  5. Operation:2
  6. Sending Hardware Address:6
  7. Sending IP Address:4
  8. Target Hardware Address:6
  9. Target IP Address:4

An offset of 20 (ethernet frame = 14, so 6 bytes into the ARP packet) puts the program at the ARP operation, a 2 byte (half-word) field.

  • BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1)

If the value of the offset is equal to REVARP_REQUEST (3) move to the next instruction.

Otherwise, jump 1 instruction to the final return statement:

    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
    0: BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
        sizeof(struct ether_header)),
    1: BPF_STMT(BPF_RET+BPF_K, 0),
  • BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + sizeof(struct ether_header))

BPF_RET: return the value in the k field. "k" number of bytes of the packet will be returned to the bpf device.

  • BPF_STMT(BPF_RET+BPF_K, 0)

0 bytes is returned to the bpf device. The packet is dropped.

An Example in Erlang

Here is another example from the bpf man page: sniffing finger requests. Yes, the bpf man page appears to have been written in a long ago age, when reverse ARP and finger requests roamed the networks.

First the C version:

struct bpf_insn insns[] = {
    BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
    BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
    BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
    BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
    BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
    BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
    BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
    BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
    BPF_STMT(BPF_RET+BPF_K, 0),
};

Now the Erlang version (with comments):

-include("bpf.hrl").

-define(ETHERTYPE_IP, 16#0800).
-define(IPPROTO_TCP, 6).

finger() ->
    [   
        % Ethernet
        ?BPF_STMT(?BPF_LD+?BPF_H+?BPF_ABS, 12),                     % offset = Ethernet Type
        ?BPF_JUMP(?BPF_JMP+?BPF_JEQ+?BPF_K, ?ETHERTYPE_IP, 0, 10),  % type = IP

        % IP
        ?BPF_STMT(?BPF_LD+?BPF_B+?BPF_ABS, 23),                     % offset = ip protocol
        ?BPF_JUMP(?BPF_JMP+?BPF_JEQ+?BPF_K, ?IPPROTO_TCP, 0, 8),    % protocol = TCP

        ?BPF_STMT(?BPF_LD+?BPF_H+?BPF_ABS, 20),                     % offset = flags, frag offset
        ?BPF_JUMP(?BPF_JMP+?BPF_JSET+?BPF_K, 16#1fff, 6, 0),        % frag offset: mask the top 3 bits
                                                                    %  and AND with 1's
                                                                    %  If any non-0 value is returned from the
                                                                    %  AND (i.e., frag offset is non-0), jump
                                                                    %  to the end and drop the packet

        ?BPF_STMT(?BPF_LDX+?BPF_B+?BPF_MSH, 14),                    % offset = IP version, IP header length
                                                                    %  Load the header length into the index
                                                                    %  register

        % TCP
        ?BPF_STMT(?BPF_LD+?BPF_H+?BPF_IND, 14),                     % offset = TCP source port
                                                                    %  Move from offset 14 (start of IP packet)
                                                                    %  plus the value held in the index register
                                                                    %  (IP header length). Puts us at the start
                                                                    %  of the TCP packet (at the source port)
        ?BPF_JUMP(?BPF_JMP+?BPF_JEQ+?BPF_K, 79, 2, 0),              % source port = 79
        ?BPF_STMT(?BPF_LD+?BPF_H+?BPF_IND, 16),                     % offset = destination port
        ?BPF_JUMP(?BPF_JMP+?BPF_JEQ+?BPF_K, 79, 0, 1),              % destination port = 79
        ?BPF_STMT(?BPF_RET+?BPF_K, 16#FFFFFFFF),                    % return: entire packet
        ?BPF_STMT(?BPF_RET+?BPF_K, 0)                               % return: drop packet
].

Note that this filter does not check if the packet is IPv4.

Loading the Filter

To load the filter, another ioctl (BIOCSETF) is called. The ioctl takes a structure with a length and a pointer to the instructions:

struct bpf_program {
    u_int bf_len;
    struct bpf_insn *bf_insns;
};

The length field is set to the number of instructions, not the size of the instructions. In the first example (the reverse ARP filter), the length is 6.

In Erlang, the filter is loaded using:

Insn = finger(),
{ok, Code, [Res]} = procket:alloc([
    <<(length(Insn)):4/native-unsigned-integer-unit:8>>,
    {ptr, list_to_binary(Insn)}
]),
case procket:ioctl(Socket, ?BIOCSETF, Code) of
    {ok, _} ->
        procket:buf(Res);
    Error ->
        Error
end.

Or, more simply, using the bpf module:

bpf:ctl(Socket, setf, finger()).

BPF Filter Examples

BPF Packet Capture

Capturing packets is as simple as reading from the bpf device.

To work with file descriptors, procket needs to support the read and write system calls.

{ok, Buf} = procket:read(FD, Length).

The captured packet is not an ethernet frame (or a frame of whatever datalink type you happen to be sniffing): it's a buffer prepended with a header containing information about the packet that follows.

struct bpf_hdr {
    struct timeval bh_tstamp;     /* time stamp */
    u_long bh_caplen;             /* length of captured portion */
    u_long bh_datalen;            /* original length of packet */
    u_short bh_hdrlen;            /* length of bpf header (this struct
                                     plus alignment padding */
};
  • bh_timestamp differs between 32 and 64-bit platforms

    • On a 32-bit platform, struct timeval has a 4 byte sec and usec field.

    • On a 64-bit platform, struct timeval has an 8 byte sec and a 4 byte usec field.

  • bh_caplen is the size of the captured packet that follows

  • bh_datalen is the real packet length. The packet may have been truncated.

  • bh_hdrlen is the real size of the bpf_hdr structure which may be padded due to alignment

To determine the start of the next packet, the bpf header provides a macro. Similarly the Erlang bpf module provides a module to calculate the proper offset:

?BPF_WORDALIGN(Hdrlen + Caplen).

The bpf module will do the calculations for you:

{ok, Length} = bpf:ctl(Socket, blen),
{ok, Buf} = procket:read(Socket, Length),
{bpf_buf, Time, Datalen, Packet, Rest} = bpf:buf(Socket).

Here is a complete example of using the bpf module to dump packets. It can be used with the filt module, if you want to play with the fcode filtering. To start it:

% en1 is the wireless device
% rule = ( src host 10.10.10.10 or dst host 10.10.10.10 ) and ( src port 80 or dst port 80)
dump:start("en1", filt:tcp({10,10,10,10}, 80)).

BPF Packet Generation

Generating crafted packets is even simpler: write the packet to the bpf device. The packet must be valid.

Be careful, crafted packets can have strange effects. On Mac OS X, I found a few odd cases that caused the network interface to go down. For example, sending out ARP replies from a spoofed MAC address or even advertising 0.0.0.0 with the macbook's MAC address.

This example acts as a sort of peer to peer QoS, should you ever need to kick someone off of a local network. This code acts in 2 ways:

  • It continually arps for whatever IP the target advertises. Eventually, the target system will give up and go offline.

  • Since gratuious arps are sent aggressively, the gateway will consider our MAC address to be the MAC address for the target's IP address and will send packets to us, effectively cutting off the target system.

To use the code, you will need to know the target's MAC and IP address.

% our interface: en1
% target:
%  MAC = "00:aa:bb:cc:dd:ee"
%  IP = "10.10.10.10"
annoy:er("en1", "00:aa:bb:cc:dd:ee", "10.10.10.10").

Saturday, December 25, 2010

Unix Sockets

There are various ways to use Unix sockets from within Erlang such as gen_socket and unixdom_drv. Code examples are even bundled with the Erlang source.

To work with Unix sockets, I've broken out the socket primitives in the procket NIF and made them accessible from Erlang.

Unix (or local or file) sockets reside as files on the local server filesystem. Like internet sockets, the Unix version can be created as either stream (reliable, connected, no packet boundary) or datagram (unreliable, packet boundaries) sockets.

Creating a Datagram Socket


The Erlang procket functions are simple wrappers around the C library. See the C library man pages for more details.

To register the server, we get a socket file descriptor and bind it to the pathname of the socket on the filesystem. The bind function takes 2 arguments, the file descriptor and a sockaddr_un. On Linux, the sockaddr_un is defined as:

typedef unsigned short int sa_family_t;

struct sockaddr_un {
    sa_family_t sun_family;         /* 2 bytes: AF_UNIX */
    char sun_path[UNIX_PATH_MAX];   /* 108 bytes: pathname */
};

We use a binary to compose the structure, zero'ing out the unused portion:

#define UNIX_PATH_MAX 108
#define PATH <<"/tmp/unix.sock">>

<<?PF_LOCAL:16/native,        % sun_family
  ?PATH/binary,               % address
  0:((?UNIX_PATH_MAX-byte_size(?PATH))*8)
>>

This binary representation of the socket structure has a portability issue. For BSD systems, the first byte of the structure holds the length of the socket address. The second byte is set to the protocol family. The value for UNIX_PATH_MAX is also smaller:
typedef __uint8_t   __sa_family_t;  /* socket address family */

struct sockaddr_un {
    unsigned char   sun_len;    /* 1 byte: sockaddr len including null */
    sa_family_t sun_family;     /* 1 byte: AF_UNIX */
    char    sun_path[104];      /* path name (gag) */
};
The binary can be built like:
#define UNIX_PATH_MAX 104
#define PATH <<"/tmp/unix.sock">>

<<
  (byte_size(?PATH)):8,         % socket address length
  ?PF_LOCAL:8,                  % sun_family
  ?PATH/binary,                 % address
  0:((?UNIX_PATH_MAX-byte_size(?PATH))*8)
>>

The code below might need to be adjusted for BSD. Or it might just work. Some code I tested on Mac OS X just happened to work, presumably because the length field was ignored, the endianness happened to put the protocol family in the second byte and the extra 4 bytes was truncated.

Here is the code to send data from the client to the server:


Start up an Erlang VM and run the server (remembering to include the path to the procket library):

$ erl -pa /path/to/procket/ebin
Erlang R14B02 (erts-5.8.3) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.3  (abort with ^G)
1> unix_dgram:server().

And in a second Erlang VM run:
1> unix_dgram:client(<<104,101,108,108,111,32,119,111,114,108,100>>). % Erlangish for <<"hello world">>, I am being a smartass

In the first VM, you should see printed out:
<<"hello world">>
ok

Creating an Abstract Socket


Linux allows you to bind an arbitrary name (a name that is not a file system path) by using an abstract socket. The abstract socket naming convention uses a NULL prefacing arbitrary bytes in place of the path used by traditional Unix sockets. To define an abstract socket, a binary is passed as the second argument to procket:bind/2, in the format of a struct sockaddr:
<<?PF_LOCAL:16/native,        % sun_family
  0:8,                        % abstract address
  "1234",                     % the address
  0:((?UNIX_PATH_MAX-(1+4)*8))
>>

To create a datagram echo server, the source address of the client socket is bound to an address so the server has somewhere to send the response. We modify the datagram server to use recvfrom/4, passing in an additional flag argument (which is set to 0) and a length. recvfrom/4 will return an additional value containing up to length bytes of the socket address.

We also need to modify the client to bind to an abstract socket. The server will receive this socket address in the return value of recvfrom/4; this value can be passed to sendto/4.


1> unix_dgram1:server().

1> unix_dgram1:client(<<104,101,108,108,111,32,119,111,114,108,100>>).
<<"hello world">>
ok

Creating a Stream Socket


To create a stream socket, we use the SOCK_STREAM type (or 1) for the second value passed to socket/3. The socket arguments can be either integers or atoms; for variety, atoms are used here.

After the socket is bound, we mark the socket as listening and poll it (rather inefficiently) for connections. When a new connection is received, it is accepted, the file descriptor for the new connection is returned and a process is spawned to handle the connection.

On the client side, after obtaining a stream socket, we do connect the socket and so do not need to explicitly bind it.


Running the same steps for the client and server as above:

1> unix_stream:server().
<<"hello world">>
** client disconnected

1> unix_stream:client(<<104,101,108,108,111,32,119,111,114,108,100>>).
<<"hello world">>
ok

Thursday, June 24, 2010

Fun with Raw Sockets in Erlang: A Spoofing DNS Proxy

UDP Header

UDP headers are specified as:
  • Source Port:16
  • Destination Port:16
  • Length:16
  • Checksum:16

  • The Source Port is a 2 byte value representing the originating port
  • The Destination Port is the 2 byte value specifying the target port
  • The Length is the size of the UDP header and packet in bytes. The size of the UDP header is 8 bytes.
  • The Checksum algorithm is the same as for TCP, involving the creation of an IP pseuduoheader.
    If the length of the UDP packet is an odd number of bytes, the packet is zero padded with an additional byte only for checksumming purposes.
The equivalent UDP header in Erlang is:
<<SourcePort:16,
DestinationPort:16,
Length:16,
Checksum:16>>
The pseudo-header used for checksumming is:
  • Source Address:32
  • Destination Address:32
  • Zero:8
  • Protocol:8
  • UDP Packet Length:16
  • UDP Header and Payload1:8
  • ...
  • UDP Header and PayloadN:8
  • optional padding:8
  • The Source Address from the IP header
  • The Destination Address from the IP header
  • The Protocol for UDP is 17
  • The UDP Packet Length in bytes, for both the UDP header and payload. The length of the IP pseudo-header is not included.
  • The UDP Header and Payload is the full UDP packet
  • If the UDP packet length is odd, an additional zero'ed byte is included for checksumming purposes. The extra byte is not used in the length computation or sent with the packet.
  • The length of the UDP packet is included in both the IP pseudo-header and the UDP header.
  • The checksum field of the UDP header is set to 0.
In Erlang, the pseudo-header is represented as:
<<SA1:8,SA2:8,SA3:8,SA4:8,  % bytes representing the IP source address
DA1:8,DA2:8,DA3:8,DA4:8,    % bytes representing the IP destination address
0:8,                        % Zero
6:8,                        % Protocol: TCP
UDPlen:16                % UDP packet size in bytes

SourcePort:16,
DestinationPort:16,
UDPlen:16,
0:16,
Data/binary,
0:UDPpad>>                  % UDPpad may be 0 or 8 bits

A Spoofing DNS Proxy

What?

spood is a spoofing DNS proxy with a vaguely obscene name. spood works by accepting DNS queries on localhost and then spoofing the source IP address of the DNS request using the Linux PF_PACKET interface. DNS replies are sniffed off the network and returned to localhost.

Why?

Maybe for using with IP over DNS tunnels?

How?

spood works with procket and epcap.

This Will Probably Be the First Page Returned For Searches For "Erlang Promiscuity"

While spood can run by spoofing its own IP address, it's more fun running it on a hubbed or public wireless network. To allow spood to sniff the network, I added support for setsockopt() to procket as an additional NIF. Promiscuous mode, under Linux, can be activated/deactivated globally by using an ioctl() or per application by using setsockopt(). To enable promiscuous mode, the application needs to call:
setsockopt(socket, SOL_PACKET, PACKET_ADD_MEMBERSHIP, (void *)&mreq, sizeof(mreq))
Though obtaining a PF_PACKET socket requires root privileges, performing the setsockopt() call on the socket does not. mreq is a struct packet_mreq:
struct packet_mreq {
    int            mr_ifindex;    /* interface index */
    unsigned short mr_type;       /* action */
    unsigned short mr_alen;       /* address length */
    unsigned char  mr_address[8]; /* physical layer address */
};
  • mr_ifindex is the interface index returned by doing an ioctl() in host endian format
  • mr_type is set to PACKET_MR_PROMISC in host endian format
  • the reminder of the struct is zero'ed
The Erlang version looks like:
-define(SOL_PACKET, 263).
-define(PACKET_ADD_MEMBERSHIP, 1).
-define(PACKET_DROP_MEMBERSHIP, 2).
-define(PACKET_MR_PROMISC, 1).

promiscuous(Socket, Ifindex) ->
    procket:setsockopt(Socket, ?SOL_PACKET, ?PACKET_ADD_MEMBERSHIP, <<
        Ifindex:32/native,              % mr_ifindex: interface index
        ?PACKET_MR_PROMISC:16/native,   % mr_type: action
        0:16,                           % mr_alen: address length
        0:64                            % mr_address[8]:  physical layer address
        >>).

Sniffing Packets

Sniffing packets involves running procket:recvfrom/2 in a loop. Erlang's pattern matching makes filtering the packets simple. One trick is retrieving the default nameservers in Erlang.
{ok, PL} = inet_parse:resolv(
    proplists:get_value(resolv_conf, inet_db:get_rc(), "/etc/resolv.conf")),
NS = proplists:get_value(nameserver, PL).
inet_db:get_rc() will return the path to the system resolv.conf file. inet_parse has an undocumented function to parse resolv.conf and return the attributes as list of key/value pairs.

Spoofing Packets

Spoofing packets is done by constructing a packet consisting of the Ethernet, IP and UDP header and payload.
dns_query(SourcePort, Data, #state{
    shost = {SM1,SM2,SM3,SM4,SM5,SM6},
    dhost = {DM1,DM2,DM3,DM4,DM5,DM6},
    saddr = {SA1,SA2,SA3,SA4},
    daddr = {DA1,DA2,DA3,DA4}
    }) ->

    Id = 1,
    TTL = 64,

    UDPlen = 8 + byte_size(Data),
    IPlen = 20 + UDPlen,

    IPsum = epcap_net:makesum(
        <<
        % IPv4 header
        4:4, 5:4, 0:8, IPlen:16,
        Id:16, 0:1, 1:1, 0:1,
        0:13, TTL:8, 17:8, 0:16,
        SA1:8, SA2:8, SA3:8, SA4:8,
        DA1:8, DA2:8, DA3:8, DA4:8
        >>
    ),

    UDPpad = case UDPlen rem 2 of
        0 -> 0;
        1 -> 8
    end,

    UDPsum = epcap_net:makesum(
        <<
        SA1:8,SA2:8,SA3:8,SA4:8,
        DA1:8,DA2:8,DA3:8,DA4:8,
        0:8,
        17:8,
        UDPlen:16,

        SourcePort:16,
        53:16,
        UDPlen:16,
        0:16,
        Data/binary,
        0:UDPpad
        >>),

    <<
    % Ethernet header
    DM1:8,DM2:8,DM3:8,DM4:8,DM5:8,DM6:8,
    SM1:8,SM2:8,SM3:8,SM4:8,SM5:8,SM6:8,
    16#08, 16#00,

    % IPv4 header
    4:4, 5:4, 0:8, IPlen:16,
    Id:16, 0:1, 1:1, 0:1,
    0:13, TTL:8, 17:8, IPsum:16,
    SA1:8, SA2:8, SA3:8, SA4:8,
    DA1:8, DA2:8, DA3:8, DA4:8,

    % UDP header
    SourcePort:16,
    53:16,
    UDPlen:16,
    UDPsum:16,
    Data/binary
    >>.

Running spood

Setup isn't all automatic yet (but see the README, maybe this has changed). After everything is compiled, find the MAC and IP address of your client and name server. Then run:
erl -pa ebin deps/*/ebin
1> spood:start("eth0",
    {{16#00,16#aa,16#bb,16#cc,16#dd,16#ee}, {list, [{192,168,100,100}, {192,168,100,101}]}},
    {{16#00,16#11,16#22,16#33,16#44,16#55}, {192,168,100,1}}).
Where:
  • The first argument is your interface device name
  • The second argument is a 2-tuple composed of your source MAC address and a representation of what should be used for your client IP address. Unless you're ARP spoofing or have published the ARP entries yourself, the IP's should be of clients on the network.

    The second argument can be a tuple or a string representing an IP or a tuple consisting of the keyword "list" followed a list of IP addresses. The source IP for each query will be randomly chosen from the list.

  • The third argument is the name server MAC and IP address
Then test it:
$ nslookup
> server 127.0.0.1
Default server: 127.0.0.1
Address: 127.0.0.1#53
> www.google.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
www.google.com  canonical name = www.l.google.com.
Name:   www.l.google.com
Address: 173.194.33.104
If you happen to be running the sods client, you can use the DNS proxy by using the "-r" option:
sdt -r 127.0.0.1 sshdns.s.example.com
Update: Well, I've tested spood in the wild now and made a few changes. By default, spood will discover the IP addresses on your network and add them to the list of source addresses to spoof. spood now takes a proplist as an argument. However, if no argument is passed, spood will try to figure out your network by:
  • guessing which interface device to use
  • finding the MAC and IP address assigned to the device
  • looking up the MAC address of the name server in the ARP cache
The arguments to spood:start/1 is a proplist consisting of:
  • {dev, string() | undefined}
  • {srcmac, tuple()}
  • {dstmac, tuple()}
  • {saddr, tuple() | string() | discover | {discover, list()} | {list, list()}}
  • {nameserver, tuple() | undefined}

Or call spood:start() to use the defaults.

Monday, June 14, 2010

Fun with Raw Sockets in Erlang: Sending ICMP Packets

I've covered pinging other hosts before using the IPPROTO_ICMP protocol and raw sockets, all in Erlang.

Now I'd like to go through the exercise of sending ICMP echo packets using the Linux PF_PACKET interface. The process is somewhat tedious and complicated, so this post will reflect this, but it should be helpful since documentation for PF_PACKET tends to be a bit sparse.

Even if you're only interested in the PF_PACKET C interface, this tutorial should be helpful. But you'll have to read a bit and mentally censor the Erlang bits.

Erlang Binaries vs C structs

To provide a direct interface to sendto(), I've added an NIF interface for Erlang in procket.

The procket sendto() interface is system dependent, relying on the layout of your computer's struct sockaddr. struct sockaddr is typically constructed as follows:
struct sockaddr {
    sa_family_t     sa_family;      /* unsigned short int */
    char            sa_data[14];    /* buffer holding data, dependent on socket type */
}
The data held in the sa_data member of struct sockaddr varies based on the different socket types. For example, for a typical internet socket, a struct sockaddr_in socket address is used:
struct sockaddr_in {
    sa_family_t     sin_family;
    in_port_t       sin_port;
    struct          in_addr sin_addr;
    char            sin_zero[8];
}
Of course, these structures will vary by platform. On Linux, aside from a few inscrutable macros, the layout is similar to those shown above. BSD's, such as Mac OS X, add another structure member with the size of the structure:
u_int8_t sin_len
The appearance and placement of this attribute will cause a lot of portability problems for you if you need to get code running on different OS'es. And it's only natural, since in this tutorial we are bypassing the normal library interfaces.

So, just remember, PF_PACKET is pretty much a Linux specific interface, so we will be concentrating on the Linux eccentricities.

An Erlang Interface to sendto()


According the man page, sendto() takes the following arguments:
ssize_t sendto(
        int s,
        const void *buf,
        size_t len,
        int flags,
        const struct sockaddr *to,
        socklen_t tolen
        )
  • s is the file descriptor, representing the socket returned by open().
  • buf is the payload to be sent in the packet.
  • len is the size of the buffer in bytes.
  • flags is the result of OR'ing together integers which affects the behaviour of the socket. Typically, flags is set to 0.
  • struct sockaddr is a buffer based on the type of socket. It is cast to the "generic" sockaddr structure. Different types of socket addresses are, for example, sockaddr_in for Internet sockets, sockaddr_un for Unix (local) sockets and sockaddr_ll for link layer sockets. We'll be looking at sockaddr_in sockets in this section and sockaddr_ll sockets when investigating sending out packets using the PF_PACKET raw socket interface later on.
It's worth noting that
sendto(socket, buf, buflen, flags, NULL, 0)
is equivalent to
send()
and
sendto(socket, buf, buflen, 0, NULL, 0)
is equivelent to
write()
With a bit of tweaking (may have to change the procket NIF a bit), we'll be able to use the sendto() to do both send()'s and write()'s in the future (both can be used when the socket has been already been bound using bind()). The procket Erlang sendto/4 interface looks like this:
sendto(Socket, Packet, Flags, Sockaddr)
Where:
  • Socket is an integer returned from procket:open/1 representing the file descriptor.
  • Packet is a binary holding the packet payload.
  • Flags is the result of OR'ing the socket options. See the sendto() man page for the possible parameters.
  • Sockaddr is an Erlang binary representation of the sockaddr structure for the type of socket in use.

An Example of Using sendto/4

In the original example of sending an ICMP echo packet from Erlang, we (mis-)used gen_udp to send and receive ICMP packets. Here is an example of sending ICMP packets using the sendto/4 NIF: To send the ICMP packet using sendto/4, we must create the struct sockaddr_in as an Erlang binary. In linux/in.h, the structure is defined as:
struct sockaddr_in {
    sa_family_t     sin_family; /* Address family: 2 bytes */
    in_port_t       sin_port;   /* Port number: 2 bytes */
    struct in_addr  sin_addr;   /* Internet address: 4 bytes */

    /* Pad to size of `struct sockaddr'. */
    unsigned char   sin_zero[8];
};
Both sa_family_t and in_port_t are 2 bytes. The total size of the struct is 16 bytes. The Erlang binary used to represent this is:
<<
?PF_INET:16/native,             % sin_family
0:16,                           % sin_port
IP1:8, IP2:8, IP3:8, IP4:8,     % sin_addr
0:64                            % sin_zero
>>
  • The value of the PF_INET macro (or 2) is taken from bits/socket.h. The value of the different PF_* macros is always in native endian format.
  • Since we are sending an ICMP packet, the port has no meaning and is set to 0.
  • IP1 through IP4 refer to the components of an IPv4 address, represented in Erlang as a 4-tuple of bytes such as {192,168,10,1}.
  • The sin_zero member is always set to 8 zero'ed bytes.
The corresponding NIF function can be found in procket.c:
static ERL_NIF_TERM
nif_sendto(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[])
{
    int sockfd = -1;
    int flags = 0;

    ErlNifBinary buf;
    ErlNifBinary sa;


    if (!enif_get_int(env, argv[0], &sockfd))
        return enif_make_badarg(env);

    if (!enif_inspect_binary(env, argv[1], &buf))
        return enif_make_badarg(env);

    if (!enif_get_int(env, argv[2], &flags))
        return enif_make_badarg(env);

    if (!enif_inspect_binary(env, argv[3], &sa))
        return enif_make_badarg(env);

    if (sendto(sockfd, buf.data, buf.size, flags, (struct sockaddr *)sa.data, sa.size) == -1)
        return enif_make_tuple(env, 2,
            atom_error,
            enif_make_tuple(env, 2,
            enif_make_int(env, errno),
            enif_make_string(env, strerror(errno), ERL_NIF_LATIN1)));

    return atom_ok;
}
The nif_sendto() function takes the Erlang binary and casts it to a sockaddr structure.

Tedium, or the Perils of Constructing Packets by Hand

When requesting a file descriptor using socket(), the PF_PACKET interface allows the user to construct either whole ethernet frames (using the SOCK_RAW type) or cooked packets to which the kernel will prepend ethernet headers (using the SOCK_DGRAM type). I had some problems with SOCK_DGRAM packets which I'll probably talk about in another blog post. But for now, I'll describe how to create ICMP echo packets using the PF_PACKET SOCK_RAW type. To get a file descriptor with the appropriate settings from procket:
{ok, FD} = procket:listen(0, [{protocol, 16#0008}, {type, raw}, {family, packet}])
Notice that, since I'm on a little endian platform, I byte swapped the defintion of ETH_P_IP to big endian format.

Retrieving the Interface Index

To figure out the index of our interface, we need to call an ioctl(). Conveniently, procket provides an NIF ioctl() interface. The C ioctl() interface is defined as:
int ioctl(int d, int request, ...);
  • d is the file descriptor.
  • request is an integer representing a device dependent instruction.
  • The remaining argument to ioctl() is usually a buffer holding a device dependent structure. In this case, we will pass an ifreq structure. The buffer acts as both the input and output for the ioctl.
struct ifreq, as defined in net/if.h, is composed of 2 unions.
struct ifreq
{
# define IFHWADDRLEN    6
# define IFNAMSIZ   IF_NAMESIZE
    union
    {
        char ifrn_name[IFNAMSIZ];   /* Interface name, e.g. "en0".  */
    } ifr_ifrn;

    union
    {
        struct sockaddr ifru_addr;
        struct sockaddr ifru_dstaddr;
        struct sockaddr ifru_broadaddr;
        struct sockaddr ifru_netmask;
        struct sockaddr ifru_hwaddr;
        short int ifru_flags;
        int ifru_ivalue;
        int ifru_mtu;
        struct ifmap ifru_map;
        char ifru_slave[IFNAMSIZ];  /* Just fits the size */
        char ifru_newname[IFNAMSIZ];
        __caddr_t ifru_data;
    } ifr_ifru;
};
But to get the interface index, we're only interested in these struct members:
struct ifreq {
    char ifrn_name[16];
    int  ifr_ifindex;
}
# define ifr_name   ifr_ifrn.ifrn_name  /* interface name   */
# define ifr_ifindex    ifr_ifru.ifru_ivalue    /* interface index      */
In Erlang terms, we can pass in the full 32 byte structure (only 4 bytes of the second union is actually used). On input, if we are interested in using the "eth0" interface:
<<
"eth0", 96:0,   % ifrn_name, 16 bytes
0:128           % ifr_ifru union for the response, 16 bytes
>>
On output:
<<
"eth0", 96:0,   % ifrn_name, 16 bytes
Ifr:32,         % interface index
0:96            % unused
>>
So, to retrieve the value in Erlang:
{ok, <<_Ifname:16/bytes, Ifr:32, _/binary>>} = procket:ioctl(S,
    ?SIOCGIFINDEX,
    list_to_binary([
            Dev, <<0:((16*8) - (length(Dev)*8)), 0:128>>
        ])),
  • Dev is a list holding the device name, such as "eth0" or "ath0".
  • Ifr is the part of the binary holding the interface index returned by the ioctl().
The corresponding NIF function can be found in procket.c:
static ERL_NIF_TERM
nif_ioctl(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[])
{
    int s = -1;
    int req = 0;
    ErlNifBinary ifr;


    if (!enif_get_int(env, argv[0], &s))
        return enif_make_badarg(env);

    if (!enif_get_int(env, argv[1], &req))
        return enif_make_badarg(env);

    if (!enif_inspect_binary(env, argv[2], &ifr))
        return enif_make_badarg(env);

    if (!enif_realloc_binary(env, &ifr, ifr.size))
        return enif_make_badarg(env);

    if (ioctl(s, req, ifr.data) < 0)
        return error_tuple(env, strerror(errno));

    return enif_make_tuple(env, 2,
            atom_ok,
            enif_make_binary(env, &ifr));
}
The nif_ioctl() function takes, as arguments, the socket descriptor and a binary buffer representing the ifreq structure. The binary is made writable, passed to ioctl() and returned to the caller.

Preparing the ICMP Packet

Unlike the other examples of sending an ICMP packet, we'll need to prepare more than the ICMP header and payload. Because we are sending directly out on the interface, we have to add the ethernet and IPv4 header.

Ethernet Header

The ethernet header is composed of 6 bytes each for the destination and source MAC addresses and two bytes for the ethernet type.
  • Destination MAC Address:48
  • Source MAC Address:48
  • Type:16
The list of ethernet types can be found in linux/if_ether.h. The Erlang specification for this message format would be (assuming the destination mac address is 00:aa:bb:cc:dd:ee and the source mac address is 00:11:22:33:44:55):
<<
16#00, 16#aa: 16#bb, 16#cc, 16#dd, 16#ee,   % destination MAC address
16#00, 16#11: 16#22, 16#33, 16#44, 16#55,   % source MAC address
16#08, 16#00                                % type: ETH_P_IP
>>

IPv4 Header

The IPv4 header is:
  • Version:4
  • IHL:4
  • ToS:8
  • Total Length:16
  • Identification:16
  • Flags:3
  • Fragment Offset:13
  • Time to Live:8
  • Protocol:8
  • Checksum:16
  • Source Address:32
  • Destination Address:32
I won't bother to explain each field. See RFC 791 for details. Constructing an Erlang IPv4 header involves declaring the header once with the checksum field set to zero, performing a checksum on the header, then incorporating the checksum in the 2 byte checksum field.
IPv4 = <<
4:4, 5:4, 0:8, 84:16,
Id:16, 0:1, 1:1, 0:1,
0:13, TTL:8, ?IPPROTO_ICMP:8, 0:16,
SA1:8, SA2:8, SA3:8, SA4:8,
DA1:8, DA2:8, DA3:8, DA4:8
>>.
  • Id is a hint for reconstructing fragmented packets by the receiving host.
  • IPPROTO_ICMP is a macro set to 1. The value is defined in netinet/in.h.
  • The checksum field is set to 0 for checksumming purposes. After the checksum has been calculated, the resulting value is placed in this field.
  • The TTL is set to 64. Packets with a time to live of 0 are discarded.
  • SA1 to SA4 are the bytes representing the IPv4 source address.
  • DA1 to DA4 are the bytes representing the IPv4 destination address.

ICMP Header

I won't go over constructing the ICMP header, since it's been covered here.

Finally Sending the Packet

We have a raw PF_PACKET socket, the index of the interface to use the sendto() operation and a binary representing the ICMP packet and payload. We have the pieces in place now to send out the ping. We could bind() the interface and then use write() or send() to push out packets. In this example, we'll specify the link layer socket address structure holding the routing information for each packet.
struct sockaddr_ll {
    unsigned short sll_family;   /* Always AF_PACKET */
    unsigned short sll_protocol; /* Physical layer protocol */
    int            sll_ifindex;  /* Interface number */
    unsigned short sll_hatype;   /* Header type */
    unsigned char  sll_pkttype;  /* Packet type */
    unsigned char  sll_halen;    /* Length of address */
    unsigned char  sll_addr[8];  /* Physical layer address */
};
  • sll_family is, as the comment says, always PF_PACKET in host endian format.
  • sll_protocol is usually either ETH_P_ALL or ETH_P_IP. It is passed in big endian format but is defined in the header file in host endian format. For many linux installs, this will be little endian, so it will need to be byte swapped.
  • sll_halen is the length of the physical layer address. Although there are up to 8 bytes allowed for for the physical layer address, only 6 bytes are used for ethernet.
<<
?PF_PACKET:16/native,   % sll_family: PF_PACKET
16#0:16,             % sll_protocol: Physical layer protocol, big endian
Interface:32/native,    % sll_ifindex: Interface number
0:16,                   % sll_hatype: Header type
0:8,                    % sll_pkttype: Packet type
0:8,                    % sll_halen: address length

0:8,                    % sll_addr[8]: physical layer address
0:8,                    % sll_addr[8]: physical layer address
0:8,                    % sll_addr[8]: physical layer address
0:8,                    % sll_addr[8]: physical layer address
0:8,                    % sll_addr[8]: physical layer address
0:8,                    % sll_addr[8]: physical layer address

0:8,                    % sll_addr[8]: physical layer address
0:8                     % sll_addr[8]: physical layer address
>>
From trial and error, only sll_ifindex needs to be set. Even the sll_family does not seem be required in this context, although the man page suggests it is required. (sll_halen and sll_addr values would otherwise be set to 6 for sll_halen and the first 6 bytes of sll_addr to the MAC address of the destination ethernet device.) The source and destination appear to be read directly from the ethernet header. The pkt module will construct an ethernet frame and send it on the network. The function interface is a bit cumbersome, forcing you to specify the MAC and IP address of both the source and destination, but allows spoofing packets from different IP/MAC combinations.
pkt:ping(
    {"eth0", {16#00,16#11,16#22,16#33,16#44,16#55}, {192,168,213,213}},
    {{16#00,16#aa,16#bb,16#cc,16#dd,16#ee}, {192,168,213,1}}
).
The first argument is a 3-tuple representing the network interface, source MAC and IP address. The second argument is a 2-tuple representing the destination MAC and IP address. Looking at the output from tcpdump:
# tcpdump -n -s 0 -XX -i ath0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ath0, link-type EN10MB (Ethernet), capture size 65535 bytes
18:58:40.077600 IP 192.168.213.213 > 192.168.213.1: ICMP echo request, id 7338, seq 0, length 64
        0x0000:  0011 2233 4455 00aa bbcc ddee 0800 4500  ....>....Y.&..E.
        0x0010:  0054 1caa 4000 4001 f1d6 c0a8 d5d5 c0a8  .T..@.@.........
        0x0020:  d501 0800 ea06 1caa 0000 0000 04fc 0007  ................
        0x0030:  2ba0 0001 2e02 2021 2223 2425 2627 2829  +......!"#$%&'()
        0x0040:  2a2b 2c2d 2e2f 3031 3233 3435 3637 3839  *+,-./0123456789
        0x0050:  3a3b 3c3d 3e3f 4041 4243 4445 4647 4849  :;<=>?@ABCDEFGHI
        0x0060:  4a4b                                     JK
18:58:40.078464 IP 192.168.213.1 > 192.168.213.213: ICMP echo reply, id 7338, seq 0, length 64
        0x0000:  00aa bbcc ddee 0011 2233 4455 0800 4500  ...Y.&....>...E.
        0x0010:  0054 86ee 0000 4001 c792 c0a8 d501 c0a8  .T....@.........
        0x0020:  d5d5 0000 f206 1caa 0000 0000 04fc 0007  ................
        0x0030:  2ba0 0001 2e02 2021 2223 2425 2627 2829  +......!"#$%&'()
        0x0040:  2a2b 2c2d 2e2f 3031 3233 3435 3637 3839  *+,-./0123456789
        0x0050:  3a3b 3c3d 3e3f 4041 4243 4445 4647 4849  :;<=>?@ABCDEFGHI
        0x0060:  4a4b                                     JK

Saturday, May 29, 2010

Raw Socket Programming in Erlang: Reading Packets Using PF_PACKET

BSD has BPF, Solaris has DLPI and Linux, well, has had many interfaces. The latest uses a linux specific protocol family, PF_PACKET. PF_PACKET can receive whole packets from the network as well as generate them, like a combination of BPF and the BSD raw socket interface.

PCAP is an abstraction over these different interfaces. epcap uses a system process linked to the PCAP library to read ethernet frames and send them as messages into Erlang using the port interface. Using procket with the PF_PACKET socket option, I've been playing with reading packets directly off the network and generating them as well.

The PF_PACKET interface is used by passing options to socket().
int socket(int domain, int type, int protocol);
  • The protocol family is, of course, PF_PACKET.
  • The type may be either SOCK_RAW or SOCK_DGRAM. SOCK_RAW will return the whole packet, including the ethernet header. A process sending a packet must prepend a link layer header. A socket with type SOCK_DGRAM will strip off the link layer header and generate a valid header for outgoing packets.
  • The protocol is selected from one of the values in linux/if_ether.h.
    #define ETH_P_IP    0x0800
    #define ETH_P_ALL   0x0003
    
    ETH_P_ALL will retrieve all network packets and ETH_P_IP just the IP packets. The values are in host-endian format and will need to be converted to network byte order before being used as arguments to socket().

Receving Packets using recvfrom()

To send and receive packets from a socket using PF_PACKET, the normal connection-less socket operations are used: sendto() and recvfrom(). By default, socket operations will block, unless the O_NONBLOCK flag is set using fcntl(). The gen_udp module in Erlang internally calls recvfrom(), so it can deal with raw sockets. Another example of using gen_udp in this way is for sending ICMP packets and reading the ICMP ECHO replies. Alternatively, I've added a recvfrom/2 function to the procket NIF for testing.

Sniffing Packets in Erlang

To read packets from the network device, either gen_udp or the NIF recvfrom/2 can be used. Using gen_udp:
-module(packet).
-export([sniff/0]).

-define(ETH_P_IP, 16#0008).
-define(ETH_P_ALL, 16#0300).

sniff() ->
    {ok, S} = procket:listen(0, [
            {protocol, ?ETH_P_ALL},
            {type, raw},
            {family, packet}
        ]),
    {ok, S1} = gen_udp:open(0, [binary, {fd, S}, {active, false}]),
    loop(S1).

loop(S) ->
    Data = gen_udp:recv(S, 2048),
    error_logger:info_report([{data, Data}]),
    loop(State).
The definitions of ETH_P_IP and ETH_P_ALL are in big endian format. The port is irrelevant and is set to 0 in procket:listen/2. The type is raw but can also be set to dgram. Using the NIF, the process must poll the socket. procket:recvfrom/2 will return the atom nodata if the socket returns EAGAIN; the tuple {ok, binary} with the binary data representing the packet or a tuple holding the value of errno, e.g., {error, {errno, strerror(errno)}}. The return values will probably change in the future.
sniff() ->
    {ok, S} = procket:listen(0, [
            {protocol, ?ETH_P_ALL},
            {type, raw},
            {family, packet}
        ]),
    loop(S).

loop(S) ->
    case procket:recvfrom(S, 2048) of
        nodata ->
            timer:sleep(1000),
            loop(S);
        {ok, Data} ->
            error_logger:info_report([{data, Data}]),
            loop(S);
        Error ->
            error_logger:error_report(Error)
   end.

Monday, May 24, 2010

ICMP Ping in Erlang

(Also see ICMP Ping in Erlang, part 2)

ICMP ECHO Packet Structure


RFC 792 describes an ICMP ECHO packet as:
  • Type:8
  • Code:8
  • Checksum:16
  • Identifier:16
  • Sequence Number:16
  • Data1:8
  • ...
  • DataN:8

The number after the colon represents the number of bits in the field.
  • The type field for ICMP ECHO is set to 8. The response (ICMP ECHO REPLY) has a value of 0.
  • The code is 0.
  • The checksum is a one's complement checksum that covers both the ICMP header and the data portion of the packet. An Erlang version looks like:
    makesum(Hdr) -> 16#FFFF - checksum(Hdr).
    
    checksum(Hdr) ->
        lists:foldl(fun compl/2, 0, [ W || <<W:16>> <= Hdr ]).
    
    compl(N) when N =< 16#FFFF -> N;
    compl(N) -> (N band 16#FFFF) + (N bsr 16).
    compl(N,S) -> compl(N+S).
    
  • The identifier and sequence number allow clients on a host to differentiate their packets, for example, if multiple ping's are running. The client will usually increment the sequence number for each ICMP ECHO packet sent.
  • Data is the payload. Traditionally, it holds a struct timeval so the client can calculate the delay without having to maintain state, but any value can be used, such as the output of erlang:now/0. The remainder is padded with ASCII characters.
The description of an ICMP packet in Erlang is very close to the specification. For ICMP ECHO:
<<8:8, 0:8, Checksum:16, Id:16, Sequence:16, Payload/binary>>
The ICMP ECHO reply is the same packet returned, with the type field set to 0 and an updated checksum:
<<0:8, 0:8, Checksum:16, Id:16, Sequence:16, Payload/binary>>

Opening a Socket

Sending out ICMP packets requires opening a raw socket. Aside from the issues of having the appropriate privileges, Erlang does not have native support for handling raw sockets. I used procket to handle the privileged socket operations and pass the file descriptor into Erlang. Once the socket is returned to Erlang, we can perform operations on it as an unprivileged user. Since there isn't a gen_icmp module, we need some way of calling sendto()/recvfrom() on the socket. gen_udp uses sendto(), so we can misuse it (with some quirks) for our icmp packets.
% Get an ICMP raw socket
{ok, FD} = procket:listen(0, [{protocol, icmp}]),
% Use the file descriptor to create an Erlang socket structure
{ok, S} = gen_udp:open(0, [binary, {fd, FD}]),
The port is meaningless, so 0 is passed in as an argument. We create the packet payload twice: first with a zero'ed checksum, then with the results of the checksum.
make_packet(Id, Seq) ->
    {Mega,Sec,USec} = erlang:now(),
    Payload = list_to_binary(lists:seq(32, 75)),
    CS = makesum(<<?ICMP_ECHO:8, 0:8, 0:16, Id:16, Seq:16, Mega:32, Sec:32, USec:32, Payload/binary>>),
    <<
        8:8,    % Type
        0:8,    % Code
        CS:16,  % Checksum
        Id:16,  % Id
        Seq:16, % Sequence
        Mega:32, Sec:32, USec:32,   % Payload: time
        Payload/binary
    >>.
The packet can be sent via the raw socket using gen_udp:send/4, with the port again set to 0.
ok = gen_udp:send(S, IP, 0, Packet)
Since we're abusing gen_udp, we can wait for a message to be sent to the process:
receive
    {udp, S, _IP, _Port, <<_:20/bytes, Data/binary>>} ->
        {ICMP, <<Mega:32/integer, Sec:32/integer, Micro:32/integer, Payload/binary>>} = icmp(Data),
        error_logger:info_report([
            {type, ICMP#icmp.type},
            {code, ICMP#icmp.code},
            {checksum, ICMP#icmp.checksum},
            {id, ICMP#icmp.id},
            {sequence, ICMP#icmp.sequence},
            {payload, Payload},
            {time, timer:now_diff(erlang:now(), {Mega, Sec, Micro})}
        ]),
after
    5000 ->
        error_logger:error_report([{noresponse, Packet}])
end
In the above code snippet, you may have noticed the first 20 bytes of the payload is stripped off. Comparing the ICMP packet we sent and the response handed to the process by gen_udp:
icmp: <<8,0,186,30,80,228,0,0,0,0,4,250,0,12,16,77,0,1,69,0,32,33,34,35,36,
            37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,
            59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75>>
    response: <<69,0,0,84,101,155,64,0,64,1,154,44,192,168,220,187,192,168,220,
                212,0,0,194,30,80,228,0,0,0,0,4,250,0,12,16,77,0,1,69,0,32,33,
                34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,
                55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75>>
While the process sent a 64 byte ICMP packet, gen_udp hands it an 84 byte packet which includes the 20 byte IPv4 header. An example of an Erlang ping is included with procket on github. The example will just print out the packets using error_logger:info_report/1:
1> icmp:ping("192.168.213.1").

=INFO REPORT==== 24-May-2010::16:21:37 ===
    type: 0
    code: 0
    checksum: 52034
    id: 14837
    sequence: 0
    payload: <<" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK">>
    time: 16790

Monday, January 11, 2010

Continuing the procket clean up

Some small changes to procket. If you're playing along at home, the changes don't affect the procket:listen/1 and procket:listen/2 interface.

The procket NIF now requires the elements to be acted on (the path to the Unix socket and/or the file descriptor of the listener on the Unix socket) to be explicitly passed. The Erlang code must track the path to the Unix socket, the fd listening on the Unix socket and the privileged socket returned from procket_cmd.

Since the procket NIF does not internally track any state or allocate memory, it can handle multiple file desciptors up to the process' resource limits.

The changes from the commit message:

open/2 -> open/1 : vestigal protocol arg removed, stick to streams. Erlang module changed to match (along with the bizarre passing in of the port as a protocol, who did that? o_O)

Returns the socket descriptor listening on the Unix socket: {ok, FD}

poll/0 -> poll/1 : takes the socket descriptor, returns {ok, Socket}

close/0 -> close/2 : close(SocketPath, SocketDescriptor), closes the socket descriptor and deletes the socket path, returns "ok"