Pages

Saturday, April 23, 2011

BPF and Erlang

An Erlang Interface to the Berkeley Packet Filter

The man page about bpf says:

The Berkeley Packet Filter provides a raw interface to data link layers
in a protocol independent fashion.  All packets on the network, even
those destined for other hosts, are accessible through this mechanism.

On BSD systems, bpf can be used for capturing and generating raw network frames. We can use bpf to send and receive network packets from Erlang code in a similar way to using the PF_PACKET socket interface on Linux.

All of the code presented here was run on Mac OS X (Snow Leopard). Hopefully the code is portable to all BSD operating systems. If it isn't, let me know in the comments or email me.

Pre-Requisites for Running the Code Examples

For the Erlang examples below, you'll need a fairly recent version of Erlang and a few libraries. You can download the Erlang source code or check it out from github:

git clone git://github.com/erlang/otp.git

procket is an NIF used to extend the Erlang runtime to make the various system calls. You can check it out here:

git clone git://github.com/msantos/procket.git

See the README, there is a bit of work involving setting up sudo to run a helper app.

Finally, the examples use another small library (pkt) to parse the packets using Erlang binaries and convert packets from Erlang records to binaries. It's available here:

git clone git://github.com/msantos/pkt.git

To compile the example Erlang modules:

erlc -I /path/to/procket/include -I /path/to/pkt/include annoy.erl

The BPF C Interface

The bpf character device is used to transfer raw network frames between user space and the kernel. On Mac OS X, there are a fixed number of available devices, each acting as a communication path for a single process.

Using bpf works as follows:

  • Starting from 0, open the bpf devices. If the the open() system call returns failure (-1) and errno is set to EBUSY, try the next character device, e.g., /dev/bpf1.

    Typical values for errno might be:

    • EPERM: the process does not have permission to access the character device. Either the process can temporarily be given superuser privileges or the permissions of the character device can be modified.

    Since bpf relies on the file permissions of the character device, the act of opening the device is the only operation that requires privileges. Other operations on the file descriptor do not require any special privileges.

    • ENOENT: the bpf device does not exist
  • The open() call returns a file descriptor.

  • ioctl() is used to associate the bpf device with an interface

  • ioctl() is used to retrieve the bpf buffer length

    The bpf device maintains a fixed buffer size. For efficiency, reads performed on the bpf device will block until either the buffer is full or a timeout is reached (by default, infinity). As a consequence, several packets may be returned by a single read.

  • Set some optional attributes that affects the behaviour of the bpf device.

    For example:

    • BIOCSHDRCMPLT: by default, the bpf device will construct a valid packet header for the underlying datalink type. Setting the "header complete" attribute allows the user to set the packet headers themselves.

    • BIOCSEESENT: the bpf device does not return packets sent from the host. This ioctl request can be used to change that behaviour.

    • BIOCIMMEDIATE: causes reads to immediately return after a packet is returned rather than buffering the packets

  • Apply filtering rules using BPF bytecode.

    bpf supports a set of instructions that allow the user to restrict which packets are returned by the device.

See this tutorial for a clear, concise example of capturing packets in C.

I've also put together some simple, runnable code. To compile it:

gcc -g -Wall -o bpf bpf.c

To keep the example from becoming too huge, not much is done except printing out the ethernet header. We'll cover more interesting ways of using bpf later.

The Erlang BPF Interface

To use bpf within Erlang, we'll need to be able to open the bpf device, perform the appropriate ioctl() operations, generate BPF filtering code and read and write from the device.

Opening the BPF Device

procket uses a setuid helper executable to open the bpf character device and pass the file descriptor back to Erlang:

{ok, Socket} = procket:dev("bpf").

Or using the bpf module:

{ok, Socket, Length} = bpf:open("en1").

Controlling the BPF Device

Once we have the file descriptor, we can set the device attributes by using the procket:ioctl/3 NIF.

The bpf header file defines a number of macros for calculating the correct ioctl request. Porting these macros to Erlang is straightforward.

According the to the man page on Mac OS X, the ioctl signature is defined as:

int ioctl(int fildes, unsigned long request, ...);

An ioctl request is an unsigned long, so the size of the command will either be 4 or 8 bytes, depending on whether the platform is 32 or 64-bit. However, the ioctl macros compute the request with the assumption that the command is a word (or 4 bytes): the lower half of the word holds the command and the top half has the length and the direction of the command.

(The number after the colon represents the number of bits in the field.)

  1. Copy argument into kernel:1
  2. Copy argument from kernel:1
  3. No arguments:1
  4. Parameter length:13
  5. Command group:8
  6. Command:8

The fields in order are:

  • IN: if set, the argument (the 3rd argument to ioctl) is read from the user space buffer

  • OUT: if set, the argument is written to the user space buffer

A command that is IN/OUT will have the contents of the buffer read by the kernel and written back.

  • VOID: no arguments are required by the ioctl request

  • Length: the size of the command in bytes

  • Group: the command group acts as namespace for organizing the ioctl requests

  • Command: 1 byte is reserved for the actual command

For example, the BIOCSHDRCMPLT macro in C is:

#define IOCPARM_MASK    0x1fff      /* parameter length, at most 13 bits */
#define _IOC(inout,group,num,len) \
    (inout | ((len & IOCPARM_MASK) << 16) | ((group) << 8) | (num))
#define IOC_IN      (__uint32_t)0x80000000
#define _IOW(g,n,t) _IOC(IOC_IN,    (g), (n), sizeof(t))

#define BIOCSHDRCMPLT   _IOW('B',117, u_int)

The corresponding macro defined in Erlang:

-define(SIZEOF_U_INT, 4).
-define(IOCPARM_MASK, 16#1fff).
-define(IOC_INOUT, ?IOC_IN bor ?IOC_OUT).

-define(BIOCSHDRCMPLT, bpf:iow($B, 117, ?SIZEOF_U_INT)).

ioc(Inout, Group, Num, Len) ->
    Inout bor ((Len band ?IOCPARM_MASK) bsl 16) bor (Group bsl 8) bor Num.

iow(G,N,T) ->
    ioc(?IOC_IN, G, N, T).

To set the "header complete" mode from within Erlang:

procket:ioctl(Socket, ?BIOCSHDRCMPLT, <<1:32/native>>).

Or using the bpf module:

bpf:ctl(Sockt, hdrcmplt, true).

BPF Filtering

bpf has a bytecode language to filter out unwanted packets. The bytecode is generated by a set of macros in the bpf header file. For convenience in porting examples to Erlang, I defined macros wrapping the Erlang functions.

A BPF filtering program consists of an 8 byte instruction:

struct bpf_insn {
    u_short code;
    u_char  jt;
    u_char  jf;
    bpf_u_int32 k;
};
  • code: 2 bytes

The opcodes are a set of instructions for moving within the packet, testing values and control flow. The opcodes are OR'ed together.

  • jt: 1 byte

jump true: if the operation evaluates as true (non-0), jump this many instructions. Instructions are numbered from 0 (the statement following the test is instruction 0).

  • jf: 1 byte

jump false: if the operation evaluates as false (0), jump this many instructions. Instructions use a 0 offset, starting with the following instruction.

  • k: 4 bytes (the man page incorrectly defines this field as a u_long)

A value whose usage depends on the opcode.

An Example in C

I'll illustrate how the filters work by using an example from the man page. This example filters out all packets except reverse proxy requests:

struct bpf_insn insns[] = {
    BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
    BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
    BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
        sizeof(struct ether_header)),
    BPF_STMT(BPF_RET+BPF_K, 0),
};
  • BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12)

    The BPF_STMT macro takes an opcode and a k value as arguments.

    • BPF_LD: load value (move to offset)

    • BPF_H: load a half word value (2 bytes)

    • BPF_ABS: use an absolute offset from the beginning of the packet

    • 12: move 12 bytes into the ethernet frame

    An ethernet frame looks like (the numbers are bytes):

    1. Destination MAC Address:6
    2. Source MAC Address:6
    3. Type:2

    A 12 byte offset leaves the program at the ethernet type.

  • BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3)

    The BPF_JUMP macro arguments are: opcode, k, jt, jf

    • BPF_JMP: A branching operation, depending on whether the test evaluates as true or not

    • BPF_JEQ: the equality of the value at this offset (defined in the previous instruction as a half-word) is tested against the value held in the k field.

      If the value is equal to ETHERTYPE_REVARP (0x8035), the packet is a reverse ARP packet and control drops to the next statement. If the statement is false (for example, it is an IP packet), control jumps to the final statement:

      BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
      0: BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
      1: BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
      2: BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
          sizeof(struct ether_header)),
      3: BPF_STMT(BPF_RET+BPF_K, 0),
      
  • BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20)

The packet is a reverse arp packet. Move to offset 20. A reverse ARP packet looks like (numbers are bytes):

  1. Hardware Type:2
  2. Protocol Type:2
  3. Hardware Length:1
  4. Protocol Length:1
  5. Operation:2
  6. Sending Hardware Address:6
  7. Sending IP Address:4
  8. Target Hardware Address:6
  9. Target IP Address:4

An offset of 20 (ethernet frame = 14, so 6 bytes into the ARP packet) puts the program at the ARP operation, a 2 byte (half-word) field.

  • BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1)

If the value of the offset is equal to REVARP_REQUEST (3) move to the next instruction.

Otherwise, jump 1 instruction to the final return statement:

    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
    0: BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
        sizeof(struct ether_header)),
    1: BPF_STMT(BPF_RET+BPF_K, 0),
  • BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + sizeof(struct ether_header))

BPF_RET: return the value in the k field. "k" number of bytes of the packet will be returned to the bpf device.

  • BPF_STMT(BPF_RET+BPF_K, 0)

0 bytes is returned to the bpf device. The packet is dropped.

An Example in Erlang

Here is another example from the bpf man page: sniffing finger requests. Yes, the bpf man page appears to have been written in a long ago age, when reverse ARP and finger requests roamed the networks.

First the C version:

struct bpf_insn insns[] = {
    BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
    BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
    BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
    BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
    BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
    BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
    BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
    BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
    BPF_STMT(BPF_RET+BPF_K, 0),
};

Now the Erlang version (with comments):

-include("bpf.hrl").

-define(ETHERTYPE_IP, 16#0800).
-define(IPPROTO_TCP, 6).

finger() ->
    [   
        % Ethernet
        ?BPF_STMT(?BPF_LD+?BPF_H+?BPF_ABS, 12),                     % offset = Ethernet Type
        ?BPF_JUMP(?BPF_JMP+?BPF_JEQ+?BPF_K, ?ETHERTYPE_IP, 0, 10),  % type = IP

        % IP
        ?BPF_STMT(?BPF_LD+?BPF_B+?BPF_ABS, 23),                     % offset = ip protocol
        ?BPF_JUMP(?BPF_JMP+?BPF_JEQ+?BPF_K, ?IPPROTO_TCP, 0, 8),    % protocol = TCP

        ?BPF_STMT(?BPF_LD+?BPF_H+?BPF_ABS, 20),                     % offset = flags, frag offset
        ?BPF_JUMP(?BPF_JMP+?BPF_JSET+?BPF_K, 16#1fff, 6, 0),        % frag offset: mask the top 3 bits
                                                                    %  and AND with 1's
                                                                    %  If any non-0 value is returned from the
                                                                    %  AND (i.e., frag offset is non-0), jump
                                                                    %  to the end and drop the packet

        ?BPF_STMT(?BPF_LDX+?BPF_B+?BPF_MSH, 14),                    % offset = IP version, IP header length
                                                                    %  Load the header length into the index
                                                                    %  register

        % TCP
        ?BPF_STMT(?BPF_LD+?BPF_H+?BPF_IND, 14),                     % offset = TCP source port
                                                                    %  Move from offset 14 (start of IP packet)
                                                                    %  plus the value held in the index register
                                                                    %  (IP header length). Puts us at the start
                                                                    %  of the TCP packet (at the source port)
        ?BPF_JUMP(?BPF_JMP+?BPF_JEQ+?BPF_K, 79, 2, 0),              % source port = 79
        ?BPF_STMT(?BPF_LD+?BPF_H+?BPF_IND, 16),                     % offset = destination port
        ?BPF_JUMP(?BPF_JMP+?BPF_JEQ+?BPF_K, 79, 0, 1),              % destination port = 79
        ?BPF_STMT(?BPF_RET+?BPF_K, 16#FFFFFFFF),                    % return: entire packet
        ?BPF_STMT(?BPF_RET+?BPF_K, 0)                               % return: drop packet
].

Note that this filter does not check if the packet is IPv4.

Loading the Filter

To load the filter, another ioctl (BIOCSETF) is called. The ioctl takes a structure with a length and a pointer to the instructions:

struct bpf_program {
    u_int bf_len;
    struct bpf_insn *bf_insns;
};

The length field is set to the number of instructions, not the size of the instructions. In the first example (the reverse ARP filter), the length is 6.

In Erlang, the filter is loaded using:

Insn = finger(),
{ok, Code, [Res]} = procket:alloc([
    <<(length(Insn)):4/native-unsigned-integer-unit:8>>,
    {ptr, list_to_binary(Insn)}
]),
case procket:ioctl(Socket, ?BIOCSETF, Code) of
    {ok, _} ->
        procket:buf(Res);
    Error ->
        Error
end.

Or, more simply, using the bpf module:

bpf:ctl(Socket, setf, finger()).

BPF Filter Examples

BPF Packet Capture

Capturing packets is as simple as reading from the bpf device.

To work with file descriptors, procket needs to support the read and write system calls.

{ok, Buf} = procket:read(FD, Length).

The captured packet is not an ethernet frame (or a frame of whatever datalink type you happen to be sniffing): it's a buffer prepended with a header containing information about the packet that follows.

struct bpf_hdr {
    struct timeval bh_tstamp;     /* time stamp */
    u_long bh_caplen;             /* length of captured portion */
    u_long bh_datalen;            /* original length of packet */
    u_short bh_hdrlen;            /* length of bpf header (this struct
                                     plus alignment padding */
};
  • bh_timestamp differs between 32 and 64-bit platforms

    • On a 32-bit platform, struct timeval has a 4 byte sec and usec field.

    • On a 64-bit platform, struct timeval has an 8 byte sec and a 4 byte usec field.

  • bh_caplen is the size of the captured packet that follows

  • bh_datalen is the real packet length. The packet may have been truncated.

  • bh_hdrlen is the real size of the bpf_hdr structure which may be padded due to alignment

To determine the start of the next packet, the bpf header provides a macro. Similarly the Erlang bpf module provides a module to calculate the proper offset:

?BPF_WORDALIGN(Hdrlen + Caplen).

The bpf module will do the calculations for you:

{ok, Length} = bpf:ctl(Socket, blen),
{ok, Buf} = procket:read(Socket, Length),
{bpf_buf, Time, Datalen, Packet, Rest} = bpf:buf(Socket).

Here is a complete example of using the bpf module to dump packets. It can be used with the filt module, if you want to play with the fcode filtering. To start it:

% en1 is the wireless device
% rule = ( src host 10.10.10.10 or dst host 10.10.10.10 ) and ( src port 80 or dst port 80)
dump:start("en1", filt:tcp({10,10,10,10}, 80)).

BPF Packet Generation

Generating crafted packets is even simpler: write the packet to the bpf device. The packet must be valid.

Be careful, crafted packets can have strange effects. On Mac OS X, I found a few odd cases that caused the network interface to go down. For example, sending out ARP replies from a spoofed MAC address or even advertising 0.0.0.0 with the macbook's MAC address.

This example acts as a sort of peer to peer QoS, should you ever need to kick someone off of a local network. This code acts in 2 ways:

  • It continually arps for whatever IP the target advertises. Eventually, the target system will give up and go offline.

  • Since gratuious arps are sent aggressively, the gateway will consider our MAC address to be the MAC address for the target's IP address and will send packets to us, effectively cutting off the target system.

To use the code, you will need to know the target's MAC and IP address.

% our interface: en1
% target:
%  MAC = "00:aa:bb:cc:dd:ee"
%  IP = "10.10.10.10"
annoy:er("en1", "00:aa:bb:cc:dd:ee", "10.10.10.10").