Pages

Tuesday, August 31, 2010

Dumping Payloads with epcap and procket

epcap and procket allow Erlang code to sniff data off of a network. With either, once the packets are in the Erlang VM, manipulating the contents is straight forward using pattern matching.

Which to use?

epcap and procket sort of overlap in functionality. The main differences, at the moment, are:
  • portability

    epcap: should work on any Unix with pcap installed

    procket: for sniffing, procket uses Linux's PF_PACKET socket option, so Linux only. I plan to add support for BPF someday, so maybe in the future procket will support BSD as well.

  • safety

    epcap: runs as a separate system process. Any bugs in epcap will not affect the Erlang VM.

    procket: linked into the Erlang VM using the NIF interface. Bugs may stall or crash the VM.

  • packet generation

    epcap: can only sniff packets

    procket: can generate whole packets. Again, currently Linux only, but should work under BSD's, like Mac OS X, when BPF is supported.

    Raw sockets (for example, generating ICMP echo packets) work under BSD as well.

    In fact, it should be possible to combine the power of procket, epcap, and BSD to send and receive arbitrary TCP or UDP packets now (since TCP/UDP raw sockets can send data only, we need to use epcap to sniff the response).

  • filtering

    epcap: packet filtering rules are processed in C, either in the kernel or in a library.

    procket: all packets are received and must be filtered by an Erlang process

Decapsulating Packets

procket and epcap have different ways of being started and reading packets but once the raw packets are received by an Erlang process, they can be decapsulated with a small module ("epcap_net.erl") distributed with epcap. Say, for example, we wanted to monitor http requests and write out a file containing just the client side: We request a raw socket from procket and then loop, polling the socket every 10ms for data. We look for established connections by matching packets with only the ACK bit set (we ignore connections in progress) and spawn another process to accumulate the data. When we see a packet indicating a connection has been closed, we tell the spawned process to write out its state to a file. The spawned process will also terminate if it reaches an arbitrary timeout. You've probably noticed that this code mimics some of the functionality of OTP behaviours. I wrote it this way for simplicity, but it certainly could be more compactly (and elegantly) written as a gen_server or a gen_fsm.

Matching on Payloads

A similar example with epcap: match on all http requests and write the complete transaction to a file based on the etag. epcap doesn't require polling as with procket. Instead, messages are received from the port similar to gen_tcp in {active, true} mode. The message contains some additional information about the length and time of the captured packet. For this example, we ignore it. We're only interested in the packet contents.

Similar to the procket example, we loop, blocking in receive. When data is received, we check if the connection is in the established state, spawning a process to accumulate the data if we haven't seen this session before.

Finally, we write out the data to the file system when the connection is closed, using the value of the "ETag" header for the file name. For succintness, I used a regular expression to match on the payload. Probably better to write a parser.

Thanks, Zabrane, for suggesting this post!