Pages

Saturday, December 25, 2010

Unix Sockets

There are various ways to use Unix sockets from within Erlang such as gen_socket and unixdom_drv. Code examples are even bundled with the Erlang source.

To work with Unix sockets, I've broken out the socket primitives in the procket NIF and made them accessible from Erlang.

Unix (or local or file) sockets reside as files on the local server filesystem. Like internet sockets, the Unix version can be created as either stream (reliable, connected, no packet boundary) or datagram (unreliable, packet boundaries) sockets.

Creating a Datagram Socket


The Erlang procket functions are simple wrappers around the C library. See the C library man pages for more details.

To register the server, we get a socket file descriptor and bind it to the pathname of the socket on the filesystem. The bind function takes 2 arguments, the file descriptor and a sockaddr_un. On Linux, the sockaddr_un is defined as:

typedef unsigned short int sa_family_t;

struct sockaddr_un {
    sa_family_t sun_family;         /* 2 bytes: AF_UNIX */
    char sun_path[UNIX_PATH_MAX];   /* 108 bytes: pathname */
};

We use a binary to compose the structure, zero'ing out the unused portion:

#define UNIX_PATH_MAX 108
#define PATH <<"/tmp/unix.sock">>

<<?PF_LOCAL:16/native,        % sun_family
  ?PATH/binary,               % address
  0:((?UNIX_PATH_MAX-byte_size(?PATH))*8)
>>

This binary representation of the socket structure has a portability issue. For BSD systems, the first byte of the structure holds the length of the socket address. The second byte is set to the protocol family. The value for UNIX_PATH_MAX is also smaller:
typedef __uint8_t   __sa_family_t;  /* socket address family */

struct sockaddr_un {
    unsigned char   sun_len;    /* 1 byte: sockaddr len including null */
    sa_family_t sun_family;     /* 1 byte: AF_UNIX */
    char    sun_path[104];      /* path name (gag) */
};
The binary can be built like:
#define UNIX_PATH_MAX 104
#define PATH <<"/tmp/unix.sock">>

<<
  (byte_size(?PATH)):8,         % socket address length
  ?PF_LOCAL:8,                  % sun_family
  ?PATH/binary,                 % address
  0:((?UNIX_PATH_MAX-byte_size(?PATH))*8)
>>

The code below might need to be adjusted for BSD. Or it might just work. Some code I tested on Mac OS X just happened to work, presumably because the length field was ignored, the endianness happened to put the protocol family in the second byte and the extra 4 bytes was truncated.

Here is the code to send data from the client to the server:


Start up an Erlang VM and run the server (remembering to include the path to the procket library):

$ erl -pa /path/to/procket/ebin
Erlang R14B02 (erts-5.8.3) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.3  (abort with ^G)
1> unix_dgram:server().

And in a second Erlang VM run:
1> unix_dgram:client(<<104,101,108,108,111,32,119,111,114,108,100>>). % Erlangish for <<"hello world">>, I am being a smartass

In the first VM, you should see printed out:
<<"hello world">>
ok

Creating an Abstract Socket


Linux allows you to bind an arbitrary name (a name that is not a file system path) by using an abstract socket. The abstract socket naming convention uses a NULL prefacing arbitrary bytes in place of the path used by traditional Unix sockets. To define an abstract socket, a binary is passed as the second argument to procket:bind/2, in the format of a struct sockaddr:
<<?PF_LOCAL:16/native,        % sun_family
  0:8,                        % abstract address
  "1234",                     % the address
  0:((?UNIX_PATH_MAX-(1+4)*8))
>>

To create a datagram echo server, the source address of the client socket is bound to an address so the server has somewhere to send the response. We modify the datagram server to use recvfrom/4, passing in an additional flag argument (which is set to 0) and a length. recvfrom/4 will return an additional value containing up to length bytes of the socket address.

We also need to modify the client to bind to an abstract socket. The server will receive this socket address in the return value of recvfrom/4; this value can be passed to sendto/4.


1> unix_dgram1:server().

1> unix_dgram1:client(<<104,101,108,108,111,32,119,111,114,108,100>>).
<<"hello world">>
ok

Creating a Stream Socket


To create a stream socket, we use the SOCK_STREAM type (or 1) for the second value passed to socket/3. The socket arguments can be either integers or atoms; for variety, atoms are used here.

After the socket is bound, we mark the socket as listening and poll it (rather inefficiently) for connections. When a new connection is received, it is accepted, the file descriptor for the new connection is returned and a process is spawned to handle the connection.

On the client side, after obtaining a stream socket, we do connect the socket and so do not need to explicitly bind it.


Running the same steps for the client and server as above:

1> unix_stream:server().
<<"hello world">>
** client disconnected

1> unix_stream:client(<<104,101,108,108,111,32,119,111,114,108,100>>).
<<"hello world">>
ok

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.