Pages

Monday, February 15, 2010

Erlang and Excessively Long Hostnames

I've been reading through the Erlang source code, trying to get familiar with it and looking for small bugs.
inet_gethost is a port that handles name lookups. The source code for it is in:
$ERL_TOP/erts/etc/common/inet_gethost.c
inet_gethost is suprisingly complicated, mainly because of portability concerns and probably due to age as well (the code looks to be about 10+ years old). inet_gethost works by starting up a master process that forks a pool of slave processes and waits for data coming from stdin. When it reads a packet from the Erlang side, it sends the data to the slave over a pipe. The slave does a gethostbyname() (or the IPv6 equivalents), blocking in the lookup, then writes the response.
Setting the environment variable "ERL_INET_GETHOST_DEBUG" will print out some extra debug messages. The values in the code range from 0 (debug disabled) to 5:
export ERL_INET_GETHOST_DEBUG=5
erl
To test how inet_gethost handles some simple edge cases, we run the following:
1> inet:gethostbyname(lists:duplicate(3,"n")).
inet_gethost[4924] (DEBUG):Saved domainname .
inet_gethost[4924] (DEBUG):Created worker[4925] with fd 3
inet_gethost[4924] (DEBUG):Saved domainname .
inet_gethost[4925] (DEBUG):Worker got request, op = 1, proto = 1, data = nnn.
inet_gethost[4925] (DEBUG):Starting gethostbyname(nnn)
inet_gethost[4925] (DEBUG):gethostbyname error 1
{error,nxdomain}
Increasing the number of characters in the domain name reveals something interesting:
4> inet:gethostbyname(lists:duplicate(100000,"n")).
inet_gethost[4924] (DEBUG):Saved domainname .
inet_gethost[4924] (DEBUG):Saved domainname .
inet_gethost[4924] (DEBUG):reap_children: res = -1, errno = 10.
inet_gethost[4924] (DEBUG):End of file while reading from pipe.
inet_gethost[4924]: WARNING:Malformed reply (header) from worker process 4925.
inet_gethost[4924] (DEBUG):Killing worker[4925] with fd 3, serial 4
{error,timeout}
On Mac OS X, the CPU usage for inet_gethost shoots up as well.
The weird error ("Malformed reply ...") happens because the slave process crashes. If you look at the two outputs, the error message that should be displayed after "Saved domainname." ("Worker got request ...") is never printed. That's because of an overflow that happens when the debug output is printed (buff is only 2048 bytes):
static void debugf(char *format, ...)
{
    char buff[2048];
    char *ptr;
    va_list ap;

    va_start(ap,format);
    sprintf(buff,"%s[%d] (DEBUG):",program_name,(int) getpid());
    ptr = buff + strlen(buff);
    vsprintf(ptr,format,ap);
    strcat(ptr,"\r\n");
    write(2,buff,strlen(buff));
    va_end(ap);
}
Replacing vsprintf() with vsnprintf() fixes that bug, but inet_gethost will still crash on Mac OS X Snow Leopard.
Writing a small program to call gethostbyname() on Mac OS X proves it is not an Erlang bug:

Looks like a bug in gethostbyname() on Mac OS X Snow Leopard, while doing a multicast DNS lookup:
Feb 15 12:42:48 ack mDNSResponder[18]:  77: ERROR: read_msg - hdr.datalen 70001 (11171) > 70000
Feb 15 12:42:48 ack ./gho[4852]: dnssd_clientstub write_all(4) failed -1/70028 32 Broken pipe
Feb 15 12:42:48 ack ./gho[4852]: dnssd_clientstub deliver_request ERROR: write_all(4, 70028 bytes) failed
Feb 15 12:42:48 ack ./gho[4852]: dnssd_clientstub write_all(4) failed -1/28 32 Broken pipe

No comments:

Post a Comment