메뉴 건너뛰기


Developer > Network

기타 Coding with the DNS protocol

2013.12.28 14:53

푸우 조회 수:13345

********************************
Coding with the DNS protocol
--------------------------------
A tutorial by JimJones
http://zsh.interniq.org [2000]
********************************
What's in a name? (I'm sorry for starting this passage off on such 
a trite tone, but it was inevitable) For many, its either a 
numerical IP or a hostname. A simple gethostbyaddr() or 
gethostbyname() can be used on the socket API layer to extract a 
hostent structure. We can then read the respective hostname or IP 
address from there. Simple? Yes. Sufficient? For almost all cases. 
But there is often that need to make a further step, whether out 
of sheer curiosity or because our application demands it. This 
tutorial will show you that extra step into the gray area - of DNS
on the UDP and TCP packet level.
 
Back to Square One
------------------
A lot of you may know this, so there's nothing stopping you from 
skipping to the next section. But just a quick recap for the others :
We all know that an IP address is a 4 byte structure (32 bits), which 
can be classified under 1 of 5 types - Class A, B, C, D, and E. To 
avoid confusion between operation systems which handle data 
differently, a common network order is used. This network order is in 
the Big Endian style, which sends the most significant byte first.
A domain name server typically runs on port 53 of UDP and TCP.
My /etc/services file contains the following, for example:
domain          53/tcp          nameserver      # name-domain server
domain          53/udp          nameserver
The large majority of name server requests will be handled on 
the UDP port. The UDP name service is used to make simple 
queries and resolutions. The TCP name service will be used when 
grabbing or transferring zones, which are typically very large 
in size. The reason for the predominant quantity of DNS traffic 
being UDP-based is really quite simple. The user datagram protocol 
is lightweight and requires little overhead. If a client simply 
needs to convert a hostname to an IP, only a small string of 
characters is being sent to the name server, and a few meaningful 
bytes returned. This really does not require the complexity of a 
virtual connection as provided by TCP. DNS traffic also accounts 
for a very large percentage of Internet traffic, since end user 
applications usually take host input over IPs, as they are more 
memorable. Using TCP would be an overkill in most cases.
You will almost always want to run your name server/make requests 
to port 53 except in the rarest conditions. Unlike FTP servers, 
for example, that sometimes run on high ports, NS's are almost 
always on port 53. Many DNS servers will reject packets not 
originating from source port 53, or they will be filtered out by 
firewalls or similar software. This port is very important to 
preserve!
 
Dissecting a DNS Packet: A Practical Laboratory
-----------------------------------------------
We will now discuss constructor methods for DNS packets. The 
source to use for creating our reading your own DNS packet is to 
use the HEADER structure, which is defined in /usr/include/arpa/nameser.h 
(simply reference with #include <arpa/nameser.h>).
To fetch a DNS packet, we may create a socket descriptor with 
socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) and recv into a buffer. 
Once the buffer has been filled with incoming data, or we have 
created our own buffer to send, we can typecast it to a (HEADER *).
A DNS packet could be defined as char dnspacket[PACKETSZ], where 
PACKETSZ is defined to be 512. We create a HEADER *myhead and 
initialize it with myhead = (HEADER *) dnspacket. The next steps 
are really quite simple. Type HEADER is simply a packed bitfield. 
The DNS header consists of 12 bytes. The first two are the ID, or 
identification sequence. This helps give each packet a unique 
identifier when a large volume of name service traffic is being
sent. Another differentiation between the DNS transport modes of
UDP and TCP is that TCP is used for DNS records which exceed a
certain length. Thus it is natural that zone transfers occur in
this form.
*NOTE: Be +VERY+ cautious when choosing IDs in applications which 
require real responses or relegate trust to an external server. 
Using a method of choosing IDs such as getpid() manipulation, or 
simple random functions are obviously insecure and can be predicted. 
This will result in the possibility of DNS cache poisoning or 
corruption, since a malicious host can inject false DNS replies
into the stream. For secure applications, a method that has a large 
degree of entropy, or randomness, should be chosen. There are many 
such ways of reading "more random" bits such as keyboard latency 
and certain system environmental factors. This, however, is beyond 
the scope of this article.
The next flag is operation field, which specifies a query as 0 and 
a response as 1. Obviously, when sending back a reply, set this 
field to 1, or 0 when making a request.
The next field consisting of one nibble specifies a query type.
This can be 0 for a standard query (QUERY), 1 for an inverse query 
(IQUERY). Values 2 and 3, which have become depracated, were used 
to query NS statuses and reversed, respectively, and the 4th option 
(NS_NOTIFY_OP) is beyond the scope of this article. The next fields
are a slew of miscellaneous flags. AA, the authoritative answer bit, 
is set with a QR return of 1, when a name server has answered a 
request for which it has authority.
 
 
A typical DNS packet will look like this:
Byte
|---------------|---------------|---------------|---------------|
1  2  3  4
ID  (Cont)  qr,op,aa,tc,rd ra,1,ad,cd,rcode
|---------------|---------------|---------------|---------------|
4  5  6  7 
# Questions (Cont)  # Answers (Cont)
|---------------|---------------|---------------|---------------|
8  9  10  11
# Authority (Cont)  # Additional (Cont)
|---------------|---------------|---------------|---------------|
   Question Section
|---------------|---------------|---------------|---------------|
   Answer Section
|---------------|---------------|---------------|---------------|
   Authority Section
|---------------|---------------|---------------|---------------|
   Additional Section
|---------------|---------------|---------------|---------------|

Where the last 4 sections take on variable length.

I guess one of the best ways to show DNS at work is to take a real 
life example. For this, we will be using a hex data dump for named 
requests. You can really do this anyway you want. Im going to use 
netcat for this example. (Yeah I know, netcat isn't exactly the most 
intensive tool for dumping data, but its lightweight and serves the 
purposes we want perfectly.
We will parse a simple address request so we run it all in UDP mode 
and kill named first!
JonesTown:/# killall named
JonesTown:/# netcat -l -p 53 -u -o dump &; host -t A test.domain.com localhost;
------ Some output ------
JonesTown:/# cat dump
< 00000000 45 1b 01 00 00 01 00 00 00 00 00 00 04 74 65 73 # E............tes
< 00000010 74 06 64 6f 6d 61 69 6e 03 63 6f 6d 00 00 01 00 # t.domain.com....
< 00000020 01 
JonesTown:/#
45 1b 01 00 00 01 00 00 00 00 00 00 04 74 65 73 74 06 64 6f 6d 61 69 6e 03 63 6f 6d 00  00 01 00 01
|_________________________________| |________________________________________________|  |_________|
 The DNS packet header (12 bytes)                 Actual DNS request      Suffix
Now let's get down and dirty and dissect this. As seen, the first 12
bytes are the packet header, the first two of which are the DNS id. 
This will be different each time. The values of these 2 bytes is 
really irrelevant. The next byte is set to 1. We see that the 
respective field for this value of "1" is the rd flag. Of course when 
querying a nameserver for an address, we wish to request recursion in 
case more nameservers must be first contacted. The rest of the fields 
are nulled (opcode also = 1 since this is a query). We reach the 6th 
byte, which is not 0. The value of this, too, is 1. Why? We are in the 
number of questions section. We are only asking one question: "What is 
the address of test.domain.com ?" Thus this number is one. There are no 
answers or authority/resource records so the rest of the DNS header is 
set to 0. Now we reach the formatted domain name.
A hostname is simply a series of letters and numbers and hyphens that 
is delimited (separated) by periods. For example, our query is for 
test.domain.com. Now, "test.domain.com" consists of three words - "test", 
"domain", and "com" separated by periods. The respective lengths of 
these words is 4 (for "test"), 6 (for "domain"), and 3 (for "com"). 
The formatted string becomes the length of each word followed by the 
word, and terminated by a null (0). Let's examine the DNS request chunk
again.
04 74 65 73 74     06 64 6f 6d 61 69 6e     03 63 6f 6d       00
|  t  e  s  t      |  d  o  m  a  i  n      |  c  o  m        |-> terminated by a null 
|-> "test" len     |-> "domain" len         |-> "com" len
The final 01 is the value of the resource query type. Since we see this line :
#define T_A             1               
This is where this next value of 01 comes from. Suffix it with another null. 
The final 01 is the class type, or Internet in this case.
#define C_IN            1               
Internet records are all that are really relevant, as CHAOS and Hesiod 
aren't used any more. Thus we always want class C_IN records. It's as simple 
as that.

Let's do one final example. This is the hex dump of an HINFO query 
for "my.host.org" (HINFO, being a hardware information request).
< 00000000 09 52 01 00 00 01 00 00 00 00 00 00 02 6d 79 04 # .R...........my.
< 00000010 68 6f 73 74 03 6f 72 67 00 00 0d 00 01          # host.org.....
We can bypass the 12 byte DNS header, since we know what that does. 
Then comes the query data.
02 6d 79         04 68 6f 73 74      03 6f 72 67       00
|  m  y          |  h  o  s  t       |  o  r  g        |-> null terminator
|-> "my" len     |-> "host" len      |-> "org" len
Here we see 0d in place of 01. T_A is 1, but HINFO is declared as 
#define T_HINFO         13              
Thus the hexadecimal representation of "13" is 0d. The final 01 is
an Internet class.

The following function is one I wrote for use in DNS based applications. 
It will parse a DNS label, such as the one seen above, into elements 
delimited by a specific character. You might think that this delimiter 
would always be a dot ('.') but for HINFO records and some others, it's 
a space. Remember to reference this to the beginning of the data (pointer).
----------------------------------------
void
printaddress (char *pointer, char delim)
{
  int i, z;
  while (*pointer != '\0')
    {
      z = *pointer;
      for (i = 0; (i < z); i++)
        {
          pointer++;
          if (isprint (*pointer))
            printf ("%c", *pointer);
        }
      if (*(pointer + 1) != '\0')
        printf ("%c", delim);
      pointer++;
    }
}
----------------------------------------
Reversing this function would be trivial, for the purposes of creating a 
function to convert an alphanumerical IP address to a formatted DNS record.

For the purposes of a reverse lookup, a PTR record is return. Basically,
the target IP is passed to the name server in a reverse order, and an
inverse query is performed. An IP of the form "a.b.c.d" is passed to the
name server as "d.c.b.a.in-addr.arpa" Perhaps this form seems familiar
to you now, as you have probably seen the "in-addr.arpa" notation used
This function is one to convert between the formatted IP contained in a
PTR record and a normal IP address.
----------------------------------------
char *
ptrtoip (char *ptrstring)
{
  char *ip[4], *parse, *ret;
  int n = 1;
  ip[0] = strtok (ptrstring, ".");
  while ((n < 4) && ((parse = strtok (NULL, ".")) != NULL))
    {
      ip[n] = parse;
      n++;
    }
  ret = calloc (1, (sizeof (char) * 16));
  sprintf (ret, "%s.%s.%s.%s\n", ip[3], ip[2], ip[1], ip[0]);
}
----------------------------------------
For the handling of compressed names, the GETSHORT(),
GETLONG(), PUTSHORT(), and PUTLONG() macros are also
defined in <arpa/nameser.h> These can be used to either
inject or extract elements of records from DNS packets. You
just have to remember that in DNS-intensive applications,
compression methods will be employed that involve the use of
DNS pointers. These pointers refer to bytes located in
previous portions of the packets, as to avoid being repetitive
and using unnecessary space. Obviously, this is key for huge
name servers that handle thousands of requests every single
minute.
Well, that's about it. This was a quick little dip into DNS,
nothing major. You should be able to figure out the rest of it
reading the RFC's, or if you're more adventurous, it's really
pretty easy to reverse engineer the protocol through dumps.
Take care and happy coding.
 
There are some good references for this sort of thing:
As always, consult the appropriate RFC's. These include :
RFC #883 : Domain Names - Implementations and Specification
RFC #1995 : Incremental Zone Transfer in DNS (IXFR)
RFC #2535 : Domain Name System Security Extensions
Another good series for networking programming series (even though it's 
certainly geared towards Windows/Winsock programmers) is the DNS portion 
of the "Rolling Your Own Intranet" article found at 
http://users.neca.com/vmis/dns.htm. This site, as I said, focuses on 
Winsock, but is still pretty good for picking apart at basic DNS packets. 
And of course, my favorite Internet basics book, 
"Internetworking with TCP/IP" contains a pretty good DNS section.

#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
void printaddress (char *pointer, char delim); // Print a stored host name into form with delimiter
void gotalarm ();
int sock;
#define usage "\"<zone name> <ns>\" where\n<zone name> denotes a zone name to query,\n<ns> is the name server to use in the query.\n"
#define TIMEOUT 10
#define A 1
#define NS 2
#define CNAME 5
#define HINFO 13
#define MX 15
#define TXT 16

void
printaddress (char *pointer, char delim)
{
  int i, z;
  while (*pointer != '\0')
    {
      z = *pointer;
      for (i = 0; (i < z); i++)
 {
   pointer++;
   if (isprint (*pointer))
     printf ("%c", *pointer);
 }
      if (*(pointer + 1) != '\0')
 printf ("%c", delim);
      pointer++;
    }
}

void
gotalarm ()
{
  alarm (0);
  fprintf (stderr, "Connection timed out! Exiting.\n");
  close (sock);
  exit (-1);
}
main (int argc, char *argv[])
{
  int offset = 0, recsize = 1;
  struct sockaddr_in sa;
  struct hostent *he;
  char buf[4096], last[4096], *host, *point, *point2, *parse, *temp;

  char dnscode[] = { 0, 25, 8, 8, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0 };
  char dnscode2[] = { 0, 0, 252, 0, 01 };
  char searchcode[] = { 0, 0, 0, 0, 1, 0, 0, 0, 0 };
  if (argc != 3)
    {
      fprintf (stderr, "Improper usage: try %s %s", argv[0], usage);
      exit (-1);
    }
  bzero (&sa, sizeof (sa));
  sa.sin_family = AF_INET;
  sa.sin_port = htons (53);
  if ((sa.sin_addr.s_addr = inet_addr (argv[2])) == INADDR_NONE)
    if ((he = gethostbyname (argv[2])) == NULL)
      {
 printf ("Error: Could not resolve name server!\n");
 exit (-1);
      }
    else
      bcopy ((int *) *&he->h_addr, &sa.sin_addr.s_addr, he->h_length);
  sock = socket (AF_INET, SOCK_STREAM, 0);
  signal (SIGALRM, gotalarm);
  alarm (TIMEOUT);
  if (connect (sock, (struct sockaddr *) &sa, sizeof (sa)) == -1)
    {
      printf ("Error: Could not connect to name server! Exiting.\n");
      exit (-1);
    }
  alarm (0);
  memset (&buf, 0, sizeof (buf));
  memcpy (buf, dnscode, sizeof (dnscode));
  point = (buf + sizeof (dnscode));

  temp = (char *) malloc ((strlen (argv[1]) * sizeof (char)) + sizeof (char));
  sprintf (temp, "%s%s", ".", argv[1]);
  parse = strtok (temp, ".");
  *point = strlen (parse);
  point++;
  strcpy (point, parse);
  point += strlen (parse);
  while ((parse = strtok (NULL, ".")) != NULL)
    {
      *point = strlen (parse);
      point++;
      strcpy (point, parse);
      point += strlen (parse);
    }
  free (temp);
  memcpy (point, dnscode2, sizeof (dnscode2));
  if (strlen (argv[1]) == 1)
    buf[1] = 17;
  else
    buf[1] = strlen (argv[1]) + 18;
  buf[2] = getpid ();
  buf[3] = getpid ();
  point += sizeof (dnscode2);
  send (sock, buf, (point - buf), 0);
  for (;;)
    {
      memset (&buf, 0, sizeof (buf));
      memcpy (buf, last, offset);
     
      temp = (buf + offset);
      alarm (TIMEOUT);
      recsize = recv (sock, temp, (sizeof (buf) - offset), 0);
      alarm (0);
      if (recsize == 0)
 {
   printf ("\n----------End of mapping for %s----------\n\n", argv[1]);
   exit (1);
 }
     
      host = buf;
      parse = (buf + sizeof (buf));
      for (;
    ((parse > host)
     && (memcmp (parse, searchcode, sizeof (searchcode)) != 0));
    parse--)
 {
 }
      offset = offset + recsize - (parse - buf);
      memset (&last, 0, sizeof (last));
      memcpy (last, parse, offset);
     
      while (host < parse)
 {
  
   if (
       ((*host == A) || (*host == NS) || (*host == CNAME)
        || (*host == MX) || (*host == MX) || (*host == TXT))
       && (*(host + 1) == '\0') && (*(host + 2) == '\1')
       && (*(host - 1) == '\0') && (*(host - 2) == '\0'))
     {
       point = (host - 3);
       while (*point != '\0')
  point--;
       point2 = (point + 1);
       point = host;
       point += 9;
       switch (*host)
  {
  case A:
    printf ("A\t");
    printaddress (point2, 46);
                printf(" -> ");
    printf ("%d.%d.%d.%d", (unsigned char) *point,
     (unsigned char) (*(point + 1)),
     (unsigned char) *(point + 2),
     (unsigned char) *(point + 3));
    break;
  case NS:
    printf ("NS\t");
    printaddress (point2, 46);
  printf(" -> ");
    printaddress (point, 46);
    break;
  case CNAME:
    printf ("CNAME\t");
    printaddress (point2, 46);
                printf(" -> ");
    printaddress (point, 46);
    break;
  case HINFO:
    printf ("HINFO\t");
    printaddress (point2, 46);
                printf(" -> ");
    printaddress (point, 32);
    break;
  case MX:
    printf ("MX\t");
    printaddress (point2, 46);
                printf(" -> ");
    point += 2;
    printf ("%d ", *(point - 1));
    printaddress (point, 46);
    break;
  case TXT:
    printf ("TXT\t");
    printaddress (point2, 46);
                printf(" -> ");
    printaddress (point, 32);
    break;
  }
       printf ("\n");
     }
   host++;
 }
    }
  close (sock);
}