The Design of the Domain Name System, Part II - Exact and Approximate Name Matching

Home / Blogs

The Design of the Domain Name System, Part II - Exact and Approximate Name Matching

	By John Levine Author, Consultant & Speaker
	August 23, 2011 Views: 14,167 Add Comment

In the previous installment, we looked at the overall design of the DNS. Today we’ll look at the ways it does and does not allow clients to look up data by name.

The most important limitation of the DNS, compared to other databases, is that it only does exact match lookups. That is, with a few minor exceptions, the name in the query has to match the name of the desired records exactly. One exception is folding of upper and lower case characters, which has little effect, the other is DNS wildcards.

Wildcards have always been part of the DNS, but the details of their definition have been confusing. The definition was clarified by RFC 4592 in 2006.

Wildcards provide a very constrained form of pattern matching, telling the server to synthesize records for all nodes below a specified node that don’t have explicit records. That is, if there is a DNS record named *.foo.example, it will match requests for something.foo.example, so long as there aren’t any records with that explicit name. A single star as the leftmost component of a domain name is the only form of wildcard; stars anywhere in a name are just normal characters.

In practice, this turns out to be of limited use, with typical applications being web servers that catch any variation of their name, e.g. http://anything.example.com, and mail systems that give each user a separate domain, e.g., [email protected]. Wildcards do not work with prefixed names, such as _attribute.*.example.com, nor are they useful to handle ranges of queries except in some very stylized cases.

Some applications have proposed sequences of multiple queries to simulate range queries. For example, DNS blacklists (DNSBLs) map IP addresses into DNS names using a modified version of the mapping used for reverse DNS. If a DNSBL is called dnsbl.example, the entry for the IP adresss 12.34.56.78 would be 78.56.34.12.dnsbl.example.

When a DNSBL wants to list a range of IP addresses, it needs conceptually to include a record for each name corresponding to an IP address in the range. For DNS servers that use traditional master files, since each component in the name represents eight bits of the IP address, this involves breaking an IPv4 range into a minimal covering set of blocks on eight bit boundaries, adding wildcards for each block, and an individual entry for each individual address not in a larger block.

Some people have suggested approaches to try to optimize range listings by querying for prefixes of a desired address, e.g., if the address is 1.2.3.4 and the name is 4.3.2.1.dnsbl.example, query for 2.1.dnsbl.example to see of any of the 1.2.xx.xx range of addresses are listed. According to DNS rules, the query should return NXDOMAIN if there are no entries in the range, or NODATA if there are some. While this technique might work, it is quite fragile, due to DNS servers that don’t correctly distinguish between NXDOMAIN and NODATA responses, currently including the most popular DNSBL server rbldnsd. Also, at this point there is no evidence that the probes really would save queries or cache entries compared to just querying for each address as needed. In principle, a DNS cache could synthesize its own NXDOMAIN responses for names below existing NXDOMAIN (anything in

*.2.1.dnsbl.example here), but again, it’s fragile, and as far as I know, no widely used DNS cache does that other than as an experiment.

As a general rule, a successful DNS application makes one query, or at most a small bounded number of queries for each application call. Note that the issue of DNS range queries is separate from that of application ranges. Most notably, the NAPTR RRTYPE, defined in RFC 3403, used to find servers for things like telephone numbers, includes a string which is interpreted by applications as a regular expression to be matched against a source string to find a domain for a subsequent lookup. While one can debate the wisdom of the rather complicated application design of which NAPTR is a part, it does not involve any pattern matching in the DNS. The NAPTR lookup algorithm makes a small set of specific DNS queries which the DNS handles without difficulty. It does involve potential provisioning problems, since regular expressions include a lot of special characters and escape sequences, something that few other RRTYPEs include and whose handling by provisioning software may not be well debugged.

In the next installment, we’ll look at the way delegation of parts of the DNS works, and how that affects the way applications use it.

NORDVPN DISCOUNT - CircleID x NordVPN
Get NordVPN [74% +3 extra months, from $2.99/month]

By John Levine, Author, Consultant & Speaker

Filed Under

Comments

The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.