The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.
[This document was originally written so as to more clearly define the
specification for a fuzzy idea I had. It seems also pass as a
reasonable description of how to perform DNS tunneling and the
techinical limitations and hurdles.]

How does the DNS tunnel work?

There are three hosts involved in DNS tunneling, the client, the server, and
the DNS server. We will call these hosts A, B, and D.

The client is the box on the inside of a walled garden or other restricted
setup that does not allow the required level of Internet access. It is one
endpoint of the tunnel.

The walled garden normally provides a DNS server for clients in the walled
garden to look up names of services within the walled garden. This host will
often also perform DNS queries on the public Internet.

The server is a host on the outside. It acts as an authoritative DNS server
and participates in the public Internet's DNS by having appropriate NS
records pointing at it.

When A wants to send an IP packet to B, it will encode the IP packet into a
DNS name. This name will be within the domain that B is authoritative
for. It will then send a TXT query for that name to D. As D is a recursive
resolver, it will (after following the DNS delegation chain) ask B. B then
decodes the DNS name to extract the packet.

The encoded DNS hostname has to fall within the usual conventions for DNS
queries. Since it is not an A, MX or NS query, the limitations on characters
in hostnames as per RFC822 and HOSTS.TXT formats does not apply, but the
limitations on name length, DNS packet size (512 bytes) and length of a
component do apply.

A scheme that fits the requirements is one where the packet is base64
encoded, a period inserted every 31 characters, and the DNS domain appended.
The MTU is usually around 150-160 characters.

Host D is going to expect a response to the DNS query. Thus, B has to
respond to it within a reasonable timeout. Assuming D is running BIND, this
means that D has to receive a reply to a query within five seconds of D
sending it. This would imply that B has about 1-2 seconds to decide what to
return - this provides a return channel.

When B wants to send an IP packet to A, it will encode the IP packet into a
TXT response and return that to D in response to a previous DNS query. D
will then return it to A.

The encoded DNS hostname has to fall within the usual conventions for DNS
responses. The important one is the size of the DNS response, which as the
response contains the DNS query, will be reduced in size by the size of any
query it might be responding to.

A scheme that fits the requirements is one where the packet is base64
encoded. The MTU is slightly more than for the forward channel, but for
sanity should probably be kept the same.

This boils down to A being able to send packets to B as often as it likes,
but B can only send packets to A in response to a packet sent from A, and B
has to send packets to A in response to a packet from A, even if it has
nothing to say.

As it happens, TCP handshaking will mostly ensure this works. A will send a
query (e.g. a HTTP request) and B will reply (with the web page). TCP flow
control will ensure there's packets being bounced both ways even when
traffic is only flowing one way down a TCP connection.

But there will be occasions where B wants to send to A but there's no query
waiting to be answered. So A has to periodically send null queries to check.

The algorithm is thus as follows:

Server:

The server initiates processing when it receives a packet from the client.
The packet is decoded and handed to the kernel.

The kernel is then queried for how many packets are queued waiting to go
out. If there's a packet, it returns that in response to a query, otherwise
it'll wait a while to see if one turns up. A suitable delay is 2s. If
there's none after that time, an empty response is sent. Either way, it also
includes in the response the number of packets left in its queue, so that
the client knows how many queries it needs to make to clear it.

Client:

The client initiates processing when it receives a packet from the kernel, a
response from the server, or just periodically on a heartbeat.

Packets from the kernel are encoded and sent to the server immediately.

Packets from the server are decoded and sent to the kernel immediately. If
the server indicates that it has more packets to send, null queries are made
immediately to collect them.

When otherwise quiet, periodic null queries are made to see if there's any
queued up data on the server. The delay depends on how long the client is
happy to delay inbound connections for. This probably should not be more
than about 60s.