Dnsanon_rssac is an implementation of RSSAC002v4 processing for DNS statistics. It implements all of v5. In v4, it default to “lax” mode that provides a superset of v4. With –version v3 and v2 it also implements most of prior versions (all but zone size). Given the “RSSAC Advisory on Measurements of the Root Server System”, at https://www.icann.org/en/system/files/files/rssac-002-measurements-root-20nov14-en.pdf, it provides all values that can be computed from packet captures. Its processing can be parallelized and and done incrementally.
Explicit design goals:
Incremental computation. It must be possible to compute statistics over the day and merge once, or to compute statistics at different sites and back-haul only minimal information for a central merge.
Extensibility. It should be easy to add measurements in the future.
Constant memory usage. Roots get attacked; we don’t want attacks to take out (computationally) the measurement system by having steps that require O(n) memory.
Optional parallel processing. It works with Hadoop or GNU parallel, and it can also run sequentially.
Non-goals:
High performance (we know some ways to make it faster; maybe in the future). (However, it seems plenty fast enough to track B-root’s statistics with a small, 40-core Hadoop cluster.)
Pedantic levels of accuracy. The goal is to support root operation, and that does not requires 5 decimal places of precision. We believe our approach is correct (we’re just adding up sums), we do not currently implement careful checks around time boundaries (midnight).
Computation of RSSAC002v3 values that cannot be easily derived from packet captures. We do not compute the load time nor zone size metrics.
Graphs. (Although if you want to add some, please let us know.)
Although not an explicit goal, this implementation is largely independent of the other implementations we know of. We depend on dnsanon, which includes some code from DSC (TCP reassembly).
The basic idea: nearly everything in RSSAC-002 is a specialized version of “word count”, if you write the words carefully. That lets one use Hadoop style-parallelism to process and combine data.
Get pcaps and extract the DNS queries to Fsdb format (Fsdb is tab-separated text with a header, see http://www.isi.edu/~johnh/FSDB.)
Convert each pcap’s queries to “rssacint” format, an internal format
that supports easy aggregation. Each line of rssacint format is of the
format (OPERATOR)(KEY) (COUNT). For example, for “+udp-ipv4-queries 10”
the operator is “+”, the key is “udp-ipv4-queries” and we’ve seen 10 of
them. The + means if we see two rows with the same key, we can add them
together. (In practice we use terser keys because we move a lot of bytes
around, so this key is actually “+3u04”.) Operators alllow one to
compute sums, minimum and maximum, lists that check for completeness,
and some others; see rssacint_reduce
for details.
Rssacint files can be arbitrarily combined using the
rssacint_reduce
command. Just merge and sort two or more
files then the reduce command will sum up counts (or more generally,
apply the operator) without losing information.
As the last step, count the number of unique sources and convert to YAML. These steps loose information.
A full pipeline is:
collect pcaps of all traffic. We use LANDER. Alternates: dnscap.
We assume pcaps show up as a series of files with dates and/or sequence numbers. For B, they look like 20151227-050349-00203216.lax.pcap, where the last set of numbers are a sequence number and “lax” is a site-name.
extract the DNS queries to “message” format. We use
dnsanon
. Dnsanon is packaged separately at https://ant.isi.edu/software/dnsanon/index.html.
< 20151227-050349-00203216.pcap dnsanon -i - -o . -p mQ -f 20151227-050349-00203216
will write the file 20151227-050349-00203216.message_question.xz
(this code should actually be
< 20151227-050349-00203216.pcap dnsanon -i - -o - -p Q > 20151227-050349-00203216.message_question.xz
but a bug in dnsanon-1.3 (to be fixed in dnsanon-1.4) causes this pipeline to not work.
convert messages to rssacint format. Use
./message_to_rssacint
.
xzcat 20151227-050349-00203216.message_question.xz | \
./message_to_rssacint --file-seqno=203216 >20151227-050349-00203216.rssacint
optionally (but recommended), process that rssacint format locally to reduce data size:
< 20151227-050349-00203216.rssacint LC_COLLATE=C sort -k 1,1 | \
./rssacint_reduce > smaller.20151227-050349-00203216.rssacint
merge all rssacint files into one big one and reduce it (can be done multiple times).
cat smaller*.rssacint.fsdb | LC_COLLATE=C sort -k 1,1 | ./rssacint_reduce > complete.rssacint.fsdb
reduce it again to count unique ips
< complete.rssacint.fsdb ./rssacint_reduce --count-ips > complete.rssacfin.fsdb
Convert rssacfin to yaml. We use ./rssacfin_to_rssacyaml
< complete.rssacfin.fsdb ./rssacfin_to_rssacyaml
In Hadoop terms, steps 2 and 3 are the map phase, 4 is a combiner, step 5 is a reduce phase, and steps 6 and 7 are a second reduce phase. When we run with Hadoop we often do steps 6 and 7 as a single process.
(And there is nothing magical about Hadoop. The only requirement is
that data be sorted before any rssacint_reduce
step.)
Each program has a manual page with examples and short sample input and output.
Extended sample output is included in the sample_data
subdirectory. Run cd sample_data; make test
to exercise
this sample output as a test suite.
For B-Root, we capture about 1 pcap file every minute or two (step 1), we process them incrementally over the day (steps 2 and 3 and 4). Every night we run steps 5 as a map-reduce job with Hadoop, and run the final reduce directly (without Hadoop).
On occasion we have re-run an entire day’s computation (steps 2 through 7). We can process that in a few hours on a moderate-size (about 120-core) Hadoop cluster.
Each pcap file is 2GB uncompressed.
Each message file is about 200MB compressed (xz). A merged rssacint file
for a day of traffic is typically 10MB after xz compression. After
counting unique IPs, this drops to about 2KB.
We have checked our computations for internal consistency and against the Hedgehog implementation of RSSAC-002. We believe our results are internally consistent. We see some differences with Hedgehog’s numbers, but they are close. We believe some differences are due to B-Root’s specific use of Hedgehog which triggers a limitation of Hedgehog that we have never worked-around.
The included program dsc_to_rssacint
converts Hedgehog’s
modified DSC output to rssacint. Although we do not recommend it for
production use, it may be useful to compare implementations.
These program use the standard Perl build system. To install:
perl Makefile.PL
make
make test
make install
For customization options, see ExtUtils::MakeMaker::FAQ(3) or http://perldoc.perl.org/ExtUtils/MakeMaker/FAQ.html.
The current version of dnsanon_rssac is at https://ant.isi.edu/software/dnsanon_rssac/.
This program depends on dnsanon, available from https://ant.isi.edu/software/dnsanon/.
dsc_to_rssacint
.rssacfin_to_rssacyaml
, and a fix to –file-seqno with sites
(as in C<–file-seqno=lax:1>)message_to_rssacint
rssacint_reducer
now
correctly propages : rows, rather than throwing an error.rssacint_reducer
no
longer add /e to overlapping rangelists.message_to_rssacint
now
reports service addresess (with “s”) and
rssacfin_to_rssacyaml
puts it in “extra”.; update for and
default to RSSAC002v5.message_to_rssacint
now
outputs tls and https for DoT and DoH, and
rssacfin_to_rssacyaml
reports them.message_to_rssacint
now
understands dnstapmq comments.message_to_rssacint
now
handles a dash in site names.rssacfin_to_rssacyaml
fixes bugs and typos in tls accounting.We are interested in feedback, particularly about correctness or other active users.
Please contact John Heidemann johnh@isi.edu with comments.