DITL (Day-IN-The-Life of the Internet) is a periodic data collection activity currently coordinated by DNS-OARC. This page documents the anonymization software we use in B-Root.
This web page documents DITL data capture and anonymization as done at B-Root, using software developed by ANT.
We document this procedure to support other groups that may wish to do something similar.
We collect data as fixed-length pcaps using LANDER. However, one can accomplish the same thing running tcpdump with `port 53 i-i $INTERFACE -C 2048 -w ditl
We anonymize pcaps with dag_scrubber. Our goals are: (a) scramble the low-order bits of the IP address, (except for the DNS service addresses, which are passed through unchanged), (b) scramble MACs, (c) remove everything but DNS data (port 53 and 853).
“Scrambling” IP addresses means re-arranging the low-order bits with CrypoPAN (using CryptoPANT, our implementation).
The specific procedure:
KEY_FILE=/tmp/keyfile
./dag_scrubber -m -s $KEY_FILE
# crypographic key, generated above
KEY_FILE=/tmp/keyfile
# service addresses are in DONTSCRAMBLE
DONTSCRAMBLE='192.228.79.201/32 199.9.14.201/32 2001:500:84::b/128 2001:500:200::b/128'
# tcpdump expression of what traffic to keep
FILTER_PAYL='( port 53 || port 853 )'
FILTER_PASS=${FILTER_PASS:-"(tcp || udp) && $FILTER_PAYL"}
# OUTPUT_DIR=
# iterate over all files
for f in ditl\*
do
dag_scrubber -P -s $KEY_FILE -m --pass4=24 --pass6=64 \
--dont-scramble="$DONTSCRAMBLE" \
-F "$FILTER_PASS" -n "$FILTER_PAYL" <$f | \
gzip >$OUTPUT_DIR/$f.pcap.gz
done
Alternatively, use our ditl_anonymization.sh script.
With data split into different files, each can be processed in parallel.
Invoke ditl_anonymization.sh
with the -p
option and it will produce
commands that can run in parallel. Pipe these through GNU parallel
on a big machine.