DITL Anonymization at B-Root

ditl

DITL (Day-IN-The-Life of the Internet) is a periodic data collection activity currently coordinated by DNS-OARC. This page documents the anonymization software we use in B-Root.

  • (2023-03-27); current release

This web page documents DITL data capture and anonymization as done at B-Root, using software developed by ANT.

We document this procedure to support other groups that may wish to do something similar.

Data Collection

We collect data as fixed-length pcaps using LANDER. However, one can accomplish the same thing running tcpdump with `port 53 i-i $INTERFACE -C 2048 -w ditl

Data Anonymization

We anonymize pcaps with dag_scrubber. Our goals are: (a) scramble the low-order bits of the IP address, (except for the DNS service addresses, which are passed through unchanged), (b) scramble MACs, (c) remove everything but DNS data (port 53 and 853).

“Scrambling” IP addresses means re-arranging the low-order bits with CrypoPAN (using CryptoPANT, our implementation).

The specific procedure:

  1. Generate a key
KEY_FILE=/tmp/keyfile
./dag_scrubber -m -s $KEY_FILE
  1. Run
# crypographic key, generated above
KEY_FILE=/tmp/keyfile
# service addresses are in DONTSCRAMBLE
DONTSCRAMBLE='192.228.79.201/32 199.9.14.201/32 2001:500:84::b/128 2001:500:200::b/128'
# tcpdump expression of what traffic to keep
FILTER_PAYL='( port 53 || port 853 )'
FILTER_PASS=${FILTER_PASS:-"(tcp || udp) && $FILTER_PAYL"}
# OUTPUT_DIR=
# iterate over all files
for f in ditl\*
do
  dag_scrubber -P -s $KEY_FILE -m --pass4=24 --pass6=64 \
                     --dont-scramble="$DONTSCRAMBLE" \
              -F "$FILTER_PASS" -n "$FILTER_PAYL" <$f | \
                 gzip >$OUTPUT_DIR/$f.pcap.gz
done

Alternatively, use our ditl_anonymization.sh script.

Parallel Processing

With data split into different files, each can be processed in parallel. Invoke ditl_anonymization.sh with the -p option and it will produce commands that can run in parallel. Pipe these through GNU parallel on a big machine.