Description of IP Accumulation Datasets

This web page documents our datasets about IPv4 accumulation–counts of the number of active addresses per /24.

Datasets are distributed as a number of files, identify as part-NNNNN.xz, where NNNNN are decimial digits.

Each file is tab-separated value with the following header line:

#fsdb -F t block timestamp duration n_active_ip probed_ip

This header defines the schema:

  • block: Is the space we report accmulation over. By default it is a 8-hex digit version of the IP block. The last two bytes will always be 00. (In some datasets with different groupings, this field may be called “group” and represent a routable prefix or an Autonomous System number.)

  • timestamp: a unix timestamp indicating when this period begins

  • duration: the duration, in seconds, this count is believed to apply

  • n_active_ip: the number of IP addresses that have been responsive since the last time they were scanned

  • probed_ip: The number of IP addresses that are being scanned. (We often scan only part of of a /24 or other group.)

Typically probed_id has the same value, corresponding to the number of ever-active addresses in the target block. For the first few timebins, probed_ip may be smaller until all addresses have been scanned at least once. Then each subsequent entry is incremntally updated.

Data is sorted by block and then timestamp.

When data is stored in multiple files, blocks are distributed randomly across all files.

Timestamps may be uniformly spaced and same duration, or they may be irregularly spaced with varariable durations. (Typically they will be sequential and completely cover time.)

Sample data showing one block:

#fsdb -F t block timestamp duration n_active_ip probed_ip
0104de00        1577900940      660     149     253
0104de00        1577901600      660     150     255
0104de00        1577902260      3960    152     256
0104de00        1577906220      660     153     256
...