Description of Internet Outage Datasets

This page describes the format of our Internet outage datasets.

We have three primary formats:

outagedownup format that is a more “cooked” form of outages
outages data, an integrated output that merges data from all observers
outageraw data, the direct output from each observer

These formats are most reliable and processed (outagedownup) to less and less processed.

We recommend using outagedownup data for most purposes, since it includes post-processing cleans up some known flaws that occur in the raw data. Outagesdownup format data is in the “alL” datasets, so for example internet_outage_adaptive_a58all-20241001 has outagdownup data for 2024q4.

You may want to use outages format if you want to see when different observers see different things. Outages data is also in the “all” datasets.

If you want ot see the individual probes each observer sends, then you need outageraw format. Outageraw data is provided separately for each site, so one would request internet_outage_adaptive_a58w-20241001 and internet_outage_adaptive_a58c-20241001 to get two of the outageraw datasets that went in to a58all.

Sites

As of Oct. 2017, our sites (Vantage Points) are:

w: ISI-West in Los Angeles;
c: Colorado data from Ft. Collins;
j: Japan data from Keio University (SJF campus) near Tokyo;
e: ISI-East data from near Washington, DC;
g: Greek data from Athens University of Economics in Business;
n: Netherlands data from SurfNet near Amsterdam.

“Outagedownup” Format Integrated Outages

“Outagesdownup” format data merges all observing sites for one time period (see “Sites” above for details, time periods are typically quarters). It also includes several post-processing step:

insufficient VP detection
merging roles
unmeasurability detection
hole filling of periods with insufficient observers

Outagesdownup format represents our best estimate for outages at any given target block, using all the information we have.

Output is in tab-separated text (FSDB format, with the following schema:

block: block address of the /24 in hex (with trailing zeros)
start: when the status was takes effect, in seconds since the Unix epoch.
duration: how long the status is in effect, in seconds.
uncertainty: our confidence in the precision of the start time. The true start time is sometime between start and start-uncertainty. The true duration is between duration-NextEventUncertainty and duration+ThisEventUncertainty. In non-raw data uncertainty is sometimes lowered when we merge observations from multiple observers.
downup: up (1), down (0), unmeasurable (-1, typically due to insufficient active observers), or gone dark (-2, typically out for more than 10 days)

Sample data, from dataset internet_outage_adaptive_a30all-20171006, file a30all.outagedownup.fsdb.bz2:

#fsdb -F t block start duration uncertainty downup
01000400        1507326957      7439968 331     1
01000500        1507326628      7439966 660     1
01000600        1507327123      7439309 333     1
01005000        1507326901      3269846 12540   1
01005000        1510596747      32046   7920    0
01005000        1510628793      47225   7920    1
01005000        1510676018      35681   11972   0
...

This data shows that blocks 0x01000400 (1.0.4.0/24), 0x01000500 (1.0.5.0/24), and 0x01000600 (1.0.6.0/24), were up (downup is 1) for the entire observtion period (starting at 1507326957, 2017-10-06t21:55:57Z and continuing for 7439968 seconds, just more than 86 days).

Block 0x01005000 (1.0.80.0/24) was up starting at 1507326901 (2017-10-06t21:55:01Z) for 3269846 seconds (37.8 days), then down for 32046 seconds (8.9 hours), then up for 47225 seconds (13.1 hours), etc.

“Outages” Format Multiple-VP Data

“Outages” format data merges all observing sites for one time period (see “Sites” above for details, time periods are typically quarters).

Outages datasets show our best estimate from all observers, but it also reveals when they disagree.

Output is in tab-separated text (FSDB format, with the following schema:

block: block address of the /24 in hex (with trailing zeros)
start: when the status was takes effect (seconds since the Unix epoch)
duration: how long the status is in effect
uncertainty: our confidence in the precision of the start time. In non-raw data uncertainty is sometimes lowered when we merge observations from multiple observers.
precision_improvement: is either unused (‘-‘) or precision improvement of the onset of a state change resulting from merging data from multiple vantage points
status: vantage point that saw the outage (each letter ‘c’,’j’,’w’, ‘g’ is one of the sites from our observers; corresponding capital letter ‘W’, ‘C’, ‘J’, ‘G’ means the vantage point saw no outage; the order is fixed to [wW][cC][jJ][gG])

A sample of outages data from dataset internet_outage_adaptive_a30all-20171006, file `a30all.outages.fsdb.bz2:

#fsdb -F t block start duration uncertainty status
01000400        1507326957      23      660     W
01000400        1507326980      890349  594     WCJGEN
...
01000400        1512522614      2242390 660     WJGEN
01000400        1514765004      839     242     WEN
01000400        1514765843      1082    331     E
01000500        1507326628      23      660     W
01000500        1507326651      1957    591     WCJGEN
01000500        1507328608      692     637     Wcjgen
...

These two segments show outages for two blocks. For the first, block 0x01000400 (1.0.4.0/24), was up (capital letters in status), as detected by site W at time 1507326957 (2017-10-06t21:55:57Z), and seen by all 6 sites in the next line 23 seconds later.

The second block, 0x01000500 (1.0.5.0/24) was detected as up by site W at time 1507326628 (2017-10-06t21:50:28Z), followed by all the other sites 23 seconds later. However, at time 1507328608 (2017-10-06t22:23:28Z) all sites except for W failed to detect it as up.

Outageraw Format: Single-VP Trinocular Output

Outage probing output is provided for each vantage point in “outageraw” format. (see “Sites” above for details).

This dataset format includes information about every single ping, plus our evolving estimate of block responsivness (the “A” value). We convert it into address accumulation datasets and merge multiple sites into outages format (described below).

Each dataset includs input to the prober in several formats and the output.

Output documents Trinocular “rounds”. Each round is a set of pings that conclude in a determination of block status, or, rarely, abort with an indeterminate status after 15 tries.

Output is in tab-separated text (FSDB format, with the following schema:

block: hex format of /24 IP block, with trailing zeros (A7 omits trailing zeros)
round_no: round number in this batch (will reset each time we restart)
round_start_epoch: when the round began, in seconds since 1970
a_short: the short term estimate of availability
a_oper: the operational estimate of A value (long term and reflecting variance)
status: status of this block: A12 and later: 0 for down, 1 for up, 2 for unknown
belief: our belief the block is down
n_pos: number of positive responses in this round
n_neg: number of negative responses in this round
probe_log: A base-64-encoded list of what specific addresses were probed. (Only in a18 and later).
rtt_us: estimated round-trip time in microseconds. (Only in a20 and later.)

A sample of raw data from dataset internet_outage_adaptive_a30w-20171006, file data/pinger-w4.e1507326545.a30w.2.r0.001.fsdb.bz2:

#fsdb -F t block round_no round_start_epoch a_short a_oper status belief n_pos n_neg probe_log rtt_us
58bae200        0       1507326545      0.8766  0.4383  1       0.01    1       0       CiQ=    219051
bd378a00        0       1507326545      0.3644  0.1822  1       0.01    1       0       CqQ=    204900
342e1a00        0       1507326545      0.2879  0.1439  1       0.01    1       0       CgQ=    158242
83c1c200        0       1507326545      0.1323  0.06614 1       0.01    1       0       CqU=    64556
d02afa00        0       1507326545      0.7865  0.3932  1       0.01    1       0       CuQ=    60168
...

The data shows the schema (the #fsdb line), followed by data for block 0x58bae200, which is 88.186.226.0/24, taken at 1507326545 seconds past the Unix epoch (2017-10-06t21:49:05Z). The block was detected as up (status is 1), and the positive ping replied in 219.051ms. Other lines show other blocks, all probed at this time.

Versions

We have had several different of our outage data processing pipeline as we learn more. In general, we have two goals in our datasets: to be as accurate as possible to what really happened, and to provide a long-term result for others to use.

These two goals are in conflict, so to resolve that confict we sometimes update our datasets with recomputed results while preserving the old results as different files in the same database.

All datasets now include a “vX” tag that indicates the version.

Here is our summary:

Version	input	raw Trinocular (icmptrain, per-site data)	aXXall.vYY.outages.fsdb.bz2 (FBS+LABR, raw to outages, merge, precision improvement)	aXXall.vYY.outagedownup.fsdb.bz2 (disagreement resolution, hole filling, gone-dark)
v1	target blocks: \|E(b)\| >= 15 and \|A(E(b))\| equal to 0.1, from Quan13c	a_oper: do not include down events to calculate a_oper probing order: per-block probe-order order is randomized from full round (FR) to FR	survey edges: no pre-staging of before & after quarter data hole filling: raw to outages: single unknown states in between 2 rounds of equal status, has its status set to the same status as the other 2 precision improvement: forward precision-improvement, from Quan13c section 4.5	gone-dark: 1 week windows need 0.8 up time, otherwise set to -2, from Alwabel15a multi-site resolution: any-up
v1b	same	same	same	gone-dark: improvements in downup_to_unmeasurable.py code, window increase from 1 to 3 weeks
v2	same	same	raw to outages: unknowns are not fixed precision improvement: backward precision improvement	gone-dark: same as in v1 but fixed some bugs
v3	same	same	same	internal only
v4	same	same	a_oper: a_oper as a new column in outages format (from Quan14c)	gone dark: outages longer than 1 week set to -2 a_oper: adds a_oper as a new column in outagedownup format (from Quan14c)
v5	same	same	survey edges: added from 1 week before/after survey for proper gone-dark filtering, motivated by gone-dark in Alwabel15a FBS: full block scanning over flaky blocks; outages in sparse blocks sometimes mapped to up, from Baltra19b LABR: lone block recovery algorithm, single addresses down events mapped to unknown, from Baltra19b	multi-site resolution: majority voting, from Baltra19a
v5b	same	same	FBS: a_short bug fixed FBS: full round (FR) completion (after a non-UP round): we are willing to count down probes in the first TR that includes a positive response against the FR accumulation FBS: windowing - require 2FRs to due to round reordering, for data on or before 2019q4	same
v5c	target blocks: blocks with \|E(b)\| >= 3	extra probe: send 16th probe if to old known replier block changes state probing order: no longer change order each round	FBS: windowing - FBS defaults to 1FR (although when we run on datasets on 2019q4 or earlier, we need to manually override to 2FR)	same