LANDER:internet address survey tcp1-20031120 From Predict README version: 4031, last modified: 2014-06-6. This file describes the trace dataset "internet_address_survey_tcp1-20031120" provided by the LANDER project. [IMG] Warning: This dataset has been marked as possibly damaged; see a full description in User Annotations Contents • 1 LANDER Metadata • 2 Dataset Contents • 3 Data Format • 4 Collection Method • 4.1 Probing Location(s) • 4.2 Coverage • 4.3 Beginning/Ending Date and Time Zone • 5 Citation • 6 Results Using This Dataset • 7 User Annotations • 7.1 Non conformity of this dataset LANDER Metadata ┌───────────────────────────┬────────────────────────────────────────────────────────────────────────────────────┐ │ dataSetName │ internet_address_survey_tcp1-20031120 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ status │ usc-web-and-predict │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ shortDesc │ TCP probe census of alloc IPv4 addresses │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ longDesc │ To collect this data, an Internet-wide IP address sweep was conducted. Every IP │ │ │ address in the ranges allocated by IANA was pinged at least 5 times by sending TCP │ │ │ SYN (other flags were experimeted with) packets to it, before giving up. High │ │ │ unused destination port was used. If the response (typically TCP RST) came, its IP │ │ │ address was recorded in this data-set. In all, approximately 2.5 billion distinct │ │ │ IP addresses were probed during this experiment. │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ datasetClass │ Quasi-Restricted │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ commercialAllowed │ true │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ requestReviewRequired │ true │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ productReviewRequired │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ ongoingMeasurement │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ submissionMethod │ Upload │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionStartDate │ 2003-11-20 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionStartTime │ 00:00:00 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionEndDate │ 2004-03-19 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionEndTime │ 00:00:00 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityStartDate │ 2012-01-27 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityStartTime │ 17:06:15 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityEndDate │ 2030-01-01 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityEndTime │ 00:00:00 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ anonymization │ none │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ archivingAllowed │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ keywords │ category:address-space-status-data, subcategory:internet-census-and-survey-data, │ │ │ ip-address, sweep, address-collection, ping, icmp, one-time │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ format │ binary │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ access │ https │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ hostName │ USC-LANDER │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ providerName │ USC │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ groupingId │ │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ groupingSummaryFlag │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ retrievalInstructions │ download │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ byteSize │ 495976448 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ expirationDays │ 14 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ uncompressedSize │ 2006864174 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ impactDoi │ 10.23721/109/1353570 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ useAgreement │ dua-ni-160816 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ irbRequired │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ privateAccessInstructions │ See http://www.isi.edu/ant/traces/index.html#getting_datasets for information on │ │ │ obtaining this dataset. │ │ │ See │ └───────────────────────────┴────────────────────────────────────────────────────────────────────────────────────┘ Dataset Contents internet_adddress_survey_tcp1-20031120.README.txt      copy of this README iana--ipv4-address-space.txt iana allocations used for probing data/     tcp1.doe.bz2 binary data files     tcp1.kid.bz2     tcp1.urn.bz2     .sha1sum SHA-1 checksum raw/     tcp1.doe.txt.gz original text-version of the trace     tcp1.kid.txt.gz     tcp1.urn.txt.gz     .sha1sum SHA-1 checksum info/ (where available)     subnet_stats.slash-16.fsdb stats computed over dataset in FSDB text format     subnet_stats.slash-16.png pretty picture of the above     summary.txt summary of IP address usage in human readable form Subdirectory "data" contains four bzipped binary files containing probe records. Each file is named after probing machine that was collecting data. E.g. it.*.doe.bz2 was collected by machine "doe.isi.edu". These machines have statically assigned IP addresses: doe.isi.edu 128.9.160.251 kid.isi.edu 128.9.168.80 urn.isi.edu 128.9.160.131 The address space was divided among probing machines in a mutual exclusive way. The division was such that each /24 subnet was probed by a single machine. The file ".sha1sum" contains SHA1 checksums of individual compressed files. The integrity of the distribution thus can be checked by independently calculating SHA1 sums of files and comparing them with those listed in the file. If you have the sha1sum utility installed on your system, you can do that by executing: sha1sum --check .sha1sum This has to be done before files are uncompressed. Subdirectory "raw" contains gzip-compressed text files with responsive IP addresses. Subdirectory "info" contains stats computed over /16s in the dataset. Data Format Binary format of trace files is described in detail here: http://www.isi.edu/ant/traces/topology/address_surveys/binformat_description.html Collection Method Data collection involves pinging all allocated addresses. A full description of this method is in: > John Heidemann, Yuri Pradkin, Ramesh Govindan, Christos Papadopoulos, Genevieve Bartlett, and Joseph Bannister. > Census and Survey of the Visible Internet. In Proceedings of the ACM Internet Measurement Conference, p.169-182. > Vouliagmeni, Greece, ACM. October, 2008 http://www.isi.edu/~johnh/PAPERS/Heidemann08c.html. Probing Location(s) The probing locations for censuses and surveys are indicated in their names. A "w" means west, which is from isi.edu in Marina del Rey, California; "c" means center, from colostate.edu, in Ft. Collins, Colorado; "e" means east which is from east.isi.edu, from Arlington, Virginia; "j" means Japan, which is from WIDE in Fujisawa-shi, Kanagawa, Japan; "g" stands for AUEB, located in Athens, Greece; "n" stands for Utrecht, The Netherlands. See Dataset Contents for more information on the system who did the survey. Coverage Earliest censuses started before June 2007, i.e. up to and including it17, did not attempt to probe addresses with the last octet either 0 or 255, e.g. x.y.z.0 (/24 subnet address) or x.y.z.255 (/24 broadcast). Censuses started between September 2007 and June 2008, i.e. it18-it21, probed all values of the last octet, except .255. Censuses started after September 2008, starting from it22, probed all values of the last octet, including both .0 and .255. Beginning/Ending Date and Time Zone Dates/Times specified in the metadata are in UTC. Earlier censuses (before it37) used local time in their metadata description. Their metadata will be updated to effectively switch to UTC in the near future. Citation If you use this trace to conduct additional research, please cite it as: Internet Addresses Census dataset, PREDICT ID: USC-LANDER/internet_address_survey_tcp1-20031120. Traces taken 2003-11-20 to 2004-03-19. Provided by the USC/LANDER project (http://www.isi.edu/ant/lander). Results Using This Dataset Traces similar to this one containing collections of "live" IP addresses have been used the following previously published work: • John Heidemann, Yuri Pradkin, Ramesh Govindan, Christos Papadopoulos, Genevieve Bartlett, and Joseph Bannister. Census and Survey of the Visible Internet. In Proceedings of the ACM Internet Measurement Conference, p.169-182. Vouliagmeni, Greece, ACM. October, 2008 http://www.isi.edu/~johnh/PAPERS/Heidemann08c.html. • Yuri Pryadkin, Robert Lindell, Joseph Bannister, and Ramesh Govindan An Empirical Evaluation of IP Address Space Occupancy Technical Report ISI-TR-2004-598, USC/Information Sciences Institute, November 2004 ftp://ftp.isi.edu/isi-pubs/tr-598.pdf. • Lin Quan, John Heidemann, Yuri Pradkin. Detecting Internet Outages with Precise Active Probing (extended). Technical Report ISI-TR-2012-678b, USC/Information Sciences Institute, May, 2012 ftp://ftp.isi.edu/isi-pubs/tr-678b.pdf. User Annotations Non conformity of this dataset This dataset is "non-conformant" in several ways:  1. TCP protocol used instead of ICMP for probes.  2. Each address is probed at most 5 times before giving up.  3. Only self-responsive IP addresses were recorded (i.e. probedIP==responseIP) --Yuri 19:42, 2 June 2010 (UTC) Categories: • Datasets • LANDER • LANDER:Datasets • LANDER:Datasets:AddressSpace:Census • LANDER:Datasets:AddressSpace