LANDER:classify internet address blocks-20080912 From Predict README version: 4043, last modified: 2014-06-6. This file describes the trace dataset "classify_internet_address_blocks-20080912" provided by the LANDER project. This is a derived dataset with processed data obtained from three sources:  1. LANDER:internet_address_survey_reprobing_it16w-20070216 Traces taken 2007-02-16 to 2007-02-23.  2. LANDER:internet_address_survey_reprobing_it17w-20070601 Traces taken 2007-06-01 to 2007-06-13.  3. Internet Software Consortium. Internet Domain Survey, ISC DS-2007JAN. web page http://www.isc.org/ds, Jan. 2007. Contents • 1 LANDER Metadata • 2 Dataset Contents • 3 Data Format • 4 Metrics Computation • 4.1 Ping Survey Fields • 4.2 ISC Name Survey Fields • 4.3 Relating Ping and ISC categories to Dyanmic IP Addresses • 5 Citation • 6 Results Using This Dataset • 7 User Annotations LANDER Metadata ┌───────────────────────────┬────────────────────────────────────────────────────────────────────────────────────┐ │ dataSetName │ classify_internet_address_blocks-20080912 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ status │ usc-web-and-predict │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ shortDesc │ Active probes to classify addr blocks │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ longDesc │ This derived dataset is derived from 2 survey datasets and 1 ISC dataset. It │ │ │ contains survey information and ISC information for IP addresses. Survey is done │ │ │ by pinging (ICMP ECHO_REQUEST) each IP address every 11 minutes for around 1 week. │ │ │ We analyzed the ping responses and provide survey information including sum │ │ │ uptime, uptime count, mean uptime, median uptime, max_uptime and ping-observable │ │ │ category. We joined the ISC dataset with a survey dataset and analyzed them for │ │ │ training and validation. We provide ISC information including keywords and │ │ │ hostname-inferred usage category. │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ datasetClass │ Quasi-Restricted │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ commercialAllowed │ true │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ requestReviewRequired │ true │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ productReviewRequired │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ ongoingMeasurement │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ submissionMethod │ Upload │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionStartDate │ 2008-09-12 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionStartTime │ 00:00:00 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionEndDate │ 2008-09-12 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionEndTime │ 00:00:00 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityStartDate │ 2013-03-04 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityStartTime │ 18:13:20 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityEndDate │ 2030-01-01 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityEndTime │ 00:00:00 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ anonymization │ none │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ archivingAllowed │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ │ category:address-space-status-data, │ │ keywords │ subcategory:internet-address-block-classification, active-measurement, topology, │ │ │ ip-address, ping, icmp │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ format │ text │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ access │ https │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ hostName │ USC-LANDER │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ providerName │ USC │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ groupingId │ │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ groupingSummaryFlag │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ retrievalInstructions │ download │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ byteSize │ 448790528 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ expirationDays │ 14 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ uncompressedSize │ 448214884 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ impactDoi │ 10.23721/109/1353590 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ useAgreement │ dua-ni-160816 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ irbRequired │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ privateAccessInstructions │ See https://ant.isi.edu/datasets/#getting-datasets for information on obtaining │ │ │ this dataset. │ │ │ See │ └───────────────────────────┴────────────────────────────────────────────────────────────────────────────────────┘ Dataset Contents classify_internet_address_blocks-20080912.README.txt      copy of this README IP addresses with ping-observable information from classify_internet_address_blocks_it16w-20080912.jdb      survey it16w dataset and hostname-inferred information from ISC 0701 dataset IP addresses with ping-observable information from classify_internet_address_blocks_it17w-20080912.jdb      survey it17w dataset and hostname-inferred information from ISC 0701 dataset     .sha1sum SHA-1 checksum The file ".sha1sum" contains SHA1 checksums of individual compressed files. The integrity of the distribution thus can be checked by independently calculating SHA1 sums of files and comparing them with those listed in the file. If you have the sha1sum utility installed on your system, you can do that by executing: sha1sum --check .sha1sum This has to be done before files are uncompressed. Data Format • .jdb files are in JDB file format. JDB is a package of commands for manipulating flat-ASCII databases from shell scripts. You can find more information about JDB at . In a nutshell, JDB file is a flat-ASCII with rows and columns. Each row in these two files represents an IP address, while columns record information of IP addresses. There are 11 columns in total, which are: ┌───────────────────┬────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ IPv4 address we pinged, in hex format. For example, 3b42a5cc is the hex format of │ │ ip │ 59.66.165.204 (These IPs are not anonymized. Dataset users are reminded that the USC MOA │ │ │ forbits attempting to map these IPs address back to the identities of human users). │ ├───────────────────┴────────────────────────────────────────────────────────────────────────────────────────────┤ │ /* the following fields are derived from the survey it17w dataset */ │ ├───────────────────┬────────────────────────────────────────────────────────────────────────────────────────────┤ │ a │ availability, fraction of time addresss is reachable (fraction between 0 and 1) │ ├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────┤ │ v │ volitality, fraction of times node has changed state from up to down (fraction between 0 │ │ │ and 1) │ ├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────┤ │ sum_u │ cumulative uptime (in seconds) │ ├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────┤ │ n_u │ number of up periods (count) │ ├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────┤ │ mean_u │ mean duration of of up periods (in seconds) │ ├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────┤ │ median_u │ median duration of up periods (in seconds) │ ├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────┤ │ max_u │ maximum observed duration of any up period (in seconds) │ ├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────┤ │ ping_category │ ping-observable category based (a, v, median) │ ├───────────────────┴────────────────────────────────────────────────────────────────────────────────────────────┤ │ /* the following fields are derived from the ISC 200701 dataset */ │ ├───────────────────┬────────────────────────────────────────────────────────────────────────────────────────────┤ │ keywords │ as described below. Multiple keywords are seperated by "_". │ ├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────┤ │ hostname_category │ Hostname-infered usage category. Multiple categories are seperated by "_". │ └───────────────────┴────────────────────────────────────────────────────────────────────────────────────────────┘ If the value in a certain column is "-", it means the info is not available for that IP address. The ISC dataset set contains original hostnames, but we are not permitted to redistribute those. If you have the ISC dataset, you may contact us to obtatin this information. Metrics Computation Ping Survey Fields We define these metrics to analyze data in ping survey dataset: ┌──────────┬──────────────────────────────────────────────────────────┐ │ │ =probing duration (i.e., around 1 week, precisely, │ │ D │ │ │ │ 843931 seconds for it17w, │ │ │ 556197 seconds for it16w) │ ├──────────┼──────────────────────────────────────────────────────────┤ │ I │ probing interval (i.e., around 11 min) │ ├──────────┼──────────────────────────────────────────────────────────┤ │ N │ number of pings = D/I │ ├──────────┼──────────────────────────────────────────────────────────┤ │ r_i │ =i-th ping response (positive/negative), i=1, ..., N │ ├──────────┼──────────────────────────────────────────────────────────┤ │ │ =up durations, j=1, ..., N_u │ │ u_j │ │ │ │ = duration of the jth run of continuous positive r_i s │ ├──────────┼──────────────────────────────────────────────────────────┤ │ sum_u │ =sum(u_j), j=1, ..., N_u, in seconds │ ├──────────┼──────────────────────────────────────────────────────────┤ │ n_u │ =N_u │ ├──────────┼──────────────────────────────────────────────────────────┤ │ mean_u │ =sum_u/n_u = mean(u_j), j=1, ..., N_u, in seconds │ ├──────────┼──────────────────────────────────────────────────────────┤ │ median_u │ =median(u_j), j=1, ..., N_u, in seconds │ ├──────────┼──────────────────────────────────────────────────────────┤ │ max_u │ = max(u_j), j=1, ..., N_u, in seconds │ └──────────┴──────────────────────────────────────────────────────────┘ We define four "ping-observable categories" to characterize IP addresses in survey dataset:  1. always-stable: sum_u >= 0.95*D, n_u == 1  2. sometimes-stable: (sum_u<0.95*D || n_u > 1) && median_u >= 6hours && sum_u >= 0.1*D  3. intermittent: (sum_u<0.95*D || n_u > 1) && median_u < 6hours && sum_u >= 0.1*D  4. underutilized: sum_u < 0.1*D ISC Name Survey Fields We distill keywords from ISC dataset and derived 15 "hostname-inferred usage categories": ┌───────────────────┬───────────────────────────────────────────────────┐ │ hostname-category │ keywords │ ├───────────────────┼───────────────────────────────────────────────────┤ │ static │ static, sta │ ├───────────────────┼───────────────────────────────────────────────────┤ │ dynamic │ dynamic, dyn │ ├───────────────────┼───────────────────────────────────────────────────┤ │ dhcp │ dhcp │ ├───────────────────┼───────────────────────────────────────────────────┤ │ pool-pond │ pool, pond │ ├───────────────────┼───────────────────────────────────────────────────┤ │ ppp │ ppp │ ├───────────────────┼───────────────────────────────────────────────────┤ │ dial │ dial │ ├───────────────────┼───────────────────────────────────────────────────┤ │ dsl │ dsl │ ├───────────────────┼───────────────────────────────────────────────────┤ │ cable │ cable │ ├───────────────────┼───────────────────────────────────────────────────┤ │ wireless │ wireless, wifi │ ├───────────────────┼───────────────────────────────────────────────────┤ │ ded │ dedicate, ded │ ├───────────────────┼───────────────────────────────────────────────────┤ │ biz │ business, biz │ ├───────────────────┼───────────────────────────────────────────────────┤ │ res │ resident, res │ ├───────────────────┼───────────────────────────────────────────────────┤ │ client │ client │ ├───────────────────┼───────────────────────────────────────────────────┤ │ server │ server, srv, svr, www, mx, mail, ftp, smtp, proxy │ ├───────────────────┼───────────────────────────────────────────────────┤ │ rtr-gw │ router, rtr, rt, gateway, gw │ └───────────────────┴───────────────────────────────────────────────────┘ Relating Ping and ISC categories to Dyanmic IP Addresses We have been asked what addresses correspond to dynamic assignements. From our analysis, the dynamic addresses can be inferred from both ping-observable categories and hostname-inferred usage categories. As to ping-observable categories, we believe intermittent ((sum_u<0.95*D || n_u > 1) && median_u < 6hours && sum_u >= 0.1*D) and underutilized (sum_u < 0.1*D) may suggest dynamic assignment. There are also dynamic addresses which are sometimes-stable, but lots of static addresses are sometimes-stable as well. We cannot tell the difference only by inspecting survey dataset. Alternatively, as to hostname-inferred categories, dynamic category certainly suggest dynamic addresses. Except for it, dhcp, pool-pond, ppp, dial, dsl, wireless may also suggest dynamics. But this may not always be true because these addresses can be statically assigned, too. Citation If you use this trace to conduct additional research, please cite it as: Internet Addresses Survey dataset, PREDICT ID USC-LANDER/classify_internet_address_blocks-20080912. Traces generated on 2008-09-12. Provided by the USC/LANDER project (http://www.isi.edu/ant/lander). Results Using This Dataset This dataset has been used the following previously published work: • Xue Cai and John Heidemann. Active Probing to Classify Internet Address Blocks (Extended Abstract for SIGCOMM'08 Poster). Technical Report ISI-TR-653, USC/Information Sciences Institute, August, 2008. http://www.isi.edu/~johnh/PAPERS/Cai08a.pdf User Annotations Currently no annotations. Categories: • Datasets • LANDER • LANDER:Datasets • LANDER:Datasets:AddressSpace