ANT Packet Processing

timefind and indexer

Software to handle indexing and selection of multiple network data types based on a given time range.

Pre-built versions for RPM (Fedora, CentOS, REHL): see https://copr.fedorainfracloud.org/coprs/johnh/timefind/

The latest code can also be checked out via git:

introduction

A group of folks at Los Alamos National Laboratory and at USC/ISI have developed two tools to handle indexing and selection of multiple network data types: timefind and indexer.

Most of us have processed or will be processing large amounts of timestamped data (.pcap, logs, and so on). For example, if we had .pcap spanning 2010-2015, we’d probably want to downselect on a time range, e.g., 2015-Jan-01 to 2015-Feb-01.

Some ways people do downselection now is to build regexes and walk the directory tree. This probably works fine with only one consistently-formatted data source (good luck to the next person that decodes and inevitably rebuilds the regex).

indexer will walk through all your data and index the timestamps of the earliest and latest records.

timefind will then use the indexes and retrieve the filenames that overlap with the given time range input. For example, if I want to downselect 2015-Jan-01 to 2015-Feb-01 on DNS .pcap data:

timefind --begin="2015-01-01" --end="2015-02-01" dns

It’s that simple and consistent.

Please send email to calvin@isi.edu with questions, bugs, feature requests, patches, and any notes on your usage!

instructions

Requires Go v1.5+.

Download and extract the tarball, and run make. Binaries and corresponding README.* files will be built in bin/.

indexer

indexer reads in a configuration file describing a source and outputs an index in CSV format containing a list of filenames, timestamp of the earliest record, and timestamp of the latest record.

Using timefind in conjunction with these indexes, a user can downselect the number of files based on a time range.

timefind

Given a large data store, a user may only need a subset of data for processing. For example, a user may only want to process a month’s worth of data (e.g., January 2015) instead of the entire collection.

Given a time range, timefind retrieves the filenames from an index generated by indexer that overlap with the time range.

For example, to retrieve all DNS data from January 2015, we might run timefind as follows:

timefind --begin="2015-01-01" --end="2015-02-01" dns

Copyright (C) 2015. Los Alamos National Security, LLC.

This software has been authored by an employee or employees of Los Alamos National Security, LLC, operator of the Los Alamos National Laboratory (LANL) under Contract No. DE-AC52-06NA25396 with the U.S. Department of Energy. The U.S. Government has rights to use, reproduce, and distribute this software. The public may copy, distribute, prepare derivative works and publicly display this software without charge, provided that this Notice and any statement of authorship are reproduced on all copies. Neither the Government nor LANS makes any warranty, express or implied, or assumes any liability or responsibility for the use of this software. If software is modified to produce derivative works, such modified software should be clearly marked, so as not to confuse it with the version available from LANL.

Additionally, this program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.