public class ICMPTrainRecordReader
extends java.lang.Object
This reader parses and prints out contents of binary icmptrain output file, possibly compressed. These are the files that we use to record data from Internet censuses and surveys.
Name | Length | Comment |
---|---|---|
type | 1 byte | ICMP type field |
code | 1 byte | ICMP code field |
typeandcode | 2 bytes | ICMP type and code concatenated |
typecode | 1 byte | ICMP type and code fiedls bunched together (legacy) |
ttl | 1 byte | ICMP response remaining time to live |
time | 4 bytes | Seconds since the Epoch that the probe was sent |
rtt | 4 bytes | Round trip time in microseconds (if replied or zero) |
flags | 1 byte | Format flags |
probeaddr | 4 bytes | probed IP adddress (or zero if didn't match) |
replyaddr | 4 bytes | IP address of the responder (if any or zero) |
Examples:
-D stream.recordreader.icmptrain.keys=probeaddr -D stream.recordreader.icmptrain.vals=type_code_typecode -D stream.recordreader.icmptrain.vals=type,code,typeandcodeAs mentioned earlier that data to this reader can be a compressed stream. Few codecs split the input file into chunks while many process one whole file. If a codec can handle split file, it will be done so seamlessly. But this behavior can be over-ridden by setting option mapred.input.codec.nosplitting to true as follows:
-D mapred.input.codec.nosplitting=trueAs said earlier that ICMP data is binary. So whenever ICMP file is split, this reader needs to align itself with an ICMP record boundary. This reader uses few heuristics for that. The strength of this heuristic can be controlled by the option stream.recordreader.icmptrain.lookaheadcount as follows:
-D stream.recordreader.icmptrain.lookaheadcount=100In this case the reader will use 100 ICMP records to verify that its alignment choice is right.
HADOOP=$HADOOP_HOME/bin/hadoop STREAMING_JAR=$HADOOP_HOME/build/hadoop-streaming.jar INPUTREADER_JAR=$HADOOP_HOME/build/hadoop-icmptrain.jar INPUTREADER_CLASS=edu.isi.hadoop.icmptrain.ICMPTrainRecordReader ... export INPUTFORMAT_CLASS=edu.isi.hadoop.icmptrain.ICMPTrainInputFormat HADOOP_CLASSPATH=${INPUTREADER_JAR} ... $HADOOP jar $STREAMING_JAR -file $INPUTREADER_JAR -inputreader $INPUTREADER_CLASS ... ... or $INPUTREADER_JAR -inputformat $INPUTFORMAT_CLASS ... ...
Constructor and Description |
---|
ICMPTrainRecordReader(Configuration job,
FileSplit split) |
ICMPTrainRecordReader(FSDataInputStream in,
FileSplit split,
Reporter reporter,
JobConf job,
FileSystem fs) |
Modifier and Type | Method and Description |
---|---|
void |
close() |
Text |
createKey() |
Text |
createValue() |
long |
getPos() |
float |
getProgress() |
boolean |
next(Text key,
Text value) |
public ICMPTrainRecordReader(FSDataInputStream in, FileSplit split, Reporter reporter, JobConf job, FileSystem fs) throws java.io.IOException, java.lang.NumberFormatException
java.io.IOException
java.lang.NumberFormatException
public ICMPTrainRecordReader(Configuration job, FileSplit split) throws java.io.IOException, java.lang.NumberFormatException
java.io.IOException
java.lang.NumberFormatException
public boolean next(Text key, Text value) throws java.io.IOException
java.io.IOException
public Text createKey()
public Text createValue()
public float getProgress() throws java.io.IOException
java.io.IOException
public long getPos() throws java.io.IOException
java.io.IOException
public void close() throws java.io.IOException
java.io.IOException