I found a clean way to stop reading from a Node.js stream
The fast-csv Node module makes it easy to parse .list files from the DARPA
Intrusion Detection Data Sets. Set the delimiter
option to a space
character and you’re good to go.
#!/usr/bin/env node
const fs = require('fs');
const csv = require('fast-csv');
const csvFile = process.argv[2];
const fileStream = fs.createReadStream(csvFile);
const csvParser = csv({'delimiter': ' '});
let lines = 0;
csvParser.on('data', function (line) {
lines += 1;
console.log(line.join(' '));
});
csvParser.on('end', function () {
console.log(`read ${lines} lines`);
});
fileStream.pipe(csvParser);
Here’s output from that code running on example data.
$ node darpa-parser.js bsm.list
1 01/23/1998 16:56:48 00:01:26 telnet 1754 23 192.168.1.30 192.168.0.20 0 -
2 01/23/1998 16:56:51 00:00:14 ftp 1755 21 192.168.1.30 192.168.0.20 0 -
10 01/23/1998 16:57:02 00:01:00 telnet 1769 23 192.168.1.30 192.168.0.20 0 -
12 01/23/1998 16:57:12 00:00:03 finger 1772 79 192.168.1.30 192.168.0.20 0 -
13 01/23/1998 16:57:22 00:00:03 smtp 1778 25 192.168.1.30 192.168.0.20 0 -
14 01/23/1998 16:57:23 00:00:03 smtp 1783 25 192.168.1.30 192.168.0.20 0 -
20 01/23/1998 16:57:00 00:01:11 telnet 43496 23 192.168.0.40 192.168.0.20 0 -
... many lines later ...
270 01/23/1998 17:04:29 00:00:05 exec 2032 512 192.168.1.30 192.168.0.20 1 port-scan
308 01/23/1998 17:05:08 00:00:37 telnet 1042 23 192.168.1.30 192.168.0.20 0 -
310 01/23/1998 17:05:31 00:00:01 smtp 1048 25 192.168.1.30 192.168.0.20 0 -
311 01/23/1998 17:06:00 00:00:01 finger 1050 79 192.168.1.30 192.168.0.20 0 -
read 64 lines
But those .list files can get pretty big, and you might not want to parse the
entire thing. If you want to stop the first time you see the smtp
program, you
can try emitting an end
event.
if (line[4] === 'smtp') {
csvParser.emit('end');
}
Let’s put those three lines after the first console.log
statement and run the
program again.
$ node darpa-parser.js bsm.list
1 01/23/1998 16:56:48 00:01:26 telnet 1754 23 192.168.1.30 192.168.0.20 0 -
2 01/23/1998 16:56:51 00:00:14 ftp 1755 21 192.168.1.30 192.168.0.20 0 -
10 01/23/1998 16:57:02 00:01:00 telnet 1769 23 192.168.1.30 192.168.0.20 0 -
12 01/23/1998 16:57:12 00:00:03 finger 1772 79 192.168.1.30 192.168.0.20 0 -
13 01/23/1998 16:57:22 00:00:03 smtp 1778 25 192.168.1.30 192.168.0.20 0 -
read 5 lines
14 01/23/1998 16:57:23 00:00:03 smtp 1783 25 192.168.1.30 192.168.0.20 0 -
20 01/23/1998 16:57:00 00:01:11 telnet 43496 23 192.168.0.40 192.168.0.20 0 -
... many lines later ...
270 01/23/1998 17:04:29 00:00:05 exec 2032 512 192.168.1.30 192.168.0.20 1 port-scan
308 01/23/1998 17:05:08 00:00:37 telnet 1042 23 192.168.1.30 192.168.0.20 0 -
310 01/23/1998 17:05:31 00:00:01 smtp 1048 25 192.168.1.30 192.168.0.20 0 -
311 01/23/1998 17:06:00 00:00:01 finger 1050 79 192.168.1.30 192.168.0.20 0 -
Oops. That didn’t stop the processing, it just moved the output. Fortunately,
streams in Node can be paused and resumed. Calling csvParser.pause()
instead might get us what we want.
if (line[4] === 'smtp') {
csvParser.pause();
}
Let’s replace the if statement we added with this one and run our code again.
$ node darpa-parser.js bsm.list
1 01/23/1998 16:56:48 00:01:26 telnet 1754 23 192.168.1.30 192.168.0.20 0 -
2 01/23/1998 16:56:51 00:00:14 ftp 1755 21 192.168.1.30 192.168.0.20 0 -
10 01/23/1998 16:57:02 00:01:00 telnet 1769 23 192.168.1.30 192.168.0.20 0 -
12 01/23/1998 16:57:12 00:00:03 finger 1772 79 192.168.1.30 192.168.0.20 0 -
13 01/23/1998 16:57:22 00:00:03 smtp 1778 25 192.168.1.30 192.168.0.20 0 -
That’s closer. The processing stopped, but it didn’t print out how many lines
where read. If we combine the two, we might get what we want. We’ll call
pause()
first to stop the stream, and then emit()
to trigger the printing.
if (line[4] === 'smtp') {
csvParser.pause();
csvParser.emit('end');
}
Let’s replace the if statement with this one and run the program one more time.
$ node darpa-parser.js bsm.list
1 01/23/1998 16:56:48 00:01:26 telnet 1754 23 192.168.1.30 192.168.0.20 0 -
2 01/23/1998 16:56:51 00:00:14 ftp 1755 21 192.168.1.30 192.168.0.20 0 -
10 01/23/1998 16:57:02 00:01:00 telnet 1769 23 192.168.1.30 192.168.0.20 0 -
12 01/23/1998 16:57:12 00:00:03 finger 1772 79 192.168.1.30 192.168.0.20 0 -
13 01/23/1998 16:57:22 00:00:03 smtp 1778 25 192.168.1.30 192.168.0.20 0 -
read 5 lines
That did it! Stream processing stopped early and the end
event handler was
called. This sort of thing can be useful if you encounter erorrs when parsing
files and want to halt processing early. And it’s not just for reading files.
This pause and emit trick works for writable streams too.