Making Peace with Logstash Part 1 – Input and Output


February 21, 2018 by Mike Hillwig

Logstash is an incredibly powerful tool. If you can put data into a text file, Logstash can parse it. It works well with a lot of data, but I’m finding myself using it more to use it for event data. When I say event data, if it triggers a log event and it writes to a log, it’s an event. For the purposes of my demos, I’m using data from the Bureau of Transportation Statistics. They track flight performance data, which works perfectly for my uses. It’s a great example dataset without using anything related to my real job.

Logstash configuration files typically have three sections, INPUT, FILTER, and OUTPUT. However, FILTER is optional.

In this case, my configuration looks like this:

input {

file {

path => ["/Users/mikehillwig/elastic/flights/*.csv"]

sincedb_path => "/dev/null"

start_position => "beginning"



output {

stdout {}

This is pretty bare bones for a Logstash config. I did put in a wildcard to the file path. The one thing I added that’s unusual is the sincedb_path option. Logstash is smart enough to avoid reprocessing a file twice and it knows where it processed a file during an exicution. In this case, I don’t want Logstash to remember this, so I forced the sincedb_path to /dev/null.
Note that I’m sending the output to stdout. That allows me to see the data passing through my pipeline without having to look for it in Elasticsearch. We’ll put this into Elasticsearch in a future post.
The output of this config looks something like this:


Some of this data makes sense, but some of it is just noise, Next time, I’ll show you how to parse this as a CSV file, then we’ll eliminate some noise.