Making Peace with Logstash Part 3 – Dropping unnecessary columns

1

March 6, 2018 by Mike Hillwig

This is the third part of my Making Peace with Logstash series. In the first part, I showed you how to have Logstash talk to a CSV. In the second part, I showed you how to turn the CSV into fields.

I’m a scifi geek. The word mutate has a certain connotation to it. In the Logstash world, mutate simply means to change the data. We’ll be using the remove_field command inside of a mutate filter.

Like I said earlier, we have some data that I know I’ll never use. This is flight performance data. The dataset contains diversion information. If a flight gets diverted more than once, it’s tracked here. I don’t care about that, so I’m dropping the diversion information for the second through fifth diversions. I’m also dropping some information about the airports that I believe I won’t need. This is the tricky part. Somewhere down the road, I’m going to need to enhance this data by converting all of the times to UTC.

Once again, this is pretty simple.

	mutate {
    remove_field => [ "message", "path", "host",
		"Div2Airport","Div2AirportID","Div2AirportSeqID","Div2WheelsOn","Div2TotalGTime","Div2LongestGTime","Div2WheelsOff","Div2TailNum","Div3Airport",
		"Div3AirportID","Div3AirportSeqID","Div3WheelsOn","Div3TotalGTime","Div3LongestGTime","Div3WheelsOff","Div3TailNum","Div4Airport","Div4AirportID",
		"Div4AirportSeqID","Div4WheelsOn","Div4TotalGTime","Div4LongestGTime","Div4WheelsOff","Div4TailNum","Div5Airport","Div5AirportID","Div5AirportSeqID",
		"Div5WheelsOn","Div5TotalGTime","Div5LongestGTime","Div5WheelsOff","Div5TailNum", 	"OriginAirportID",
		"OriginAirportSeqID", "DestAirportID","DestAirportSeqID"]

Yeah. It’s that easy. If you’re keeping track of this at home, my flights.conf configuration file looks like this:

input {
    file {
        path => ["/Users/mikehillwig/elastic/flights/*.csv"]
        sincedb_path => "/dev/null"
        start_position => "beginning"
    }
}


filter {
	csv {

		columns => [
			"Year", "Quarter", "Month", "DayofMonth", "DayOfWeek", "FlightDate", "UniqueCarrier", "AirlineID", "Carrier", "TailNum", "FlightNum", "OriginAirportID", 
			"OriginAirportSeqID", "OriginCityMarketID", "Origin", "OriginCityName", "OriginState", "OriginStateFips", "OriginStateName", "OriginWac",
			"DestAirportID","DestAirportSeqID","DestCityMarketID","Dest","DestCityName","DestState","DestStateFips","DestStateName","DestWac",
			"CRSDepTime","DepTime","DepDelay","DepDelayMinutes","DepDel15","DepartureDelayGroups","DepTimeBlk","TaxiOut","WheelsOff","WheelsOn","TaxiIn",
			"CRSArrTime","ArrTime","ArrDelay","ArrDelayMinutes","ArrDel15","ArrivalDelayGroups","ArrTimeBlk","Cancelled","CancellationCode","Diverted",
			"CRSElapsedTime","ActualElapsedTime","AirTime","Flights","Distance","DistanceGroup","CarrierDelay","WeatherDelay","NASDelay","SecurityDelay",
			"LateAircraftDelay","FirstDepTime","TotalAddGTime","LongestAddGTime","DivAirportLandings","DivReachedDest","DivActualElapsedTime","DivArrDelay",
			"DivDistance","Div1Airport","Div1AirportID","Div1AirportSeqID","Div1WheelsOn","Div1TotalGTime","Div1LongestGTime","Div1WheelsOff","Div1TailNum",
			"Div2Airport","Div2AirportID","Div2AirportSeqID","Div2WheelsOn","Div2TotalGTime","Div2LongestGTime","Div2WheelsOff","Div2TailNum","Div3Airport",
			"Div3AirportID","Div3AirportSeqID","Div3WheelsOn","Div3TotalGTime","Div3LongestGTime","Div3WheelsOff","Div3TailNum","Div4Airport","Div4AirportID",
			"Div4AirportSeqID","Div4WheelsOn","Div4TotalGTime","Div4LongestGTime","Div4WheelsOff","Div4TailNum","Div5Airport","Div5AirportID","Div5AirportSeqID",
			"Div5WheelsOn","Div5TotalGTime","Div5LongestGTime","Div5WheelsOff","Div5TailNum"]
		separator => ","
	}

			mutate {
				remove_field => [ "message", "path", "host",
					"Div2Airport","Div2AirportID","Div2AirportSeqID","Div2WheelsOn","Div2TotalGTime","Div2LongestGTime","Div2WheelsOff","Div2TailNum","Div3Airport",
					"Div3AirportID","Div3AirportSeqID","Div3WheelsOn","Div3TotalGTime","Div3LongestGTime","Div3WheelsOff","Div3TailNum","Div4Airport","Div4AirportID",
					"Div4AirportSeqID","Div4WheelsOn","Div4TotalGTime","Div4LongestGTime","Div4WheelsOff","Div4TailNum","Div5Airport","Div5AirportID","Div5AirportSeqID",
					"Div5WheelsOn","Div5TotalGTime","Div5LongestGTime","Div5WheelsOff","Div5TailNum", 	"OriginAirportID",
					"OriginAirportSeqID", "DestAirportID","DestAirportSeqID"]
			}
}

output {
    stdout { codec => rubydebug }	
}

Now, let’s run this puppy through Logstash. We’ll get these results:

Mikes-MacBook-Pro:logstash-6.1.2 mikehillwig$ bin/logstash -f ../flights/flights.conf
Sending Logstash's logs to /Users/mikehillwig/elastic/logstash-6.1.2/logs which is now configured via log4j2.properties
[2018-02-19T11:10:32,148][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/Users/mikehillwig/elastic/logstash-6.1.2/modules/netflow/configuration"}
[2018-02-19T11:10:32,166][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/Users/mikehillwig/elastic/logstash-6.1.2/modules/fb_apache/configuration"}
[2018-02-19T11:10:32,401][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-02-19T11:10:33,022][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.1.2"}
[2018-02-19T11:10:33,456][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2018-02-19T11:10:40,920][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>500, :thread=>"#<Thread:0x1ac5944f run>"}
[2018-02-19T11:10:41,269][INFO ][logstash.pipeline        ] Pipeline started {"pipeline.id"=>"main"}
[2018-02-19T11:10:41,475][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}
{
                "ArrDel15" => "0.00",
                   "Month" => "12",
            "WeatherDelay" => nil,
              "DepTimeBlk" => "1500-1559",
               "DestState" => "CA",
       "LateAircraftDelay" => nil,
                 "TailNum" => "N613AS",
            "DestCityName" => "Burbank, CA",
    "DepartureDelayGroups" => "-1",
               "OriginWac" => "92",
         "OriginStateFips" => "41",
                 "DepTime" => "1535",
                "DepDelay" => "-3.00",
         "ArrDelayMinutes" => "1.00",
                    "Year" => "2017",
         "LongestAddGTime" => nil,
              "FlightDate" => "2017-12-02",
         "OriginStateName" => "Oregon",
           "DestStateFips" => "06",
        "Div1LongestGTime" => nil,
                    "Dest" => "BUR",
                "ArrDelay" => "1.00",
          "DivReachedDest" => nil,
                 "DestWac" => "91",
           "DestStateName" => "California",
             "OriginState" => "OR",
               "AirlineID" => "19930",
             "DivDistance" => nil,
           "UniqueCarrier" => "AS",
        "Div1AirportSeqID" => nil,
         "DepDelayMinutes" => "0.00",
               "column110" => nil,
      "DivAirportLandings" => "0",
               "FlightNum" => "490",
                  "Origin" => "PDX",
      "OriginCityMarketID" => "34057",
                "@version" => "1",
              "@timestamp" => 2018-02-19T16:10:41.841Z,
               "DayOfWeek" => "6",
          "OriginCityName" => "Portland, OR",
                "DepDel15" => "0.00",
                 "Quarter" => "4",
           "DistanceGroup" => "4",
          "Div1TotalGTime" => nil,
           "Div1AirportID" => nil,
                 "ArrTime" => "1749",
           "Div1WheelsOff" => "",
               "WheelsOff" => "1547",
                "NASDelay" => nil,
           "SecurityDelay" => nil,
            "FirstDepTime" => "",
              "DayofMonth" => "2",
                  "TaxiIn" => "4.00",
             "Div1Airport" => "",
                "WheelsOn" => "1745",
    "DivActualElapsedTime" => nil,
            "Div1WheelsOn" => "",
              "ArrTimeBlk" => "1700-1759",
             "Div1TailNum" => "",
                "Distance" => "817.00",
                 "TaxiOut" => "12.00",
        "CancellationCode" => "",
            "CarrierDelay" => nil,
        "DestCityMarketID" => "32575",
                 "AirTime" => "118.00",
              "CRSDepTime" => "1538",
                "Diverted" => "0.00",
              "CRSArrTime" => "1748",
                 "Carrier" => "AS",
             "DivArrDelay" => nil,
      "ArrivalDelayGroups" => "0",
               "Cancelled" => "0.00",
          "CRSElapsedTime" => "130.00",
                 "Flights" => "1.00",
           "TotalAddGTime" => nil,
       "ActualElapsedTime" => "134.00"
}

We’re making some serious progress with this. Next time, let’s find a field to use as our timestamp. That’ll make it easier to get the data into Elasticsearch.