July 7, 2020, 1:19 p.m.
Posted by soar

LogStash JSON filter

Usage of LogStash JSON filter is very simple and it is described in the official docs. All you need is create a special object mapping in your index:

PUT /logstash/_mapping?pretty
{
  "properties": {
    "data": {
      "type": "object",
      "dynamic": true
    }
  }
}

And add to the LogStash config something like this:

filter {
    json {
        skip_on_invalid_json => true
        source => "message"
        target => "data"
        add_tag => [ "_message_json_parsed" ]
    }
}

But what can go wrong? In this configuration you can see a lot of warnings like this:

[2020-07-06T18:51:37,837][WARN ][logstash.outputs.elasticsearch][main][...] 
Could not index event to Elasticsearch. 
{
    :status=>400, 
    :action=>[
        "index", {
            :_id=>nil, 
            :_index=>"logstash", 
            :routing=>nil, 
            :_type=>"_doc"
        }, 
        #<LogStash::Event:0x2bd37cf7>
    ], 
    :response=>{
        "index"=>{
            "_index"=>"logstash-2020.07.06-000007", 
            "_type"=>"_doc", 
            "_id"=>"1wN4JXMBewtat5szJUjd", 
            "status"=>400, 
            "error"=>{
                "type"=>"mapper_parsing_exception", 
                "reason"=>"object mapping for [data] tried to parse field [data] as object, but found a concrete value"
            }
        }
    }
}

What it means? LogStash JSON parser is not so strict and if a message doesn't contain a valid JSON, but a valid string, the data field will contain only this string, but not an "object".

Moreover, if this happens after a log rotation, it could create a data field mapped to the string type, which can cause more problems, like required index re-creation, etc.

To avoid this, you need to upgrade you LogStash configuration with additional logic:

filter {
    json {
        skip_on_invalid_json => true
        source => "message"
        target => "data"
        add_tag => [ "_message_json_parsed" ]
    }

    if [data] =~ /.*/ {
        mutate {
            remove_field => [ "data" ]
        }
    }
}

What it does? LogStash has no ability to check that data is a valid object, so we check that it is not a string. If it is, regex check will fail and we will remove a data field. So now, there will be only properly parsed JSONs in the data property of our logs.

Comments