Usage of LogStash JSON filter is very simple and it is described in the official docs. All you need is create a special object mapping in your index:
1PUT /logstash/_mapping?pretty HTTP/1.0
2Content-Type: application/json
3
4{
5 "properties": {
6 "data": {
7 "type": "object",
8 "dynamic": true
9 }
10 }
11}
And add to the LogStash config something like this:
1filter {
2 json {
3 skip_on_invalid_json => true
4 source => "message"
5 target => "data"
6 add_tag => [ "_message_json_parsed" ]
7 }
8}
But what can go wrong? In this configuration you can see a lot of warnings like this:
1[2020-07-06T18:51:37,837][WARN ][logstash.outputs.elasticsearch][main][...]
2Could not index event to Elasticsearch.
3{
4 :status=>400,
5 :action=>[
6 "index", {
7 :_id=>nil,
8 :_index=>"logstash",
9 :routing=>nil,
10 :_type=>"_doc"
11 },
12 #<LogStash::Event:0x2bd37cf7>
13 ],
14 :response=>{
15 "index"=>{
16 "_index"=>"logstash-2020.07.06-000007",
17 "_type"=>"_doc",
18 "_id"=>"1wN4JXMBewtat5szJUjd",
19 "status"=>400,
20 "error"=>{
21 "type"=>"mapper_parsing_exception",
22 "reason"=>"object mapping for [data] tried to parse field [data] as object, but found a concrete value"
23 }
24 }
25 }
26}
What it means? LogStash JSON parser is not so strict and if a message
doesn’t contain a valid JSON, but a valid string, the data
field will contain only this string, but not an “object”.
Moreover, if this happens after a log rotation, it could create a data
field mapped to the string
type, which can cause more problems, like required index re-creation, etc.
To avoid this, you need to upgrade you LogStash configuration with additional logic:
1filter {
2 json {
3 skip_on_invalid_json => true
4 source => "message"
5 target => "data"
6 add_tag => [ "_message_json_parsed" ]
7 }
8
9 if [data] =~ /.*/ {
10 mutate {
11 remove_field => [ "data" ]
12 }
13 }
14}
What it does? LogStash has no ability to check that data
is a valid object, so we check that it is not a string. If it is, regex check will fail and we will remove a data
field. So now, there will be only properly parsed JSONs in the data
property of our logs.