How to ship JSON logs via Rsyslog

It's pretty straight forward to ship text logs via Rsyslog; but what if but what if you're log files and are in JSON and you want to ship them?

In my previous article How to ship logs with Rsyslog and Logstash I described how to ship text based logs files; since writing that article I have moved on to having pretty much every log file in JSON, which requires some extra Rsyslog config. The advantage of using JSON is that you need minimal filters on the Logstash side of things, which gives developers more control over what they push into your ELK stack, the downside is that you will need to enforce some standards on field names and formats; Elasticsearch will throw errors if the same field name appears with different formats (int, string, object etc.) and you will get mysterious missing log lines in your ELK.

Here is an example JSON log line that you might want to parse:

{"message":"Matched route /healthcheck","context":[],"channel":"webservice","datetime":"2017-03-05T11:08:38+00:00","extra":[],"log_level":"INFO"}

What we want to do is get this into the ELK stack pretty much as is, with the caveat that we will add in add some extra fields that describe the environment that the service is running in, such as the machines hostname or role; we shouldn't force applications to include this information as we can use Rsyslog to add boiler plate data like this into all log messages. With this in mind we will need to actually parse our log files as JSON using Rsyslog so we can include what we want from the original message and add in these extra bits.

Rsyslog config

We can parse JSON with Rsyslog by employing the mmjsonparse module, you will need to install this module first off as it isn't included with Rsyslog by default, you can install this on Ubuntu like so:

# Add rsyslog PPA
add-apt-repository ppa:adiscon/v8-stable 
apt-get update
apt-get install rsyslog-mmjsonparse

Now we need to load the module in our Rsyslog config file, add this near the top of the file:

# /etc/rsyslog.conf
...
module(load="mmjsonparse")
...

In order to add extra fields to log lines we will need to define a template that concatenates our extra fields with the fields in the original message; the json parse module makes each field in the parsed message available so you can reference and include them directly; alternatively there is an all-json property that represents the entire line.

Here is an example that concatenates some extra fields with all fields in a log line:

# /etc/rsyslog.conf
...

# Template for shipping JSON logs
# Just adds some furniture to the json message
template(name="allJsonLogTemplate"
   type="list") {
   constant(value="{ ")
   constant(value="\"type\":\"")
   property(name="programname")
   constant(value="\", ")
   constant(value="\"host\":\"")
   property(name="hostname")
   constant(value="\", ")
   constant(value="\"@version\":\"1\", ")
   constant(value="\"role\":\"api_something\", ")
   constant(value="\"sourcefile\":\"")
   property(name="$!metadata!filename")
   constant(value="\", ")
   property(name="$!all-json" position.from="2")
}
...

So here we are adding in the extra fields type, host, @version and sourcefile (some of which are filled in using Rsyslog properties), notice that we have to open the JSON with an initial bracket, we also have to add commas, quotes and escapes in order to form a valid json message. The slightly funky bit is where we add in the original log line via the all-json property, we have to snip off the first two characters via the position.from parameter (see Rsyslog templates for more information on this), this removes the initial opening bracket from the log line so we can combine with our extra fields into a valid JSON message.

Alternatively we could reference fields in the original message directly, this way we bring only fields that we want to include from the log line as opposed to including the entire message:

# /etc/rsyslog.conf
...
# Template for shipping JSON logs
# Just adds some furniture to the json message
template(name="pickedJsonLogTemplate"
   type="list") {
   constant(value="{ ")
   constant(value="\"type\":\"")
   property(name="programname")
   constant(value="\", ")
   constant(value="\"host\":\"")
   property(name="hostname")
   constant(value="\", ")
   constant(value="\"@version\":\"1\", ")
   constant(value="\"role\":\"api_something\", ")
   constant(value="\"sourcefile\":\"")
   property(name="$!metadata!filename")
   constant(value="\", ")
   constant(value="\"message\":\"")
   property(name="$!message")
   constant(value="\", ")
   constant(value="\"datetime\":\"")
   property(name="$!datetime")
   constant(value="\", ")
   constant(value="\"log_level\":\"")
   property(name="$!log_level")
   constant(value="\" ")
   constant(value=" }")
}
...

Once again in order to output to Logstash as valid JSON we need to add in quotes, commas etc. where needed, along with a closing bracket.

OK now that we have our templates in place let's use them to read from a log file and parse each line with the JSON parse module, my preference is to define a ruleset which you can apply to any input, so here is our JSON log ruleset:

# /etc/rsyslog.conf

ruleset(name="remoteAllJsonLog") {
    action(type="mmjsonparse" cookie="")
    action(
        type="omfwd"
        Target="logstash.endpoint"
        Port="5001"
        Protocol="udp"
        template="allJsonLogTemplate"
    )
    stop
}

The key line here is the mmjsonparse action, adding this action will tell Rsyslog to parse each log line from an input as JSON which makes each field in the message available (or the whole thing available via all-json); by default the module will only parse lines that begin with @cee (this is to do with the deprecated Lumberjack log format), however I don't think anyone wants to force applications to start each line with this prefix so we add the parameter 'cookie=""' which tells the module to parse standard JSON lines that do not have any prefix at all.

Finally let's define a file input that uses this ruleset:

# /etc/rsyslog.d/10-api-something.conf

input(type="imfile"
   File="/var/log/api-something.log"
   Tag="application-logs"
   addMetadata="on"
   ruleset="remoteAllJsonLog"
)

So that's it, we've got the json parse module loaded, we've got a ruleset that uses a JSON friendly template and we've setup an input based on this.

Logstash config

On the Logstash side of things you will just need a JSON input, you will probably need some filters to deal with different date formats in here since applications will no doubt log the time in different ways.

Here is an example bit of Logstash config that takes JSON and parses a few different date formats:

# /etc/logstash/conf.d/01-main.conf

input {
    udp {
        port => 5001
        codec => json
    }
}

filter {
    # Deal with different time formats
    if [datetime] {
        date {
            match => [ "datetime" , "ISO8601" ]
            remove_field => [ "datetime" ]
        }
    }
    if [timestamp_unix] {
        date {
            match => [ "timestamp_unix" , "UNIX" ]
            remove_field => [ "timestamp_unix" ]
        }
    }
}

output {
    elasticsearch {
        flush_size => '100'
        hosts => [ 'elasticsearch.elk.internal:9200' ]
        index => 'logstash-%{+YYYY.MM.dd}'
    }
}

Debugging

If your log lines aren't appearing in Kibana then it's possible that they aren't getting parsed by Logstash due some kind of formatting issue, you have a few different areas where you can figure out what's going on.

  • You can check the logs of Logstash and ElasticSearch to see if their are any messages about bad formatting or conflicting fields
  • You can use tcpdump (with the udp setting) to view log lines getting shipped on the source machine
  • Similarly you can employ tcpdump on your Logstash machine to see what's coming in

Useful bits of config

There are some Rsyslog settings that you might want to tweak.

MaxMessageSize - by default the maximum size of a message is 8k, in my experience you will reach this limit quite often which results in truncated log lines getting sent to Logstash, you can increase the limit by placing the following at the top of your rsyslog.conf file:

# /etc/rsyslog.conf
$MaxMessageSize 64k

RebindInterval - this setting is for the omfwd module, it specifies how many messages are sent before the network connection to Logstash is restarted, this is useful for a Logstash cluster behind a load balancer; without this you will not get an even distribution of traffic between each load balanced Logstash instance:

# /etc/rsyslog.conf

ruleset(name="remoteAllJsonLog") {
    action(type="mmjsonparse" cookie="")
    action(
        type="omfwd"
        Target="logstash.endpoint"
        Port="5001"
        Protocol="udp"
        RebindInterval="100"
        template="allJsonLogTemplate"
    )
    stop
}

Wrapping up

OK that's it for now, if you have any questions about this article then please pop them into the comments section below; oh and you can take a look at my Rsyslog example repo for some example snippets of config.