How to ship logs with Rsyslog and Logstash

Rsyslog is lightweight and powerful, it's an excellent tool for log processing and remote shipping. Using it you can ship not only syslog messages, but also messages from any application log file with precise control over the format. This guide takes a look at using Rsyslog as a shipper for Logstash.

Note I've written a follow up article to this one on shipping JSON logs, take a look once you've read this one.

Rsyslog for log processing

Getting to grips with Rsyslog can be somewhat confusing; there is a lot of information floating around covering several different versions, so some of the information you will find can be dated and/or irrelevant. Before diving into the specifics let me detail what I wanted to achieve with Rsyslog:

  • Ship logs to different endpoints if needed, in my case it was different logstash ports; I used one in the end but the option is still there. You can of course use the methods described here to ship logs to any receiver.
  • Ship syslog messages, many applications can log straight to syslog which makes life a lot easier.
  • Ship messages from any number of application log files such as Symfony2 and Django logs as well Nginx access logs.
  • Add details of the origin server to the log messages, such as the name/role and hostname/ip address of the origin server.
  • I wanted to enforce a standard format for log messages so I wouldn't need complicated filters to understand them.
  • Finally I wanted to be able to filter out unwanted messages such as ELB health checks in access logs.

You may ask why didn't I just use the Logstash Forwarder since I am forwarding to Logstash? I find Rsyslog to be far more powerful and configurable, honestly I am not really sure what the Logstash Forwarder brings to the table when compared to Rsyslog, except for simplicity, this will no doubt change in the future though as the project matures.

Picking up on the last point, Rsyslog can seem daunting; the documentation can be cryptic and as mentioned many of the examples out there cover a range of versions, you will have to do a lot of reading in order to fully understand how it all works.

Before I get into detailed examples, here are some basic pointers for Rsyslog:

  • Ensure that you are using the latest version (version 8 at the time of writing), it is bundled with many OSs but you will most likely need to upgrade to the latest one. There is a ppa available for Ubuntu that gets you version 8.
  • The main configuration file is /etc/rsyslog.conf, general settings should go in there.
  • Extra config files are read from /etc/rsyslog.d/, you should put single purpose config files in here (eg. one per log file) using the classic double digit prefix pattern to ensure ordering.
  • Rsyslog comes with a selection of modules (input, output and action) that allow you to do various things, such as a reading from a file or sending messages over TCP. Most of your setup will involve picking a plugin that fits your needs and then configuring it.
  • RainerScript is a DSL that you can use for conditionals and other dynamic things.
  • You can use Rsyslog to write to files locally and to ship them externally, you can ship straight into Elasticsearch if you wish. By default Rsyslog just writes syslog and kernal messages to the various system log files in /var/log.
  • You can construct templates which define how log files are formatted.

The Rsyslog documentation is a good reference for all the available plugins, it's worth having a read through them to get an understanding of what you can do.

General Rsyslog settings

Let's setup some basic settings in Rsyslogs main config file to load some modules, setup queues and other bits:

# /etc/rsyslog.conf
module(load="imfile" PollingInterval="10")
module(load="imklog")
module(load="imuxsock")

$RepeatedMsgReduction off

$WorkDirectory /var/spool/rsyslog

$ActionQueueFileName mainqueue
$ActionQueueMaxDiskSpace 500M
$ActionQueueSaveOnShutdown on
$ActionQueueType LinkedList
$ActionResumeRetryCount -1

# Use standard RFC5424 log format for local logs
$ActionFileDefaultTemplate RSYSLOG_SyslogProtocol23Format

# Include extra config files
$IncludeConfig /etc/rsyslog.d/*.conf

Here I am setting up queuing, loading a few necessary plugins and some other bits that will become clearer later on; I normally stick any custom templates into the global file as well. The imklog and imuxsock modules are required and should always be included so that Rsyslog can work with syslog and kernal messages.

Shipping syslog messages

This one is easily done, later on I will add a template that is more Logstash friendly but for now let's just look at shipping syslog messages externally.

Rsyslog comes with some default config in /etc/rsyslog.d/ for sending system messages to the various /var/log files, the config is normally called 50-default.conf, take a look in the default file and find the line that appears to write to /var/log/syslog, it should look something like this:

*.*;syslog;auth,authpriv.none /var/log/syslog

This is called an action, the first part tells Rsyslog which facilities and levels this action applies to, in this case the wild cards imply everything. The bits after the semicolons denote which selectors to include (where the messages come from), in this case syslog messages are included, adding .none means an exclusion of all selectors in the current semi colon, so auth and authpriv messages are not included. The last part simply pushes the output into a file. So to summarise this line, all syslog messages except auth and authpriv messages will be written to /var/log/syslog.

Create a new file ordered before this one (eg. /etc/rsyslog.d/49-ship-syslog.conf), we are going to add a line similar to this into your new file but we are going to direct the output to a remote server over UDP using the more detailed action object to describe what to do with the messages:

# /etc/rsyslog.d/49-ship-syslog.conf
*.*;syslog;auth,authpriv.none action(
  type="omfwd"
  Target="remote.server.com"
  Port="5001"
  Protocol="udp"
)

Now syslog messages are forwarded to remote.server.com on port 5001, we are using the omfwd module which is capable of forwarding messages via tcp or udp. Note that the rule in the default syslog config file will still pick up messages and save them to /var/log/syslog, you can use Rsyslog's stop command to prevent further processing if you wish, this will come in handy later on, we don't want to use that now though.

It's worth noting that many languages and frameworks can log directly to syslog, so if you can update your apps to log directly to syslog instead of their own files then this may be the easiest option for remote logging.

Debugging Rsyslog

After a restart of Rsyslog you should see syslog messages getting shipped to your remote server, if you want to see exactly what Rsyslog is doing you can add the following lines into rsyslog.conf (make sure you restart the rsyslog service):

# /etc/rsyslog.conf
$DebugFile /var/log/rsyslog-debug.log
$DebugLevel 2

Tail the file and then push messages to syslog Rsyslog is monitoring, you should see some messages detailing exactly what Rsyslog is doing. You can send messages to syslog using logger.

Shipping messages from a log file

The imfile module can monitor log files, this is useful for applications that cannot log to syslog, such as Apache or Nginx access logs.

In this example let's monitor an Nginx access log:

# /etc/rsyslog.d/01-nginx-access.conf
input(type="imfile"
      File="/var/log/nginx/nginx-access.log"
      Tag="nginx-access"
)

if $programname == 'nginx-access' then {
    action(
        type="omfwd"
        Target="remote.server.com"
        Port="5001"
        Protocol="udp"
    )
    stop
}

Put file monitors into their own config file in /etc/rsyslog.d/, in this instance I am going to call this file 01-nginx-access.conf.

The first part of the config tells Rsyslog to monitor the access log file, you can read up on the imfile module for a list of all of the available config options. The tag option assigns log messages from this file a programname, this is a variable that can be used to identify messages from this log source.

The second part of the config, which is a conditional switch, catches only messages from the access log by switching on the programname variable, it then sends them to a remote server and port using the omfwd module. The stop command indicates that no other processing should take place, this prevents access log messages from getting caught by other log rules (such as the default syslog rules).

This is the basic setup for monitoring a file, you can modify this template to monitor any file on a server.

Before going on it's worth noting that you can prevent lines that contain certain pieces of text from getting shipped, in my case I want to prevent AWS ELB healthchecks, here's the revised config with this Filter Condition:

# /etc/rsyslog.d/01-nginx-access.conf
input(type="imfile"
      File="/var/log/nginx/nginx-access.log"
      Tag="nginx-access"
)

if $programname == 'nginx-access' then {
    # Don't log ELB healthchecks
    :msg, contains, "ELB-HealthChecker" stop
    :msg, contains, "healthcheck" stop
    action(
        type="omfwd"
        Target="remote.server.com"
        Port="5001"
        Protocol="udp"
    )
    stop
}

Formatting messages with templates

This is probably the most powerful feature of Rsyslog, you can build custom templates that allow you to format the messages that you send to remote servers, this allows you to format in a precise way for the receiver.

The Rsyslog docs describe how templates are built and also lists some built in ones, I am going to build a few custom ones for massaging log messages into a Logstash friendly format.

We want to ship in json so we can add some extra metadata fields to messages, here's a very basic json format for shipping Syslog rfc5424 formatted messages to Logstash:

{
    "type": "[eg. syslog, nginx-access-log]"
    "host": "[eg. ip-10-192-8-252 in aws]"
    "role": "[eg. api-hello-app]"
    "message": "[rfc5424 message]"
}

As you can see I have added in a type field which we can use to apply different filters in Logstash along with a role field that will contain a description of the sending machine, this could be something general like php-webserver or it could be the name of the application that the machine is hosting. The message field simply contains the 5424 formatted message itself for Logstash to parse. It's worth noting that the 5424 format includes a program part so it's possible to determine which application on the sending machine sent the message.

To ship in this format I am going to need a template that does two things, it needs to format the message into rfc5424 format and then put this into the json format described above for logstash, here is my template for doing all of this:

# /etc/rsyslog.conf
$template jsonRfc5424Template,"{\"type\":\"syslog\",\"host\":\"%HOSTNAME%\",\"role\":\"api-hello-app\",\"message\":\"<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg:::json%\"}\n"

Here I have extracted Rsyslog's built in 5424 template and shoved it into a Logstash friendly json format, note the use of property replacers to massage the date field into rfc3339 format and the msg field into a json friendly format (see the :::date-rfc3339 and :::json parts), property replacers are essentially filters that you can use to format parts of a message. Since syslog messages are sent directly to Rsyslog it can manipulate each part of the message in detail.

All that's needed now is to assign this template to syslog messages that are shipped remotely, here's the revised syslog shipping config:

# /etc/rsyslog.d/49-ship-syslog.conf
*.*;syslog;auth,authpriv.none action(
  type="omfwd"
  Target="remote.server.com"
  Port="5001"
  Protocol="udp"
  template="jsonRfc5424Template"
)

Note the addition of the template option tells Rsyslog to use our new json template when shipping messages.

This template is only really useful for syslog and kernal messages that are sent directly to Rsyslog; you might also want to ship messages from a file that is already in rfc5424 format in which case you can use a template that simply adds the log message into the Logstash json format without any 5424 formatting like so:

# /etc/rsyslog.conf
$template jsonSyslogTemplate,"{\"type\":\"syslog\",\"host\":\"%HOSTNAME%\",\"role\":\"api-hello-app\",\"message\":\"%rawmsg:::json%\"}\n"

Non rfc5424 log template

For things like access logs and other custom format log files you will need to setup a custom filter in Logstash, as mentioned before the type field will be used to detect which filter to use in Logstash so we are going to need a template that sets the type field dynamically based on the programname; this is assigned by the tag option of the imfile module.

Here's the template for this:

# /etc/rsyslog.conf
$template jsonLogTemplate,"{\"type\":\"%programname%\",\"host\":\"%HOSTNAME%\",\"role\":\"api-hello-app\",\"message\":\"%rawmsg:::json%\"}\n"

Here is a complete Nginx access logs example with the new template:

# /etc/rsyslog.d/01-nginx-access.conf
input(type="imfile"
      File="/var/log/nginx/nginx-access.log"
      Tag="nginx-access"
)

if $programname == 'nginx-access' then {
    # Don't log ELB healthchecks
    :msg, contains, "ELB-HealthChecker" stop
    :msg, contains, "healthcheck" stop
    action(
        type="omfwd"
        Target="remote.server.com"
        Port="5001"
        Protocol="udp"
        template="jsonLogTemplate"
    )
    stop
}

Setting up Logstash

Please check the official docs for info on how to install Logstash and the other parts of the ELK stack (Elasticsearch and Kibana).

Once you've gotten Logstash installed it's pretty simple to configure it to receive logs from Rsyslog, pop your config files into /etc/logstash/conf.d/. I'm just going to create one TCP json input for shipping logs like so:

# /etc/logstash/conf.d/01-input-json-tcp.conf
input {
	tcp {
		port => 5001
		codec => json
	}
}

Now we need a filter for rfc5424 messages, Logstash doesn't support this format out of the box but there is a plugin that adds support called logstash-patterns-core, you can install this plugin by doing the following from your Logstash install dir:

# /opt/logstash
bin/plugin install logstash-patterns-core

Once this is installed you can the create a syslog filter like so:

# /etc/logstash/conf.d/20-filter-syslog.conf
filter {
	if [type] == "syslog" {
		grok {
			match => { "message" => "(?m)%{SYSLOG5424LINE}" }
		}
		syslog_pri { }
		if !("_grokparsefailure" in [tags]) {
			mutate {
				replace => [ "message", "%{syslog5424_msg}" ]
				replace => [ "timestamp", "%{syslog5424_ts}" ]
				replace => [ "priority", "%{syslog5424_pri}" ]
				replace => [ "program", "%{syslog5424_app}" ]
				replace => [ "facility", "%{syslog_facility}" ]
				replace => [ "severity", "%{syslog_severity}" ]
				replace => [ "received_at", "%{@timestamp}" ]
			}
			mutate {
				remove_field => [ "syslog5424_host", "syslog5424_msg", "syslog5424_ts", "syslog5424_pri", "syslog5424_app", "syslog5424_proc", "syslog5424_ver", "syslog_facility", "syslog_facility_code" , "syslog_severity", "syslog_severity_code" ]
			}
		}
	}
}

Note the switch on the type field. The 5424 Logstash pattern adds syslog_ in front of all of the fields, I'm not a big fan of that so I have massaged the field names somewhat and removed some fields that I don't want.

Next is the filter for Nginx access logs, you will have to create a custom filter like this for any other non rfc5424 logs that you ship:

# /etc/logstash/conf.d/21-filter-nginx-access.conf
filter {
	if [type] == "nginx-access" {
		grok {
			match => [
				"message", "%{IPORHOST:http_host} %{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:request_time:float} %{NUMBER:upstream_time:float}"
			]
			add_field => [ "received_at", "%{@timestamp}" ]
			add_field => [ "index_name", "accesslogs" ]
		}
		date {
			match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
		}
	}
}

Note that some fields are marked as numerical, this means that they can be used to produce graphs in Kibana later on.

As a side note here is the Nginx access log format with the request time included:

log_format main '$http_host '
    '$remote_addr [$time_local] '
    '"$request" $status $body_bytes_sent '
    '"$http_referer" "$http_user_agent" '
    '$request_time '
    '$upstream_response_time';

Finally an output to Elasticsearch is needed:

# /etc/logstash/conf.d/90-output-elasticsearch.conf
output {
    elasticsearch {'
        flush_size => '100'
        host => [ 'http://remote.elasticsearch'9200' ]
        index => 'logstash-%{+YYYY.MM.dd}'
    }
}

Done and done

So that covers how to setup shipping in Rsyslog and receiving with Logstash, please ask any questions in the comments section below.

I've put together a collection of Rsyslog config snippets that contain more detailed examples (article on this coming soon).