Kubernetes Logging and Monitoring: The Elasticsearch, Fluentd, and Kibana (EFK) Stack – Part 1: Fluentd Architecture and Configuration

This is a 3-part series on Kubernetes monitoring and logging:

  1. Requirements and recommended toolset 
  2. EFK Stack – Part 1: Fluentd Architecture and Configuration (this article)
  3. EFK Stack – Part 2: Elasticsearch Configuration

In the previous article, we discussed the proven components and architecture of a logging and monitoring stack for Kubernetes, comprised of Fluentd, Elasticsearch, and Kibana.

In this article, we’ll dive deeper into best practices and configuration of fluentd.

What is fluentd?

Fluentd is an efficient log aggregator. It is written in Ruby, and scales very well. For most small to medium sized deployments, fluentd is fast and consumes relatively minimal resources. “Fluent-bit”, a new project from the creators of  fluentd claims to scale even better and has an even smaller resource footprint. For the purpose of this discussion, lets focus on fluentd as it is more mature and more widely used.

How does fluentd work?

Fluentd scraps logs from a given set of sources, processes them (converting into a structured data format) and then forwards them to other services like Elasticsearch, object storage etc. Fluentd is especially flexible when it comes to integrations – it works with 300+ log storage and analytic services.

  1. Fluentd gets data from multiple sources.
  2. It structures and tags data.
  3. It then sends the data to multiple destinations, based on matching tags

fluentd architecture

Source Configuration in fluentd

For the purpose of this discussion, to capture all container logs on a Kubernetes node, the following source configuration is required:

<source>

@id fluentd-containers.log

@type tail

path /var/log/containers/*.log

pos_file /var/log/fluentd-containers.log.pos

time_format %Y-%m-%dT%H:%M:%S.%NZ

tag raw.kubernetes.*

format json

read_from_head true

</source>
  1. id: A unique identifier to reference this source. This can be used for further filtering and routing of structured log data
  2. type: Inbuilt directive understood by fluentd. In this case, “tail” instructs fluentd to gather data by tailing logs from a given location. Another example is “http” which instructs fluentd to collect data by using GET on http endpoint.
  3. path: Specific to type “tail”. Instructs fluentd to collect all logs under /var/log/containers directory. This is the location used by docker daemon on a Kubernetes node to store stdout from running containers.
  4. pos_file: Used as a checkpoint. In case the fluentd process restarts, it uses the position from this file to resume log data collection
  5. tag: A custom string for matching source to destination/filters. fluentd matches source/destination tags to route log data

Routing Configuration in fluentd

Lets look at the config instructing fluentd to send logs to Eelasticsearch:

<match **>

@id elasticsearch

@type elasticsearch

@log_level info

include_tag_key true

type_name fluentd

host "#{ENV['OUTPUT_HOST']}"

port "#{ENV['OUTPUT_PORT']}"

logstash_format true

<buffer>

@type file

path /var/log/fluentd-buffers/kubernetes.system.buffer

flush_mode interval

retry_type exponential_backoff

flush_thread_count 2

flush_interval 5s

retry_forever

retry_max_interval 30

chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"

queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"

overflow_action block

</buffer>
  1. “match” tag indicates a destination. It is followed by a regular expression for matching the source. In this case, we want to capture all logs and send them to Elasticsearch, so simply use **
  2. id: Unique identifier of the destination
  3. type: Supported output plugin identifier. In this case, we are using ElasticSearch which is a built-in plugin of fluentd.
  4. log_level: Indicates which logs to capture. In this case, any log with level “info” and above – INFO, WARNING, ERROR – will be routed to Elasticsearch.
  5. host/port: ElasticSearch host/port. Credentials can be configured as well, but not shown here.
  6. logstash_format: The Elasticsearch service builds reverse indices on log data forward by fluentd for searching. Hence, it needs to interpret the data. By setting logstash_format to “true”, fluentd forwards the structured log data in logstash format, which Elasticsearch understands.
  7. Buffer: fluentd allows a buffer configuration in the event the destination becomes unavailable. e.g. If the network goes down or ElasticSearch is unavailable. Buffer configuration also helps reduce disk activity by batching writes.

Fluentd as Kubernetes Log Aggregator

To collect logs from a K8s cluster, fluentd is deployed as privileged daemonset. That way, it can read logs from a location on the Kubernetes node. Kubernetes ensures that exactly one fluentd container is always running on each node in the cluster. For the impatient, you can simply deploy it as helm chart.

$ helm install stable/fluentd-elasticsearch

To summarize, fluentd is highly scalable log aggregation solution. It provides a compelling option for log management in a Kubernetes cluster. In the next post, we will look at fluentd deployment along with Elasticsearch and Kibana for an end to end log management solution.


This is a 3-part series on Kubernetes monitoring and logging. Continue to:


Recommended Reading

  • Kubernetes Monitoring at Scale with Prometheus and Cortex
  • Comparing Fluentd vs. Logstash
  • Platform9

    You may also enjoy

    GitOps in Kubernetes

    By Platform9

    Building Helm Charts for Kubernetes Management

    By Platform9

    The browser you are using is outdated. For the best experience please download or update your browser to one of the following: