Kubernetes Logging and Monitoring: The Elasticsearch, Fluentd, and Kibana (EFK) Stack – Part 1: Fluentd Architecture and Configuration
This is a 3-part series on Kubernetes monitoring and logging:
- Requirements and recommended toolset
- EFK Stack – Part 1: Fluentd Architecture and Configuration (this article)
- EFK Stack – Part 2: Elasticsearch Configuration
In the previous article, we discussed the proven components and architecture of a logging and monitoring stack for Kubernetes, comprised of Fluentd, Elasticsearch, and Kibana.
In this article, we’ll dive deeper into best practices and configuration of fluentd.
What is fluentd?
Fluentd is an efficient log aggregator. It is written in Ruby, and scales very well. For most small to medium sized deployments, fluentd is fast and consumes relatively minimal resources. “Fluent-bit”, a new project from the creators of fluentd claims to scale even better and has an even smaller resource footprint. For the purpose of this discussion, lets focus on fluentd as it is more mature and more widely used.
How does fluentd work?
Fluentd scraps logs from a given set of sources, processes them (converting into a structured data format) and then forwards them to other services like Elasticsearch, object storage etc. Fluentd is especially flexible when it comes to integrations – it works with 300+ log storage and analytic services.
- Fluentd gets data from multiple sources.
- It structures and tags data.
- It then sends the data to multiple destinations, based on matching tags
Source Configuration in fluentd
For the purpose of this discussion, to capture all container logs on a Kubernetes node, the following source configuration is required:
<source> @id fluentd-containers.log @type tail path /var/log/containers/*.log pos_file /var/log/fluentd-containers.log.pos time_format %Y-%m-%dT%H:%M:%S.%NZ tag raw.kubernetes.* format json read_from_head true </source>
- id: A unique identifier to reference this source. This can be used for further filtering and routing of structured log data
- type: Inbuilt directive understood by fluentd. In this case, “tail” instructs fluentd to gather data by tailing logs from a given location. Another example is “http” which instructs fluentd to collect data by using GET on http endpoint.
- path: Specific to type “tail”. Instructs fluentd to collect all logs under /var/log/containers directory. This is the location used by docker daemon on a Kubernetes node to store stdout from running containers.
- pos_file: Used as a checkpoint. In case the fluentd process restarts, it uses the position from this file to resume log data collection
- tag: A custom string for matching source to destination/filters. fluentd matches source/destination tags to route log data
Routing Configuration in fluentd
Lets look at the config instructing fluentd to send logs to Eelasticsearch:
<match **> @id elasticsearch @type elasticsearch @log_level info include_tag_key true type_name fluentd host "#{ENV['OUTPUT_HOST']}" port "#{ENV['OUTPUT_PORT']}" logstash_format true <buffer> @type file path /var/log/fluentd-buffers/kubernetes.system.buffer flush_mode interval retry_type exponential_backoff flush_thread_count 2 flush_interval 5s retry_forever retry_max_interval 30 chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}" queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}" overflow_action block </buffer>
- “match” tag indicates a destination. It is followed by a regular expression for matching the source. In this case, we want to capture all logs and send them to Elasticsearch, so simply use **
- id: Unique identifier of the destination
- type: Supported output plugin identifier. In this case, we are using ElasticSearch which is a built-in plugin of fluentd.
- log_level: Indicates which logs to capture. In this case, any log with level “info” and above – INFO, WARNING, ERROR – will be routed to Elasticsearch.
- host/port: ElasticSearch host/port. Credentials can be configured as well, but not shown here.
- logstash_format: The Elasticsearch service builds reverse indices on log data forward by fluentd for searching. Hence, it needs to interpret the data. By setting logstash_format to “true”, fluentd forwards the structured log data in logstash format, which Elasticsearch understands.
- Buffer: fluentd allows a buffer configuration in the event the destination becomes unavailable. e.g. If the network goes down or ElasticSearch is unavailable. Buffer configuration also helps reduce disk activity by batching writes.
Fluentd as Kubernetes Log Aggregator
To collect logs from a K8s cluster, fluentd is deployed as privileged daemonset. That way, it can read logs from a location on the Kubernetes node. Kubernetes ensures that exactly one fluentd container is always running on each node in the cluster. For the impatient, you can simply deploy it as helm chart.
$ helm install stable/fluentd-elasticsearch
To summarize, fluentd is highly scalable log aggregation solution. It provides a compelling option for log management in a Kubernetes cluster. In the next post, we will look at fluentd deployment along with Elasticsearch and Kibana for an end to end log management solution.
This is a 3-part series on Kubernetes monitoring and logging. Continue to:
- EFK Stack – Part 2: Elasticsearch Configuration
Recommended Reading
Kubernetes Monitoring at Scale with Prometheus and Cortex Comparing Fluentd vs. Logstash
- Navigating the future of enterprise IT: The rise of developer-friendly private clouds - December 17, 2024
- Beyond Kubernetes Operations: Discover Platform9’s Always-On Assurance™ - November 29, 2023
- KubeCon 2023 Through Platform9’s Lens: Key Takeaways and Innovative Demos - November 14, 2023