Logging is an important part of the observability and operations requirements for any large-scale, distributed system. With Kubernetes being such a system, and with the growth of microservices applications, logging is more critical for the monitoring and troubleshooting of these systems, than ever before.
There are multiple log aggregators and analysis tools in the DevOps space, but two dominate Kubernetes logging: Fluentd and Logstash from the ELK stack.
Both log aggregators, Fluentd and Logstash, address the same DevOps functionalities but are different in their approach, making one preferable to the other, depending on your use case.
This article compares these log collectors against a set of critical features and capabilities. It also discusses which solution is preferable for different types of applications or environments.
The logging stack components
Log analysis can’t be done without log collectors. But to ensure the logging process is managed correctly, we need a logging stack. A logging stack is a set of components working together to ensure proper logging management.
Standard components of a logging stack are:
- Logs exporter (configure logs per host)
- Log collector listening for log input
- Logs storage
- Logs visualization
As we already saw, Fluentd and Logstash are log collectors. How do they interact in the logging stack?
Let’s first get acquainted with these tools.
Elasticsearch is the distributed, search engine. Raw data flows into Elasticsearch from different types of sources, including logs, system metrics, and web applications. Data ingestion is the process by which this raw data is parsed, normalized, and enriched before it is indexed in Elasticsearch. Once indexed in Elasticsearch, users can run queries against their data and use aggregations to retrieve summaries of their data. With Kibana, users can create powerful visualizations of their data, share dashboards, and manage the Elastic Stack.
Logstash is the ELK open-source data collection engine and it can do real-time pipelining. All components of Logstash are available under the Apache2 license.
Logstash can unify data from disparate sources dynamically and also normalize the data into destinations of your choice. Here is a great tutorial on configuring the ELK stack with Kubernetes.
Fluentd is, like Logstash in the ELK stack, is also an open-source data collector, which lets you unify the data collection and consumption to allow better insight into your data. Fluentd scraps logs from a given set of sources, processes them (converting into a structured data format) and then forwards them to other services like Elasticsearch, object storage etc. Fluentd also works together with ElasticSearch and Kibana. This is known as the EFK stack.
We’ve previously covered the Fluend architecture and you can also follow the tutorial for setting it up, along with Elasticsearch for Kubernetes logging. Platform9 Managed Kubernetes solution also includes Managed Prometheus and Fluentd so that you can consume these as a service, with 99% SLA on any environment.
Comparing Logstash and Fluentd
Let’s now compare the two tools against important DevOps features and capabilities.
Both tools run on both Windows and Linux
Event routing is an important feature of a log collector. Logstash and Fluentd are different in their approach concerning event routing.
Logstash uses the if-else condition approach; this way we can define certain criteria with If..Then..Else statements – for performing actions on our data.
With Fluentd, the events are routed on tags. Fluentd uses tag-based routing and every input (source) needs to be tagged. Fluentd then matches a tag against different outputs and then sends the event to the corresponding output.
From our experience, tagging events is much easier than using if-then-else for each event type, so Fluentd has an advantage here.
Plugins extend the tool’s functionality. Both tools are flexible and work with hundreds of integrations for analytics and storage solutions.
The Logstash plugin ecosystem is centralized under a single GitHub repository. Fluentd has an official repository, but most of the plugins are hosted elsewhere. It depends on the user’s preference for how they want to manage and collect the plugins, from a centralized place (Logstash) or from several places (Fluentd). Efficiency wise, a centralized place is usually preferable.
Inputs – like files, syslog and data stores – are used to get data into Logstash. Logstash is limited to an in-memory queue that holds 20 events and, therefore, relies on an external queue, like Redis, for persistence across restart. Often, Redis is facilitated as a “broker” in a centralized Logstash installation, queueing Logstash events from remote Logstash “shippers”.
This means that with Logstash you need an additional tool to be installed and configured in order to get data into Logstash. This dependency on an additional tool adds another dependency and complexity to the system, and can increase the risk of failure. This is not the case with Fluentd, which is independent in getting its data and has a configurable in-memory or on-disk buffering system. Fluentd, therefore, is ‘safer’ than Logstash regarding data transport.
Performance and high-volume logging
While performance really depends on your particular use case, it is known that Logstash consumes more memory than Fluentd. Fluentd is an efficient log aggregator. It is written in Ruby, and scales very well. For most small to medium-sized deployments, fluentd is fast and consumes relatively minimal resources.
Fluentd uses Ruby and Ruby Gems for configuring its 500+ plugins. Ruby is an interpreted language: it uses a lot of C extensions for parsing log files and forwarding data to provide the necessary speed. However, due to the volume of logs ingested, performance problems are expected, because a lot of C extensions (extra code next to Ruby!) is necessary
Fluent-bit is recommended when using small or embedded applications. Fluent-bit is implemented primarily in C. But it can provide all the functionality you need and meets performance expectations.
Elastic beats is the lightweight variant of Logstash. However, if your use case goes beyond mere data transport, to also require data pulling and aggregation, then you’d need both Logstash and Elastic Beats.
The components for log parsing are different per logging tool. Fluentd uses standard built-in parsers (JSON, regex, csv etc.) and Logstash uses plugins for this. This makes Fluentd favorable over Logstash, because it does not need extra plugins installed, making the architecture more complex and more prone to errors.
Docker has a built-in logging driver for Fluentd, but doesn’t have one for Logstash. With Fluentd, no extra agent is required on the container in order to push logs to Fluentd. Logs are directly shipped to Fluentd service from STDOUT without requiring an extra log file.
Logstash requires a plugin (filebeat) in order to read the application logs from STDOUT before they can be sent to Logstash.
Thus, when using Docker containers, Fluentd is the preferred candidate, as it makes the architecture less complex and this makes it less risky for logging mistakes.
Container metrics data collection
Both tools have vendors offering enterprise support for them, however Logstash is part of the ELK stack and, when used with ElasticSearch and Kibana, could have better enterprise support experience.
Logstash vs. Fluentd: Which one to use for Kubernetes?
Data logging can be divided into two areas: event and error logging. Both Fluentd and Logstash can handle both logging types and can be used for different use cases, and even co-exist in your environments for logging both VMs/legacy applications as well as Kubernetes-based microservices.
For Kubernetes environments, Fluentd seems the ideal candidate due to its built-in Docker logging driver and parser – which doesn’t require an extra agent to be present on the container to push logs to Fluentd. In comparison with Logstash, this makes the architecture less complex and also makes it less risky for logging mistakes. The fact that Fluentd, like Kubernetes, is another CNCF project is also an added bonus!
- How Containers at the Edge Can Accelerate the 5G Rollout - May 21, 2020
- Fight Latency at the Edge with Kubernetes-Based Infrastructure – Part II - May 15, 2020
- The Three Deployment Strategies for Modern Private Cloud - May 13, 2020