Just after Sada co-founded Treasure Data, Inc, he found that a lot of data (especially time-series data) was not used effectively because there’s no easy solution to reliably collect it. He wanted to invent ‘easy’ and ‘flexible’ solution to unity the data collection, with machine-readable format. That’s how Fluentd was born. Sada is a co-founder of Treasure Data, Inc., a primary sponsor of the Fluentd project. Since open-sourced in October 2011, the Fluentd project has grown dramatically: dozens of contributors, hundreds of community-contributed plugins, thousands of users, and trillions of events collected, filtered and stored. Currently Masahiro “Masa” Nakagawa is the main maintainer. The first commit was June 2011.
Fluentd’s installation is pretty straightforward so I won’t repeat it here. See here: http://docs.fluentd.org/v0.12/categories/installation Be sure to read the pre-installation notes here: http://docs.fluentd.org/articles/before-install
So you’ve read up Fluentd, and decided that you like it. And now you want to see it in action. A Fluentd configuration generally contains two sections: a
source, which defines where the actual log stream will come from, and a
match, which decides what to do with the log stream once it arrives. There are two examples below which will explain this well.
Find your Fluentd configuration file (located
/etc/td-agent/td-agent.conf) and add the following lines to it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 <source> @type tail #We are using the in_tail module of Fluentd path /var/log/httpd/access_log # Location of Apache access log pos_file /var/log/td-agent/httpd-access.log.pos # Where to record file position tag apache.access # FluentD tag. This is used in the match section to process logs further. format apache2 # apache2 is a predefined format. You can add your own regex here. </source> <match apache.access> @type file # We are using the out_file module of Fluentd. This will store the incoming stream in a file. path /tmp/my-fluent-app time_slice_format %Y%m%d time_slice_wait 10m time_format %Y%m%dT%H%M%S%z utc </match>
Then restart Fluentd.
If everything is fine, you should have a new file in your
/tmp directory where Fluentd logs would show up in JSON format. If that’s not the case, make sure you send some traffic to Apache so some access logs can be generated. If that fails, check out Fluentd logs in
Add the following configuration to the first server.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 <source> @type tail path /var/log/httpd/access_log #...or where you placed your Apache access log pos_file /var/log/td-agent/httpd-access.log.pos # This is where you record file position tag apache.access #fluentd tag! format apache2 # Do you have a custom format? You can write your own regex. </source> <match apache.access> @type forward send_timeout 1s recover_wait 10s heartbeat_interval 1s phi_threshold 16 hard_timeout 60s <server> name box1 # Name of second server used in logs host 10.1.1.61 # IP address for second server port 24224 # Port of second server where logs are forwarded. This port should be open on second server for both TCP and UDP. </server> </match>
Add the following to the second server.
1 2 3 4 5 6 7 8 9 10 11 12 <source> @type forward port 24224 </source> <match *.**> @type file path /tmp/my-fluent-app time_slice_format %Y%m%d time_slice_wait 10m time_format %Y%m%dT%H%M%S%z utc </match>
Restart Fluentd on both machines and you should see Fluentd events forwarded from the first Fluentd instance to the second one. Be sure to activate some traffic so logs can be generated. As always you can check the logs for any problems. Be sure to open Fluentd ports for TCP and UDP on the second server.
The above two examples shouldn’t be used on production, but they should get you started.
And that’s it! Let me know what you think.
Have fun using Fluentd :)