EFK (Elasticsearch + Fluentd + Kibana) 容器部署问题
2025-06-08 10:00:00
ubuntu with docker
Fluentd 官方文档 Docker Compose
照搬原文关键内容,规避官方文档未来的更新
mkdir -p ~/docker/efk/fluentd/conf
~/docker/efk/docker-compose.yml
services:
web:
image: httpd
ports:
- "8080:80"
depends_on:
- fluentd
logging:
driver: "fluentd"
options:
fluentd-address: localhost:24224
tag: httpd.access
fluentd:
build: ./fluentd
volumes:
- ./fluentd/conf:/fluentd/etc
depends_on:
# Launch fluentd after that elasticsearch is ready to connect
elasticsearch:
condition: service_healthy
ports:
- "24224:24224"
- "24224:24224/udp"
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.1
container_name: elasticsearch
hostname: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false # Disable security for testing
healthcheck:
# Check whether service is ready
test: ["CMD", "curl", "-f", "http://localhost:9200/_cluster/health"]
interval: 10s
retries: 5
timeout: 5s
ports:
- 9200:9200
kibana:
image: docker.elastic.co/kibana/kibana:8.17.1
depends_on:
# Launch fluentd after that elasticsearch is ready to connect
elasticsearch:
condition: service_healthy
ports:
- "5601:5601"
~/docker/efk/fluentd/Dockerfile
FROM fluent/fluentd:edge-debian
USER root
RUN ["gem", "install", "fluent-plugin-elasticsearch", "--no-document", "--version", "5.4.3"]
USER fluent
~/docker/efk/fluentd/conf/fluent.conf
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<match *.**>
@type copy
<store>
@type elasticsearch
host elasticsearch
port 9200
logstash_format true
logstash_prefix fluentd
logstash_dateformat %Y%m%d
include_tag_key true
type_name access_log
tag_key @log_name
flush_interval 1s
</store>
<store>
@type stdout
</store>
</match>
cd ~/docker/efk
docker compose up -d
若遇到这个问题
=> [fluentd internal] load build definition from Dockerfile
=> => transferring dockerfile: 208B
=> ERROR [fluentd internal] load metadata for docker.io/fluent/fluentd:edge-debian
------
[+] Running 0/1ernal] load metadata for docker.io/fluent/fluentd:edge-debian:
⠙ Service fluentd Building
failed to solve: fluent/fluentd:edge-debian: failed to resolve source metadata for docker.io/fluent/fluentd:edge-debian: failed to authorize: failed to fetch anonymous token: Get "https://auth.docker.io/token?scope=repository%3Afluent%2Ffluentd%3Apull&service=registry.docker.io": dial tcp 31.13.83.2:443: i/o timeout
请拉取镜像后再试
docker pull fluent/fluentd:edge-debian
若遇到这个问题
=> ERROR [fluentd 4/4] RUN ["gem", "install", "fluent-plugin-elasticsearch", "--no-document", "--version", "5.4.3"]
------
> [fluentd 4/4] RUN ["gem", "install", "fluent-plugin-elasticsearch", "--no-document", "--version", "5.4.3"]:
357.0 #<Thread:0x0000740b65dab788 /usr/local/lib/ruby/3.2.0/rubygems/request_set.rb:168 run> terminated with exception (report_on_exception is true)
357.0 /usr/local/lib/ruby/3.2.0/rubygems/remote_fetcher.rb:262:in `rescue in fetch_path': Errno::ECONNRESET: Connection reset by peer - SSL_connect (https://index.rubygems.org/gems/excon-1.2.7.gem) (Gem::RemoteFetcher::FetchError)
...
⠙ Service fluentd Building
failed to solve: process "gem install fluent-plugin-elasticsearch --no-document --version 5.4.3" did not complete successfully: exit code: 1
请添加换源命令后再试
RUN ["gem", "sources", "--add", "https://mirrors.aliyun.com/rubygems/", "--remove", "https://rubygems.org/"]
官方文档成文于 2025年1月 左右,查看 fluentd 容器日志应该会有如下报错
2025-06-01 02:27:00 +0000 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2025-06-01 02:27:02 +0000 chunk="6363d083e93240666f8f0488a6eeef23" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"http\"}): [400] {\"error\":{\"root_cause\":[{\"type\":\"media_type_header_exception\",\"reason\":\"Invalid media-type value on headers [Content-Type, Accept]\"}],\"type\":\"media_type_header_exception\",\"reason\":\"Invalid media-type value on headers [Content-Type, Accept]\",\"caused_by\":{\"type\":\"status_exception\",\"reason\":\"Content-Type version must be either version 8 or 7, but found 9. Content-Type=application/vnd.elasticsearch+x-ndjson; compatible-with=9\"}},\"status\":400}"
这是因为 gem 插件版本不匹配的缘故,可通过以下命令确认
docker exec -it efk-fluentd-1 fluent-gem list | grep elasticsearch
# elasticsearch (9.0.3)
# elasticsearch-api (9.0.3)
# fluent-plugin-elasticsearch (5.4.3)
请添加指定版本命令后再试
RUN ["gem", "install", "elasticsearch", "--no-document", "--version", "8.17.1"]
RUN ["gem", "install", "elasticsearch-api", "--no-document", "--version", "8.17.1"]
docker container rm -f efk-fluentd-1
docker image rm efk-fluentd
docker compose up -d
基于上面的成功,此时应该是能轻松完成以下版本升级
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.1 => 9.0.1
image: docker.elastic.co/kibana/kibana:8.17.1 => 9.0.1
RUN ["gem", "install", "fluent-plugin-elasticsearch", "--no-document", "--version", "5.4.3"] => 6.0.0
RUN ["gem", "install", "elasticsearch", "--no-document", "--version", "8.17.1"] => 9.0.3 或删掉
RUN ["gem", "install", "elasticsearch-api", "--no-document", "--version", "8.17.1"] => 9.0.3 或删掉
以上修改完毕,清理后重新部署
docker compose down -v
docker image rm efk-fluentd
docker compose up -d
Elastic 官方文档 Install Elasticsearch with Docker
接下来以这个文档为准,生产环境部署日志聚合查看系统
docker network create elastic
docker run --name elasticsearch --network elastic -h elasticsearch -p 9200:9200 -d elasticsearch:9.0.1
docker exec -it elasticsearch /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic
cd ~/docker/efk/fluentd
docker build -t efk-fluentd .
docker run --name fluentd --network elastic -p 24224:24224 -v ~/docker/efk/fluentd/conf:/fluentd/etc -d efk-fluentd
fluentd 应该会有如下报错
2025-06-01 06:25:53 +0000 [error]: #0 unexpected error error_class=Elastic::Transport::Transport::Error error="EOFError (EOFError)"
不好确定是哪里问题就去看 elasticsearch 容器日志,应该同时会有如下报错
{"@timestamp":"2025-06-01T06:25:53.037Z", "log.level": "WARN", "message":"received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/172.18.0.2:9200, remoteAddress=/172.18.0.3:45556}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][transport_worker][T#1]","log.logger":"org.elasticsearch.http.netty4.Netty4HttpServerTransport","elasticsearch.cluster.uuid":"6FuH2MGVSI2rSFXEI8McWA","elasticsearch.node.id":"KJaQJwmZQwulm4SQfZAe_w","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
fluent.conf 增加如下配置后再试
<match *.**>
<store>
scheme https
user elastic
password xxxx
此时仍会有如下报错
2025-06-01 06:37:49 +0000 [error]: #0 unexpected error error_class=Elastic::Transport::Transport::Error error="SSL_connect returned=1 errno=0 peeraddr=172.18.0.2:9200 state=error: certificate verify failed (self-signed certificate in certificate chain) (OpenSSL::SSL::SSLError) Unable to verify certificate. This may be an issue with the remote host or with Excon. Excon has certificates bundled, but these can be customized:\n\n `Excon.defaults[:ssl_ca_path] = path_to_certs`\n `ENV['SSL_CERT_DIR'] = path_to_certs`\n `Excon.defaults[:ssl_ca_file] = path_to_file`\n `ENV['SSL_CERT_FILE'] = path_to_file`\n `Excon.defaults[:ssl_verify_callback] = callback`\n (see OpenSSL::SSL::SSLContext#verify_callback)\nor:\n `Excon.defaults[:ssl_verify_peer] = false` (less secure).\n"
此问题可通过以下两种方式解决
<match *.**>
<store>
ssl_verify false
其实更推荐下面这种
docker cp elasticsearch:/usr/share/elasticsearch/config/certs/http_ca.crt ~/docker/efk/fluentd/conf
<match *.**>
<store>
ca_file /fluentd/etc/http_ca.crt
但是仍会有如下报错
2025-06-01 07:47:32 +0000 [error]: #0 unexpected error error_class=Elastic::Transport::Transport::Error error="SSL_CTX_load_verify_file: system lib (OpenSSL::SSL::SSLError)"
2025-06-01 07:47:32 +0000 [error]: #0 /usr/local/bundle/gems/elastic-transport-8.4.0/lib/elastic/transport/transport/base.rb:324:in `rescue in perform_request'
这里是挂载的证书权限问题
docker run --name fluentd --network elastic --user 1000:1000 -p 24224:24224 -v ~/docker/efk/fluentd/conf:/fluentd/etc -d efk-fluentd
即便这里采用 ssl_verify false
没有权限的问题,后面 fluentd 想在挂载的目录里写文件还是会遇到
docker run --name kibana --network elastic -p 5601:5601 -d kibana:9.0.1
# 获取带验证码的访问链接
docker logs -f kibana
# 获取注册令牌
docker exec -it elasticsearch /usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana
docker exec -it fluentd bash -c "echo '{\"message\":\"hello\"}' | fluent-cat debug.log"
换源使用 tail 抓取服务日志
<source>
@type tail
path /fluentd/log/**/*.log
pos_file /fluentd/log/.pos
tag wolf.log
<parse>
@type none
</parse>
docker run --name fluentd --network elastic --user 1000:1000 -v ~/docker/efk/fluentd/conf:/fluentd/etc -v /data/log:/fluentd/log -d efk-fluentd
这里可以采用 ssl_verify false
的方式且去掉 --user 1000:1000
来启动容器,故意触发上面提到的权限报错
2025-06-01 08:56:50 +0000 [error]: #0 unexpected error error_class=Errno::EACCES error="Permission denied @ rb_sysopen - /fluentd/log/.pos"
2025-06-01 08:56:50 +0000 [error]: #0 /usr/local/bundle/gems/fluentd-1.16.9/lib/fluent/plugin/in_tail.rb:243:in `initialize'
权限报错作如下说明,默认 fluentd 容器使用新建的 fluent 999:999
用户,可以看这里,但想在挂载的目录写入文件,要么使用特权用户(不建议)root 0:0
,要么使用对挂载目录有权限的用户(建议)ubuntu 1000:1000
fluentd 容器部署在没有 --network elastic
的节点上
docker image save -o efk-fluentd.tar efk-fluentd
docker image load -i efk-fluentd.tar
docker run --name fluentd --user 1000:1000 --add-host elasticsearch:192.168.10.8 -v ~/docker/efk/fluentd/conf:/fluentd/etc -v /data/log:/fluentd/log -d efk-fluentd
不同节点上的配置可以按需调整 logstash_prefix