Skip to the content.

EFK (Elasticsearch + Fluentd + Kibana) 容器部署问题

2025-06-08 10:00:00


ubuntu with docker

Fluentd 官方文档 Docker Compose

照搬原文关键内容,规避官方文档未来的更新

mkdir -p ~/docker/efk/fluentd/conf

~/docker/efk/docker-compose.yml

services:
  web:
    image: httpd
    ports:
      - "8080:80"
    depends_on:
      - fluentd
    logging:
      driver: "fluentd"
      options:
        fluentd-address: localhost:24224
        tag: httpd.access

  fluentd:
    build: ./fluentd
    volumes:
      - ./fluentd/conf:/fluentd/etc
    depends_on:
      # Launch fluentd after that elasticsearch is ready to connect
      elasticsearch:
        condition: service_healthy
    ports:
      - "24224:24224"
      - "24224:24224/udp"

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.17.1
    container_name: elasticsearch
    hostname: elasticsearch
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false # Disable security for testing
    healthcheck:
      # Check whether service is ready
      test: ["CMD", "curl", "-f", "http://localhost:9200/_cluster/health"]
      interval: 10s
      retries: 5
      timeout: 5s
    ports:
      - 9200:9200

  kibana:
    image: docker.elastic.co/kibana/kibana:8.17.1
    depends_on:
      # Launch fluentd after that elasticsearch is ready to connect
      elasticsearch:
        condition: service_healthy
    ports:
      - "5601:5601"

~/docker/efk/fluentd/Dockerfile

FROM fluent/fluentd:edge-debian
USER root
RUN ["gem", "install", "fluent-plugin-elasticsearch", "--no-document", "--version", "5.4.3"]
USER fluent

~/docker/efk/fluentd/conf/fluent.conf

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<match *.**>
  @type copy

  <store>
    @type elasticsearch
    host elasticsearch
    port 9200
    logstash_format true
    logstash_prefix fluentd
    logstash_dateformat %Y%m%d
    include_tag_key true
    type_name access_log
    tag_key @log_name
    flush_interval 1s
  </store>

  <store>
    @type stdout
  </store>
</match>
cd ~/docker/efk
docker compose up -d

若遇到这个问题

 => [fluentd internal] load build definition from Dockerfile
 => => transferring dockerfile: 208B
 => ERROR [fluentd internal] load metadata for docker.io/fluent/fluentd:edge-debian
------
[+] Running 0/1ernal] load metadata for docker.io/fluent/fluentd:edge-debian:
 ⠙ Service fluentd  Building
failed to solve: fluent/fluentd:edge-debian: failed to resolve source metadata for docker.io/fluent/fluentd:edge-debian: failed to authorize: failed to fetch anonymous token: Get "https://auth.docker.io/token?scope=repository%3Afluent%2Ffluentd%3Apull&service=registry.docker.io": dial tcp 31.13.83.2:443: i/o timeout

请拉取镜像后再试

docker pull fluent/fluentd:edge-debian

若遇到这个问题

 => ERROR [fluentd 4/4] RUN ["gem", "install", "fluent-plugin-elasticsearch", "--no-document", "--version", "5.4.3"]
------
 > [fluentd 4/4] RUN ["gem", "install", "fluent-plugin-elasticsearch", "--no-document", "--version", "5.4.3"]:
357.0 #<Thread:0x0000740b65dab788 /usr/local/lib/ruby/3.2.0/rubygems/request_set.rb:168 run> terminated with exception (report_on_exception is true)
357.0 /usr/local/lib/ruby/3.2.0/rubygems/remote_fetcher.rb:262:in `rescue in fetch_path': Errno::ECONNRESET: Connection reset by peer - SSL_connect (https://index.rubygems.org/gems/excon-1.2.7.gem) (Gem::RemoteFetcher::FetchError)
...
 ⠙ Service fluentd  Building
failed to solve: process "gem install fluent-plugin-elasticsearch --no-document --version 5.4.3" did not complete successfully: exit code: 1

请添加换源命令后再试

RUN ["gem", "sources", "--add", "https://mirrors.aliyun.com/rubygems/", "--remove", "https://rubygems.org/"]

官方文档成文于 2025年1月 左右,查看 fluentd 容器日志应该会有如下报错

2025-06-01 02:27:00 +0000 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2025-06-01 02:27:02 +0000 chunk="6363d083e93240666f8f0488a6eeef23" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"http\"}): [400] {\"error\":{\"root_cause\":[{\"type\":\"media_type_header_exception\",\"reason\":\"Invalid media-type value on headers [Content-Type, Accept]\"}],\"type\":\"media_type_header_exception\",\"reason\":\"Invalid media-type value on headers [Content-Type, Accept]\",\"caused_by\":{\"type\":\"status_exception\",\"reason\":\"Content-Type version must be either version 8 or 7, but found 9. Content-Type=application/vnd.elasticsearch+x-ndjson; compatible-with=9\"}},\"status\":400}"

这是因为 gem 插件版本不匹配的缘故,可通过以下命令确认

docker exec -it efk-fluentd-1 fluent-gem list | grep elasticsearch
# elasticsearch (9.0.3)
# elasticsearch-api (9.0.3)
# fluent-plugin-elasticsearch (5.4.3)

请添加指定版本命令后再试

RUN ["gem", "install", "elasticsearch", "--no-document", "--version", "8.17.1"]
RUN ["gem", "install", "elasticsearch-api", "--no-document", "--version", "8.17.1"]
docker container rm -f efk-fluentd-1
docker image rm efk-fluentd
docker compose up -d

基于上面的成功,此时应该是能轻松完成以下版本升级

image: docker.elastic.co/elasticsearch/elasticsearch:8.17.1 => 9.0.1
image: docker.elastic.co/kibana/kibana:8.17.1 => 9.0.1
RUN ["gem", "install", "fluent-plugin-elasticsearch", "--no-document", "--version", "5.4.3"] => 6.0.0
RUN ["gem", "install", "elasticsearch", "--no-document", "--version", "8.17.1"] => 9.0.3 或删掉
RUN ["gem", "install", "elasticsearch-api", "--no-document", "--version", "8.17.1"] => 9.0.3 或删掉

以上修改完毕,清理后重新部署

docker compose down -v
docker image rm efk-fluentd
docker compose up -d

Elastic 官方文档 Install Elasticsearch with Docker

接下来以这个文档为准,生产环境部署日志聚合查看系统

docker network create elastic

docker run --name elasticsearch --network elastic -h elasticsearch -p 9200:9200 -d elasticsearch:9.0.1

docker exec -it elasticsearch /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

cd ~/docker/efk/fluentd
docker build -t efk-fluentd .

docker run --name fluentd --network elastic -p 24224:24224 -v ~/docker/efk/fluentd/conf:/fluentd/etc -d efk-fluentd

fluentd 应该会有如下报错

2025-06-01 06:25:53 +0000 [error]: #0 unexpected error error_class=Elastic::Transport::Transport::Error error="EOFError (EOFError)"

不好确定是哪里问题就去看 elasticsearch 容器日志,应该同时会有如下报错

{"@timestamp":"2025-06-01T06:25:53.037Z", "log.level": "WARN", "message":"received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/172.18.0.2:9200, remoteAddress=/172.18.0.3:45556}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][transport_worker][T#1]","log.logger":"org.elasticsearch.http.netty4.Netty4HttpServerTransport","elasticsearch.cluster.uuid":"6FuH2MGVSI2rSFXEI8McWA","elasticsearch.node.id":"KJaQJwmZQwulm4SQfZAe_w","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}

fluent.conf 增加如下配置后再试

<match *.**>
  <store>
    scheme https
    user elastic
    password xxxx

此时仍会有如下报错

2025-06-01 06:37:49 +0000 [error]: #0 unexpected error error_class=Elastic::Transport::Transport::Error error="SSL_connect returned=1 errno=0 peeraddr=172.18.0.2:9200 state=error: certificate verify failed (self-signed certificate in certificate chain) (OpenSSL::SSL::SSLError) Unable to verify certificate. This may be an issue with the remote host or with Excon. Excon has certificates bundled, but these can be customized:\n\n            `Excon.defaults[:ssl_ca_path] = path_to_certs`\n            `ENV['SSL_CERT_DIR'] = path_to_certs`\n            `Excon.defaults[:ssl_ca_file] = path_to_file`\n            `ENV['SSL_CERT_FILE'] = path_to_file`\n            `Excon.defaults[:ssl_verify_callback] = callback`\n                (see OpenSSL::SSL::SSLContext#verify_callback)\nor:\n            `Excon.defaults[:ssl_verify_peer] = false` (less secure).\n"

此问题可通过以下两种方式解决

<match *.**>
  <store>
    ssl_verify false

其实更推荐下面这种

docker cp elasticsearch:/usr/share/elasticsearch/config/certs/http_ca.crt ~/docker/efk/fluentd/conf
<match *.**>
  <store>
    ca_file /fluentd/etc/http_ca.crt

但是仍会有如下报错

2025-06-01 07:47:32 +0000 [error]: #0 unexpected error error_class=Elastic::Transport::Transport::Error error="SSL_CTX_load_verify_file: system lib (OpenSSL::SSL::SSLError)"
  2025-06-01 07:47:32 +0000 [error]: #0 /usr/local/bundle/gems/elastic-transport-8.4.0/lib/elastic/transport/transport/base.rb:324:in `rescue in perform_request'

这里是挂载的证书权限问题

docker run --name fluentd --network elastic --user 1000:1000 -p 24224:24224 -v ~/docker/efk/fluentd/conf:/fluentd/etc -d efk-fluentd

即便这里采用 ssl_verify false 没有权限的问题,后面 fluentd 想在挂载的目录里写文件还是会遇到

docker run --name kibana --network elastic -p 5601:5601 -d kibana:9.0.1

# 获取带验证码的访问链接
docker logs -f kibana

# 获取注册令牌
docker exec -it elasticsearch /usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana

docker exec -it fluentd bash -c "echo '{\"message\":\"hello\"}' | fluent-cat debug.log"

换源使用 tail 抓取服务日志

<source>
  @type tail
  path /fluentd/log/**/*.log
  pos_file /fluentd/log/.pos
  tag wolf.log
  <parse>
    @type none
  </parse>
docker run --name fluentd --network elastic --user 1000:1000 -v ~/docker/efk/fluentd/conf:/fluentd/etc -v /data/log:/fluentd/log -d efk-fluentd

这里可以采用 ssl_verify false 的方式且去掉 --user 1000:1000 来启动容器,故意触发上面提到的权限报错

2025-06-01 08:56:50 +0000 [error]: #0 unexpected error error_class=Errno::EACCES error="Permission denied @ rb_sysopen - /fluentd/log/.pos"
  2025-06-01 08:56:50 +0000 [error]: #0 /usr/local/bundle/gems/fluentd-1.16.9/lib/fluent/plugin/in_tail.rb:243:in `initialize'

权限报错作如下说明,默认 fluentd 容器使用新建的 fluent 999:999 用户,可以看这里,但想在挂载的目录写入文件,要么使用特权用户(不建议)root 0:0,要么使用对挂载目录有权限的用户(建议)ubuntu 1000:1000

fluentd 容器部署在没有 --network elastic 的节点上

docker image save -o efk-fluentd.tar efk-fluentd
docker image load -i efk-fluentd.tar

docker run --name fluentd --user 1000:1000 --add-host elasticsearch:192.168.10.8 -v ~/docker/efk/fluentd/conf:/fluentd/etc -v /data/log:/fluentd/log -d efk-fluentd

不同节点上的配置可以按需调整 logstash_prefix