es集群未知原因所有索引同一时刻没数据排查修复 经验 Elastic Logstash

愚人乙 4月前 525

结构:filebeat->kafka->logstash->Elasticsearch (7.1.1 版本)

现象如标题描述,首先测试了logstash可以消费并rubydebug到屏幕,排除了kafka、filebeat、logstash的异常,问题锁定到es集群,然后排查日志,发现logstash报错如下:

[2019-07-08T10:12:41,878][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logsta
sh-im-httpdns-nginx-access-2019.07.08.02", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x29adb1ed>], :response=>{"index"=>{"_index"=>"logstash-im-httpdns-nginx-a
ccess-2019.07.08.02", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Validation Failed: 1: this action would ad
d [2] total shards, but this cluster currently has [2999]/[3000] maximum shards open;"}}}}

重点在cluster currently has [2999]/[3000] maximum shards open,目前集群的shard数量已经是2999个,集群最大索引为3000个,将要添加的shard数量超越了集群管理的最大值,所以数据无法写入。

原因定位后,要想几个事情,1是3000的阈值是怎么来的,2是为什么会有这么多shard,3是怎么提高这个阈值,马上查询官网。


3000阈值怎么来的?

If the cluster is already over the limit, due to changes in node membership or setting changes, all operations that create or open indices will fail until either the limit is increased as described below, or some indices are closed or deleted to bring the number of shards below the limit.

Replicas count towards this limit, but closed indexes do not. An index with 5 primary shards and 2 replicas will be counted as 15 shards. Any closed index is counted as 0, no matter how many shards and replicas it contains.

The limit defaults to 1,000 shards per data node, and can be dynamically adjusted using the following property:

官网给出解释,每个数据节点限制1000个shard,我们es集群刚好有3个数据节点,所以最多3000shard。

官链:https://www.elastic.co/guide/en/elasticsearch/reference/master/misc-cluster.html#cluster-shard-limit


为什么会有这么多shard?

查后发现很多非用户定义系统自动创建的索引,空shard,截图如下:


怎么提高这个阈值?

暂时找不到系统创建这些索引的来源,那么为了恢复数据最好的办法就好先提高阈值,查后控制参数是

cluster.max_shards_per_node

设置命令如下:

curl -X PUT "c3-im-****-es01.bj:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
    "persistent" : {
        "cluster.max_shards_per_node" : "2000"
    }
}
'


修改完配置后,问题解决

最后于 4月前 被愚人乙编辑 ,原因:
最新回复 (0)
    • 运维开源项目互助社区—致敬开源
      2
        立即登录 立即注册 
返回