alertmanager持久化原创
# 前言
prometheus通过抓取监控数据,将触发规则的告警,推送到Alertmanger,Alertmanger对告警进行分组、聚合等处理后,可以通过邮件、Slack、webhook等方式对用户进行发送告警信息。alertmanager告警的记录不支持持久化记录,发送的告警信息不会存储在数据库中,prometheus将所有数据存储为时间序列,却不会将alertmanager发送的告警信息做为一条记录存储下来。
解决的思路就是通过webhook发送告警信息到指定服务,并将记录输出到stdout,再通过fluentd采集容器日志推送到ES
# alertsnitch + mysql + grafana
alertsnitch支持将alertmanager的告警数据写入mysql和pg, 并结合grafna面板,将数据进行展示。
# alertsnitch
https://gitlab.com/yakshaving.art/alertsnitch
配置变量: 环境变量数据库配置格式ALERTSNITCH_DSN="${MYSQL_USER}😒{MYSQL_PASSWORD}@/${MYSQL_DATABASE}" 部署完成后配置ingress,映射域名对外提供服务,过程略,配置域名后调用方式为: https://alertsnitch.test.com/webhook
配置alertmanager:
# 配置接收者
- name: default
webhook_configs:
- send_resolved: true
http_config:
follow_redirects: true
url: https://alertsnitch.test.cn/webhook
2
3
4
5
6
7
# grafana dashboard
ID: 15833 https://grafana.com/grafana/dashboards/15833-prometheus-alert-history/
# alertmanager2es + ES + kibana
https://github.com/cloudflare/alertmanager2es
https://github.com/webdevops/alertmanager2es
docker pull webdevops/alertmanager2es:20.11.0
docker pull webdevops/alertmanager2es:development
# alertmanager-webhook-logger
https://github.com/tomtom-international/alertmanager-webhook-logger docker pull ghcr.io/tomtom-international/alertmanager-webhook-logger:1.0
https://github.com/grafana/alertmanager-webhook-logger docker pull grafana/alertmanager-webhook-logger:0.3
通events持久化,通过fluentd+容器标准输出采集到ES
# 部署
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager-webhook-logger
namespace: monitor
spec:
replicas: 1
template:
metadata:
labels:
app: alertmanager-webhook-logger
spec:
containers:
- name: alertmanager-webhook-logger
image: ghcr.io/tomtom-international/alertmanager-webhook-logger:1.0
imagePullPolicy: IfNotPresent
selector:
matchLabels:
app: alertmanager-webhook-logger
---
kind: Service
apiVersion: v1
metadata:
name: alertmanager-webhook-logger
namespace: monitor
spec:
ports:
- name: '6725'
protocol: TCP
port: 6725
targetPort: 6725
selector:
app: alertmanager-webhook-logger
type: ClusterIP
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# alertmanager配置
route:
receiver: default-receiver
routes:
- receiver: "all-to-webhook"
continue: true
group_wait: 30s
repeat_interval: 5m
receivers:
- name: all-to-webhook # 增加webhook
webhook_configs:
- url: http://alertmanager-webhook-logger.monitor:6725
# 修改配置后reload promethus(alertmanager): curl -XPOST <prometheus-url>/-/reload
2
3
4
5
6
7
8
9
10
11
12
13
14