1. 简介
Prometheus 是一个开源系统监控和警报工具,它将其指标收集并存储为时间序列数据,即指标信息与记录时的时间戳以及称为标签的可选键值对一起存储。Prometheus 于 2016 年加入 云原生计算基金会,成为继Kubernetes之后的第二个托管项目。
从下图可以看出Prometheus主要是以主动垃取指标为主,再过去行业中的监控软件基本是以客户端主动提交指标为主,这导致了监控服务承受了过大的压力乃至于指标上报的延迟从而引发后续的一系列告警等问题。由于 Prometheus 是云原生项目所以对 Kubernetes 做服务发现是非常友好,我们能轻易的去使用 Prometheus 去监控 Kubernetes。Prometheus 已经是云原生监控体系的一个基石,这里不再赘述其概念,需要更多的了解请移步 传送门
实验环境
K8S: 192.168.0.3(master), 192.168.0.2(node);(自行解决)
Prometheus: 192.168.0.3:9090
2. 快速入门
2.1 Docker 安装 Prometheus
-
正常情况下我们都不应该直接安装在集群内所以使用了docker的安装方式,生产上推荐裸机部署。
-
创建 prometheus.yml
mkdir -p /prometheus/config && cd /prometheus/config cat > prometheus.yml << EOF global: # 默认60s抓取 scrape_interval: 60s EOF
-
start 与 reload, 启动后将会反射192.168.0.3:9090端口;当再次需要重载配置时需要运行reload。
cd /prometheus/ cat > start.sh <<EOF #!/bin/bash PWD=`pwd` CONFIG_NAME="prometheus.yml" CONFIG_DIR=${PWD}/config function main() { docker run -d --name pm \ -p 9090:9090 \ -v ${CONFIG_DIR}:/config \ prom/prometheus:v2.30.0 --web.enable-lifecycle --config.file=/config/${CONFIG_NAME} } main EOF
cat > reload.sh << EOF #!/bin/bash URL="192.168.0.3:9090" function main() { curl -X POST http://${URL}/-/reload if [[ $? == 0 ]];then echo "Succeed!" else echo "Failed!" fi } main EOF
-
现在可以启动,等待容器启动后在浏览器打开,http://192.168.0.3:9090/
sh start.sh
2.2 安装 NodeExporter
-
NodeExporter 使用暴露节点指标,下面是开始安装 Exporter; node_exporter.yml
apiVersion: v1 kind: Namespace metadata: name: prometheus --- apiVersion: apps/v1 kind: DaemonSet metadata: name: node-exporter namespace: prometheus labels: name: node-exporter spec: selector: matchLabels: name: node-exporter template: metadata: labels: name: node-exporter spec: # 需要共享 hostPID: true # 共享IPC hostIPC: true # 共享网络 hostNetwork: true containers: - name: node-exporter image: bitnami/node-exporter:1.2.2 ports: - containerPort: 9100 resources: requests: cpu: 100m memory: 100Mi limits: cpu: 1000m memory: 1Gi securityContext: # 授权 privileged privileged: true args: - --path.procfs - /host/proc - --path.sysfs - /host/sys - --collector.filesystem.ignored-mount-points - '"^/(sys|proc|dev|host|etc)($|/)"' volumeMounts: - name: dev mountPath: /host/dev - name: proc mountPath: /host/proc - name: sys mountPath: /host/sys - name: rootfs mountPath: /rootfs tolerations: - key: "node-role.kubernetes.io/master" operator: "Exists" effect: "NoSchedule" volumes: - name: proc hostPath: path: /proc - name: dev hostPath: path: /dev - name: sys hostPath: path: /sys - name: rootfs hostPath: path: /
kubectl apply -f node_exporter.yml
2.2.1 配置 Prometheus 抓取node exporter指标
-
静态配置 prometheus.yml
global: scrape_interval: 60s scrape_configs: # 静态配置 # 需安装node-exporter - job_name: 'node-exporter' static_configs: - targets: ['192.168.0.3:9100','192.168.0.2:9100' ]
-
重载 prometheus 配置
sh reload.sh
-
查看新发现的目标 node-exporter 是否加入,浏览器查看 http://192.168.0.3:9090/targets
-
下面放出如何配置动态发现 node exporter
global: scrape_interval: 60s scrape_configs: - job_name: 'k8s-node' # 抓取uri metrics_path: /metrics kubernetes_sd_configs: - api_server: https://192.168.0.3:6443/ # 支持5种资源的服务发现: node,service, pod, endpoints, ingress role: node # 这个需要一个viewer权限 的sa token,这个自行解决 bearer_token_file: /config/sa.token tls_config: # kubernetes CA 的证书 ca_file: /config/ca.crt # 当然是可以忽略验证,省去上两步骤 # insecure_skip_verify: true # 若不修正label 通过上述discovery 发现的node节点都是以:10250端口(即kubelet监听的端口) relabel_configs: # 源标签 - source_labels: [__address__] regex: '(.*):10250' # 192.168.0.3:10250 -> 192.168.0.3:9100 replacement: '${1}:9100' # 目标标签 target_label: __address__ action: replace
-
查看新发现的目标 k8s-node 是否加入,浏览器查看 http://192.168.0.3:9090/targets
-
查看node exporter暴露的指标,以node开头的指标;
2.2.2 配置 Prometheus 抓取kubelet指标
-
kubelet 默认情况下的抓取API
SA_TOKEN=`cat sa.token`; curl -k https://192.168.0.3:10250/metrics --header "Authorization: Bearer $SA_TOKEN"
-
增加抓取kubelet配置 prometheus.yaml,配置完成后 sh reload.sh 进行重载
global: scrape_interval: 60s scrape_configs: ... - job_name: 'k8s-kubelet' # 抓取数据(kubelet)使用的 scheme scheme: https # 从上面得知kubelet 的metrics地址 metrics_path: /metrics # 抓取数据使用的 bearer_token: bearer_token_file: /config/sa.token # 跳过证书加密 tls_config: insecure_skip_verify: true kubernetes_sd_configs: - api_server: https://192.168.0.3:6443/ role: node bearer_token_file: /config/sa.token tls_config: ca_file: /config/ca.crt relabel_configs: - source_labels: [__meta_kubernetes_node_address_InternalIP] regex: '(.+)' replacement: '${1}:10250' target_label: __address__ action: replace
-
查看新发现的目标 k8s-kubelet 是否加入,浏览器查看 http://192.168.0.3:9090/targets
到此快速入门就结束了,下面进入如何编写Exporter主题,以及如何让Promethues动态发现我们部署的应用及收集我们暴露的业务数据;
3. Exporter 的编写
-
exporter常用的两种类型 Counter 与 Gauage,其他类型请自行查阅官方文档
- Counter: 是累加指标值只增不减;
- Gauage: 是可以上下浮动的指标值;
3.1 Counter 类型
-
在默认情况下prometheus 会带上系统的一些监控指标,以的示例分别展示了带标签和不带标签的写法
package main import ( "fmt" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promhttp" "net/http" ) var ( ConnectionCount = 0 ) func init() { prometheus.MustRegister(cc) prometheus.MustRegister(cf) } // 带动态标签的counter var cc = prometheus.NewCounterVec( prometheus.CounterOpts{ Namespace: "test", Name: "connection_count_with_label", }, []string{"app", "namespace"}, ) // 不带标签 var cf = prometheus.NewCounterFunc(prometheus.CounterOpts{ Namespace: "test", Name: "connection_count", }, func() float64 { return float64(ConnectionCount) }) func main() { http.HandleFunc("/hello", func(writer http.ResponseWriter, request *http.Request) { cc.With(prometheus.Labels{ "app": "simple-counter", "namespace": "test", }).Inc() ConnectionCount++ c := fmt.Sprintf("%d\n", ConnectionCount) writer.Write([]byte("count: " + c)) }) http.Handle("/metrics", promhttp.Handler()) if err := http.ListenAndServe(":8081", nil); err != nil { panic(err) } }
-
运行代码后,浏览器访问 http://127.0.0.1:8081/hello 让计数器跑起来进行值的累加;之后访问http://127.0.0.1:8081/metrics 可以看到我们暴露的指标:
3.2 Gauage 类型
- 从编码写法上和counter 是类似的, 唯一不同的是 Gauage 类型可以直接设置值;
package main import ( "fmt" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promhttp" "net/http" ) var ( ConnectionCount = 0 ) // 带动态标签的gauage var cc = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Namespace: "test", Name: "connection_count_with_label", }, []string{"app", "namespace"}, ) func main() { http.HandleFunc("/hello", func(writer http.ResponseWriter, request *http.Request) { ConnectionCount++ cc.With(prometheus.Labels{ "app": "simple-counter", "namespace": "test", }).Set(float64(ConnectionCount)) c := fmt.Sprintf("%d\n", ConnectionCount) writer.Write([]byte("count: " + c)) }) http.Handle("/metrics", promhttp.Handler()) if err := http.ListenAndServe(":8081", nil); err != nil { panic(err) } }
3.3 去除默认指标
-
由于我们大部分时候是只想暴露我们的业务指标,那么默认系统指标就是多余的了;还是那Gauage类型的代码进行去除默认指标
package main import ( "fmt" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promhttp" "net/http" ) var ( ConnectionCount = 0 // empty registry, 清空默认指标 EmptyRegistry = prometheus.NewRegistry() ) func init() { EmptyRegistry.MustRegister(cc) } // 带动态标签的counter var cc = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Namespace: "test", Name: "connection_count_with_label", }, []string{"app", "namespace"}, ) func main() { http.HandleFunc("/hello", func(writer http.ResponseWriter, request *http.Request) { ConnectionCount++ cc.With(prometheus.Labels{ "app": "simple-counter", "namespace": "test", }).Set(float64(ConnectionCount)) c := fmt.Sprintf("%d\n", ConnectionCount) writer.Write([]byte("count: " + c)) }) // 以下两种写法均可 ////写法一 //http.HandleFunc("/metrics", func(writer http.ResponseWriter, request *http.Request) { // promhttp.HandlerFor(EmptyRegistry, // promhttp.HandlerOpts{ErrorHandling: promhttp.ContinueOnError}). // ServeHTTP(writer, request) // //}) // 写法二 http.Handle("/metrics", promhttp.HandlerFor(EmptyRegistry, promhttp.HandlerOpts{ErrorHandling: promhttp.ContinueOnError})) // if err := http.ListenAndServe(":8081", nil); err != nil { panic(err) } }
-
先访问 http://127.0.0.1:8081/hello 再访问 http://127.0.0.1:8081/metrics 效果如下, 可以看到仅有我们需要展示的业务指标:
3.3 Collector 的编写
- 通常情况下我们是不需要自行编写Collector接口的,prometheus提供的简易类型接口便可完成;但一些复杂场景下需要手动编写collector。
需要的同学自行测试下,做法还是比较简单易懂
package main import ( "fmt" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promhttp" "net/http" "sync" ) var ( counter int healthy int lock sync.Mutex emptyRegistry *prometheus.Registry ) func init() { lock = sync.Mutex{} emptyRegistry = prometheus.NewRegistry() emptyRegistry.MustRegister(NewTestCollector()) } type TestCollector struct { Desc []*prometheus.Desc } func NewTestCollector() *TestCollector { variableLabels := []string{"ns", "app"} constLabels := prometheus.Labels{ "const_label": "true", } return &TestCollector{Desc: []*prometheus.Desc{ // counter prometheus.NewDesc( "test_app_connection_count", "connection count", variableLabels, constLabels, ), // gauage prometheus.NewDesc( "test_app_healthy", "connection count", variableLabels, constLabels, ), }} } //描述 func (this *TestCollector) Describe(ch chan<- *prometheus.Desc) { for _, d := range this.Desc { ch <- d } } // 收集指标 func (this *TestCollector) Collect(ch chan<- prometheus.Metric) { m1, err := prometheus.NewConstMetric(this.Desc[0], prometheus.CounterValue, float64(counter), "test", "test-app", ) if err != nil { panic(err) } m2, err := prometheus.NewConstMetric(this.Desc[1], prometheus.GaugeValue, float64(healthy), "test", "test-app", ) if err != nil { panic(err) } ch <- m1 ch <- m2 } func main() { http.HandleFunc("/set-healthy", func(writer http.ResponseWriter, request *http.Request) { lock.Lock() defer lock.Unlock() healthy = 1 _, _ = writer.Write([]byte(fmt.Sprintf("%d", healthy))) }) http.HandleFunc("/set-unhealthy", func(writer http.ResponseWriter, request *http.Request) { lock.Lock() defer lock.Unlock() healthy = 0 _, _ = writer.Write([]byte(fmt.Sprintf("%d", healthy))) }) http.HandleFunc("/hello", func(writer http.ResponseWriter, request *http.Request) { lock.Lock() defer lock.Unlock() counter++ c := fmt.Sprintf("counnter: %d", counter) _, _ = writer.Write([]byte(c)) }) http.Handle("/metrics", promhttp.HandlerFor(emptyRegistry, promhttp.HandlerOpts{ErrorHandling: promhttp.ContinueOnError})) if err := http.ListenAndServe(":8081", nil); err != nil { panic(err) } }
4. 部署业务服务并配置Promethues自动发现
-
当我们的业务编写完成需要部署到k8s中去如何让外部的Prometheus自动发现机制来垃取我们的监控指标。
-
我们先部署一个Deloyment 及 Service,留意Service的注释;
apiVersion: apps/v1 kind: Deployment metadata: name: prodmetrics namespace: default spec: selector: matchLabels: app: prodmetrics replicas: 1 template: metadata: labels: app: prodmetrics spec: # 为了方便测试,所以直接指定节点拉起, 该节点ip: 192.168.0.3 nodeName: k8s-01 containers: - name: prodmetrics image: alpine:3.12 imagePullPolicy: IfNotPresent workingDir: /app command: ["./prodmetrics"] volumeMounts: - name: app mountPath: /app ports: - containerPort: 8080 volumes: - name: app hostPath: path: /opt/code/prometheus/99_monitor_app_test --- apiVersion: v1 kind: Service metadata: name: prodmetrics namespace: default annotations: # 留意以下两个annotation scrape: "true" nodeport: "31880" spec: type: NodePort ports: - port: 80 targetPort: 8080 nodePort: 31880 selector: app: prodmetrics
-
修改刚刚搭建好的Prometheus配置 prometheus.yaml,为它加入一个新job
- job_name: 'prod-metrics-auto' # keep和drop的作用 # 当action设置为keep时,Prometheus会丢弃source_labels的值中没有匹配到regex正则表达式内容的Target实例, # # 而当action设置为drop时,则会丢弃那些source_labels的值匹配到regex正则表达式内容的Target实例 metrics_path: /metrics kubernetes_sd_configs: - api_server: https://192.168.0.3:6443/ role: service bearer_token_file: /config/sa.token tls_config: ca_file: /config/ca.crt relabel_configs: # 可以看到我们匹配了service资源含有 annotation 为 scrape: true # 保留annotation scrape = true endpoint - source_labels: [ __meta_kubernetes_service_annotation_scrape ] regex: true action: keep # nodeport = 31880 - source_labels: [ __meta_kubernetes_service_annotation_nodeport ] regex: '(.+)' replacement: '192.168.0.3:${1}' # __address__ 是采集地址 target_label: __address__ # 替换 prodmetrics.default.svc:80 -> 192.168.0.3:31880 action: replace # 新增 namespace label 并将 __meta_kubernetes_namespace 的值赋予给它 - source_labels: [ __meta_kubernetes_namespace ] action: replace target_label: namespace # 新增 svcname label, 同上; - source_labels: [ __meta_kubernetes_service_name ] action: replace target_label: svcname
-
最后查看下target http://192.168.0.3:9090/targets , 可以看到prod-metrics-auto 已经加入;
今天的文章[K8S] Prometheus 快速入门与Exporter的编写方式分享到此就结束了,感谢您的阅读。
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
如需转载请保留出处:https://bianchenghao.cn/14143.html