使用prometheus-adapter实现自定义指标HPA - SRE回忆录

共计 7914 个字符，预计需要花费 20 分钟才能阅读完成。

Kubernetes 默认提供 CPU 和内存作为 HPA 弹性伸缩的指标，如果有更复杂的场景需求，比如基于业务单副本 QPS 大小来进行自动扩缩容，可以使用 prometheus-adapter 来实现基于自定义指标的 Pod 弹性伸缩，在先前的文章中介绍HPA时，有使用prometheus-adapter，但是未作太多描述，这里单独挪出来记录下

在使用自定义指标做HPA时，其查数据查询是经过prometheus-adapter转换的，它是将prometheus的metrics 数据格式转换成k8s API接口能识别的格式。由于prometheus-adapter是自定义API Service，所以还需要用Kubernetes aggregator在主API服务器中注册，以便直接通过/apis/来访问

[root@k8s-master ~]# kubectl get apiservice | grep moni
v1.monitoring.coreos.com               Local                                     True        3h27m
v1beta1.custom.metrics.k8s.io          monitoring/prometheus-adapter             True        45h

[root@k8s-master ~]# kubectl get apiservice v1beta1.custom.metrics.k8s.io -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  group: custom.metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: prometheus-adapter
    namespace: monitoring
    port: 443
  version: v1beta1
  versionPriority: 100

在kubernetes apiserver中提供了三种API用于监控指标的操作

resource metrics API：被设计用来给 k8s 核心组件提供监控指标
custom metrics API：被设计用来给 HPA 控制器提供指标
external metrics API：被设计用来通过外部指标扩容

prometheus-adapter仓库地址：https://github.com/kubernetes-sigs/prometheus-adapter.git

adapter配置

[root@k8s-master custom-metrics-api]# cat custom-metrics-configmap.yaml
####### 略
    - seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
      seriesFilters:
      - isNot: ^container_.*_seconds_total$
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pod
      name:
        matches: ^container_(.*)_total$
        as: ""
      metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[1m])) by (<<.GroupBy>>)
####### 略

目前adapter规则配置主要为以下四个

discovery：指定需要处理的prometheus的metrics
- seriesQuery：选择匹配的metrics集合
- seriesFilters：过滤metrics集合
  - is: <regex>, 匹配包含该正则表达式的metrics
  - isNot: <regex>, 匹配不包含该正则表达式的metrics
association：设置metrics与k8s resources的映射关系
- resources
naming：用于将metrics名称转化为custom metrics API所使用的metrics名称
- name
querying：处理调用 custom metrics API 获取到的 metrics 的 value，该值最终提供给 HPA 进行扩缩容
- metricsQuery

metricsQuery 字段使用 Go template 将 URL 请求转变为 Prometheus 的请求，它会提取 custom metrics API 请求中的字段，并将其划分为 metric name,group-resource,以及 group-resource 中的一个或多个 objects，对应如下字段：

Series: metric名称
LabelMatchers: 以逗号分割的 objects，当前表示特定 group-resource 加上命名空间的 label(如果该 group-resource 是 namespaced 的)
GroupBy：以逗号分割的 label 的集合，当前表示 LabelMatchers 中的group-resource label

假设 metrics http_requests_per_second 如下

http_requests_per_second{pod="pod1",service="nginx1",namespace="somens"}
http_requests_per_second{pod="pod2",service="nginx2",namespace="somens"}

当调用 kubectl get --raw "/apis/{APIService-name}/v1beta1/namespaces/somens/pods/*/http_request_per_second"时，metricsQuery 字段的模板的实际内容如下：

Series: “http_requests_total”
LabelMatchers: “pod=~”pod1|pod2”,namespace=”somens”
GroupBy:pod

HPA支持的metrics类型

HPA 通常会根据 type 从 aggregated APIs (metrics.k8s.io, custom.metrics.k8s.io, external.metrics.k8s.io)的资源路径上拉取 metrics，目前有如下四种类型

resource

目前仅支持 cpu 和 memory。target 可以指定数值(targetAverageValue) 和比例 (targetAverageUtilization) 进行扩缩容

pods

HPA 从 metrics.k8s.io 获取 resource metrics，custom metrics，这类 metrics 描述了 pod 类型，target 仅支持按指定数值(targetAverageValue)进行扩缩容。targetAverageValue 用于计算所有相关 pods 上的 metrics 的平均值

type: Pods
pods:
  metric:
    name: packets-per-second
  target:
    type: AverageValue
    averageValue: 1k

object

这类 metrics 描述了相同命名空间下的(非 pod )类型。target 支持通过 value 和 AverageValue 进行扩缩容，前者直接将 metric 与 target 比较进行扩缩容，后者通过 metric/ 相关的 pod 数目与 target 比较进行扩缩容

type: Object
object:
  metric:
    name: requests-per-second
  describedObject:
    apiVersion: extensions/v1beta1
    kind: Ingress
    name: main-route
  target:
    type: Value
    value: 2k

external

kubernetes 1.10+支持的新功能。通常 Prometheus 能够直接从 RabbitMQ 中抓取指标。不幸的是，RabbitMQ的指标端点并没有包含queue length指标。为了收集这些数据，我们使用了RabbitMQ Exporter. 一旦我们将它连接到 RabbitMQ，我们将拥有大量的 RabbitMQ 指标，我们可以将其用作扩展的基础，然后将它们存储在其时间序。Prometheus 能够抓取这些指标列数据库中。 prometheus 中以rabbitmq_queue开头的任何指标都可以通过这个新的 external.metrics.k8s.io API 以 1 分钟间隔的速率形式提供。（也就是说任何pod都可以用这些值来实现pod扩容，哪怕是两个业务毫无关系。但前提是需要把他注册到api-resources）

custom.metrics.k8s.io 只支持pod 本身metrics指标来扩容
external.metrics.k8s.io 可以是其它业务pod 根据这个值来扩容（例如我可以用nginx的指标值来扩 mysql，也可以用mysql_exporter的值来扩mysql。）

例如我们可以用mongo_exporter的连接数来扩容nginx，HPA 从 external.metrics.k8s.io 获取 external metrics

kind: Deployment
metadata:
  name: deployment-firstspec:
  replicas: 2
  template:
    metadata:
      labels:
        app: deployment-first
  spec:
    containers:
      - name: deployment-first
        image: nginx
        imagePullPolicy: Always
      ports:
        - containerPort: 80
      protocol: TCP
      resources:
        requests:
         cpu: "1m"
        limits:
         cpu: "100m
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: app-server-mongo-conn-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deployment-first
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: External 
    external:
      metric
        name: mongodb_current_connection
        selector:
           matchLabels:
             queue: "worker_tasks"
      target:
        type: AverageValue
        averageValue: 30

外部指标写法样例

externalRules:
- seriesQuery: '{__name__=~"^.*_queue_(length|size)$",namespace!=""}'
  resources:
    overrides:
      namespace:
        resource: namespace
  name:
    matches: ^.*_queue_(length|size)$
    as: "$0"
  metricsQuery: max(<<.Series>>{<<.LabelMatchers>>})
- seriesQuery: '{__name__=~"^.*_queue$",namespace!=""}'
  resources:
    overrides:
      namespace:
        resource: namespace
  name:
    matches: ^.*_queue$
    as: "$0"
  metricsQuery: max(<<.Series>>{<<.LabelMatchers>>})

对于外部指标规则，会被注册为一个新的资源对象

[root@k8s-master ~]# kubectl api-resources | grep external
###### 略
node_cpu_core_throttles_total                        external.metrics.k8s.io        true         ExternalMetricValueList
node_network_transmit_queue_length                   external.metrics.k8s.io        true         ExternalMetricValueList
prometheus_notifications_queue_length                external.metrics.k8s.io        true         ExternalMetricValueList
###### 略

kubernetes metrics的获取

假设注册的 APIService为custom.metrics.k8s.io/v1beta1，在注册好APIService 后 HorizontalPodAutoscaler controller 会从以 /apis/custom.metrics.k8s.io/v1beta1 为根 API 的路径上抓取 metrics。metrics 的 API path 可以分为 namespaced 和 non-namespaced 类型的。通过如下方式校验 HPA 是否可以获取到 metrics

namespaced

获取指定 namespace 下指定 object 类型和名称的 metrics

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}" | jq .

如获取 monitor 命名空间下名为 grafana 的 pod 的start_time_seconds metric

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/grafana/start_time_seconds" | jq .

获取指定 namespace 下所有特定 object 类型的 metrics

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}" | jq .

如获取 monitor 命名空间下名为所有 pod 的 start_time_seconds metric

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/*/start_time_seconds" | jq .

使用 labelSelector 可以选择带有特定 label 的 object

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}?labelSelector={label-name}" | jq .   
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}?labelSelector={label-name}" | jq .

non-namespaced

non-namespaced 和 namespaced 的类似，主要有 node，namespace，PersistentVolume 等。non-namespaced 访问有些与 custom metrics API 描述不一致

访问 object 为 namespace 的方式如下如下

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/metrics/{metric-name...}" | jq .   
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/{metric-name...}" | jq .

访问 node 的方式如下

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/nodes/{node-name}/{metric-name...}" | jq .

HPA资源声明样例

[root@k8s-master hpa]# cat hpa-v3.yml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
  namespace: default
spec:
  # HPA的伸缩对象描述，HPA会动态修改该对象的pod数量
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  # HPA的最小pod数量和最大pod数量
  minReplicas: 1
  maxReplicas: 10
  # 监控的指标数组，支持多种类型的指标共存
  metrics:
  # Object类型的指标
  - type: Object
    object:
      metric:
        # 指标名称
        name: requests-per-second
      # 监控指标的对象描述，指标数据来源于该对象
      describedObject:
        apiVersion: networking.k8s.io/v1beta1
        kind: Ingress
        name: main-route
      # Value类型的目标值，Object类型的指标只支持Value和AverageValue类型的目标值
      target:
        type: Value
        value: 10k
  # Resource类型的指标
  - type: Resource
    resource:
      name: cpu
      # Utilization类型的目标值，Resource类型的指标只支持Utilization和AverageValue类型的目标值
      target:
        type: Utilization
        averageUtilization: 50
  # Pods类型的指标
  - type: Pods
    pods:
      metric:
        name: packets-per-second
      # AverageValue类型的目标值，Pods指标类型下只支持AverageValue类型的目标值
      target:
        type: AverageValue
        averageValue: 1k
  # External类型的指标
  - type: External
    external:
      metric:
        name: queue_messages_ready
        # 该字段与第三方的指标标签相关联，（此处官方文档有问题，正确的写法如下）
        selector:
          matchLabels:
            env: "stage"
            app: "myapp"
      # External指标类型下只支持Value和AverageValue类型的目标值
      target:
        type: AverageValue
        averageValue: 30