k8s Pod健康检查策略

1,455次阅读
没有评论

共计 7692 个字符,预计需要花费 20 分钟才能阅读完成。

k8s Pod健康检查策略

Pod启动后,其中的服务就立马能提供服务了吗?一般我们在部署java应用时,pod状态虽然很快就转变为running,但是并未能提供服务,而是在加载中各种bead或是数据初始化等。如果此时将它挂载到service上,肯定会有异常,这时就需要提供一种检查策略检测服务状态,而后在判断是否挂载到service

健康检查策略

目前k8s对于pod的健康检查有一下两种:

  • Liveness:进程存活探测,判断进程启动状态码是否正常。通常用于故障自愈(判断是否需要自动重启)
  • Readiness:可用性探测,判断服务是否已达到可用,不可用时会被service剔除。通常用于判断应用是否可供对外提供服务
  • Startup:启动探测,有的应用在启动时需要较长的初始化时间,只有这一段时间是预期可以稍旧,而后的存活探测配置可以缩短检测间隔。经过startup探测后,liveness探测才会接管探测任务,从而实现两个时段不同时间探测策略。

查看livenessProbe字段描述

[root@master ~]# kubectl explain pod.spec.containers.livenessProbe
KIND:     Pod
VERSION:  v1

RESOURCE: livenessProbe <Object>

DESCRIPTION:
     Periodic probe of container liveness. Container will be restarted if the
     probe fails. Cannot be updated. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

     Probe describes a health check to be performed against a container to
     determine whether it is alive or ready to receive traffic.

FIELDS:
   exec	<Object>
     One and only one of the following should be specified. Exec specifies the
     action to take.

   failureThreshold	<integer>
     Minimum consecutive failures for the probe to be considered failed after
     having succeeded. Defaults to 3. Minimum value is 1.

   httpGet	<Object>
     HTTPGet specifies the http request to perform.

   initialDelaySeconds	<integer>
     Number of seconds after the container has started before liveness probes
     are initiated. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

   periodSeconds	<integer>
     How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
     value is 1.

   successThreshold	<integer>
     Minimum consecutive successes for the probe to be considered successful
     after having failed. Defaults to 1. Must be 1 for liveness and startup.
     Minimum value is 1.

   tcpSocket	<Object>
     TCPSocket specifies an action involving a TCP port. TCP hooks not yet
     supported

   timeoutSeconds	<integer>
     Number of seconds after which the probe times out. Defaults to 1 second.
     Minimum value is 1. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

查看readinessProbe字段描述

[root@master ~]# kubectl explain pod.spec.containers.readinessProbe
KIND:     Pod
VERSION:  v1

RESOURCE: readinessProbe <Object>

DESCRIPTION:
     Periodic probe of container service readiness. Container will be removed
     from service endpoints if the probe fails. Cannot be updated. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

     Probe describes a health check to be performed against a container to
     determine whether it is alive or ready to receive traffic.

FIELDS:
   exec	<Object>
     One and only one of the following should be specified. Exec specifies the
     action to take.

   failureThreshold	<integer>
     Minimum consecutive failures for the probe to be considered failed after
     having succeeded. Defaults to 3. Minimum value is 1.

   httpGet	<Object>
     HTTPGet specifies the http request to perform.

   initialDelaySeconds	<integer>
     Number of seconds after the container has started before liveness probes
     are initiated. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

   periodSeconds	<integer>
     How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
     value is 1.

   successThreshold	<integer>
     Minimum consecutive successes for the probe to be considered successful
     after having failed. Defaults to 1. Must be 1 for liveness and startup.
     Minimum value is 1.

   tcpSocket	<Object>
     TCPSocket specifies an action involving a TCP port. TCP hooks not yet
     supported

   timeoutSeconds	<integer>
     Number of seconds after which the probe times out. Defaults to 1 second.
     Minimum value is 1. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

查看startupProbe 字段描述

[root@master health-check]# kubectl explain pod.spec.containers.startupProbe
KIND:     Pod
VERSION:  v1

RESOURCE: startupProbe <Object>

DESCRIPTION:
     StartupProbe indicates that the Pod has successfully initialized. If
     specified, no other probes are executed until this completes successfully.
     If this probe fails, the Pod will be restarted, just as if the
     livenessProbe failed. This can be used to provide different probe
     parameters at the beginning of a Pod's lifecycle, when it might take a long
     time to load data or warm a cache, than during steady-state operation. This
     cannot be updated. This is a beta feature enabled by the StartupProbe
     feature flag. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

     Probe describes a health check to be performed against a container to
     determine whether it is alive or ready to receive traffic.

FIELDS:
   exec	<Object>
     One and only one of the following should be specified. Exec specifies the
     action to take.

   failureThreshold	<integer>
     Minimum consecutive failures for the probe to be considered failed after
     having succeeded. Defaults to 3. Minimum value is 1.

   httpGet	<Object>
     HTTPGet specifies the http request to perform.

   initialDelaySeconds	<integer>
     Number of seconds after the container has started before liveness probes
     are initiated. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

   periodSeconds	<integer>
     How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
     value is 1.

   successThreshold	<integer>
     Minimum consecutive successes for the probe to be considered successful
     after having failed. Defaults to 1. Must be 1 for liveness and startup.
     Minimum value is 1.

   tcpSocket	<Object>
     TCPSocket specifies an action involving a TCP port. TCP hooks not yet
     supported

   timeoutSeconds	<integer>
     Number of seconds after which the probe times out. Defaults to 1 second.
     Minimum value is 1. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

从上可以发现livenessProbe、readinessProbe、startupProbe 字段相同:

  • exec:执行命令探测
  • httpGet:http检测
  • tcpSocket:tcp套接字检测
  • failureThreshold:当探测失败时,Kubernetes 的重试次数。 对存活探测而言,放弃就意味着重新启动容器。 对就绪探测而言,放弃意味着 Pod 会被打上未就绪的标签。默认值是 3。最小值是 1
  • initialDelaySeconds:容器启动后要等待多少秒后才启动存活和就绪探测器, 默认是 0 秒,最小值是 0
  • periodSeconds:执行探测的时间间隔(单位是秒)。默认是 10 秒。最小值是 1
  • successThreshold:探测器在失败后,被视为成功的最小连续成功数。默认值是 1。 存活和启动探测的这个值必须是 1。最小值是 1
  • timeoutSeconds:探测的超时后等待多少秒。默认值是 1 秒。最小值是 1

探针测试

exec探测

[root@master health-check]# cat >liveness-exec.yaml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec
  labels:
    test: liveness-exec
spec:
  restartPolicy: OnFailure
  containers:
  - name: liveness-exec
    image: busybox:latest
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 120
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy

      # 延迟探测开始时间:10s
      initialDelaySeconds: 10

      # 探测间隔:5s,默认探测3次失败则杀掉容器并重启容器
      periodSeconds: 5
EOF

[root@master health-check]# kubectl apply -f liveness-exec.yaml 
pod/liveness-exec created

[root@master health-check]# kubectl get pods -l test=liveness-exec -w -o wide
NAME            READY   STATUS    RESTARTS   AGE    IP               NODE    NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   3          4m4s   10.100.166.135   node1   <none>           <none>
liveness-exec   1/1     Running   4          5m8s   10.100.166.135   node1   <none>           <none>

httpGet探测

[root@master health-check]# cat >http-get.yaml<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpget-nginx
  labels:
    app: httpget-nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: httpget-nginx
  template:
    metadata:
      labels:
        app: httpget-nginx
    spec:
      initContainers:
      - name: init-container
        image: busybox:latest
        imagePullPolicy: IfNotPresent
        command: ["sh"]
        env:
#        - name: MY_POD_NAME
#          valueFrom:
#            fieldRef:
#              fieldPath: metadata.name
         - name: MY_POD_IP
           valueFrom:
             fieldRef:
               fieldPath: status.podIP
        args: 
          [
            "-c",
            "echo ${HOSTNAME} ${MY_POD_IP} > /wwwroot/index.html",
          ]
        volumeMounts:
        - name: wwwroot
          mountPath: "/wwwroot"
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
          protocol: TCP
        volumeMounts:
        - name: wwwroot
          mountPath: /usr/share/nginx/html/index.html
          subPath: index.html
        livenessProbe:

          # 返回大于或等于 200 并且小于 400 的任何代码都标示成功,其它返回代码都标示失败
          httpGet:
            path: /
            port: 80
            scheme: HTTP
          initialDelaySeconds: 10
          timeoutSeconds: 2
          periodSeconds: 3

          # 启动探测和存活探测successThreshold必须为1
          successThreshold: 1
          failureThreshold: 3
        readinessProbe:

          # 返回大于或等于 200 并且小于 400 的任何代码都标示成功,其它返回代码都标示失败
          httpGet:
            path: /
            port: 80
            scheme: HTTP
          initialDelaySeconds: 10
          timeoutSeconds: 2
          periodSeconds: 3
          successThreshold: 3
          failureThreshold: 3
      volumes:
        - name: wwwroot
          emptyDir: {}
EOF

tcpSocket探测

[root@master health-check]# cat >tcp-socket.yaml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20
EOF

正文完
 234
xadocker
版权声明:本站原创文章,由 xadocker 2021-03-13发表,共计7692字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
评论(没有评论)