共计 7692 个字符,预计需要花费 20 分钟才能阅读完成。
Pod启动后,其中的服务就立马能提供服务了吗?一般我们在部署java应用时,pod状态虽然很快就转变为running,但是并未能提供服务,而是在加载中各种bead或是数据初始化等。如果此时将它挂载到service上,肯定会有异常,这时就需要提供一种检查策略检测服务状态,而后在判断是否挂载到service
健康检查策略
目前k8s对于pod的健康检查有一下两种:
- Liveness:进程存活探测,判断进程启动状态码是否正常。通常用于故障自愈(判断是否需要自动重启)
- Readiness:可用性探测,判断服务是否已达到可用,不可用时会被service剔除。通常用于判断应用是否可供对外提供服务
- Startup:启动探测,有的应用在启动时需要较长的初始化时间,只有这一段时间是预期可以稍旧,而后的存活探测配置可以缩短检测间隔。经过startup探测后,liveness探测才会接管探测任务,从而实现两个时段不同时间探测策略。
查看livenessProbe字段描述
[root@master ~]# kubectl explain pod.spec.containers.livenessProbe
KIND: Pod
VERSION: v1
RESOURCE: livenessProbe <Object>
DESCRIPTION:
Periodic probe of container liveness. Container will be restarted if the
probe fails. Cannot be updated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
Probe describes a health check to be performed against a container to
determine whether it is alive or ready to receive traffic.
FIELDS:
exec <Object>
One and only one of the following should be specified. Exec specifies the
action to take.
failureThreshold <integer>
Minimum consecutive failures for the probe to be considered failed after
having succeeded. Defaults to 3. Minimum value is 1.
httpGet <Object>
HTTPGet specifies the http request to perform.
initialDelaySeconds <integer>
Number of seconds after the container has started before liveness probes
are initiated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
periodSeconds <integer>
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
value is 1.
successThreshold <integer>
Minimum consecutive successes for the probe to be considered successful
after having failed. Defaults to 1. Must be 1 for liveness and startup.
Minimum value is 1.
tcpSocket <Object>
TCPSocket specifies an action involving a TCP port. TCP hooks not yet
supported
timeoutSeconds <integer>
Number of seconds after which the probe times out. Defaults to 1 second.
Minimum value is 1. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
查看readinessProbe字段描述
[root@master ~]# kubectl explain pod.spec.containers.readinessProbe
KIND: Pod
VERSION: v1
RESOURCE: readinessProbe <Object>
DESCRIPTION:
Periodic probe of container service readiness. Container will be removed
from service endpoints if the probe fails. Cannot be updated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
Probe describes a health check to be performed against a container to
determine whether it is alive or ready to receive traffic.
FIELDS:
exec <Object>
One and only one of the following should be specified. Exec specifies the
action to take.
failureThreshold <integer>
Minimum consecutive failures for the probe to be considered failed after
having succeeded. Defaults to 3. Minimum value is 1.
httpGet <Object>
HTTPGet specifies the http request to perform.
initialDelaySeconds <integer>
Number of seconds after the container has started before liveness probes
are initiated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
periodSeconds <integer>
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
value is 1.
successThreshold <integer>
Minimum consecutive successes for the probe to be considered successful
after having failed. Defaults to 1. Must be 1 for liveness and startup.
Minimum value is 1.
tcpSocket <Object>
TCPSocket specifies an action involving a TCP port. TCP hooks not yet
supported
timeoutSeconds <integer>
Number of seconds after which the probe times out. Defaults to 1 second.
Minimum value is 1. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
查看startupProbe 字段描述
[root@master health-check]# kubectl explain pod.spec.containers.startupProbe
KIND: Pod
VERSION: v1
RESOURCE: startupProbe <Object>
DESCRIPTION:
StartupProbe indicates that the Pod has successfully initialized. If
specified, no other probes are executed until this completes successfully.
If this probe fails, the Pod will be restarted, just as if the
livenessProbe failed. This can be used to provide different probe
parameters at the beginning of a Pod's lifecycle, when it might take a long
time to load data or warm a cache, than during steady-state operation. This
cannot be updated. This is a beta feature enabled by the StartupProbe
feature flag. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
Probe describes a health check to be performed against a container to
determine whether it is alive or ready to receive traffic.
FIELDS:
exec <Object>
One and only one of the following should be specified. Exec specifies the
action to take.
failureThreshold <integer>
Minimum consecutive failures for the probe to be considered failed after
having succeeded. Defaults to 3. Minimum value is 1.
httpGet <Object>
HTTPGet specifies the http request to perform.
initialDelaySeconds <integer>
Number of seconds after the container has started before liveness probes
are initiated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
periodSeconds <integer>
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
value is 1.
successThreshold <integer>
Minimum consecutive successes for the probe to be considered successful
after having failed. Defaults to 1. Must be 1 for liveness and startup.
Minimum value is 1.
tcpSocket <Object>
TCPSocket specifies an action involving a TCP port. TCP hooks not yet
supported
timeoutSeconds <integer>
Number of seconds after which the probe times out. Defaults to 1 second.
Minimum value is 1. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
从上可以发现livenessProbe、readinessProbe、startupProbe 字段相同:
- exec:执行命令探测
- httpGet:http检测
- tcpSocket:tcp套接字检测
- failureThreshold:当探测失败时,Kubernetes 的重试次数。 对存活探测而言,放弃就意味着重新启动容器。 对就绪探测而言,放弃意味着 Pod 会被打上未就绪的标签。默认值是 3。最小值是 1
- initialDelaySeconds:容器启动后要等待多少秒后才启动存活和就绪探测器, 默认是 0 秒,最小值是 0
- periodSeconds:执行探测的时间间隔(单位是秒)。默认是 10 秒。最小值是 1
- successThreshold:探测器在失败后,被视为成功的最小连续成功数。默认值是 1。 存活和启动探测的这个值必须是 1。最小值是 1
- timeoutSeconds:探测的超时后等待多少秒。默认值是 1 秒。最小值是 1
探针测试
exec探测
[root@master health-check]# cat >liveness-exec.yaml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec
labels:
test: liveness-exec
spec:
restartPolicy: OnFailure
containers:
- name: liveness-exec
image: busybox:latest
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 120
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
# 延迟探测开始时间:10s
initialDelaySeconds: 10
# 探测间隔:5s,默认探测3次失败则杀掉容器并重启容器
periodSeconds: 5
EOF
[root@master health-check]# kubectl apply -f liveness-exec.yaml
pod/liveness-exec created
[root@master health-check]# kubectl get pods -l test=liveness-exec -w -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec 1/1 Running 3 4m4s 10.100.166.135 node1 <none> <none>
liveness-exec 1/1 Running 4 5m8s 10.100.166.135 node1 <none> <none>
httpGet探测
[root@master health-check]# cat >http-get.yaml<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpget-nginx
labels:
app: httpget-nginx
spec:
replicas: 3
selector:
matchLabels:
app: httpget-nginx
template:
metadata:
labels:
app: httpget-nginx
spec:
initContainers:
- name: init-container
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ["sh"]
env:
# - name: MY_POD_NAME
# valueFrom:
# fieldRef:
# fieldPath: metadata.name
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
args:
[
"-c",
"echo ${HOSTNAME} ${MY_POD_IP} > /wwwroot/index.html",
]
volumeMounts:
- name: wwwroot
mountPath: "/wwwroot"
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
protocol: TCP
volumeMounts:
- name: wwwroot
mountPath: /usr/share/nginx/html/index.html
subPath: index.html
livenessProbe:
# 返回大于或等于 200 并且小于 400 的任何代码都标示成功,其它返回代码都标示失败
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 2
periodSeconds: 3
# 启动探测和存活探测successThreshold必须为1
successThreshold: 1
failureThreshold: 3
readinessProbe:
# 返回大于或等于 200 并且小于 400 的任何代码都标示成功,其它返回代码都标示失败
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 2
periodSeconds: 3
successThreshold: 3
failureThreshold: 3
volumes:
- name: wwwroot
emptyDir: {}
EOF
tcpSocket探测
[root@master health-check]# cat >tcp-socket.yaml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: goproxy
labels:
app: goproxy
spec:
containers:
- name: goproxy
image: k8s.gcr.io/goproxy:0.1
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
EOF
正文完