k8s调度器亲和性与反亲和性 - SRE回忆录

共计 23601 个字符，预计需要花费 60 分钟才能阅读完成。

简介

利用亲和性和反亲和性可以帮助业务实现打散同workload内pod在node中的分布，提高了服务高可用性。当然也可以实现业务架构级划分不同node群集等等

[root@k8s-master ~]# kubectl explain deploy.spec.template.spec.affinity
KIND:     Deployment
VERSION:  apps/v1

RESOURCE: affinity <Object>

DESCRIPTION:
     If specified, the pod's scheduling constraints

     Affinity is a group of affinity scheduling rules.

FIELDS:
   nodeAffinity <Object>
     Describes node affinity scheduling rules for the pod.

   podAffinity  <Object>
     Describes pod affinity scheduling rules (e.g. co-locate this pod in the
     same node, zone, etc. as some other pod(s)).

   podAntiAffinity      <Object>
     Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod
     in the same node, zone, etc. as some other pod(s)).

nodeAffinity

[root@k8s-master ~]# kubectl explain deploy.spec.template.spec.affinity.nodeAffinity
KIND:     Deployment
VERSION:  apps/v1

RESOURCE: nodeAffinity <Object>

DESCRIPTION:
     Describes node affinity scheduling rules for the pod.

     Node affinity is a group of node affinity scheduling rules.

FIELDS:
   preferredDuringSchedulingIgnoredDuringExecution      <[]Object>
     The scheduler will prefer to schedule pods to nodes that satisfy the
     affinity expressions specified by this field, but it may choose a node that
     violates one or more of the expressions. The node that is most preferred is
     the one with the greatest sum of weights, i.e. for each node that meets all
     of the scheduling requirements (resource request, requiredDuringScheduling
     affinity expressions, etc.), compute a sum by iterating through the
     elements of this field and adding "weight" to the sum if the node matches
     the corresponding matchExpressions; the node(s) with the highest sum are
     the most preferred.

   requiredDuringSchedulingIgnoredDuringExecution       <Object>
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to an update), the system may or may not try
     to eventually evict the pod from its node.

podAffinity

[root@k8s-master ~]# kubectl explain deploy.spec.template.spec.affinity.podAffinity
KIND:     Deployment
VERSION:  apps/v1

RESOURCE: podAffinity <Object>

DESCRIPTION:
     Describes pod affinity scheduling rules (e.g. co-locate this pod in the
     same node, zone, etc. as some other pod(s)).

     Pod affinity is a group of inter pod affinity scheduling rules.

FIELDS:
   preferredDuringSchedulingIgnoredDuringExecution      <[]Object>
     The scheduler will prefer to schedule pods to nodes that satisfy the
     affinity expressions specified by this field, but it may choose a node that
     violates one or more of the expressions. The node that is most preferred is
     the one with the greatest sum of weights, i.e. for each node that meets all
     of the scheduling requirements (resource request, requiredDuringScheduling
     affinity expressions, etc.), compute a sum by iterating through the
     elements of this field and adding "weight" to the sum if the node has pods
     which matches the corresponding podAffinityTerm; the node(s) with the
     highest sum are the most preferred.

   requiredDuringSchedulingIgnoredDuringExecution       <[]Object>
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to a pod label update), the system may or
     may not try to eventually evict the pod from its node. When there are
     multiple elements, the lists of nodes corresponding to each podAffinityTerm
     are intersected, i.e. all terms must be satisfied.

podAntiAffinity

[root@k8s-master ~]# kubectl explain deploy.spec.template.spec.affinity.podAntiAffinity
KIND:     Deployment
VERSION:  apps/v1

RESOURCE: podAntiAffinity <Object>

DESCRIPTION:
     Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod
     in the same node, zone, etc. as some other pod(s)).

     Pod anti affinity is a group of inter pod anti affinity scheduling rules.

FIELDS:
   preferredDuringSchedulingIgnoredDuringExecution      <[]Object>
     The scheduler will prefer to schedule pods to nodes that satisfy the
     anti-affinity expressions specified by this field, but it may choose a node
     that violates one or more of the expressions. The node that is most
     preferred is the one with the greatest sum of weights, i.e. for each node
     that meets all of the scheduling requirements (resource request,
     requiredDuringScheduling anti-affinity expressions, etc.), compute a sum by
     iterating through the elements of this field and adding "weight" to the sum
     if the node has pods which matches the corresponding podAffinityTerm; the
     node(s) with the highest sum are the most preferred.

   requiredDuringSchedulingIgnoredDuringExecution       <[]Object>
     If the anti-affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     anti-affinity requirements specified by this field cease to be met at some
     point during pod execution (e.g. due to a pod label update), the system may
     or may not try to eventually evict the pod from its node. When there are
     multiple elements, the lists of nodes corresponding to each podAffinityTerm
     are intersected, i.e. all terms must be satisfied.

从上面可以看出都有相同的属性字段

preferredDuringSchedulingIgnoredDuringExecution：软亲和性
requiredDuringSchedulingIgnoredDuringExecution：硬亲和性

软亲和性：结合下面的 “operator: NotIn”，意思就是尽量不要将 pod 调度到匹配到的节点，但是如果没有不匹配的节点的话，也可以调度到匹配到的节点

硬亲和性：结合下面的 “operator: In”，意思就是必须调度到满足条件的节点上，否则就等着 Pending

调度策略	匹配标签	操作符	拓扑域支持	调度目标
nodeAffinity	主机	In，NotIn，Exists，DoesNotExist, Gt, Lt	否	指定主机
podAffinity	pod	In，NotIn，Exists，DoesNotExist	是	pod 与指定 pod同一拓扑域
podAntAffinity	pod	In，NotIn，Exists，DoesNotExist	是	pod 与指定 pod 不在同一拓扑域

操作符

In：label 的值在某个列表中
NotIn：label 的值不在某个列表中
Gt：label 的值大于某个值
Lt：label 的值小于某个值
Exists：某个 label 存在
DoesNotExist：某个 label 不存在

注意：不管哪种亲和性都是需要依赖标签功能的

[root@k8s-master ~]# kubectl get nodes --show-labels
NAME          STATUS   ROLES            AGE     VERSION   LABELS
k8s-master    Ready    compute,master   109d    v1.18.9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/compute=dedicated-middleware,node-role.kubernetes.io/master=
k8s-node-01   Ready    <none>           4h18m   v1.18.9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-01,kubernetes.io/os=linux
k8s-node-02   Ready    <none>           4h16m   v1.18.9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-02,kubernetes.io/os=linux


[root@k8s-master ~]# kubectl get pods -n kube-system --show-labels
NAME                                       READY   STATUS    RESTARTS   AGE     LABELS
calico-kube-controllers-5b8b769fcd-8hlzn   1/1     Running   23         109d    k8s-app=calico-kube-controllers,pod-template-hash=5b8b769fcd
calico-node-fwcss                          1/1     Running   23         109d    controller-revision-hash=b9dd4bd9f,k8s-app=calico-node,pod-template-generation=1
calico-node-m84rz                          1/1     Running   0          4h17m   controller-revision-hash=b9dd4bd9f,k8s-app=calico-node,pod-template-generation=1
calico-node-tvs89                          1/1     Running   0          4h19m   controller-revision-hash=b9dd4bd9f,k8s-app=calico-node,pod-template-generation=1
coredns-65556b4c97-dhkz4                   1/1     Running   5          24d     k8s-app=kube-dns,pod-template-hash=65556b4c97
etcd-k8s-master                            1/1     Running   23         109d    component=etcd,tier=control-plane
kube-apiserver-k8s-master                  1/1     Running   23         109d    component=kube-apiserver,tier=control-plane
kube-controller-manager-k8s-master         1/1     Running   24         109d    component=kube-controller-manager,tier=control-plane
kube-proxy-9b84w                           1/1     Running   0          4h19m   controller-revision-hash=949786769,k8s-app=kube-proxy,pod-template-generation=1
kube-proxy-hftdw                           1/1     Running   23         109d    controller-revision-hash=949786769,k8s-app=kube-proxy,pod-template-generation=1
kube-proxy-x4lnq                           1/1     Running   0          4h17m   controller-revision-hash=949786769,k8s-app=kube-proxy,pod-template-generation=1
kube-scheduler-k8s-master                  1/1     Running   23         109d    component=kube-scheduler,tier=control-plane
metrics-server-86499f7fd8-pdw6d            1/1     Running   3          9d      k8s-app=metrics-server,pod-template-hash=86499f7fd8
nfs-client-provisioner-df46b8d64-jwgd4     1/1     Running   23         109d    app=nfs-client-provisioner,pod-template-hash=df46b8d64

亲和性

硬亲和性

nodeAffinity的硬亲和性

[root@k8s-master nginx]# cat nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx1
  labels:
    app: nginx1
spec:
  replicas: 10
  selector:
    matchLabels:
      app: nginx1
  template:
    metadata:
      labels:
        app: nginx1
    spec:
      affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/hostname    #指定node的标签
                  operator: NotIn                #设置Pod安装到kubernetes.io/hostname的标签值不在values列表中的node上
                  values:
                  - k8s-node-01
      initContainers:
      - name: init-container
        image: busybox:latest
        imagePullPolicy: IfNotPresent
        command: ["sh"]
        env:
#        - name: MY_POD_NAME
#          valueFrom:
#            fieldRef:
#              fieldPath: metadata.name
         - name: MY_POD_IP
           valueFrom:
             fieldRef:
               fieldPath: status.podIP
        args:
          [
            "-c",
            "echo ${HOSTNAME} ${MY_POD_IP} > /wwwroot/index.html",
          ]
        volumeMounts:
        - name: wwwroot
          mountPath: "/wwwroot"
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
          protocol: TCP
        volumeMounts:
        - name: wwwroot
          mountPath: /usr/share/nginx/html/index.html
          subPath: index.html
      volumes:
        - name: wwwroot
          emptyDir: {}

调度器则不会调度pod到该节点中，而是选择其他合适的节点

# 可以看到所有pod只会调度到非k8s-node-01的节点
[root@k8s-master nginx]# kubectl get pods -o wide | grep nginx1
nginx1-664f458845-2bbcw                            1/1     Running   0          2m38s   10.100.44.197    k8s-node-02   <none>           <none>
nginx1-664f458845-64rg2                            1/1     Running   0          2m39s   10.100.44.198    k8s-node-02   <none>           <none>
nginx1-664f458845-8tbxt                            1/1     Running   0          2m39s   10.100.44.200    k8s-node-02   <none>           <none>
nginx1-664f458845-bz672                            1/1     Running   0          2m38s   10.100.44.195    k8s-node-02   <none>           <none>
nginx1-664f458845-ft5c9                            1/1     Running   0          2m38s   10.100.44.201    k8s-node-02   <none>           <none>
nginx1-664f458845-jp8tz                            1/1     Running   0          2m39s   10.100.44.196    k8s-node-02   <none>           <none>
nginx1-664f458845-lf5k7                            1/1     Running   0          2m38s   10.100.44.202    k8s-node-02   <none>           <none>
nginx1-664f458845-sn9c5                            1/1     Running   0          2m39s   10.100.44.199    k8s-node-02   <none>           <none>
nginx1-664f458845-t8nrb                            1/1     Running   0          2m39s   10.100.44.194    k8s-node-02   <none>           <none>
nginx1-664f458845-vbwxn                            1/1     Running   0          2m39s   10.100.44.193    k8s-node-02   <none>           <none>

由于博主这里给mater设置了可调度，但是在配置了调度策略后，竟然没有一个pod调度到master，把节点2也排除时才会到master节点上，看来调度到master节点是最后无可用node节点时才会选择

[root@k8s-master nginx]# cat nginx-deployment.yaml
####### 略
    spec:
      affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/hostname    #指定node的标签
                  operator: NotIn                #设置Pod安装到kubernetes.io/hostname的标签值不在values列表中的node上
                  values:
                  - k8s-node-01
                  - k8s-node-02
####### 略

[root@k8s-master nginx]# kubectl get pods -o wide | grep nginx1
nginx1-6cb8f796f7-25f5l                            1/1     Running   0          15m   10.100.235.219   k8s-master   <none>           <none>
nginx1-6cb8f796f7-2qfcv                            1/1     Running   0          16m   10.100.235.194   k8s-master   <none>           <none>
nginx1-6cb8f796f7-78t2r                            1/1     Running   0          15m   10.100.235.242   k8s-master   <none>           <none>
nginx1-6cb8f796f7-b9c5n                            1/1     Running   0          16m   10.100.235.236   k8s-master   <none>           <none>
nginx1-6cb8f796f7-ccpd8                            1/1     Running   0          16m   10.100.235.254   k8s-master   <none>           <none>
nginx1-6cb8f796f7-jhp2k                            1/1     Running   0          15m   10.100.235.213   k8s-master   <none>           <none>
nginx1-6cb8f796f7-l6s9h                            1/1     Running   0          16m   10.100.235.231   k8s-master   <none>           <none>
nginx1-6cb8f796f7-qnjvg                            1/1     Running   0          15m   10.100.235.199   k8s-master   <none>           <none>
nginx1-6cb8f796f7-tgx6l                            1/1     Running   0          16m   10.100.235.202   k8s-master   <none>           <none>
nginx1-6cb8f796f7-twhxq                            1/1     Running   0          15m   10.100.235.226   k8s-master   <none>           <none>

此时再将master也加上排除，则会看到滚动更新时新pod一直处于pending状态

[root@k8s-master nginx]# kubectl get pods -o wide | grep nginx1
nginx1-6cb8f796f7-2qfcv                            1/1     Running   0          20m     10.100.235.194   k8s-master   <none>           <none>
nginx1-6cb8f796f7-b9c5n                            1/1     Running   0          20m     10.100.235.236   k8s-master   <none>           <none>
nginx1-6cb8f796f7-ccpd8                            1/1     Running   0          20m     10.100.235.254   k8s-master   <none>           <none>
nginx1-6cb8f796f7-jhp2k                            1/1     Running   0          19m     10.100.235.213   k8s-master   <none>           <none>
nginx1-6cb8f796f7-l6s9h                            1/1     Running   0          20m     10.100.235.231   k8s-master   <none>           <none>
nginx1-6cb8f796f7-qnjvg                            1/1     Running   0          19m     10.100.235.199   k8s-master   <none>           <none>
nginx1-6cb8f796f7-tgx6l                            1/1     Running   0          20m     10.100.235.202   k8s-master   <none>           <none>
nginx1-6cb8f796f7-twhxq                            1/1     Running   0          19m     10.100.235.226   k8s-master   <none>           <none>
nginx1-9f7cb6d58-mvhwr                             0/1     Pending   0          2m54s   <none>           <none>       <none>           <none>
nginx1-9f7cb6d58-pz6vn                             0/1     Pending   0          2m53s   <none>           <none>       <none>           <none>
nginx1-9f7cb6d58-rkbm5                             0/1     Pending   0          2m53s   <none>           <none>       <none>           <none>
nginx1-9f7cb6d58-tsc7z                             0/1     Pending   0          2m54s   <none>           <none>       <none>           <none>
nginx1-9f7cb6d58-x4c8w                             0/1     Pending   0          2m54s   <none>           <none>       <none>           <none>


[root@k8s-master nginx]# kubectl describe pods nginx1-9f7cb6d58-mvhwr
Name:           nginx1-9f7cb6d58-mvhwr
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=nginx1
                pod-template-hash=9f7cb6d58
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/nginx1-9f7cb6d58
Init Containers:
  init-container:
    Image:      busybox:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      sh
    Args:
      -c
      echo ${HOSTNAME} ${MY_POD_IP} > /wwwroot/index.html
    Environment:
      MY_POD_IP:   (v1:status.podIP)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-scngg (ro)
      /wwwroot from wwwroot (rw)
Containers:
  nginx:
    Image:        nginx:latest
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /usr/share/nginx/html/index.html from wwwroot (rw,path="index.html")
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-scngg (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  wwwroot:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  default-token-scngg:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-scngg
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3m42s  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.
  Warning  FailedScheduling  3m42s  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

podAffinity的硬亲和性

创建两个deploy

A deploy使用nodeAffinity配置调度到node-01，并配置标签group=a
B deploy使用podAffinity调度到具有标签group=a的pod的节点上

A deploy

[root@k8s-master nginx]# cat nginx-a-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-a
  labels:
    app: nginx-a
spec:
  replicas: 10
  selector:
    matchLabels:
      app: nginx-a
  template:
    metadata:
      labels:
        app: nginx-a
    spec:
      affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values:
                  - k8s-node-01
      initContainers:
      - name: init-container
        image: busybox:latest
        imagePullPolicy: IfNotPresent
        command: ["sh"]
        env:
#        - name: MY_POD_NAME
#          valueFrom:
#            fieldRef:
#              fieldPath: metadata.name
         - name: MY_POD_IP
           valueFrom:
             fieldRef:
               fieldPath: status.podIP
        args:
          [
            "-c",
            "echo ${HOSTNAME} ${MY_POD_IP} > /wwwroot/index.html",
          ]
        volumeMounts:
        - name: wwwroot
          mountPath: "/wwwroot"
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
          protocol: TCP
        volumeMounts:
        - name: wwwroot
          mountPath: /usr/share/nginx/html/index.html
          subPath: index.html
      volumes:
        - name: wwwroot
          emptyDir: {}

[root@k8s-master nginx]# kubectl get pods -o wide --show-labels | grep nginx-a
nginx-a-55c8c877d5-29smq                           1/1     Running   0          2m35s   10.100.154.212   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5
nginx-a-55c8c877d5-5s92q                           1/1     Running   0          2m35s   10.100.154.206   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5
nginx-a-55c8c877d5-5tbf8                           1/1     Running   0          2m35s   10.100.154.203   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5
nginx-a-55c8c877d5-6qzdp                           1/1     Running   0          2m35s   10.100.154.210   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5
nginx-a-55c8c877d5-7zr2b                           1/1     Running   0          2m35s   10.100.154.208   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5
nginx-a-55c8c877d5-bqnvw                           1/1     Running   0          2m35s   10.100.154.207   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5
nginx-a-55c8c877d5-s7fjn                           1/1     Running   0          2m35s   10.100.154.209   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5
nginx-a-55c8c877d5-w7nsq                           1/1     Running   0          2m35s   10.100.154.211   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5
nginx-a-55c8c877d5-wkss5                           1/1     Running   0          2m35s   10.100.154.204   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5
nginx-a-55c8c877d5-z4q2w                           1/1     Running   0          2m35s   10.100.154.205   k8s-node-01   <none>           <none>            app=nginx-a,group=a,pod-template-hash=55c8c877d5

B deploy

[root@k8s-master nginx]# cat nginx-b-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-b
  labels:
    app: nginx-b
spec:
  replicas: 10
  selector:
    matchLabels:
      app: nginx-b
  template:
    metadata:
      labels:
        app: nginx-b
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: group
                operator: In
                values:
                - a
            topologyKey: kubernetes.io/hostname
      initContainers:
      - name: init-container
        image: busybox:latest
        imagePullPolicy: IfNotPresent
        command: ["sh"]
        env:
#        - name: MY_POD_NAME
#          valueFrom:
#            fieldRef:
#              fieldPath: metadata.name
         - name: MY_POD_IP
           valueFrom:
             fieldRef:
               fieldPath: status.podIP
        args:
          [
            "-c",
            "echo ${HOSTNAME} ${MY_POD_IP} > /wwwroot/index.html",
          ]
        volumeMounts:
        - name: wwwroot
          mountPath: "/wwwroot"
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
          protocol: TCP
        volumeMounts:
        - name: wwwroot
          mountPath: /usr/share/nginx/html/index.html
          subPath: index.html
      volumes:
        - name: wwwroot
          emptyDir: {}

[root@k8s-master nginx]# kubectl get pods -o wide | grep nginx
nginx-a-55c8c877d5-29smq                           1/1     Running           0          21m   10.100.154.212   k8s-node-01   <none>           <none>
nginx-a-55c8c877d5-5s92q                           1/1     Running           0          21m   10.100.154.206   k8s-node-01   <none>           <none>
nginx-a-55c8c877d5-5tbf8                           1/1     Running           0          21m   10.100.154.203   k8s-node-01   <none>           <none>
nginx-a-55c8c877d5-6qzdp                           1/1     Running           0          21m   10.100.154.210   k8s-node-01   <none>           <none>
nginx-a-55c8c877d5-7zr2b                           1/1     Running           0          21m   10.100.154.208   k8s-node-01   <none>           <none>
nginx-a-55c8c877d5-bqnvw                           1/1     Running           0          21m   10.100.154.207   k8s-node-01   <none>           <none>
nginx-a-55c8c877d5-s7fjn                           1/1     Running           0          21m   10.100.154.209   k8s-node-01   <none>           <none>
nginx-a-55c8c877d5-w7nsq                           1/1     Running           0          21m   10.100.154.211   k8s-node-01   <none>           <none>
nginx-a-55c8c877d5-wkss5                           1/1     Running           0          21m   10.100.154.204   k8s-node-01   <none>           <none>
nginx-a-55c8c877d5-z4q2w                           1/1     Running           0          21m   10.100.154.205   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-4ds8b                           0/1     PodInitializing   0          44s   10.100.154.228   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-5w6w2                           1/1     Running           0          44s   10.100.154.223   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-fs6qk                           1/1     Running           0          44s   10.100.154.232   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-jwb5d                           0/1     PodInitializing   0          44s   10.100.154.229   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-pgt9l                           0/1     PodInitializing   0          44s   10.100.154.226   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-q5fmc                           0/1     PodInitializing   0          44s   10.100.154.231   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-rnd55                           1/1     Running           0          44s   10.100.154.224   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-sgljk                           1/1     Running           0          44s   10.100.154.225   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-vz7js                           1/1     Running           0          44s   10.100.154.227   k8s-node-01   <none>           <none>
nginx-b-7bfbc47b99-wj68d                           1/1     Running           0          44s   10.100.154.230   k8s-node-01   <none>           <none>

软亲和性

软亲和性是一种柔性控制逻辑，被调度的Pod应该尽量放置在满足亲和条件的节点上，但亲和条件不满足时，该Pod也能接受被调度到其它不满足亲和条件的节点上。另外，多个软亲和条件并存时，还支持为亲和条件定义weight属性以区别它们的优先级，取值范围1-100，数字越大优先级越高，Pod越优先被调度到此节点上

nodeAffinity的软亲和性

Pod规范中的spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution字段用于定义Pod和节点的首选亲和关系，它可以嵌套使用preference和weight字段。

weight：指定软亲和条件的优先级，取值范围1-100，数字越大优先级越高
preference：用于定义节点选择器，值是一个对象，支持matchExpressions和matchFields两种表达机制，它们的使用方式和逻辑与强制亲和一样

加入软亲和性配置如下

      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 60
            preference:
              matchExpressions:
              - key: departement
                operator: In
                values: 
                - dep-a
          - weight: 30
            preference:
              matchExpressions:
              - key: ssd
                operator: In
                values: 
                - true

在上面的示例中，定义了两个软亲和条件，第一个用于选择具有departement=dep-a标签的节点，优先级为60；第二个用于选择具有ssd=true标签的节点，优先级为30。此时可以将集群中的节点分为4类：

同时具有departement=dep-a和ssd=true标签的节点，优先级最高为60+30=90
只具有project=project2标签的节点，优先级为60
只具有ssd=true标签的节点，优先级为30
不具有departement=dep-a和ssd=true标签的节点，优先级为0

Pod在调度时会优先选择第一种开始选择，以此类推往下选择，最后一种就是优先级为0的节点，所以不会像硬亲和性导致pod pending

podAffinity软亲和性

Pod间的首选亲和通过spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution字段中，其值是一个对象列表，支持嵌套使用weight和podAffinityTerm字段

weight：数字，取值范围1-100，定义软亲和条件的权重
podAffinityTerm：Pod标签选择器定义，可以嵌套使用labelSelector、namespaces和topologyKey字段
加入策略如下

      affinity:
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 80
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: group
                  operator: In
                  values:
                  - a
              topologyKey: kubernetes.io/hostname
          - weight: 40
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: group
                  operator: NotIn
                  values: 
                  - b
              topologyKey: kubernetes.io/hostname

上面这个例子优选节点上存在pod标签group=a且满足pod标签不存在group=b的节点，最差的选择则就是pod标签不带group=a且满足pod标签存在group=b的节点

反亲和性

通过这个机制，就可以实现一个workload内pod均匀调度到每个节点上，降低出现同一workload内pod密集于某些node上的情况

podAntiAffinity软反亲和性

调度器尽量不会把互斥的Pod调度到同一位置，但约束条件无法满足时，也会将Pod放在同一位置，而不是将Pod至于Pending状态

[root@k8s-master nginx]# cat nginx-d-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-d
  labels:
    app: nginx-d
spec:
  replicas: 9
  selector:
    matchLabels:
      app: nginx-d
      tier: backend
  template:
    metadata:
      labels:
        app: nginx-d
        tier: backend
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 60
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: tier
                  operator: In
                  values:
                  - backend
              topologyKey: kubernetes.io/hostname
      initContainers:
      - name: init-container
        image: busybox:latest
        imagePullPolicy: IfNotPresent
        command: ["sh"]
        env:
#        - name: MY_POD_NAME
#          valueFrom:
#            fieldRef:
#              fieldPath: metadata.name
         - name: MY_POD_IP
           valueFrom:
             fieldRef:
               fieldPath: status.podIP
        args:
          [
            "-c",
            "echo ${HOSTNAME} ${MY_POD_IP} > /wwwroot/index.html",
          ]
        volumeMounts:
        - name: wwwroot
          mountPath: "/wwwroot"
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
          protocol: TCP
        volumeMounts:
        - name: wwwroot
          mountPath: /usr/share/nginx/html/index.html
          subPath: index.html
      volumes:
        - name: wwwroot
          emptyDir: {}
[root@k8s-master nginx]# kubectl get pods -o wide --show-labels | grep nginx-d
nginx-d-6985c677dd-26pl9                           1/1     Running   0          56s     10.100.235.222   k8s-master    <none>           <none>            app=nginx-d,pod-template-hash=6985c677dd,tier=backend
nginx-d-6985c677dd-b2x5z                           1/1     Running   0          56s     10.100.154.235   k8s-node-01   <none>           <none>            app=nginx-d,pod-template-hash=6985c677dd,tier=backend
nginx-d-6985c677dd-cv74q                           1/1     Running   0          56s     10.100.235.221   k8s-master    <none>           <none>            app=nginx-d,pod-template-hash=6985c677dd,tier=backend
nginx-d-6985c677dd-hf5k5                           1/1     Running   0          56s     10.100.44.205    k8s-node-02   <none>           <none>            app=nginx-d,pod-template-hash=6985c677dd,tier=backend
nginx-d-6985c677dd-hjmm2                           1/1     Running   0          56s     10.100.154.234   k8s-node-01   <none>           <none>            app=nginx-d,pod-template-hash=6985c677dd,tier=backend
nginx-d-6985c677dd-hs6pw                           1/1     Running   0          56s     10.100.44.207    k8s-node-02   <none>           <none>            app=nginx-d,pod-template-hash=6985c677dd,tier=backend
nginx-d-6985c677dd-nfmfm                           1/1     Running   0          56s     10.100.44.204    k8s-node-02   <none>           <none>            app=nginx-d,pod-template-hash=6985c677dd,tier=backend
nginx-d-6985c677dd-tcfxr                           1/1     Running   0          56s     10.100.44.206    k8s-node-02   <none>           <none>            app=nginx-d,pod-template-hash=6985c677dd,tier=backend
nginx-d-6985c677dd-thxlt                           1/1     Running   0          56s     10.100.44.208    k8s-node-02   <none>           <none>            app=nginx-d,pod-template-hash=6985c677dd,tier=backend

podAntiAffinity硬反亲和性

调度器尽量不会把互斥的Pod调度到同一位置，但约束条件无法满足时，则会处于pending状态

[root@k8s-master nginx]# cat nginx-c-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-c
  labels:
    app: nginx-c
spec:
  replicas: 9
  selector:
    matchLabels:
      app: nginx-c
      tier: frontend
  template:
    metadata:
      labels:
        app: nginx-c
        tier: frontend
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: tier
                operator: In
                values:
                - frontend
            topologyKey: kubernetes.io/hostname
      initContainers:
      - name: init-container
        image: busybox:latest
        imagePullPolicy: IfNotPresent
        command: ["sh"]
        env:
#        - name: MY_POD_NAME
#          valueFrom:
#            fieldRef:
#              fieldPath: metadata.name
         - name: MY_POD_IP
           valueFrom:
             fieldRef:
               fieldPath: status.podIP
        args:
          [
            "-c",
            "echo ${HOSTNAME} ${MY_POD_IP} > /wwwroot/index.html",
          ]
        volumeMounts:
        - name: wwwroot
          mountPath: "/wwwroot"
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
          protocol: TCP
        volumeMounts:
        - name: wwwroot
          mountPath: /usr/share/nginx/html/index.html
          subPath: index.html
      volumes:
        - name: wwwroot
          emptyDir: {}

[root@k8s-master nginx]# kubectl get pod -o wide --show-labels | grep nginx-c
nginx-c-5c57748846-5xf25                           0/1     Pending   0          2m13s   <none>           <none>        <none>           <none>            app=nginx-c,pod-template-hash=5c57748846,tier=frontend
nginx-c-5c57748846-68sr2                           0/1     Pending   0          2m13s   <none>           <none>        <none>           <none>            app=nginx-c,pod-template-hash=5c57748846,tier=frontend
nginx-c-5c57748846-7z24h                           0/1     Pending   0          2m13s   <none>           <none>        <none>           <none>            app=nginx-c,pod-template-hash=5c57748846,tier=frontend
nginx-c-5c57748846-85fcw                           1/1     Running   0          3m17s   10.100.235.244   k8s-master    <none>           <none>            app=nginx-c,pod-template-hash=5c57748846,tier=frontend
nginx-c-5c57748846-pdr4r                           1/1     Running   0          3m17s   10.100.154.233   k8s-node-01   <none>           <none>            app=nginx-c,pod-template-hash=5c57748846,tier=frontend
nginx-c-5c57748846-pmrn9                           0/1     Pending   0          2m13s   <none>           <none>        <none>           <none>            app=nginx-c,pod-template-hash=5c57748846,tier=frontend
nginx-c-5c57748846-qms5r                           0/1     Pending   0          2m13s   <none>           <none>        <none>           <none>            app=nginx-c,pod-template-hash=5c57748846,tier=frontend
nginx-c-5c57748846-sxfkw                           0/1     Pending   0          2m13s   <none>           <none>        <none>           <none>            app=nginx-c,pod-template-hash=5c57748846,tier=frontend
nginx-c-5c57748846-tbs2g                           1/1     Running   0          3m17s   10.100.44.203    k8s-node-02   <none>           <none>            app=nginx-c,pod-template-hash=5c57748846,tier=frontend