@lijiang

Sculpting in time

Do one thing and do it well.
Every story has a beginning and an end.

Volcano高性能引擎

kubernets原生容器批量调度引擎

3 分钟

简介

volcano是一个用于高性能工作负载场景下基于kubernets的容器批量引擎。

应用场景:

  1. 机器学习以及深度学习

  2. 生物以及基因计算

  3. 大数据应用

概念

Queue

容纳一组podgroup的队列

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: distcc
spec:
  weight: 1
  reclaimable: false
  capability:
    cpu: 50

字段:

weight -> 该queue在集群资源划分中所占有的比例,该queue占用的资源比例为: (weight / total-weight) * total-resource,资源软约束。

capability -> queue内所有podgroup使用资源之和的上限,资源硬约束。

reclaimable -> 当该queue在资源使用超过该queue限制时,是否允许其他queue回收该queue使用的超额资源。

使用场景

Total Cluster CPUS = 4cores

---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test1
spec:
  weight: 1
  
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test2
spec:
  weight: 3

# 创建p1 p2 podgroup分别属于test1,test2,分别向p1 p2中投入job1 job2,资源申请分别为1C和3C,两个job均能正常工作
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test1
spec:
  weight: 1

# 首先创建test1 queue,创建podgroup p1,在p1中创建job1 job2,资源分配分别为1C和3C,job均能正常工作。

---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test2
spec:
  weight: 3

# 创建test2 queue, 在该queue中创建podgroup p2,在p2中创建job3资源申请为3C,由于test2 queue weight=3,从而job2将被驱逐,test1 3C资源将归还给test2。
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test1
spec:
  capability:
    cpu: 2

# 创建test1 queue,容量设置为2,也就是资源上限使用为2C,创建p1 podgroup,在p1中创建job1 job2资源申请分别为1C和3C,那么job1正常运行,job2处于pending状态
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test1
spec:
  weight: 1
  reclaimable: false
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: test2
spec:
  weight: 1

# 创建 test1 queue,reclaimable为False,也就是该queue不归还多占用的资源,分别在test1 test2 queue 中创建p1 p2,在p1中创建job1,资源申请为3C,由于权重比例为1:1,此时 test1 多占用1C,在p2中创建job2,资源申请为2C,此时由于test1不归还多占用的资源,job2将处于pending状态。

PodGroup

podgroup是一组强关联pod的集合,用于批处理工作负载场景。

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: distcc
spec:
  capability:
    cpu: 50
  reclaimable: false
  weight: 1
status:
  running: 1
  state: Open
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  labels:
    volcano.sh/job-type: COMPILER
  name: distcc
  namespace: eth
  ownerReferences:
  - apiVersion: batch.volcano.sh/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: distcc
    uid: 871a0e89-9478-4369-88ae-9b9105910965
  resourceVersion: "211427823"
  uid: 6a610eec-e135-496c-8e4c-84e66dd15b21
spec:
  minMember: 6
  minResources:
    cpu: "4"
  queue: distcc
status:
  conditions:
  - lastTransitionTime: "2021-06-02T05:38:05Z"
    message: '6/0 tasks in gang unschedulable: pod group is not ready, 6 minAvailable.'
    reason: NotEnoughResources
    status: "True"
    transitionID: 2c741e36-4b2d-4fba-a35f-0be5f5d454eb
    type: Unschedulable
  - lastTransitionTime: "2021-06-02T05:40:30Z"
    reason: tasks in gang are ready to be scheduled
    status: "True"
    transitionID: 1d85da8b-2a6a-4969-9aa3-8d41dab98dd7
    type: Scheduled
  phase: Running
  running: 11

minMember -> 该podgroup下最少需要运行的pod或任务数量,如果集群资源不满足minMember数量任务的运行需求,调度器将不会调度任何一个该podgroup内的任务。

queue -> 表示该podgroup所属的queue。

minResources -> 表示运行该podgroup所需要的最少资源。

当创建volcanoJob时未指定相关的podGroup,podGroup将被自动创建,名称与volcanoJob同名

volcanoJob

volcano自定义的Job类型

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: distcc
  labels:
    "volcano.sh/job-type": "COMPILER"
spec:
  minAvailable: 6
  schedulerName: volcano
  queue: distcc
  plugins:
    svc: []
    env: []
  policies:
    - event: PodEvicted
      action: RestartJob
  volumes:
    - mountPath: "/src"
      volumeClaim:
        accessModes: ["ReadWriteOnce"]
        storageClassName: "managed-nfs-storage"
        resources:
          requests:
            storage: 1Gi
  tasks:
    - replicas: 1
      name: master
      policies:
        - event: TaskCompleted
          action: CompleteJob
      template:
        spec:
          containers:
            - command:
                - tail
                - -f
                - /dev/null
              image: 192.168.1.114:5000/distcc:k8s-2021-06-02
              name: master
              resources:
                requests:
                  cpu: 4
                limits:
                  cpu: 4
          restartPolicy: OnFailure
    - replicas: 10
      name: worker
      policies:
        - event: TaskCompleted
          action: CompleteJob
      template:
        spec:
          containers:
            - image: 192.168.1.114:5000/distcc:k8s-2021-06-02
              name: worker
          restartPolicy: OnFailure

schedulerName -> 该job使用的调度器,默认为: volcano

minAvailable -> 表示该job所要运行的最少pod数量

volumes -> 表示该job的挂载卷,遵从k8s volumes配置要求

tasks.replicas -> 表示某个task pod的具体副本数

tasks.template -> 表示某个task pod的具体配置定义

tasks.policies -> 表示某个task pod的生命周期策略

plugins -> 该job在调度过程中使用的插件

queue -> 该job所属的队列

分布式编译

创建 distcc queue,采用distcc来分布式编译MPICH,我们需要一个master用于启动distcc编译,一些distcc worker节点用于接受编译请求

# Dockerfile
FROM ubuntu:20.04
LABEL maintainer="[email protected]"

RUN apt-get -y update && apt-get -y upgrade
RUN apt-get install -y g++ gcc clang distcc build-essential
RUN apt-get -y -q autoremove && apt-get -y -q clean

EXPOSE 3632

CMD distccd --jobs $(nproc) --log-stderr --no-detach --daemon --allow 10.0.0.0/8 --log-level info
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: distcc
spec:
  weight: 1

创建Job

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: distcc
  labels:
    "volcano.sh/job-type": "COMPILER"
spec:
  minAvailable: 6
  schedulerName: volcano
  queue: distcc
  plugins:
    svc: []
    env: []
  policies:
    - event: PodEvicted
      action: RestartJob
  volumes:
    - mountPath: "/src"
      volumeClaimName: distcc-data
      #volumeClaim:
      #  accessModes: ["ReadWriteOnce"]
      #  storageClassName: "managed-nfs-storage"
      #  resources:
      #    requests:
      #      storage: 1Gi
  tasks:
    - replicas: 1
      name: master
      policies:
        - event: TaskCompleted
          action: CompleteJob
      template:
        spec:
          containers:
            - command:
                - /bin/sh
                - -c
                - |
                  cd /tmp;
                  echo "start...";
                  cp -v /src/mpich-3.3.2.tar.gz ./
                  tar xf mpich-3.3.2.tar.gz;
                  cd mpich-3.3.2;
                  export DISTCC_HOSTS="$(cat /etc/volcano/worker.host | tr '\n' ' ')";
                  CC=distcc CXX=distcc ./configure --disable-fortran;
                  make -j50;
                  mkdir -pv /src/mpich;
                  make install DESTDIR=/src/mpich;
              image: 192.168.1.114:5000/distcc:k8s-2021-06-02
              name: master
              resources:
                requests:
                  cpu: 4
                limits:
                  cpu: 4
          restartPolicy: OnFailure
    - replicas: 10
      name: worker
      policies:
        - event: TaskCompleted
          action: CompleteJob
      template:
        spec:
          containers:
            - image: 192.168.1.114:5000/distcc:k8s-2021-06-02
              name: worker
          restartPolicy: OnFailure

编译完成之后,我们可以在/src/mpich中看到安装好的二进制MPICh。

最新文章

分类

关于

Keep thinking, Stay curious
Always be sensitive to new things