Volcano高性能引擎
kubernets原生容器批量调度引擎
简介
volcano是一个用于高性能工作负载场景下基于kubernets的容器批量引擎。
应用场景:
-
机器学习以及深度学习
-
生物以及基因计算
-
大数据应用
概念
Queue
容纳一组podgroup的队列
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: distcc
spec:
weight: 1
reclaimable: false
capability:
cpu: 50
字段:
weight -> 该queue在集群资源划分中所占有的比例,该queue占用的资源比例为: (weight / total-weight) * total-resource,资源软约束。
capability -> queue内所有podgroup使用资源之和的上限,资源硬约束。
reclaimable -> 当该queue在资源使用超过该queue限制时,是否允许其他queue回收该queue使用的超额资源。
使用场景
Total Cluster CPUS = 4cores
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: test1
spec:
weight: 1
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: test2
spec:
weight: 3
# 创建p1 p2 podgroup分别属于test1,test2,分别向p1 p2中投入job1 job2,资源申请分别为1C和3C,两个job均能正常工作
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: test1
spec:
weight: 1
# 首先创建test1 queue,创建podgroup p1,在p1中创建job1 job2,资源分配分别为1C和3C,job均能正常工作。
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: test2
spec:
weight: 3
# 创建test2 queue, 在该queue中创建podgroup p2,在p2中创建job3资源申请为3C,由于test2 queue weight=3,从而job2将被驱逐,test1 3C资源将归还给test2。
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: test1
spec:
capability:
cpu: 2
# 创建test1 queue,容量设置为2,也就是资源上限使用为2C,创建p1 podgroup,在p1中创建job1 job2资源申请分别为1C和3C,那么job1正常运行,job2处于pending状态
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: test1
spec:
weight: 1
reclaimable: false
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: test2
spec:
weight: 1
# 创建 test1 queue,reclaimable为False,也就是该queue不归还多占用的资源,分别在test1 test2 queue 中创建p1 p2,在p1中创建job1,资源申请为3C,由于权重比例为1:1,此时 test1 多占用1C,在p2中创建job2,资源申请为2C,此时由于test1不归还多占用的资源,job2将处于pending状态。
PodGroup
podgroup是一组强关联pod的集合,用于批处理工作负载场景。
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: distcc
spec:
capability:
cpu: 50
reclaimable: false
weight: 1
status:
running: 1
state: Open
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
labels:
volcano.sh/job-type: COMPILER
name: distcc
namespace: eth
ownerReferences:
- apiVersion: batch.volcano.sh/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Job
name: distcc
uid: 871a0e89-9478-4369-88ae-9b9105910965
resourceVersion: "211427823"
uid: 6a610eec-e135-496c-8e4c-84e66dd15b21
spec:
minMember: 6
minResources:
cpu: "4"
queue: distcc
status:
conditions:
- lastTransitionTime: "2021-06-02T05:38:05Z"
message: '6/0 tasks in gang unschedulable: pod group is not ready, 6 minAvailable.'
reason: NotEnoughResources
status: "True"
transitionID: 2c741e36-4b2d-4fba-a35f-0be5f5d454eb
type: Unschedulable
- lastTransitionTime: "2021-06-02T05:40:30Z"
reason: tasks in gang are ready to be scheduled
status: "True"
transitionID: 1d85da8b-2a6a-4969-9aa3-8d41dab98dd7
type: Scheduled
phase: Running
running: 11
minMember -> 该podgroup下最少需要运行的pod或任务数量,如果集群资源不满足minMember数量任务的运行需求,调度器将不会调度任何一个该podgroup内的任务。
queue -> 表示该podgroup所属的queue。
minResources -> 表示运行该podgroup所需要的最少资源。
当创建volcanoJob时未指定相关的podGroup,podGroup将被自动创建,名称与volcanoJob同名
volcanoJob
volcano自定义的Job类型
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: distcc
labels:
"volcano.sh/job-type": "COMPILER"
spec:
minAvailable: 6
schedulerName: volcano
queue: distcc
plugins:
svc: []
env: []
policies:
- event: PodEvicted
action: RestartJob
volumes:
- mountPath: "/src"
volumeClaim:
accessModes: ["ReadWriteOnce"]
storageClassName: "managed-nfs-storage"
resources:
requests:
storage: 1Gi
tasks:
- replicas: 1
name: master
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- command:
- tail
- -f
- /dev/null
image: 192.168.1.114:5000/distcc:k8s-2021-06-02
name: master
resources:
requests:
cpu: 4
limits:
cpu: 4
restartPolicy: OnFailure
- replicas: 10
name: worker
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- image: 192.168.1.114:5000/distcc:k8s-2021-06-02
name: worker
restartPolicy: OnFailure
schedulerName -> 该job使用的调度器,默认为: volcano
minAvailable -> 表示该job所要运行的最少pod数量
volumes -> 表示该job的挂载卷,遵从k8s volumes配置要求
tasks.replicas -> 表示某个task pod的具体副本数
tasks.template -> 表示某个task pod的具体配置定义
tasks.policies -> 表示某个task pod的生命周期策略
plugins -> 该job在调度过程中使用的插件
queue -> 该job所属的队列
分布式编译
创建 distcc queue,采用distcc来分布式编译MPICH,我们需要一个master用于启动distcc编译,一些distcc worker节点用于接受编译请求
# Dockerfile
FROM ubuntu:20.04
LABEL maintainer="[email protected]"
RUN apt-get -y update && apt-get -y upgrade
RUN apt-get install -y g++ gcc clang distcc build-essential
RUN apt-get -y -q autoremove && apt-get -y -q clean
EXPOSE 3632
CMD distccd --jobs $(nproc) --log-stderr --no-detach --daemon --allow 10.0.0.0/8 --log-level info
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: distcc
spec:
weight: 1
创建Job
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: distcc
labels:
"volcano.sh/job-type": "COMPILER"
spec:
minAvailable: 6
schedulerName: volcano
queue: distcc
plugins:
svc: []
env: []
policies:
- event: PodEvicted
action: RestartJob
volumes:
- mountPath: "/src"
volumeClaimName: distcc-data
#volumeClaim:
# accessModes: ["ReadWriteOnce"]
# storageClassName: "managed-nfs-storage"
# resources:
# requests:
# storage: 1Gi
tasks:
- replicas: 1
name: master
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
cd /tmp;
echo "start...";
cp -v /src/mpich-3.3.2.tar.gz ./
tar xf mpich-3.3.2.tar.gz;
cd mpich-3.3.2;
export DISTCC_HOSTS="$(cat /etc/volcano/worker.host | tr '\n' ' ')";
CC=distcc CXX=distcc ./configure --disable-fortran;
make -j50;
mkdir -pv /src/mpich;
make install DESTDIR=/src/mpich;
image: 192.168.1.114:5000/distcc:k8s-2021-06-02
name: master
resources:
requests:
cpu: 4
limits:
cpu: 4
restartPolicy: OnFailure
- replicas: 10
name: worker
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- image: 192.168.1.114:5000/distcc:k8s-2021-06-02
name: worker
restartPolicy: OnFailure
编译完成之后,我们可以在/src/mpich中看到安装好的二进制MPICh。