@lijiang

Sculpting in time

Do one thing and do it well.
Every story has a beginning and an end.

TF-MPI-Distributed-Training

Tensorflow MPI-based Distributed Training

6-Minute Read

I have written an article about how to build a distributed training cluster on Raspberry Pie 4 using the distributed training system that comes with TF2.0. However, there is a drawback: we need to start the training program at each node, and the distributed training will only work after all the nodes are started. MPI is mainly used in the field of supercomputing. Building MPI cluster on Raspberry, firstly, it can be used to learn distributed computing on supercomputing, and secondly, it can…

3-Minute Read

Continuing from the previous article on NFS file systems, the current requirement is to be able to control the applications running in the cluster. For example, if we need to run parallel cross-node program A, we need to schedule and stop the running program B. This requires building a messaging system that can perform operational tasks based on message drivers.

3-Minute Read

目前正学习如何在树莓派4集群上结合gromacs+mpich的分子动力学模拟,所以需要搭建一款分布式存储系统,又由于树莓派性能的限制,搭建OpenEBS会比较浪费计算资源,最后就采用轻量级的NFS来完成文件系统的共享和存储。

Recent Posts

Categories

About

Keep thinking, Stay curious
Always be sensitive to new things