Sculpting in time

Do one thing and do it well.
Every story has a beginning and an end.


Tensorflow MPI-based Distributed Training

6-Minute Read

I have written an article about how to build a distributed training cluster on Raspberry Pie 4 using the distributed training system that comes with TF2.0. However, there is a drawback: we need to start the training program at each node, and the distributed training will only work after all the nodes are started. MPI is mainly used in the field of supercomputing. Building MPI cluster on Raspberry, firstly, it can be used to learn distributed computing on supercomputing, and secondly, it can…

3-Minute Read

Continuing from the previous article on NFS file systems, the current requirement is to be able to control the applications running in the cluster. For example, if we need to run parallel cross-node program A, we need to schedule and stop the running program B. This requires building a messaging system that can perform operational tasks based on message drivers.

3-Minute Read


Recent Posts



Keep thinking, Stay curious
Always be sensitive to new things