@lijiang

Sculpting in time

Do one thing and do it well.
Every story has a beginning and an end.

Against CoVid-2019

Building rosetta protein folding computational cluster against CoVid-2019 disease using Raspberry Pi

14-Minute Read

For a short video please click here

2020-5-16

1

RaspberryPiImage

About distributed computing projects

I first encountered distributed computing projects during my high school graduation, and ran Boinc distributed computing platform on my laptop, the most famous of which is [seti@home](https://setiathome.berkeley .edu), the Boinc platform was developed for this project.

The idea of distributed computing is that the master server splits a task into several small tasks, assigns these small tasks to the clients for computing, and sends the results of the small tasks to the master server after the clients finish computing, and the master server aggregates the results of these computations to get the execution results of the task.

Since the cost of using supercomputers for computation is quite expensive and in some areas of predictive computing, it may end up of computation end in failure, it has been invented to use the computational resources of individual idle computers and connect these resources are connected into a distributed grid system, and the access to computational resources is all done in this grid system.This has led to a number of distributed computing projects.

About Rosetta Project

rosetta@home This project is mainly used to predict the 3D folding structure of proteins. According to the official website, when Rosetta predicts the shape of a specific protein, it is actually is looking for the folding with the lowest energy, which is interesting because it can be related to calculations about energy in mathematical physics, and it seems that nature itself is all about stabilizing life and species reproduction with a minimal conservation of energy and at the lowest > energy cost. Because of the sudden curiosity about this protein folding computation and also the concern that DeepMind developed [AlphaFold](https://deepmind.com/blog/article/AlphaFold-Using-AI-for-scientific- discovery) uses artificial intelligence to predict the three-dimensional structure of proteins, so I bought two books on life sciences to study and follow up with slightly more specialized I bought two books on life sciences to conduct research and follow up with a slightly more professional-oriented article.

To quote a phrase from the original English

proteins are truly amazing machines: before they do their work, they assemble themselves.

Proteins are truly amazing machines: before they do their work, they assemble themselves into a particular shape.

Returning to Rosetta’s calculations, the calculations that predict the folding structure of proteins:

  1. the program will start with a chain without any folding
  2. move a part of the chain to produce a new new shape and calculate the energy under that shape
  3. accept or reject the move based on the change in energy
  4. repeat the process from 2 to 3 until each part of the chain has been moved a sufficient number of times

Finally the program gets a structure that is the lowest energy shape predicted for the current sequence of chains. Since the prediction process uses randomly moving parts of the chain, the computation is quite large. In the distributed counting > calculation process, the client finishes calculating the lowest energy state of each chain and then sends its result to the server, and then the researcher compares all the lowest energy state structures collected from the client for that chain to get the lowest > energy state, which also completes the prediction of that protein fold.

The first book is “Structural Biology: From Atoms to Life”, and the original title is “Textbook of Structural Biology”.

Structural Biology From Atoms to Life

Textbook of Structural Biology

The second book Protein Physics

The protein folding project Folding@home, which is similar to Rosetta, is also a distributed computing project, the difference between the two is that Rosetta aims at > predicting the final structure of a protein, **Folding@home **is the process of calculating how a protein folds. The difference between the two is that I still need to read the relevant professional books in the field of biology before I can make a specific analysis, and a related article will be published later.

Building a Raspberry Pi Distributed Computing Cluster

Raspberry Pi RPI logo

A total of 12 ARM development boards are currently joined to Rosetta computing, two of which are Jetson Nano,> because the current area of focus in Artificial Intelligence, I would like to understand more about the biological domain, because the original artificial neural network ANN is abstracted from the neuronal computational model of the brain.

The current cluster management and application orchestration is kubernetes tool, the operating system is installed by Ubuntu Server Raspberry Pi, and the k8s implementation is ** MicroK8s**, and we will consider using k3s to manage more nodes, such as expanding the node to 20 Raspberry Pi.

Currently it has been updated to 20 RPI nodes and the cluster management system uses k3s. These small machines will be used in distributed learning projects for Tensorflow, MXNet after finishing the Rosetta project.

The current cluster architecture is as follows, 3 * (ETCD + Master) + HAProxy + 17 * Worker

ARM-Cluster

1,jpg

2,jpg

3.jpg

Boinc is deployed on the nodes of each development board in DaemonSet mode.

boinc-ds.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: boinc
  namespace: science
  labels:
    app: boinc
spec:
  selector:
    matchLabels:
      app: boinc
  template:
    metadata:
      labels:
        app: boinc
    spec:
      hostNetwork: true
      containers:
      - name: boinc
        image: localhost:32000/boinc/client:arm64v8
        env:
          - name: BOINC_CMD_LINE_OPTIONS
            value: "--allow_remote_gui_rpc"
        ports:
          - containerPort: 31416
            name: app-port
        volumeMounts:
          - name: boinc-database
            mountPath: /var/lib/boinc
      volumes:
        - name: boinc-database
          hostPath:
            path: /var/lib/boinc-k8s
            type: DirectoryOrCreate

ARM64v8 Docker Image

Dockerfile.arm64v8

Boinc Docker Github

FROM ubuntu:20.04

LABEL maintainer="BOINC" \
      description="A lightweight BOINC client on ARMv8 64-bit architecture."

# Global environment settings
ENV BOINC_GUI_RPC_PASSWORD="123" \
    BOINC_REMOTE_HOST="127.0.0.1" \
    BOINC_CMD_LINE_OPTIONS="" \
	DEBIAN_FRONTEND=noninteractive

# Copy files
COPY bin/ /usr/bin/

# Configure
WORKDIR /var/lib/boinc

# BOINC RPC port
EXPOSE 31416

# Install
RUN apt-get update && apt-get install -y --no-install-recommends \
# Install Time Zone Database
	tzdata \
# Install BOINC Client
    boinc-client && \
# Cleaning up
    apt-get autoremove -y && \
    rm -rf /var/lib/apt/lists/*

CMD ["start-boinc.sh"]

Monitoring system

The monitoring system uses prometheus + grafana + node-exporter to monitor the system data of each node

Rosetta project tasks are monitored using prometheus pushgateway, which pushes the data with tags to the pushgateway, where prometheus is responsible for crawling.

rosettaMonitor.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu May  7 16:04:55 2020
pip install prometheus_client
@author: alexchen
"""

#!/bin/env python3
from prometheus_client import CollectorRegistry, Gauge, pushadd_to_gateway
from subprocess import check_output
import time


class rosettaTask():
    def __init__(self,servers=None):
        self.projinfo = {}
        self.finishedinfo = {}
        self.projrunninginfo = {}
        self.servers = servers
        self.pushgateway = "http://192.168.1.188:9091"

    def get_task_info(self,server):
    	#Grab information about the task in question with a shell command
        projcmd = "boinccmd  --host %s --passwd 123 --get_tasks | grep  WU | awk -F ':' '{print $2}'" % server
        statcmd = "boinccmd  --host %s --passwd 123 --get_tasks | grep scheduler  | awk -F ':' '{print $2}'" % server
        fractionstatcmd = "boinccmd  --host %s --passwd 123 --get_tasks | grep  fraction | awk -F ':' '{print $2}'" % server
	
        projs = check_output(projcmd,shell=True)
        status  = check_output(statcmd, shell=True)
        fractionstat = check_output(fractionstatcmd, shell=True)
        projs = projs.decode('utf-8').replace(' ','').split('\n')
        status = status.decode('utf-8').replace(' ','').split('\n')
        fractionstat = fractionstat.decode('utf-8').replace(' ','').split('\n')
        temp = []
        for i in range(len(projs)):
            if status[i] != '':
                self.projinfo[projs[i]] = status[i]
            if status[i] != "uninitialized" and status[i] != '':
	       #Removing unrunning tasks
               temp.append(projs[i])
        for i in range(len(temp)):
            self.projrunninginfo[temp[i]] = fractionstat[i]


    def runningTaskMonitor(self):
        registry = CollectorRegistry()
	#Add two types of metric (rosetta_tasks and rosetta_jobnum) for task information and number of tasks
        g = Gauge("rosetta_tasks",'rosetta tasks monitor',['task','server'],registry=registry)
        h = Gauge("rosetta_jobnum", 'rosetta job numbers',['server'], registry=registry)
        for i in self.servers:
	    #Initialization of the task information list
            self.projinfo = {}
            self.finishedinfo = {}
            self.projrunninginfo = {}

	    #Get the task information from get_task_info and assign it to the information list
            self.get_task_info(i)
            print(i)
            print(self.projrunninginfo)
            for key in self.projrunninginfo:
                task = key
                fractionstat = self.projrunninginfo[key]
                server = i
                print(task)
                print(fractionstat)
		#metric adds task server tag Statistics of running tasks
                g.labels(task, server).set(fractionstat)
            print("job nums")
            print(len(self.projrunninginfo))
	    #Add server tag to count the number of running tasks
            h.labels(i).set(len(self.projrunninginfo))
        pushadd_to_gateway(self.pushgateway,job='rosettainfo',registry=registry,timeout=20)



if __name__ == "__main__":
            servers = [
                        "192.168.1.108",
                        "192.168.1.112",
                        "192.168.1.113",
                        "192.168.1.114",
                        "192.168.1.115",
                        "192.168.1.116",
                        "192.168.1.117",
                        "192.168.1.120",
                        "192.168.1.121",
                        "192.168.1.122",
                        "192.168.1.123",
                        "192.168.1.124",
                        "192.168.1.125",
                        "192.168.1.126",
                        "192.168.1.127",
                        "192.168.1.129",
                        "192.168.1.130",
                        "192.168.1.131",
                        "192.168.1.132",
                        "192.168.1.107",
                        "192.168.1.118"]
            while True:
                task = rosettaTask(servers=servers)
                task.runningTaskMonitor()
                print("loop...")
                time.sleep(300)

Monitoring Chart Presentation

Monitor the number of running tasks and the progress of each node

rosettatasks1

Monitor the IO, SWAP, etc. usage of each node, system load, and Raspberry Pi temperature monitoring, and send alarms when the CPU temperature exceeds 60 degrees Celsius.

rosettaservers1

rosettaservers2

Node UPS Backup Power Supply

In order to prevent the loss of Rosetta mission data due to power failure I purchased a APC Schneider UPS BK650-CH model, the main reason why I chose this UPS is because of the moderate price and it is equipped with a data transfer interface. The UPS can be connected to the Raspberry Pi and managed by the apcupsd backend program, and apcupsd can also be clustered, with one master configured and all other slave nodes connected to the master to get the power > pool status to respond after a power outage.

3.jpg

master apcupsd.conf configuration (master and UPS are connected via data cable):

UPSNAME rosetta
UPSCABLE usb
UPSTYPE usb
DEVICE
POLLTIME 10
LOCKFILE /var/lock
SCRIPTDIR /etc/apcupsd
PWRFAILDIR /etc/apcupsd
NOLOGINDIR /etc
ONBATTERYDELAY 6
BATTERYLEVEL 15
MINUTES 3
TIMEOUT 0
ANNOY 300
ANNOYDELAY 160
NOLOGON disable
KILLDELAY 60
NETSERVER on
NISIP 0.0.0.0
NISPORT 3551
EVENTSFILE /var/log/apcupsd.events
EVENTSFILEMAX 10
UPSCLASS standalone
UPSMODE disable
STATTIME 0
STATFILE /var/log/apcupsd.status
LOGSTATS off
DATATIME 0

slave apcupsd.conf config:

UPSNAME rosetta
UPSCABLE ether
UPSTYPE net
DEVICE 192.168.1.106 ## master ip addr ##
POLLTIME 10
LOCKFILE /var/lock
SCRIPTDIR /etc/apcupsd
PWRFAILDIR /etc/apcupsd
NOLOGINDIR /etc
ONBATTERYDELAY 6
BATTERYLEVEL 25
MINUTES 4
TIMEOUT 0
ANNOY 300
ANNOYDELAY 30
NOLOGON disable
KILLDELAY 10
NETSERVER on
NISIP 0.0.0.0
NISPORT 3551
EVENTSFILE /var/log/apcupsd.events
EVENTSFILEMAX 10
UPSCLASS standalone
UPSMODE disable
STATTIME 0
STATFILE /var/log/apcupsd.status
LOGSTATS off
DATATIME 0

To prevent rosetta’s task data from being lost, the Slave node starts the shutdown scheduling task to stop the node’s boinc task when the battery has 25% power left or when there are 4 minutes of power left in the support, and then the node enters the shutdown state. The node enters the shutdown state. The event response tasks for configuring apcupsd are located in the /etc/apcupsd/apccontrol file.

UPS Monitoring Chart

UPS_Stat_board

About the design of proteins

Rosetta@home is a computational project initiated by the University of Washington, and the latest news about Rosetta against Covid-2019 can be learned from their official website, the following content and understanding from their official website, interested friends can directly reach official website to read more detailed information.

Rosetta’s role in fighting coronavirus

Rosetta’s role in the fight against Covid-2019

We are happy to report that the Rosetta molecular modeling suite was recently used to accurately predict the atomic-scale structure of an important coronavirus protein weeks before it could be measured in the lab. Knowledge gained from studying this viral protein is now being used to guide the design of novel vaccines and antiviral drugs.

Before being precisely determined in the laboratory, Rosetta was able to model at the molecular level to accurately predict the arrangement of coronavirus proteins at the atomic scale, and by analyzing the proteins carried by the virus, it can better help us to develop vaccines and antiviral drugs. By analyzing the proteins carried by the virus, we can better assist in the development of vaccines and antiviral drugs.

Importantly, structural biologists are quickly gaining insights into what the proteins that make up this virus look like and how they function.

Structural biologists are rapidly discovering the structure of proteins in coronaviruses and the function of these proteins.

One viral protein in particular — the spike protein — allows SARS-CoV-2 to fuse its membrane with those on human cells, leading to infection. Researchers at UT Austin this week used cryo-electron microscopy to create the first 3D atomic-scale map of the SARS-CoV-2 spike protein in its prefusion state

A specific protein in the virus - spinosin - allows the virus to invade human cells to infect them, and the researchers have created the three-dimensional structure of the spinosin in the pre-fused state by cryo-electron microscopy. Rosetta’s prediction of the optimal three-dimensional structure of the spinosin protein is shown in the figure, which matches the three-dimensional structure observed in the laboratory under cryomicroscopy. 2020-02-24-at-9.15.03-AM-768x652.png

images from www.ipd.uw.edu

Coronavirus spike proteins — like the proteins found in your body — ‘fold up’ in order to function.

Just like the proteins inside your body, the proteins of the coronavirus must first be folded in order to serve a specific purpose.

Robetta is an online service platform for academic research based on Rosetta to accurately predict protein structures, and back in February 2020, the research team had already predicted the 3D structure of the spinosin protein. As early as February 2020, the research team had predicted the 3D structure of spinosin proteins, and the predicted results were very close to those obtained later in the laboratory.

Three-dimensional structure of spinosin

spike-protein-structure

References:

  1. Breakthrough in Coronavirus Research Results in New Map to Support Vaccine Design
  2. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation
  3. Massively parallel de novo protein design for targeted therapeutics

With this knowledge in hand, researchers at the Institute for Protein Design are now working to create new proteins to neutralize coronavirus. If successful, these antiviral proteins would stick to the SARS-CoV-2 spike protein and thereby prevent viral particles from infecting healthy cells

Researchers at the Protein Design Institute are creating a new protein that neutralizes coronaviruses, and if the study is successful, the antiviral protein will attach to the SARS-CoV-2 stinger protein to prevent the virus from entering > normal cells.

These new drug candidates — a type of molecule we call ‘mini-protein binders’ — seek to combine the specificity of antibodies with the high stability and manufacturability of small molecule drugs. Mini-protein binders are custom-designed on the computer to adhere only to specific targets, such as specific grooves on the SARS-CoV-2 spike protein.

A type of drug we call mini-protein binders is obtained by combining specific antibodies with small molecular drugs that are highly stable and manufacturable. mini-protein binders are computed and simulated on a computer to attach to the surface of specific proteins, such as SARS-CoV-2 stinger proteins. The miniprotein binders are computed and simulated on the computer to attach to the surface of specific proteins, such as the SARS-CoV-2 stinger protein. The figure below shows the design of the miniprotein binder (pink) binding to the spike-in protein. Details can be found in [Building 20,000 new drug candidates](https://www.bakerlab.org/index.php/2017/09/28/new-article-nature-designing- testing-20,000-new-protein-drug-candidates/)

Screen-Shot-2020-02-24-at-9.18.53-AM-768x628 image from www.ipd.uw.edu

Our researchers are now designing on the computer tens of thousands of anti-coronavirus mini-protein binders. In the coming weeks we hope to produce these mini-proteins in the lab and measure their ability to bind to spike protein. Following this, much more laboratory testing would still be needed to evaluate the safety and efficacy of these experimental coronavirus drugs.

Researchers have now designed thousands of antiviral miniprotein binders, and in the next few weeks of experiments we hope to be able to make these proteins that bind the stinging proteins in the laboratory. More > testing is needed to determine the safety and efficacy of these experimental drugs.

Designing coronavirus vaccines

Designing a vaccine for coronavirus

Screen-Shot-2020-02-24-at-9.22.24-AM-400x398.png

image from www.ipd.uw.edu

This experimental SARS-CoV-2 vaccine was made by fusing multiple copies of the coronavirus spike protein (red) to the outside of a designed protein nanoparticle (orange and gray).

The experimental SARS-CoV-2 vaccine was developed by binding multiple copies of coronavirus spike-in proteins to the exterior of designed protein nanoparticles.

Volunteers rally to Rosetta@Home to stop COVID-19

Volunteers unite to fight COVID-19

Original article address

Frost ScienceContributing computer resources to the fight against the 2019 coronavirus。

Frost-Planetarium-serversimage from Frost Science

RosettaCommons organization

RosettaCommons](https://www.rosettacommons.org) is primarily an organization that develops the Rosetta protein modeling program, which is used in scientific and commercial projects. For more information, see: With an emergency meeting, RosettaCommons aims to accelerate COVID-19 research

Computer Generated Vaccines

Rosetta is a powerful tool for rapid vaccine development that has been shown to automate the precise design of custom immunogens.

Paper 1 Paper 2

Design of antiviral drugs

The Fleichman lab at the Weizmann Institute of Science in Israel is working to automate the design of certain antivirals. Graduate student Jonathan Weinstein shared an update on their efforts to automatically design anti-coronavirus nanobodies. These natural proteins resemble antibodies, but are much smaller, potentially making them easier and cheaper to produce.

Researchers have already automated the design of nanomedicines against coronaviruses, natural proteins that resemble antibodies but are very small, which makes them easy and inexpensive to manufacture.

Using Machine Learning

AlphaFold, developed by DeepMind, is based on machine learning to predict protein structure, related articles to read.

In a later article I will focus on articles that explore knowledge about predicting protein structure in machine learning.

Conclusion

If you are interested in rosetta@home and want to make a small contribution, you can download Boinc, and select rosetta@home in the add project, create Then Boinc will load the main Rosetta program according to your configuration, download the resource package, and start its computing journey.

Attached is a picture of the Rosetta@home project

1

2

3

Recent Posts

Categories

About

Keep thinking, Stay curious
Always be sensitive to new things