Science

DeepMind UCL Deep Learning Online Course 1 Summary

Intro to Machine Learning & AI course address Course content: Solving intelligence Alpha Go & Alpha Zero Learning to play capture the flag Folding protein with AlphaFold Overview of lecture series Regarding the understanding of intelligence, there is a formula mentioned in the course The source of the formula is from the paper A Definition of Machine Intelligence Alpha Go and Alpha Zero use reinforcement learning to train themselves. An untrained neural network plays with itself to determine processing and adjust for errors, and then adjusts its own parameters to win more rewards through constant self-play. ...

RaspberryPis Tensorflow 分布式训练1

为什么目前自己采购了35块树莓派4core4G用来计算Rosetta@home，等到COVID-2019过去之后，会用这些开发板测试TF项目，因为TF的生态链齐全，所以在工业上会考虑使用TF来做为最终的产品技术使用方案。未来的趋势会面向边缘计算领域，像自动驾驶，智能家居，家庭医疗辅助系统，农业生产，制造业零部件质量检测，工业机械磨损检测等等，都会考虑到数据的实时接受和传输，还有计算成本，在机器学习上，如果依靠云计算平台，去辅助上述这些项目，那么就需要考虑本地到服务中心的网络，带宽延迟，数据安全性，计算实时性的问题。如果依托边缘计算，采用工业的微控制器，在微控制器上部署模型以及Tensorflow Lite用于模型的推演，在本地解决计算高可用问题，无需将数据传递到公网上，减少带宽的消耗，从而降低计算成本。 Tensorflow aarch64 源码构建安装依赖 apt-get install libatlas3-base libopenblas-dev libopenblas-base libblas-dev gcc gfortran python3-dev libgfortran5 g++ libhdf5-dev libfreetype-dev build-essential openjdk-11-jdk zip unzip python3-h5py python3-numpy python3-pip sudo pip3 install keras_preprocessing keras_applications 安装 Bazel install-compile-bootstrap-unix bazel 官方未给出arm64架构的二进制文件，所以需要自己手工编译下载 bazel-2.0.0-dist.zip 运行 EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh 拷贝 output/bazel 到 /usr/local/bin/hazel 编译 Tensorflow Raspberry pi 4B 上安装的系统是Ubuntu 20.04 ARM64架构，Python Version: 3.8，Tensorflow官方未给出对应版本的python whl安装包，所以需要自己手动从源码构建 git clone https://github.com/tensorflow/tensorflow.git git checkout v2.2.0 ./configure 配置选项编译操作之前需要增加swap分区，4G系统内存编译是完全不够的，建议swap设置6G 最好能够单独增加一块USB3转SATA的移动硬盘用来单独增加SWAP分区 fallocate -l 6G /swapfile chmod 0600 /swapfile mkswap /swapfile swapon /swapfile 执行编译操作 bazel build --config=noaws --config=nogcp --config=nohdfs --config=nonccl --config=monolithic --config=v2 --local_cpu_resources=3 //tensorflow/tools/pip_package:build_pip_package 由于是直接在4核4G的Raspberry pi 上构建，所以需要耐心等待，编译时间大概在 15 - 25 小时之间 ☕️☕️☕️ 编译完成之后执行构建pip安装包 bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg tensorflow-2.2.0-cp38-cp38-linux_aarch64.whl 安装包构建完成最后执行 pip install tensorflow-2.2.0-cp38-cp38-linux_aarch64.whl TF 测试使用官方的最简MNIST教程中的代码 ...

蛋白质折叠基础 - 1

蛋白质的组成蛋白质基本上就是一部分子机器，对生命的支撑起着关键作用。蛋白质属于高分子聚合物，是由氨基酸残基组成的多肽链。蛋白质在工作过程中，它的三维结构即多肽链的结构是以严格确定的方式折叠起来的。蛋白质折叠结构中分为一级 -> 四级结构组成多肽链的氨基酸序列决定了蛋白质的三维结构，即蛋白质的一级结构决定了蛋白质的三维结构蛋白质的生产过程蛋白质的基本组成单位 - 氨基酸氨基酸的维基百科基本的20种氨基酸名称结构和表达式: 氨基酸的基本结构: 可以分为氨基(NH2)，羧基(CO2H)，侧链(R) 蛋白质构象 Rosetta 蛋白质折叠

人工神经网络的反向传播-梯度下降算法

关于人工神经网络人工神经网络模型判断手写数字我们首先在Mathematica中徒手打造一个人工神经网络，然后对该神经网络进行训练，最后用训练得到的模型来判断图片上的手写数字是多少。 trainingData = ResourceData["MNIST", "TrainingData"]; testData = ResourceData["MNIST", "TestData"]; net = NetInitialize@NetChain[{ FlattenLayer[], LinearLayer[500], ElementwiseLayer["Sigmoid"], LinearLayer[100], ElementwiseLayer["Sigmoid"], LinearLayer[10], SoftmaxLayer[]}, "Input" -> NetEncoder[{"Image", {12, 12}, "ColorSpace" -> "Grayscale"}], "Output" -> NetDecoder[{"Class", Range[0, 9]}]] NetEncoder告诉我们输入的单通道12*12的图像 NetDecoder告诉我们我们的神经网络模型最后输出的是一个分类，类别分别是0-9中的一个数字网络层的模型为： FlattenLayer[] 图像中的像素数据即两纬数据排列成一纬数据做为输入 LinearLayer[500] 第一层线性层，输出数据为500个，限定函数为Sigmoid LinearLayer[100] 第二层线性层，输出数据为100个，限定函数为Sigmoid LinearLayer[10] 最后一层输出，因为我们需要有10个分类，所以这里设定是10，限定函数为 Softmax 神经网络层形象的展示如下: 训练数据标记，左边是实际的手写数字，右边是对应标签 RandomSample[trainingData, 10] 在未对模型进行训练前，使用随机的权重来判断图片，我们可以确定错误率接近100% 对模型进行训练 trained = NetTrain[net, trainingData, ValidationSet -> testData, MaxTrainingRounds -> 50 ] 从以上图表中我们大致可以看出正确率已经接近97% 再次预测上述图片，模型已经可以正确预测给出的图片数字为6 measurements = ClassifierMeasurements[trained, testData] 运行 measurements[“Accuracy”] 我们可以得到模型的正确率为98.02% ...

Epidemic Modeling

SIR Modeling The SIR classical model is mainly used in the field of infectious diseases to predict future trends in the number of infections. S[t] denotes susceptible susceptible population I[t] denotes infected population already infected R[t] indicates recovered recovered population Reference: MathWorld: SIR Model Kermack-McKendrick Model Kermack-McKendrick Model The original Kermack-McKendrick model was designed to account for changes in the number of people infected over time, like the plague that occurred in 1665-1666 and the cholera that occurred in 1865. The model assumes that the total population is fixed, that the incubation period for infectious diseases is instantaneous, that the duration of infection is the same as the disease cycle, and that the population is assumed to be non-differentiable, without differences by gender or race. ...