Tensorflow Lite对模型的量化

TFlite 组成

Tensorflow Lite interpreter

在部署的硬件上进行推演，硬件可以包括，手机，微控制器，嵌入式设备。

Tensorflow Lite Converter

转换模型，使其容量更小，推理速度更快。

TFlite Converter

Tensorflow Converter 可以将模型转化成FlatBuffers格式，FlatBuffers是一款跨平台序列化工具，结构化数据都以二进制形式进行存储，在微控制器上的表现就是节省内存，

Python Keras Converter:

converter = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model = converter.convert()

with tf.io.gfile.GFile('mode1.tflite','wb') as f:
    f.write(tflite_model)

Command line Converter:

tflite_convert --saved_model_dir=$modelDir --output_file=mode1.tflite

TFlite 模型量化

Quantizing models for CPU model size

量化权重从原先的32bits降到8bits，可以加快推理的时间。

Python:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quantize_model1 = converter.convert()
with tf.io.gfile.GFile('mode1-default-quant.tflite','wb') as f:
    f.write(tflite_quantize_model1)

Full integer quantization of weights and activations

将所有计算全部限定在整数集，进一步缩减模型的大小，和加快模型的推演速度。

下图列出了不同的量化方式下对于同一个模型所产生的效率。

TFlite 组成#

TFlite Converter#

TFlite 模型量化#

Quantizing models for CPU model size#

Full integer quantization of weights and activations#

TFlite 组成

TFlite Converter

TFlite 模型量化

Quantizing models for CPU model size

Full integer quantization of weights and activations