Tensorflow Lite quantification

TFlite Composition

tensorflow lite interpreter

Derivation on deployed hardware, which can include cell phones, microcontrollers, embedded devices.

Tensorflow Lite Converter

Convert models to make them smaller and reason faster.

TFlite Converter

Tensorflow Converter can convert models to FlatBuffers format. FlatBuffers is a cross-platform serialization tool where structured data is stored in binary form and can be used in the following ways The performance on the microcontroller is memory saving.

Python Keras Converter:

converter = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model = converter.convert()

with tf.io.gfile.GFile('mode1.tflite','wb') as f:
    f.write(tflite_model)

Command line Converter:

tflite_convert --saved_model_dir=$modelDir --output_file=mode1.tflite

TFlite model quantification

Quantizing models for CPU model size

The quantified weights are reduced from 32 bits to 8 bits, which speeds up the inference time.

Python:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quantize_model1 = converter.convert()
with tf.io.gfile.GFile('mode1-default-quant.tflite','wb') as f:
    f.write(tflite_quantize_model1)

Full integer quantization of weights and activations

Restricting all calculations to the set of integers further reduces the size of the model, and speeds up the model derivation.

The figure below shows the efficiency of different quantization methods for the same model.

TFlite Composition#

TFlite Converter#

TFlite model quantification#

Quantizing models for CPU model size#

Full integer quantization of weights and activations#

TFlite Composition

TFlite Converter

TFlite model quantification

Quantizing models for CPU model size

Full integer quantization of weights and activations