Tensorflow Lite quantification
Optimization of Models
TFlite Composition
- tensorflow lite interpreter
- Derivation on deployed hardware, which can include cell phones, microcontrollers, embedded devices.
- Tensorflow Lite Converter
- Convert models to make them smaller and reason faster.
TFlite Converter
Tensorflow Converter can convert models to FlatBuffers format. FlatBuffers is a cross-platform serialization tool where structured data is stored in binary form and can be used in the following ways The performance on the microcontroller is memory saving.
Python Keras Converter:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with tf.io.gfile.GFile('mode1.tflite','wb') as f:
f.write(tflite_model)
Command line Converter:
tflite_convert --saved_model_dir=$modelDir --output_file=mode1.tflite
TFlite model quantification
Quantizing models for CPU model size
The quantified weights are reduced from 32 bits to 8 bits, which speeds up the inference time.
Python:
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quantize_model1 = converter.convert()
with tf.io.gfile.GFile('mode1-default-quant.tflite','wb') as f:
f.write(tflite_quantize_model1)
Full integer quantization of weights and activations
Restricting all calculations to the set of integers further reduces the size of the model, and speeds up the model derivation.
The figure below shows the efficiency of different quantization methods for the same model.