@lijiang

Sculpting in time

Do one thing and do it well.
Every story has a beginning and an end.

Tensorflow Lite quantification

Optimization of Models

1-Minute Read

TFlite Composition

  1. tensorflow lite interpreter
  • Derivation on deployed hardware, which can include cell phones, microcontrollers, embedded devices.
  1. Tensorflow Lite Converter
  • Convert models to make them smaller and reason faster.

TFlite Converter

Tensorflow Converter can convert models to FlatBuffers format. FlatBuffers is a cross-platform serialization tool where structured data is stored in binary form and can be used in the following ways The performance on the microcontroller is memory saving.

Python Keras Converter:

converter = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model = converter.convert()

with tf.io.gfile.GFile('mode1.tflite','wb') as f:
    f.write(tflite_model)

Command line Converter:

tflite_convert --saved_model_dir=$modelDir --output_file=mode1.tflite

TFlite model quantification

Quantizing models for CPU model size

The quantified weights are reduced from 32 bits to 8 bits, which speeds up the inference time.

Python:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quantize_model1 = converter.convert()
with tf.io.gfile.GFile('mode1-default-quant.tflite','wb') as f:
    f.write(tflite_quantize_model1)

Full integer quantization of weights and activations

Restricting all calculations to the set of integers further reduces the size of the model, and speeds up the model derivation.

The figure below shows the efficiency of different quantization methods for the same model.

Recent Posts

Categories

About

Keep thinking, Stay curious
Always be sensitive to new things