TensorFlow has added the Quantization Aware Training (QAT) API to its TensorFlow Model Optimization Toolkit. QAT allows developers to train and deploy models in a manner that leverages both quantization and accuracy. TensorFlow has been on a mission to develop smaller and faster Machine Learning models and QAT is its answer.
Quantization is the traditional approach to transforming machine learning models into equivalent representations. However, to achieve this representation at scale, the parameters used and computations are less accurate. Accordingly, quantization is lossy (i.e. information is lost). QAT looks to minimize this loss.
QAT interprets information loss during quantization as noise in the training process. The underlying algorithm minimizes this noise. In turn, the model learns parameters without including the noise, or loss of information.
The TensorFlow team has run tests to see how QAT performs to traditional models. The positive results can be seen in a blog post announcement. API access to QAT offers a flexible way to quantize TensorFlow models. This brings "quantization awareness" across machine learning models.