Learn how LEIP accelerates your time to deployment Schedule a demo

 

DevOps for ML Part 1: Boosting Model Performance with LEIP Optimize

by Richard McCarthy | Posted Aug 04, 2023
Share
Latent AI Efficient Inference Platform (LEIP) Build, Optimize, Deploy

The Latent AI Efficient Inference Platform (LEIP) creates specialized DevOps processes for machine learning (ML) that produce ultra-efficient optimized models ready for scalable deployment as executable files.

But how does it work? How does AI actually go from development to deployment to a device? In this series of blog posts, we’ll walk you through the ML life cycle and show you how LEIP can take you to market faster. 

LEIP Optimize

Boosting model performance for the edge begins with LEIP Optimize, one of the core modules of LEIP. It is the first step in the LEIP software development kit’s (SDK) end-to-end workflow. LEIP Optimize allows an ML developer to utilize their pre-trained neural network models to set up an “optimized” model that will be quantized to their specified values and observable quantities. This newly optimized model will be in the form of a Latent AI Runtime Environment (LRE) object quantized to include the developer’s specifications. The optimized model will also contain executable code that will function and operate with their target hardware.

There are two phases to LEIP Optimize – LEIP Compress and LEIP Compile

LEIP Compress

To understand how LEIP Compress functions, we have to have an understanding of neural networks. In its broadest meaning, a neural network is a technology built to simulate the activity of the human brain – specifically, pattern recognition and the passage of input through various layers of simulated neural connections. Neural networks have an input layer, an output layer, and at least one hidden layer in between. The phrase “deep learning” is also used to describe neural networks, as deep learning represents a specific form of machine learning where technologies using aspects of artificial intelligence seek to classify and order information in ways that go beyond simple input/output protocols. These deep neural networks use a large number of specifications to perform their functions which may have a significant effect on their resources. That makes most models too big to work on edge devices. LEIP Compress provides quantization optimization to shrink the size of the model, therefore making it much easier to deploy the model to edge devices. The main idea of quantization and its algorithms is to analyze the distribution of the floating point values and provide a mapping to integer values while minimizing loss in overall accuracy.

There are two other types of optimizations that can be used with LEIP Compress: Tensor Splitting and Bias Correction. Tensors can be “split” to allow for a different and more ideal compression ratio. The Tensor Splitting algorithm provides a flow to automatically determine the layers whose tensors should be split. The Tensor Splitting optimization may take several minutes to complete depending on the size of the model.  

Bias Correction is another optimization that can be used with LEIP Compress. This is when an error is introduced for the outputs. The purpose of this is so that Bias Correction can calibrate the model to eliminate the error. This can significantly improve the model’s performance. 

Figure 1. Bias Correction. Regularly spaced coefficients (left) and interpolated bias field (right). (Illustration from Wikipedia)

LEIP Compile

After a model has been compressed, LEIP Compile takes the optimized model from the previous step (sometimes called a computational graph) as an input and creates a binary image that is a representation. This image is based on the target that is specified by the user. The binary image is a shared object file that can be loaded onto a runtime for its execution.

LEIP Compile can perform several optimizations by manipulating the computational graph that represents the neural network. However, this can use a lot of resources. So as a default, only standard optimizations are performed. 

LEIP Compile can generate binaries for multiple processors. Latent AI currently supports processors based on the x86, NVIDIA, and ARM architectures. The compiler is capable of generating binaries that support 32-bit floating point, 8-bit integer, and mixed types. The compiler will match the best data type depending on the hardware capabilities of the target architecture.

Runtime for Edge Inference: LEIP Optimize

Once a model is optimized, the next step will be to evaluate its accuracy throughout the different stages of the LEIP process. LEIP Evaluate provides the means to do this. 

In our next post, we’ll discuss LEIP Evaluate and how you can use it to evaluate the accuracy of your model in a consistent way across the entire ML development life cycle. 

 

For our Documentation of the LEIP SDK User Guide and LEIP Recipes, visit our Resource Center. For more information, contact us at info@latentai.com

Share
View All

Related