Blog

DevOps for ML Part 2: Testing model accuracy with LEIP evaluate

By kimberly.eng@latentai.com - 08/24/2023

Part 1: Optimizing Your Model with LEIP Optimize

The Latent AI Efficient Inference Platform (LEIP) SDK creates dedicated DevOps processes for ML. With LEIP, you can produce secure models optimized for memory, power, and compute that can be delivered as an executable ready to deploy at scale. But how does it work? How do you actually go from design to deployment?

The previous post in our ML Lifecycle series walked through the first step of producing a model with LEIP Optimize, one of the LEIP SDK’s core modules. It optimizes models for memory, power, and compute while also compiling them into an executable package for easy deployment. However, models must also be evaluated for accuracy and authenticity after they have been trained, exported, and optimized.

LEIP Evaluate

LEIP Evaluate allows you to verify the accuracy of a model on your desired target devices. Whether you are developing your model on-premise or in the cloud, LEIP Evaluate lets you run an evaluation pipeline in your SDK container, while performing remote inference on any number of remote edge devices.

LEIP Evaluate can assess the accuracy of a compiled and optimized model across several stages in the LEIP tool chain including before and after running LEIP Compile and LEIP Optimize using the the following evaluation methods:

Test and Evaluation (T&E) – When the model is being developed/trained.
Verification and Validation (V&V) –When the model is being updated/refined during deployment.

T&E builds trust and V&V maintains trust in the deployed AI. Trust must be established for both developer and operator because it is important to establish the data and model provenance.

LEIP Evaluate takes in two inputs to begin the process. The first input is a path to a model folder or a file. The second input is a path to a testset file. The model is loaded into the specified runtime and performs inference of the input files specified in the testset. This results in LEIP Evaluate generating two outputs: the accuracy metric along with information about the number of inferences per second. The accuracy metric shows how close the evaluated model is to the original uncompiled model. The number of inferences per second measures how many predictions a machine learning model can make in one second. Both the accuracy metric and the number of inferences per second are indicators of the performance and efficiency of a model.

LEIP Evaluate may be used entirely within the SDK container or may be used to perform evaluation using remote inference. While evaluation within the SDK is limited to testing against the hardware architecture of the host computer, remote inference allows a user to evaluate against the hardware architecture of the intended remote edge device. For remote inference, a process called the Latent AI Object Runner (LOR) is installed on the device under test. The compiled and optimized model will be passed directly to LEIP Evaluate when the function is executed, and LEIP Evaluate will communicate with the LOR to copy over the model, send samples for inference, and receive the results for post processing and scoring.

LEIP is an end-to-end SDK to train, optimize, and deploy secure NN runtimes all within the same workflow. NN runtimes can be more efficient and robust when optimization and security are implemented. In addition, NN runtimes can be integrated with security protocols that protect against reverse engineering.

Now that we’ve introduced LEIP Optimize and LEIP Evaluate, the next module in our series is LEIP Pipeline, which allows a user to perform tasks such as optimization, evaluation and more sequentially.

For Documentation of our LEIP SDK User Guide and LEIP Recipes, visit our Resource Center. For more information, contact us at info@latentai.com.

Introducing Latent Agent: Your new partner in edge AI development

Latent AI reduces model update time from months to days

Edge AI for mission-critical operations: What defense, law enforcement, and first responders have in common

Latent Linguist: True, real-time translation that works entirely offline

Platform Overview

DevOps for ML Part 2: Testing model accuracy with LEIP evaluate

LEIP Evaluate

Edge AI for mission-critical operations: What defense, law enforcement, and first responders have in common

What to expect at AFCEA West 2026: Latent AI delivers interoperable AI for unmatched advantage

Latent AI and Sigma Defense unite to conquer the last mile in tactical edge AI

Edge AI for mission-critical operations: What defense, law enforcement, and first responders have in common

DevOps for ML Part 2: Testing model accuracy with LEIP evaluate

LEIP Evaluate

Gain technical and industry insights from our team.

Edge AI for mission-critical operations: What defense, law enforcement, and first responders have in common

What to expect at AFCEA West 2026: Latent AI delivers interoperable AI for unmatched advantage

Latent AI and Sigma Defense unite to conquer the last mile in tactical edge AI