At Enabled Intelligence (EI), we excel in taking raw, messy data and transforming it into actionable, decision-ready intelligence for government and commercial clients.
EI is often tasked with helping clients improve the reliability and accuracy of their AI models. Core to completing these tasks is developing a labeled training data set at near ground truth accuracy. Investing in data curation and annotation up front ultimately reduces the total cost of AI model development by improving model performance and user trust and reducing the number of reworks required from data science team. The following case study explains how we do this – and the value we can create for our customers.
Over a year ago, senior program managers from a US intelligence agency asked Enabled Intelligence to troubleshoot an underperforming object-detection model.
Their data science team was responsible for developing an AI model that could analyze raw video and still images from video and identify objects of interest to their staff in the field.
However, after spending six years and millions of dollars on the project, the AI model was unable to reliably identify these objects – which could have disastrous implications for the people on the ground that needed accurate intelligence. Enabled Intelligence was asked to help solve the problem.
The Results
Within five months, and at a cost of about 10% of what the Intelligence Community customer had spent to date, Enabled Intelligence efficiently transformed raw, unstructured data into a fully deployed rare-object AI detection model that makes highly accurate and reliable predictions. EI’s model was so effective that the IC customer expanded EI’s scope of work to include additional rare objects. Today, EI continues to develop and expand the capabilities of this solution for the IC customer, and EI’s solution is now supporting the intelligence gathering efforts at scores of customer field operation sites globally.
The outcome was fantastic for the customer, who has now come to see EI as a key provider of AI solutions.
How did we do this?
Quality Data “In” = Accurate AI Analysis “Out”
Most people understand that quality data in equals accurate AI analysis out.
However, customers often fail to grasp the magnitude of error that can occur from even small changes in data annotation. All AI models rely on annotated “training data” so that they can “learn” to make reliable predictions in real-world operational environments. The quality and precision of data curation and annotation is important for all AI models, whether one is talking about a Large Language Model (LLM) such as ChatGPT, or a computer vision model used for semi-autonomous driving or a model used to identify military action in satellite images.
But with military, intelligence, or other “rare” objects/event use cases, the relative quality and precision of each data label carries even more weight. Unlike LLMs or other commercial AI models, training data sets for national security AI are much smaller. For example, if you are building a model to identify stop signs, there are millions of publicly available images of stop signs to use, so making a few mistakes or being less precise on individual annotations is less costly, because you have a huge sample size to draw from.
This is NOT the case if you are a defense or intelligence customer working with rare objects (e.g. North Korean rocket launchers, Chinese fishing craft, land mines in Ukraine) or with unusual data types such as SAR or hyperspectral imagery.
In these scenarios, training data sets are much smaller, so each piece of data carries a greater relative weight in the training model and must be annotated precisely and accurately. And because that data is “rare”, it takes highly trained experts and professional labelers to prepare the training sets.
Enabled Intelligence’s Data to Decision Engine
In the case with the Intelligence Agency customer, EI’s data science team attributed the poor model performance to sub-par data labeling. Starting from scratch, EI assessed the customer’s raw data, used in-house subject matter experts to train full-time staff in accurate analysis and data annotation, and used an intensive quality assurance process to ensure >95% accuracy of all data labels. By investing in quality upfront with expert-in-the-loop data annotation, EI developed a new and better-performing model within months — at a fraction of the cost.
EI’s team of expert data annotators did such an outstanding job of annotating training data the FIRST TIME, the AI model was able to learn from real world data far more efficiently than before. A smaller quantity of higher quality labeled data also allowed EI’s data science team to develop a reliable AI model in days, using less high powered (and less expensive) compute, and less overall data.
The Bottom Line: Investing in High Quality AI Training Data is Critical
Too often, organizations working to create and use emerging AI technologies spend significant resources on high end compute and novel machine learning techniques, but undervalue the development of training data. The best way to lower overall AI development costs and improve AI performance is to train models on accurately labeled and representative data – which can save millions in the longer term and more quickly transform data to reliable and accurate decisions.