Learn more about Enabled Intelligence.
In just a few months, the popularity of Large Language Models (LLMs) such as OpenAI’s ChatGPT or Google Gemini has exploded. LLMs have evolved from an online curiosity to an essential tool for many businesses. For example, LLM powered chatbots are now indispensable for online retailers, travel services and consumer financial services, while an increasing number of businesses rely on LLM’s to generate routine corporate communications, summarize reports or perform basic web-based research. While we expect the rapid adoption of LLM-based technologies will continue we see potential storm clouds over LLM adoption if they cannot prove more reliable for focused specific business use cases. A recent survey of large enterprises, 67% are either testing or deploying LLMs into their workflows, leveraging their capabilities to optimize operations and drive innovation.
But we also see a number of companies trying to figure out how to use LLMs for their specific businesses. Tech giants, like IBM, are learning how to use fine-tuned smaller LLM “modules” to more efficiently and reliably address the needs of their clients. Meanwhile, government users, including those in the U.S. Defense and Intelligence Community are also assessing LLM’s. We’re proponents of LLMs – provided they have been fine tuned to meet the unique domain expertise and security requirements of national security users. In this blog, we outline some of the drawbacks of using “off-the-shelf” LLMs and explain how fine-tuning LLMs can reduce or eliminate these problems.
General Purpose and Off-The-Shelf
One reason for LLM’s success to date has been their versatility and ease of use. The most popular and well-known LLMs also known as foundational models, are incredibly flexible: they can generate software code, solve calculations, or generate a history of the Holy Roman Empire, all with a single prompt. They are built for general use by anyone, anywhere.
But many enterprise and government customers are learning the hard way that general purpose, off-the-shelf LLMs don’t work equally well for all business applications, much less for the mission-critical requirements of national security users in the US Defense and Intelligence. It is one thing to generate a short history report on the Civil War or summarize a Teams call; quite another to provide a reliable assessment of Houthi forces currently deployed in Yemen.
In short, some organizations have invested a significant time and money on LLM’s, only to find they have purchased a solution that generates hallucinations (wildly incorrect answers) or that simply cannot deal with their specific data or security requirements.
fine-tuning off-the-shelf LLMs on an organization’s datasets, business documentation, and technology/industry specific publications reduces contextual errors and hallucinations
The Need for Relevant Doman Expertise
Why do off-the-shelf LLM’s struggle with these requirements? One of the biggest drawbacks in using off-the-shelf LLM’s, especially in “mission critical applications”, is their limited domain specific knowledge. Since LLMs are trained on all publicly available internet data, LLMs can struggle with specialized terminology or industry jargon, for example interpreting “BP” as “British Petroleum”, when “Blood Pressure” was intended. Furthermore, general purpose LLM’s get “feedback” on which answers are “better” from everyone, everywhere. This works when words mean the same thing in all contexts, but it can be disastrous when words such as “fires”, “effects”, or “autonomy” have meanings in a military context that are different from their everyday usage.
We have found that LLM fine-tuning can dramatically reduce these problems for our government customers. In one recent project, we fine-tuned an off-the-shelf LLM on an organization’s internal datasets, business documentation, reports, employee manuals, and technology specific publications, significantly reducing contextual errors and hallucinations, and increasing the relevancy and auditability of the LLM’s outputs.
Improving Data Security, Lowering Compute Costs
Another concern with off-the-shelf LLMs is data security. Many of these models are hosted on cloud-based infrastructure, posing significant risks to data privacy and regulatory compliance, and for Defense and IC customers, operational security. However, organizations can reduce these risks by developing proprietary LLMs or fine-tuning custom models that can be deployed on private, secure servers.
General-purpose LLM’s can cost billions of dollars to develop and hundreds of millions of dollars to run. Part of the reason for this is that these foundational LLM’s are working with huge datasets that must be queried with every user prompt, so that both the data storage and actual computing costs are very high.
But fine-tuning LLMs can lower compute costs and make it easier to deploy AI to the edge. By adapting a pre-trained LLM to a specific task or domain with a smaller dataset, fine-tuning can reduce the amount of data sent to the LLM, minimizing API calls and processing tokens, thus lowering costs. This approach allows for the use of smaller, more specialized models that are more cost-effective for specific tasks. This is especially important for Defense and Intelligence Community users that want to get AI technology into the hands of operational users in the field.
Conclusion
While general purpose, foundational LLM’s can work “off-the-shelf” for some business applications, they are not designed to support mission critical national security applications that require high levels of domain expertise and data security. But by fine-tuning an LLM on data relevant to Defense and Intelligence Community users, these LLM’s can become more powerful, more secure, and more cost-effective to operate.



