ChatGPT, Bing, Bard… From relative obscurity, these platforms have launched themselves into the public domain in the last few months.
Each is a specific product developed by different companies, but they are all built on top of the same class of technologies called Large Language Models (LLMs).
What is a LLM? And what makes it “large”? A LLM is a neural network model architecture based on a specific component called a “transformer.” The transformer gives the LLM the ability to identify how words relate to each other in their context and produce unique answers to questions, rather than “looking up” responses. A vast number of “neurons” make up this neural network, and the connections between these neurons are called “parameters.” These parameters define the strength of the signal between neurons.
To put the size of LLMs into perspective, one of the models behind ChatGPT has 175 billion parameters. Clearly, these models can be incredibly large, and can therefore offer impressive performance capabilities for the enterprise. The flip side is that they can also become very complex and costly.
Therefore, considering the size and capability of a LLM is key to deciding how best to use it. So, what are the options?
Using LLMs in the Enterprise
There are two ways you can utilise a LLM in the enterprise beyond the simple web interface.
1. You can make an API call to a model provided as-a-service
These services are generally on offer by companies like OpenAI, Amazon Web Services and Microsoft Azure. These companies provide public APIs that you can connect to your software . With this approach, there are several advantages, including:
Low barrier to entry — calling an API is a straightforward task that a junior developer can do in a matter of minutes
Higher sophistication — you can leverage some of the largest and most sophisticated models available, providing more accurate responses on a wide range of topics
Speed — generally these models provide quick responses, so you can use them in real-time
But while convenient and powerful, these public models are not suited for certain enterprise applications. Their being public means that data from the query may be retained and used for further development of the model. So, enterprises need to check and see whether the architecture respects their data residency and privacy obligations for their use case.
Additionally, there’s potential to accrue high costs, as most public APIs have a fee structure that charges according to the number of queries and the length of processed text. You can usually get cost estimates and use smaller/cheaper models for narrower tasks. Finally, though rare, the provider of an API can choose to stop the service at any moment; it’s risky to depend on a pipeline whose flow you don’t control.
2 . You can download and run an open-source model in an environment you manage
Given some of the limitations of harnessing a public model via an API, companies may be better off creating and running an open-source model themselves.
There’s a whole range of open-source models available, each characterised by strengths and weaknesses that will be more or less suited to a company’s needs. A smaller model — while limited in application — can often deliver a desired performance on a specific use case at far less the cost than a very large model. Moreover, by running and maintaining open-source models themselves, organisations are not dependent on a third-party API service.
But this approach might not be every organisation’s cup of tea. The process can, firstly, involve a high level of complexity, as to set up and maintain your own LLM requires a level of data science and engineering expertise beyond that of simpler models. Companies need to self-evaluate honestly to see if they have sufficient expertise and time to build and maintain such a model in the long term.
And then, secondly, there is the issue of narrower performance. Open-source community models are smaller and more focused in their application, whereas the huge models on offer via public APIs can cover an astonishing breadth and variety of topics.
Choosing an Approach
Taking into account the tradeoffs of each approach, you might ask: does one outweigh the other? In simple terms, no. In fact, there is no one-size-fits-all approach that could work enterprise-wide. Even within companies themselves, the best ways to choose which model and architecture to use should rest on a use case by use case basis.
Both options allow you to choose from smaller and larger models, with tradeoffs in terms of the breadth of their potential applications, the sophistication of the language generated, and the cost and complexity of using the model. For many enterprises, either method may be suited for different use cases at different times, depending on fluctuating budgets, capacity, and resources.
The companies that will have the greatest success using LLMs are those that can take an agile approach that allows them to opt for the right model for any given application. Innovation with LLMs is rapidly advancing. Companies that can be flexible and adapt to these changes will reap the greatest benefits from them.
About the Author
Kurt Muehmel is Everyday AI Strategic Director at Dataiku. Dataiku is the platform for Everyday AI, systemizing the use of data for exceptional business results. Organizations that use Dataiku elevate their people (whether technical and working in code or on the business side and low- or no-code) to extraordinary, arming them with the ability to make better day-to-day decisions with data. More than 500 companies worldwide use Dataiku to systemize their use of data, analytics, and AI, driving diverse use cases from fraud detection to customer churn prevention, predictive maintenance to supply chain optimization, and everything in between.