LLMLANGUAGE MODELNLPMLLLMs, a brief history and their use casesJohn Jacob, Shantanu Nair
John Jacob, Shantanu Nair8 min read
John Jacob, Shantanu Nair
What is a Large Language Model (LLM)?
An LLM or Large Language Model are machine learning models trained on a massive corpus and have shown exceptional capacity in Language Understanding.
A Brief History about LLMs
The first ever concept of an LLM or Large Language model can be traced back to the year 2017 when Google researchers released a paper detailing the original Transformer model. A Transformer model is a neural network that identifies patterns in sequential data, and became hugely popular after beating several customized models tuned for individual NLP tasks. Later, Google released BERT (Bidirectional Encoder Representations from Transformers) in 2018 with 354 million parameters, with an open-source implementation and several pretrained models. The popularity that BERT had gathered, pushed large companies to create their own powerful Transformer model architectures and training techniques that eventually led to the growth of LLMs.
In the year 2020, OpenAI launched GPT-3, their 3rd generation Generative Pre-trained Transformer, trained on a dataset with almost a Trillion words with over 175 billion model parameters. GPT-3 is a Language predictive model that could handle a wide range of use cases and was applied across industries, from gaming and development, to creative image generation and human like writing.
In the year 2021, Google released LaMDA with 137 billion parameters. Microsoft and NVIDIA introduced a Transformer based model, Megatron-Turing Natural Language Generation model (MT-NLG), which is said to be the largest generative language model to date, with 530 billion parameters. OpenAI's GPT-4 is expected to be launched with 100 trillion parameters.
Over time, large language models (LLMs) have evolved to serve use cases such as generative art with OpenAI’s DALL-E, and emotion supported chatbots with Meta’s BlenderBot, to teaching robots how to operate in the real world.
LLMs have advanced language modeling skills that have even transferred over into their speech-related abilities. Companies like Cohere, OpenaAI, Forefront.ai and AI21Labs are delivering innovations at unprecedented speed, with their advanced Natural Language Understanding (NLU), that can provide human-like insights from unstructured data, capable of performing tasks such as text generation, summarization, question answering etc, some utilizing a clever technique known as Prompt Engineering to provide context to the models on the task at hand, in a natural, no code manner.
Large Language Model (LLM) Functionality
LLMs are trained on incredibly huge language data sets and can solve many problems associated with language. The problem categories that LLMs can solve are:
- Natural Language Generation
- Language Translation
- Conversation Management
The above mentioned categories can be used to help users find their required language specific technology for their application.
Natural Language Generation
Natural Language Generation (NLG) uses Artificial Intelligence (AI) to generate text from structured data. Companies such as Cohere, OpenAI, AI21Labs, EleutherAI, BLOOM, GooseAI etc, have NLG models. Two strategies of language generation models are:
- Prompt Engineering
One of the main use cases of LLMs are generating text, as they are pre-trained with a large collection of text data. The text data used to prompt the model consists of a collection of short examples of the task at hand. The model studies the content, identifies a pattern in the text data, and proceeds to generate text. This is a popular concept called Prompt Engineering.
The ability of a model to perform on a certain task, depending on the amount of examples or prompt data given to it, can be categorized into various operating modes.
- Few-shot learning is the concept of classifying new data by feeding the pretrained model with a few examples of labeled data of each class to nudge prompt completion in the right direction.
- One-shot learning is the concept of feeding the model with a single example per class
- Zero-shot learning is the concept of classify unseen classes without any training examples given, enabling it to perform well on unseen tasks
- Text Classification
- Rewriting Text
- Semantic Search and Sentence Similarity
- Text Extraction
- Generate Endpoint (Summarization and Paraphrasing)
Text Classification Text Classification is the process of feeding the pre-trained model with a predefined label to the given piece of text. Since we are aware of the classes or clusters in advance, text classification can be considered as supervised learning. For example: “The Summary of this document is...” These are the ways to prompt data for a task:
- Task description prompt Allowing the large language model to be trained on an adequate description of a task to perform. For example: Generate a summary.
- Adding Examples in the prompt Allowing the large language model to be trained on multiple examples of adequate data to perform a task.
- Use different ranges of Examples in the prompt Allowing the large language model to be trained on different ranges of multiple examples of adequate data to perform a task.
Text Classification is one of the use cases in language processing, where you can train classifiers and automate language-oriented tasks based on your requirement.
Rewriting text can be done with prompt engineering, where the task is to rewrite or rephrase a piece of text.
An interesting use of this would be correcting transcriptions. You can prompt the model with task context, along with a few examples of correct and incorrect transcriptions. A few examples where rewriting can be used are making text corrections, paraphrasing or rephrasing text, redact personal information etc.
Semantic Search and Sentence Similarity
Using NLP (Natural Language Processing), Semantic Search is the process of understanding the context of the given search query to result in the correct output, and storing data for improving search accuracy. The idea is that all your sentences or documents as embeddings are pushed into a vector space, and while searching, the query embedding is pushed into the same vector space, and the closest embeddings with a high semantic overlap are found.
Sentence Similarity is the process of identifying similar text in different sentences. The Idea is to convert input text into embeddings, and calculate their similarity.
A similarity score is calculated through these embeddings, which can be used to further improve its accuracy.
A few use cases are:
- Identifying similar text in documents, or ranking documents by relevance
- Retrieving relevant information from a document
- Intelligent context aware search
Text Extraction is the process of extracting key information from a given piece of text. Using embeddings, you can process and extract insights from a given text. You may prepare the prompt by feeding some context about the task, and a few examples of extracted text.
A few use cases are:
- Extraction of Key terms, words and named entities.
- Generate Tags for Blogs.
- Identify Personally Identifiable Information (PII)
Generate a human-like text output when given a prompt input. Often, models have been built to learn the stylistic and artistic demeanor of a given prompt and build on that to generate creative, and compelling stories or conversations.
Summarization and Paraphrasing
LLMs can summarize and paraphrase the text of a paragraph, an article or even an entire document using prompt engineering. Feed the model with a few examples of a text document and its summary, and the LLM will start summarizing the text. Certain language models with clever prompt inputs can generate summaries or paraphrases for a wide variety of different language content.
A few examples where LLM summarization can be used:
- Meeting Transcription
- Published Papers
- Customer Support Chats
- Phone calls
Fine-tuning is the process of customizing a pre-trained model LLM (Large Language Model) to help at better performing a particular task. Fine-tuning can help you create a custom model trained on your dataset, which can then perform beetter at a unique task and with increased knowledge about a particular domain. Fine-tuning builds on pre-trained baseline models and creates a custom fine-tuned model for better output.
Question-Answering models are ML models that when given some or no context can generate answers for the questions asked. Answering is based on the dataset it was trained on, for a specific problem. The models will have a semantic understanding of the context and the question. Tasks such as frequently asked questions and customer related queries can be automated with QA.
Types of QA variants:
- Extractive QA: This QA model extracts the answer from a given context.
- Open Generative QA: This QA model generates text based only on the given context.
- Closed Generative QA: This QA model generates text based on zero or no context.
There are two types of QA domain models - Closed and Open. Closed-domain models are restricted only to a specific domain. Open-domain models are restricted to no specific domain.
In recent years, there has been a rapid improvement in Natural Language Translation (NLT) with the progress of advanced pre-trained language models. With a pre-trained model, you can create language translation with any level of training and expect extraordinary results. Earlier, the use of Long short-term memory (LSTM) was common, but after the rise of transformer-based neural network models, we have seen a rise in skill and accuracy of machine generated translations.
With advancements in Conversation AI and Chatbots, organizations benefit from the ease of doing business with their customers, helping them to have better conversations, provide better and more timely support. Conversation Management uses technologies like NLP (Natural Language Processing) and NLU (Natural Language Understanding), to understand sophisticated conversations, their intent, correct errors in them, and respond accordingly.
In recent times, LLMs have given rise to the next generation of Chatbots. Large companies like Meta, have released conversational AI prototypes such as ‘BlenderBot 3’, in which the bots can learn from live conversations, and are trained to have empathic conversations with users. LLM-based “open-language” chatbots are the next step for conversation AI that companies are looking to commercialize.
Experience LLMs via these Platforms:
While some of these allow you direct access to interact with a large language model, some via prompt building, and others doing specialized prompt building for you, under the hood, the others in the list utilize LLMs to provide a specific service or product, without you even being aware that an LLMs is behind it all.
Warp - Shell command generation
Viable, Enterpret, Cohere, & Anecdote - organize & summarize product feedback from users (e.g. support tickets, surveys, analytics) into actionable insights for future product development. - product insights
Monterey - Generates Product Requirement Documents
Oogway - Personal decision making
GODEL(Microsoft), DialoGPT (Microsoft), Blender Bot (Meta AI) - For Conversational AI
LLMs have risen to become one of the most powerful means we can leverage for text analysis. The next step is to seamlessly integrate theses Large Language Models (LLMs) into your own application using a simple API, with access to all providers - More on this soon.
Checkout our other blogs for more interesting content.