What is required to apply Large Language Models successfully in the patent industry

Sakari Arvela and Jan C. Scholtes focus their attention on Large Language Models, the limitations of GPT, and how to improve today's models.

Why is human language so hard to understand for computer programs?

Human language is difficult for computer programs to understand because it is inherently complex, ambiguous, and context-dependent. Unlike computer languages, which are based on strict rules and syntax, human language is nuanced and can vary greatly based on the speaker, the situation, and the cultural context. As a result, building computer programs that can accurately understand and interpret human language is a complex and has been an ongoing challenge for artificial intelligence researchers. This is exactly the reason why it took us so long to create reliable computer programs to deal with human language.

In addition, for many different reasons, early language models took shortcuts and none of them addressed all linguistic challenges. It was so until Google introduced the Transformer model in 2017 in the ground-breaking paper “Attention is all you need”. Here, a full encoder-decoder model, using multiple layers of self-attention resulted in a model capable of understanding almost all of the linguistic challenges. The model soon outperformed all other models on various linguistic tasks such as translation, Q&A, classification, text analytics.

The zoo of transformer models: BERT and GPT

As encoder-decoder models such as theT5 model are very large and hard to train due to a lack of aligned training data, a variety of cut-down models (also called a zoo of transformer models) have been created. The two best known models are:BERT and GPT.

BERT is a pre-trained (encoder-only) transformer-based neural network model designed for solving various NLP tasks such as Part-of-Speech tagging, Named Entity Recognition, or sentiment analysis. BERT is commonly used for classification tasks.
GPT, on the other hand, is a language model that is specifically designed for text generation tasks. It uses a decoder-only transformer architecture. GPT is trained on large amounts of text data and can generate coherent, human-like text in response to a prompt. GPT is commonly used for tasks such as text completion and text generation.

The much-debated ChatGPT is an extension of GPT. It is based on GPT version 3.5 and has been fine-tuned for human-computer dialog using reinforcement learning. In addition, it has several additional mechanisms in place that aim at helping it to stick to human ethical values.Just a month ago, an even more powerful GPT-4, that has some of these mechanisms directly incorporated into the core model, was released. Despite the early glitches and biases in the models, these capabilities are major achievements!

The core reason the GPT family is so good, is because transformers are the first computational models that takes almost all linguistic phenomena seriously. Based on Google’s transformers, OpenAI (with the help of Microsoft) have shaken up world by introducing a models that can generate language that can no longer be distinguished from human language.

GPT’s limitations

But not even GPT-4 is not the all-knowing General Artificial Intelligence some people think. This is mainly due to the decoder-only architecture. ChatGPT is great for “chatting”, and to some extent reasoning, but one cannot control the factuality. This is due to the lack of an encoder mechanism. The longer the chats, the higher the odds that ChatGPT will get off-track or start “hallucinating”. Being a statistical process, this is alogical consequence: longer sequences are harder to control or predict than shorter ones.

Using GPT on its own is great for casual chit-chatting, and can also be used for compiling helpful answers to relatively challenging questions and – to some degree – for data analysis and manipulation. There are, however, definite risks that need to be recognized for any serious use. For example, if you ask ChatGPT to “List 5 patents that describe autonomous vehicle platoon control systems”, it generates a good-looking list with patent numbers and a plausible summaries of the patents – but both the numbers and the summaries are completely arbitrary! Not being aware of how ChatGPT works and the limitations of the approach, one would take the answer without questions. Using it for legal or medical advice without human validation of the factuality of such advice is just dangerous (or can we say stupid).

How to improve large language models

The AI research is aware of this, and there are a number of on-going approaches to improve today’s models:

Larger models: so far, larger models have always been better. However, there are drawbacks: energy consumption grows exponentially and larger models are harder to understand and more vulnerable for adversarial attacks¹.
Build models optimized for certain vertical applications such as legal & medical, co-pilots for specific tasks such as searching, programming, document drafting, eDiscovery and information governance. Currently, ChatGPT and GPT-4 are trained using general data from the internet (Wikipedia, various blogs, websites, etc.). By training it with legal or medical data, quality will dramatically improve. This is what Harvey.ai did with data provided by Allen & Overy. This is also what Stanford University did with BioMedLM, a 2.7B parameter language model trained on biomedical literature which delivers an improved state of the art for medical question answering. Just recently, Bloomberg announced BloombergGPT, a model fine-tuned with financial data.
More reinforcement learning. Sometimes, this is also referred to as active learning. Both in the AlphaGo success as well as in ChatGPT’s, reinforcement learning methods made a big difference. For AlphaGo the computer program learned to outperform humans by playing millions of games against itself. For ChatGPT, the models learned to behave better and have human-like dialogue by chatting for months with humans. This is a true human-in-the-loop form of machine learning. It can be done with humans, but if annotated data sets are available, one can also do this automatically. Companies such a Snorkel provide advanced methods to create such high-quality annotated data sets with a minimal amount of human effort.
But, it is expected that the main increase in performance and control in factuality can only come from more controlled dialogues and prompt generation. We need encoders to drive the decoders. Instead of just taking some random section of a text as prompt (as is done with the initial BING integration) without really understanding the meaning of such text), we can better analyze the text so we understand the semantic role and relations between the words, and then use that to generate better prompts and control the text generation. There are many solutions in the world of Artificial Intelligence such as knowledge graphs or semantic networks. But a simple named-entity recognition in combination with relation extraction can already make a big difference. Did you know that there is a special database based on Wikipedia, DBpedia, that holds many of such relations and is a great source to control factuality.

We at IPRally know how to convert the text of a patent document into a knowledge graph. Combined with our recent breakthroughs with Graph transformers to power our patent search engine, we have a unique position to use these to control a conversational dialog when searching or to generate the text of a patent application, to mention only one use case example.

We closely monitor all of the above mentioned routes for improvements. Not only from OpenAI, but also from Google (the original inventor of the Transformer architecture) and other relevant players.

In addition, one can also expect integrations with other forms of human perception: vision and speech. As you may not know, OpenAI is also the creator of Whisper, the state of the art Speech recognition for 100s of languages and DALL-E2, the well-known image generator.

So, what do you expect is next for the patent industry? That we will answer in an upcoming blog post.

_________________________________________________________________________

¹Textual adversarial attacks are a type of cyber-attack that involves modifying or manipulating textual data in order to deceive or mislead machine learning models.

In natural language processing (NLP), models are trained to perform a variety of tasks, such as sentiment analysis, text classification, or language translation. Adversarial attacks aim to create input data that will cause the model to produce incorrect or unintended outputs, by exploiting vulnerabilities in the model's training data or architecture.

Textual adversarial attacks can take many forms, such as adding or removing words or punctuation, changing word order or synonyms, or introducing subtle changes that are difficult for humans to detect but can drastically alter the model's output. The goal of these attacks is to exploit the weaknesses in the model's decision-making process, in order to produce results that are advantageous to the attacker. For example, an attacker might attempt to deceive a spam filter by. inserting specific keywords that trigger a false positive, or to manipulate an NLP model used in financial markets to generate misleading data that can be used for fraudulent purposes.

References

Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).

Devlin, Jacob, et al. "Bert:Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." arXiv preprint arXiv:1905.05950 (2019).

Radford, Alec, Jeff Wu, Rewon Child,David Luan, Dario Amodei, and Ilya Sutskever. “Language Models areUnsupervised Multitask Learners.” (2019). GPT-2.

Tom B. Brown et al. “Language Models are Few-Shot Learners”, arXiv:2005.14165, July 2020. GPT-3.

Ouyang, Long, et al. "Training language models to follow instructions with human feedback." arXiv preprint arXiv:2203.02155 (2022)

Sakari Arvela and Jan C. Scholtes

April 14, 2023

•

5 min read