Human language is difficult for computer programs to understand because it is inherently complex, ambiguous, and context-dependent. Unlike computer languages, which are based on strict rules and syntax, human language is nuanced and can vary greatly based on the speaker, the situation, and the cultural context. As a result, building computer programs that can accurately understand and interpret human language is a complex and has been an ongoing challenge for artificial intelligence researchers. This is exactly the reason why it took us so long to create reliable computer programs to deal with human language.
In addition, for many different reasons, early language models took shortcuts and none of them addressed all linguistic challenges. It was so until Google introduced the Transformer model in 2017 in the ground-breaking paper “Attention is all you need”. Here, a full encoder-decoder model, using multiple layers of self-attention resulted in a model capable of understanding almost all of the linguistic challenges. The model soon outperformed all other models on various linguistic tasks such as translation, Q&A, classification, text analytics.
As encoder-decoder models such as theT5 model are very large and hard to train due to a lack of aligned training data, a variety of cut-down models (also called a zoo of transformer models) have been created. The two best known models are:BERT and GPT.
The much-debated ChatGPT is an extension of GPT. It is based on GPT version 3.5 and has been fine-tuned for human-computer dialog using reinforcement learning. In addition, it has several additional mechanisms in place that aim at helping it to stick to human ethical values.Just a month ago, an even more powerful GPT-4, that has some of these mechanisms directly incorporated into the core model, was released. Despite the early glitches and biases in the models, these capabilities are major achievements!
The core reason the GPT family is so good, is because transformers are the first computational models that takes almost all linguistic phenomena seriously. Based on Google’s transformers, OpenAI (with the help of Microsoft) have shaken up world by introducing a models that can generate language that can no longer be distinguished from human language.
But not even GPT-4 is not the all-knowing General Artificial Intelligence some people think. This is mainly due to the decoder-only architecture. ChatGPT is great for “chatting”, and to some extent reasoning, but one cannot control the factuality. This is due to the lack of an encoder mechanism. The longer the chats, the higher the odds that ChatGPT will get off-track or start “hallucinating”. Being a statistical process, this is alogical consequence: longer sequences are harder to control or predict than shorter ones.
Using GPT on its own is great for casual chit-chatting, and can also be used for compiling helpful answers to relatively challenging questions and – to some degree – for data analysis and manipulation. There are, however, definite risks that need to be recognized for any serious use. For example, if you ask ChatGPT to “List 5 patents that describe autonomous vehicle platoon control systems”, it generates a good-looking list with patent numbers and a plausible summaries of the patents – but both the numbers and the summaries are completely arbitrary! Not being aware of how ChatGPT works and the limitations of the approach, one would take the answer without questions. Using it for legal or medical advice without human validation of the factuality of such advice is just dangerous (or can we say stupid).
The AI research is aware of this, and there are a number of on-going approaches to improve today’s models:
We at IPRally know how to convert the text of a patent document into a knowledge graph. Combined with our recent breakthroughs with Graph transformers to power our patent search engine, we have a unique position to use these to control a conversational dialog when searching or to generate the text of a patent application, to mention only one use case example.
We closely monitor all of the above mentioned routes for improvements. Not only from OpenAI, but also from Google (the original inventor of the Transformer architecture) and other relevant players.
In addition, one can also expect integrations with other forms of human perception: vision and speech. As you may not know, OpenAI is also the creator of Whisper, the state of the art Speech recognition for 100s of languages and DALL-E2, the well-known image generator.
So, what do you expect is next for the patent industry? That we will answer in an upcoming blog post.
1 Textual adversarial attacks are a type of cyber-attack that involves modifying or manipulating textual data in order to deceive or mislead machine learning models.
In natural language processing (NLP), models are trained to perform a variety of tasks, such as sentiment analysis, text classification, or language translation. Adversarial attacks aim to create input data that will cause the model to produce incorrect or unintended outputs, by exploiting vulnerabilities in the model's training data or architecture.
Textual adversarial attacks can take many forms, such as adding or removing words or punctuation, changing word order or synonyms, or introducing subtle changes that are difficult for humans to detect but can drastically alter the model's output. The goal of these attacks is to exploit the weaknesses in the model's decision-making process, in order to produce results that are advantageous to the attacker. For example, an attacker might attempt to deceive a spam filter by. inserting specific keywords that trigger a false positive, or to manipulate an NLP model used in financial markets to generate misleading data that can be used for fraudulent purposes.
Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
Devlin, Jacob, et al. "Bert:Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." arXiv preprint arXiv:1905.05950 (2019).
Radford, Alec, Jeff Wu, Rewon Child,David Luan, Dario Amodei, and Ilya Sutskever. “Language Models areUnsupervised Multitask Learners.” (2019). GPT-2.
Tom B. Brown et al. “Language Models are Few-Shot Learners”, arXiv:2005.14165, July 2020. GPT-3.
Ouyang, Long, et al. "Training language models to follow instructions with human feedback." arXiv preprint arXiv:2203.02155 (2022)
Copyright © 2022
— IPRally Technologies Oy.