Transfer learning has been a big factor in the recent success in using machine learning for creating new tools and products. Transfer learning means taking a machine learning model trained to solve one task and using that model to solve another task, usually by fine-tuning the original model or training another model on top of the original model. Commonly used models for transfer learning are for instance models trained on ImageNet (used in computer vision) and the BERT model (used in natural language processing). We at IPRally also leverage transfer learning by training GloVe word embeddings on patent data and using these as inputs to our Tree-LSTM model.
Now that we've created a model that performs very well as a patent search engine (for searching prior art for inventions), we're constantly thinking whether we can use the model to solve other problems also. One thing that came to mind very quickly was patent classification, i.e. automatically categorizing inventions into the correct patent classes. Patent classification is a challenging multi-label classification task, since the number of available classes is very large. On the most specific subgroup level of the CPC classification system there are more than 200k classes that a patent can belong to. Automating this process is thus very challenging.
We decided to see whether our patent search model could also be used as a patent classification model. We did this by doing a simple nearest neighbor search. In other words we perform a search with our engine using the invention data (in our tests we used the first claim of the patent/application), and predict the classes by looking at the top results from the search. The patent search model was not fine-tuned for this purpose, and no new model was trained on top of the current patent search model. To find the correct number of top results to look at we used 15k randomly selected patent applications from USPTO, EPO and WIPO. We then used 5k other patents from the same offices to calculate the actual metrics.
The results we achieve with this simple model are surprisingly good. On the subclass level in the CPC classification (i.e. A01B) we achieve an F1 score of 70.83, and even on the most specific subgroup level (i.e. A01B 1/02) the F1 score is 29.95. When predicting only the top class we achieve a precision of 81.5 % on the subclass level and 47.2% on the subgroup level. In other words, almost half of the time the top predicted class is actually one of the classes of the patent even on the most specific classification level. As a comparison, in the PatentBERT paper, where the BERT model was fine-tuned to perform patent classification, the F1 score on the subgroup level is slightly worse, 66.83. The scores are not directly comparable though since we do not use the same test set (PatentBERT uses just US patents).
We will in the near future add statistics about the patent classes of the result list to the application, and also show the predicted patent class for all searches. Join our newsletter to stay tuned!