Patent search metrics: We can do better than recall

Patent search engines are traditionally optimized for metrics like recall. But recall is a problematic metric. We are about to take a leap towards a new metric that more directly measures understanding.

Patent search engines are traditionally optimized for metrics like recall. You get what you measure, and focusing on the recall gives you a decent search engine. We are about to take a leap beyond mere retrieval: towards a new metric that more directly measures understanding.

We have a patent search engine that PTOs, law firms and some of the biggest companies in the world use, mainly because of the quality of the search results. When we attacked the patent search problem, we believed that the graph-based approach would lead to something unique that is more than incrementally better than the old.

Solving the same problem better leads to incremental improvements. That is why we need to solve a different problem: not just information retrieval, but deep understanding of the technology inside the documents.

‍

Old patent search metrics – recall

Traditionally search engines have been measured with recall, that is, how many of the known relevant documents are found in the result set. With prior art searching, the metric has been the patent examiner citations. If we can get all the citations to the top results, the search is approaching the patent examiner level.

But recall is a problematic metric. First of all, it should not be confused with search quality. When you add more data for the search engine, there will be more false negatives. That is, there will be more relevant documents that the examiner never noticed and cited. The recall goes down while the search itself becomes better with better coverage. Second, recall is not ambitious enough. A machine can go through all the data and a human cannot. On the whole dataset level, we should aim to be better than a human expert with the old tools.

‍

New AI metrics – X vs A citations

‍We are moving from measuring the search as a whole towards measuring understanding. In a PTO when a patent application is received, the examiner begins the work for finding prior art that could prevent the patent. If something completely contains the idea, it gets cited as X. If it’s relevant but not a complete match, it gets marked as A. And if it's newer than the application and otherwise would be X, it get's marked A - we need to do some date filtering. X vs A accuracy gives us a metric without obvious compromises. We can use the X and the A citations from the same application together. As they then come from same examiner, we will avoid incorrect samples close to perfection.
‍

Traditionally with information retrieval, the metric relates to the whole dataset. With patents and X/A citations, we have accurate citation data that lets us to forget the rest of the dataset and focus on measuring understanding in a more direct way.

From search to graph AI
‍
Why then does everyone prefer search metrics like recall? Because it is the best way to measure the current search engines – it tells you how often results are relevant. But what if we aim for something more ambitious: to only show relevant results, to have them ranked by relevance, and then pinpoint the parts that make them relevant to make the output transparent and understandable? Recall wouldn’t suffice. Having such an AI today just sounds too amazing to be possible – transformers are a step towards a level like this but that wouldn't scale enough. And yet, that is what we are building. Our knowledge graph approach is the secret that makes it possible - to truly understand technology at scale.
‍

X vs A is difficult: a good, relevant result set doesn't need to produce better X vs A accuracy than random. Recall for top 5 (T5R) is here perfect because all citations are within top 5. X vx A is zero, as X is ranked lower than A.

‍

Towards examiner-level patent understanding

How far are we? We currently can distinguish X from A with 58% accuracy, and I don't really expect anyone to be doing better. It doesn’t tell much about the search results directly, as the A citations are good and important results too. What it tells, however, is that the future will bring more changes than you are likely to expect.

What would you get with scalable patent understanding? A rookie would be able to get better search results in minutes than an experienced patent examiner with traditional tools would get in one week. At this level, we shift focus from searching to what’s important - to building our understanding and making wiser decisions.

Juho Kallio

November 18, 2021

•

5 min read