Google’s New SMITH Update Trumps BERT Search Algorithm
The SMITH—Siamese Multi-depth Transformer-based Hierarchical—algorithm is Google’s latest and greatest model for understanding long-form documents. This algorithm is an exciting innovation in document matching, which Google promises can outperform its predecessor when processing longer documents and queries. Although SMITH isn’t meant to completely replace BERT as of yet, it does supplement BERT by performing the lengthy work that BERT struggles to accomplish. In some ways, it’s an update to BERT’s capabilities, built into a separate and supplementary algorithm.
How does it work? Algorithms like BERT and SMITH are trained on data sets to contextually predict words within sentences. You might see something similar with predictive text in your phone: as you type a word, the app will offer suggestions based on similar sentences you have entered, or are popularly used in conjunction with that word.
The algorithms are “pre-trained,” a process by which the engineers will mask random words within a sentence and see if the algorithm can predict the missing words accurately. The algorithm then learns which words are more accurate within the context and grows optimized to make fewer mistakes. When released for others to use, the algorithms similarly learn from the user input, optimizing to the user’s needs over time.
BERT, meet SMITH, Your Replacement
Even as helpful as it is, BERT is limited to short text: a few sentences, a short paragraph. Simply put, the complexity of prediction rapidly increases with longer text input. So, BERT can predict a few words in a few missing sentences, but if you’re asking it to fill in the blanks on an entire page, it’s going to struggle to match them. It’s like putting together a puzzle, really: a small puzzle with a few missing pieces is much easier to manage than a large puzzle with pieces missing at the far corners.
Human readers derive context from content structures. That is, the way a document is laid out and the information presented will help us to better understand that information. For algorithms like BERT, the longer a document, the harder it is to read it all, as it tries to process a long-form view.
SMITH, on the other hand, is designed to exactly this in ways that BERT cannot. While SMITH uses the same sort of pre-training and learning methods that BERT does, essentially what it does is predict the following sentences, rather than just words within sentences. Its pre-training consists of masked sentences in the same way that engineers hide a single word within a sentence at which BERT will guess.
The result is an updated self-attention-based text matching model capable of an increased input text length from 512 to 2048. If BERT is looking at input by sentences, then SMITH looks at it by paragraphs and actually thrives with longer documents. As part of Google’s continuous core algorithm updates, SMITH is opening doors to potential functions that will benefit the entire industry.
SMITHing a New Search Tool
Why is SMITH important? The algorithm represents a big step forward. By Google’s own statement, semantic matching between long-form documents has many important applications in news recommendation, related article recommendation, document clustering, all areas ripe for exploration.
While Google remains cagey about whether SMITH is currently in use, based on their own publishings, the algorithm is currently in a functional state and has delivered reliable results exceeding the performance of other Transformer-based algorithms. What this means for you is that the SMITH algorithm will be available as part of Google’s suite. You will be able to use it just as you have been enjoying the benefits of BERT and the other algorithms, delivered alongside other performance enhancements in a future Google update.
Like BERT, SMITH is built to aid search software in better understanding and thus ranking online documents to meet a specific query. This is where SMITH’s ability to read entire documents has the potential to completely redefine the SEO game. The benefits are immediately apparent: if BERT ranks results based on being able to analyze a small section of an offering, what if SMITH is able to analyze the entire page?
Think about that for a moment. The applications are every bit as exciting as they are boundless. With SMITH working in tandem with its predecessors, even queries of greater length (and thus complexity) can be met with better matching data.
Preparing for SMITH Updates
What can you do to prepare for SMITH and its unique applications in natural language pattern recognition? Right now, there isn’t much. Until Google openly releases or puts SMITH into wide-scale use alongside the other semantic matching algorithms, your best bet is to continue with your business as usual. Of course, you should continue to improve your SEO efforts, keep your website updated, and so on, but once SMITH and Google are ready, you should have some idea of how to take advantage of these new capabilities.
Mr. SMITH is poised to show us all some exciting new possibilities. Keep your eyes peeled for more news from Google, and take the time to educate yourself on what SMITH has to offer.