Home Blog Digital Marketing Services SMITH vs BERT – A Comparative Study of Google Algorithms

SMITH vs BERT – A Comparative Study of Google Algorithms

01 Feb / 2021
3,117 views

BERT algorithm has caused a stir right since its inception due to its innovative crawling approach. Recently, many SEO experts are pointing out that SMITH algorithm can outperform BERT.

Wondering what Smith algorithm is? Is it already in use? How is it different (we mean, better) than BERT? Sit back and relax! The blog will find answer to all the questions.

What is SMITH algorithm?

SMITH is a new algorithm model that tries to comprehend the entire document. Models like BERT can understand multiple words within the context of a sentence.

BERT can predict randomly missing words from the context within the sentence. On the other hand, SMITH can predict what the next sentence can be. This potential of SMITH enables it to understand larger documents in a better way than that of the BERT algorithm.

A little background understanding is important for any discussion on the algorithms.

Each algorithm is trained on a specific data set. It is called pre-training. Usually, the engineers hide several words within a sentence and the algorithm has to predict the words.

As the training proceeds, the algorithm gets smarter and becomes optimized to reduce mistakes on the training data. SMITH also goes through these fine-tuning processes of model training.

During the course of the pre-training of the SMITH algorithm, the researchers hid randomly selected words as well as sentence blocks. That’s because, the relations between different words in a sentence block as well as the sentence blocks within a document are equally important for understanding content.

They found that SMITH has a capability to outperform BERT when it comes to comprehending long content.

Is Google Using SMITH Algorithm?

You may have noticed that Google doesn’t reveal which specific algorithm it is using. Though the researchers maintain that SMITH algorithm is more powerful than BERT, Google hasn’t formally stated that it is in use.

So, why should you read this blog? And why should we put the effort to outline it? That’s because, Google’s ulterior motive is to enhance user experience. And as we can see, SMITH algorithm is clearly more potent than BERT when it comes to offering a superb UX. Therefore, it’s better to have an idea about it and prepare for it.

The following discussion will further the comparative study between BERT and SMITH. So, keep reading.

SMITH vs BERT – An Analysis

Problem of matching long queries to long content

According to the researchers, semantic matching among different long document pairs that has significant applications like related article recommendation, news recommendation, and document and document clustering has not yet explored properly and requires more effort.

Nevertheless, the researchers have maintained that the BERT algorithm can understand short documents and is not suited for analysing long-form document.

Here’s what the research paper Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching has to say:

“Semantic matching between long texts is a more challenging task due to a few reasons:

1) When both texts are long, matching them requires a more thorough understanding of semantic relations including matching pattern between text fragments with long distance;

2) Long documents contain internal structure like sections, passages and sentences. For human readers, document structure usually plays a key role for content understanding. Similarly, a model also needs to take document structure information into account for better document matching performance;

3) The processing of long texts is more likely to trigger practical issues like out of TPU/GPU memories without careful model design.”

Issues with a long document

SMITH can perform better as the document gets longer. A study was conducted with various benchmark data for a long-form text matching. It shows that SMITH model has the potential to outperform the earlier algorithm models. It also increases the max input text length from 512 to 2048 as compared to BERT based guidelines.

Having been said that, a top SEO agency India remarks that SMITH model doesn’t really replace BERT. It only extends the potential of BERT by executing the heavy-duty works that BERT was unable to accomplish.

Here’s what the research paper Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching has to say:

“Our experimental results on several benchmark datasets for long-form document matching show that our proposed SMITH model outperforms the previous state-of-the-art models including hierarchical attention, multi-depth attention-based hierarchical recurrent neural network, and BERT.”

Closing Thoughts
Though Google has not yet explicitly revealed that it uses SMITH algorithm model, researchers found that it has more potential than BERT algorithm in comprehending long content. That’s why, it’s better to have an understanding of it in case Google starts to use it in the future.

3 comments

HTML5 Web Hosting Templates says:
February 26, 2021 at 1:06 am
Nice post, thanks for sharing, keep up the good work
Rosina Frede says:
April 1, 2021 at 3:29 am
Such a cute cover!
Taurus websolutions says:
May 12, 2021 at 11:58 pm
Great post. Thank you for sharing