01 Feb / 2021 911 views

SMITH vs BERT – A Comparative Study of Google Algorithms

SMITH vs BERT – A Comparative Study of Google Algorithms

Posted by : Under : Digital Marketing Services 2 Comments

BERT algorithm has caused a stir right since its inception due to its innovative crawling approach. Recently, many SEO experts are pointing out that SMITH algorithm can outperform BERT.

Wondering what Smith algorithm is? Is it already in use? How is it different (we mean, better) than BERT? Sit back and relax! The blog will find answer to all the questions.

What is SMITH algorithm?

SMITH is a new algorithm model that tries to comprehend the entire document. Models like BERT can understand multiple words within the context of a sentence.

BERT can predict randomly missing words from the context within the sentence. On the other hand, SMITH can predict what the next sentence can be. This potential of SMITH enables it to understand larger documents in a better way than that of the BERT algorithm.

A little background understanding is important for any discussion on the algorithms.

Each algorithm is trained on a specific data set. It is called pre-training. Usually, the engineers hide several words within a sentence and the algorithm has to predict the words.

As the training proceeds, the algorithm gets smarter and becomes optimized to reduce mistakes on the training data. SMITH also goes through these fine-tuning processes of model training.

During the course of the pre-training of the SMITH algorithm, the researchers hid randomly selected words as well as sentence blocks. That’s because, the relations between different words in a sentence block as well as the sentence blocks within a document are equally important for understanding content.

They found that SMITH has a capability to outperform BERT when it comes to comprehending long content.

Is Google Using SMITH Algorithm?

You may have noticed that Google doesn’t reveal which specific algorithm it is using. Though the researchers maintain that SMITH algorithm is more powerful than BERT, Google hasn’t formally stated that it is in use.

So, why should you read this blog? And why should we put the effort to outline it? That’s because, Google’s ulterior motive is to enhance user experience. And as we can see, SMITH algorithm is clearly more potent than BERT when it comes to offering a superb UX. Therefore, it’s better to have an idea about it and prepare for it.

The following discussion will further the comparative study between BERT and SMITH. So, keep reading.

SMITH vs BERT – An Analysis

  • Problem of matching long queries to long content

According to the researchers, semantic matching among different long document pairs that has significant applications like related article recommendation, news recommendation, and document and document clustering has not yet explored properly and requires more effort.

Nevertheless, the researchers have maintained that the BERT algorithm can understand short documents and is not suited for analysing long-form document.

Here’s what the research paper Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching has to say:

“Semantic matching between long texts is a more challenging task due to a few reasons:

1) When both texts are long, matching them requires a more thorough understanding of semantic relations including matching pattern between text fragments with long distance;

2) Long documents contain internal structure like sections, passages and sentences. For human readers, document structure usually plays a key role for content understanding. Similarly, a model also needs to take document structure information into account for better document matching performance;

3) The processing of long texts is more likely to trigger practical issues like out of TPU/GPU memories without careful model design.”

  • Issues with a long document

SMITH can perform better as the document gets longer. A study was conducted with various benchmark data for a long-form text matching. It shows that SMITH model has the potential to outperform the earlier algorithm models. It also increases the max input text length from 512 to 2048 as compared to BERT based guidelines.

Having been said that, a top SEO agency India remarks that SMITH model doesn’t really replace BERT. It only extends the potential of BERT by executing the heavy-duty works that BERT was unable to accomplish.

Here’s what the research paper Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching has to say:

“Our experimental results on several benchmark datasets for long-form document matching show that our proposed SMITH model outperforms the previous state-of-the-art models including hierarchical attention, multi-depth attention-based hierarchical recurrent neural network, and BERT.”

Closing Thoughts

Though Google has not yet explicitly revealed that it uses SMITH algorithm model, researchers found that it has more potential than BERT algorithm in comprehending long content. That’s why, it’s better to have an understanding of it in case Google starts to use it in the future.

Priyanka Agarwal

Priyanka Agarwal

Priyanka Agarwal is an experienced digital marketer with vast knowledge in SEO, SMO, etc. She pens articles elucidating the latest trends in digital marketing.


  1. HTML5 Web Hosting Templates says:

    Nice post, thanks for sharing, keep up the good work

  2. Rosina Frede says:

    Such a cute cover!

Leave a Reply

Business growth ebook

Contact Us

Avail an Unconditional Free Consultation

    Contact  Us