Attention not needed!
Attention, Attention For as long as we have been using Transformer models, there has been a question about how to put it in production because it needs a lot of computing power. Google took about a year from when BERT (Bidirectional Encoder Representations from Transformers) was introduced to being incorporated into their search algorithm. It is not just the advancement in understanding searches using BERT; they also needed new hardware. “But it’s not just advancements in software that can make this possible: we needed new hardware too. Some of the models we can build with BERT are so complex that they push the limits of what we can do using traditional hardware, so for the first time, we’re using the latest Cloud TPUs to serve search results and get you more relevant information quickly. “( https://blog.google/products/search/search-language-understanding-bert/ ) One of the complexities in BERT is the ‘self-attention’ modules, these are neural network layers that compute a compatibi