Petabyte

Posts

Showing posts with the label nlp

Attention not needed!

August 01, 2022

Attention, Attention For as long as we have been using Transformer models, there has been a question about how to put it in production because it needs a lot of computing power. Google took about a year from when BERT (Bidirectional Encoder Representations from Transformers) was introduced to being incorporated into their search algorithm. It is not just the advancement in understanding searches using BERT; they also needed new hardware. “But it’s not just advancements in software that can make this possible: we needed new hardware too. Some of the models we can build with BERT are so complex that they push the limits of what we can do using traditional hardware, so for the first time, we’re using the latest Cloud TPUs to serve search results and get you more relevant information quickly. “( https://blog.google/products/search/search-language-understanding-bert/ ) One of the complexities in BERT is the ‘self-attention’ modules, these are neural network layers that compu...

Natural Language Processing (NLP): A Key to Human Advancement

May 13, 2020

Abstract In this paper, we will explore why NLP (Natural Language Processing) techniques contribute to Human Advancement. We will revisit how the interpretation of a text, learning from writing, and interaction with NLP based technologies affect our lives, decision, and behaviors. We will also see how NLP affects our community and social structure. 1 Introduction NLP technique is key to our human advancement. It is crucial now for interpreting a corpus of text. It will be critical as more sources of text are digitized. It is also the reason why we need to advocate digitization adaptation. NLP techniques are how we are going to sustain the information needs of our society. NLP is also a key technology in our learning. It is becoming a necessity for our human development as we shift our social interactions online. As we produce more text that can be analyzed, we can look into a person’s profile and information without actually revealing it. Profiling can be used to target and ...

Gensim Doc2Vec on Spark - a quest to get the right Vector

March 31, 2017

Ever since I joined the R & D group we have been doing a lot of cool things, like trying IBM Watson (see previous blog entry). Now we are doing a lot of Natural language processing. We wanted to compare the similarity of two documents. There is this excellent project Gensim ( doc2vec ) that easily allow to you to translate large blocks of text to a fixed length feature vector to make comparisons. Here is the link to the original paper from some people from Google that explains the approach. In essence, they wanted to find a representation that will overcome the weaknesses of the bag of words model. The doc2vec approach proves to be a reliable approach to comparing similarities of documents because it takes into consideration the semantics and the order of the word in context. So with that, we wanted to use it for a corpus of 26 million documents. Calculating the doc2vec for 26 million documents is not a small task, so we need to process it in Spark. The problem ...