Petabyte

Posts

Showing posts with the label word embeddings

Word Embeddings - Vectors that represents words

June 06, 2017

Ever since I started being a part of the R & D in our company. I have been dealing with understanding neural networks that generate representations of words and documents. Representing words and document as vectors allow us to carry out Natural Language Processing Tasks mathematically.An example, we can see how similar two documents are (using cosine similarity), a quick analogy (vector operations), and ranking documents. But how do you produce the vectors that will represent the words? Well, there are many ways. There are traditional NLP approaches that can still work very well like Matrix Factorization (LDA, GloVe) and newer methods many of which uses neural networks (Word2Vec). I have been producing document vectors using the gensim doc2vec (which uses Word2Vec). I have been using the hierarchical softmax skip-gram model ( Word2Vec.py Neural network ). When I was reading this part of the code, I thought that if this is a neural network (shallow - not a deep learning model) w...