Why we need word2vec?
- As both TFIDF and BOW do not store semantic information. TFIDF gives importance to uncommon words.
- Hence there is definitely a chance of overfitting.
What word2vec does?
In this word2vec specific model, each word is basically represented as a vector of 32 or more dimension instead of single number.
Here the semantic information and relation between different word is also preserved.
Word2vec is applied to huge amount of data mostly.
How to use?
from gensim.model import word2vec
After tokenizing and stopwords write the following code...
model=word2vec(senetnces,min_count=1)
words=model.wv
Thank You!
Top comments (0)