Word2Vec Processor

Modified on Tue, 30 Nov 2021 at 04:35 PM


Computes a Word2Vec model based on an input (text) corpus. Word2Vec is a neural network based approach to create word embeddings (high dimensional features vectors) from a given input corpus. It relies on the distributional hypothesis, which states that words occurring within a specific context range are similar. 

Note: Changed machine learning algorithm implementations in Spark 3 may slightly change results compared to Spark 2. Overall performance improved.


As input, the processor takes a table with columns containing a corpus (collection of written texts).



The processor forwards a table with extracted words and the respective vector (apart from the words column, the number of columns created is equal to the dimension mentioned in the configuration).


Example Input

For this example, we use the following corpus as input:

Trees were swaying, though gently, and their leaves were rustling as if in applause to the change in the weather
This had been going on for several days
The men and women who gauge the climate on television were exultant over the unusual run of good weather as if it was they who had brought it on
how it's supposed to be done " is a trait all of the best deer hunters share
Plus , it's a lot of fun to pull off-the-wall stunts that actually work in special situations
The protesters here certainly know what they don't like: war, globalization, capitalism, drug laws, immigrant detention centers, a high-speed train line and, inexplicably, the Olympic torch
This is a discussion of war, " said Claudio Robba, 25, one of maybe 150 protesters at a piazza


Example Configuration


Related Articles

Decision Tree Regression Forecast

Decision Tree Classification Forecast

Random Forest Classification Forecast Processor

Random Forest Regression Forecast Processor

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article