Overview
Generates multiple binarized columns from one nominal scaled/text column for a limited number of unique values. Each binarized column will contain a 1 in a row when the corresponding nominal value is present in it. Otherwise there will be a 0.
Input
The processor operates on a dataset containing at least one column of type string. Only such columns can be selected for binarization.
Configuration
Output
The processor has two output ports:
- Lexical Binarization Output: Contains the input dataset along with the new binarized columns, a column for each different "word" (the use of the term word here depends on the separation pattern) of the selected attribute.
- Distinct Summary Output: The amount of each "word" in the selected attribute (sum of the corresponding column) along with the corresponding fraction.
A bar chart of the percentages is also provided within the processor under the result tab.
Example
Example Input


Example Configuration


Workflow


Result
Lexical Binarization Output
Distinct Summary Output
The previous result is provided by the processor and can be viewed under the "Result" tab.
Related Articles
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article