Lexical Binarization Processor

Modified on Tue, 30 Nov 2021 at 01:29 PM

Overview

Generates multiple binarized columns from one nominal scaled/text column for a limited number of unique values. Each binarized column will contain a 1 in a row when the corresponding nominal value is present in it. Otherwise there will be a 0.


Input

The processor operates on a dataset containing at least one column of type string. Only such columns can be selected for binarization.


Configuration


Output

The processor has two output ports:

  • Lexical Binarization Output: Contains the input dataset along with the new binarized columns, a column for each different "word" (the use of the term word here depends on the separation pattern) of the selected attribute.
  • Distinct Summary Output: The amount of each "word" in the selected attribute (sum of the corresponding column) along with the corresponding fraction.

A bar chart of the percentages is also provided within the processor under the result tab.


Example

Example Input


Example Configuration


Workflow


Result

  • Lexical Binarization Output


  • Distinct Summary Output


The previous result is provided by the processor and can be viewed under the "Result" tab. 


Related Articles

Lexical Columization Processor

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article