Columnization Processor

Modified on Tue, 30 Nov 2021 at 03:02 PM

Overview

Generate new columns based on the respective values in an existing column. 


Input

This processor works on any kind of input dataset.


Configuration


The names of the columns created by the second field in the configuration will be according to this schema: columizedColumnName_columizedColumnValue_duplicatedColumnName where duplicatedColumnName are the columns which are aggregated. 


Warning: Note that in order for the workflow to be executed, at least one of the three aggregation methods (in the blue boxes) should be configured.


Advanced Configuration

It is possible to select multiple columns for aggregation, in that case it is recommended to toggle the last configuration field (Use Broadcast Joins). Enabling this toggle allows to perform the columization on each of the selected columns and then join the resulting tables.

It is important that the resulting table does not exceed the memory limit of the workers.


Output

The result table contains the columns selected in the first configuration field, along with the created columns with values from the feature selected in the second configuration field.


Example

In the following example, we would like to output the the cheapest accommodation in different locations for each accommodation type.


Example input



Workflow

Example Configuration


Result

Minimum values are selected for each location and missing values are replaced by the chosen default value.

If a second aggregation method is to be chosen, new columns are created.


Here the average price is selected as a second aggregation method. The used default value is 250.


Related Articles

Lexical Columization Processor

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article