Overview
This processor calculates certain statistical measures (exp: Mean, Median ...) and visualizes the Data in boxplot.
Motivation
To gain important insights from Data and extract helpful information embedded in it, it is necessary to understand this Data which this can be achieved by applying some Descriptive Analysis on the Data of interest.
The Heuristic Summaries Processor helps generate significant statistics from the input Data.
Configuration
The processor requires a Dataset containing at least one numeric/ratio scaled column/variable.
This processor is generally linked to a Load Processor to interpret input Data, or maybe after applying some transformations on this Data.
The configuration menu of the processor is the following:
Compression Size: the higher the value, the higher precision will be BUT execution time rises and memory consumption will be high.
Merge Interval: define interval for merging tree centroids
NOTE THAT:
- these two configuration fields are experimental
- the processor does not infer column type meaning that if a column is declared as of type "String" but contains only numbers the processor will NOT take it into account
Output
This processor provides two outputs:
- within the processor: Displays a Boxplot graph accompanied with multiple statistical measures (min, max, median ...) for each numeric Column from the input Dataset
- The output node of the processor generates a table with 12 columns: ColumnName, min, max, sum, median, firstQuartile, thirdQuartile, arithmeticMean, geometricMean, lowerWhisker, upperWhisker, numberOfRows
Example
In this example the Heuristic Summary Processor will be applied on a simple Dataset to extract statistical measures from this Data:
Related Articles
Distinct Summary
Distinct Textual Summary
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article