Overview
The Caching Processor caches an input dataset and forwards the cached dataset. This can improve performance on big iterative and complex calculations on the same data set by caching the result and not recalculating the entire (previous) process again. Once run successfully the workflow will use the Caching Processor as starting point for further calculations.
Input
The processor can have any valid dataset as input.
Configuration
"Dataframe-Based Caching" decides whether the dataframe or the raw Resilient Distributed Datasets (RDD) gets cached. Caching the dataframe saves cache space and may improve subsequent and overall query performance due to more optimization options for Spark. Dataframe caching is switched on by default.
Output
The cached dataset acts as a starting point for further calculations. The actual content of the dataset is not changed by the processor.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article