Caching Processor

Modified on Tue, 30 Nov 2021 at 03:15 PM

Overview

The Caching Processor caches an input dataset and forwards the cached dataset. This can improve performance on big iterative and complex calculations on the same data set by caching the result and not recalculating the entire (previous) process again. Once run successfully the workflow will use the Caching Processor as starting point for further calculations.


Input

The processor can have any valid dataset as input.


Configuration

"Dataframe-Based Caching" decides whether the dataframe or the raw Resilient Distributed Datasets (RDD) gets cached. Caching the dataframe saves cache space and may improve subsequent and overall query performance due to more optimization options for Spark. Dataframe caching is switched on by default.


Output

The cached dataset acts as a starting point for further calculations. The actual content of the dataset is not changed by the processor.


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article