Dataset Save Processor (Deprecated)

Modified on Tue, 30 Nov 2021 at 11:18 AM

This processors has been deleted, please refer to the processor Data Table Save instead.


Motivation

ONE DATA is all about data, this data can bring value once it is well structured, analysed and exploited.

After working with a Workflow, it is very useful to save the output data in order to further analyse it or create relevant reports, that's when Dataset Save Processors can be used.


Overview

This Processor can be used for saving a new dataset into different file formats or modifying an existing dataset.


Configuration

This processor needs valid input data in order to save it, and it can be configured as follows:

The user is asked to provide the name of the resulting dataset and can select one out of three options: either to create the dataset, append the result to an existing dataset or totally replace an existing dataset.


If a name of an existing dataset is provided, the create option in no longer valid 
the user can have a look at this Dataset and choose between appending or replacing it:

       

Resulting data can be stored in 3 different categories:

  • CSV file
  • Database format
  • Parquet


Advanced Configuration Options

This processor has three optional toggles:

  • Compute the amount of rows written in the Dataset
  • Compute the content length (approximate amount of tokens in the Dataset)
  • Enforce the Schema Match (ensures that the resulting Dataset matches the same schema of the existing dataset to append/replace)


If the "Enforce Schema Match" is enabled => the new and old Datasets must have the same schema/structure (column names/types)

Otherwise an error occurs while executing the Workflow

If this option is disabled => only Warnings will be generated.


Output

This processor will generate a Dataset (according to the configuration made, this Dataset will be created or an existing Dataset will be modified i.e replaced or appended).

It can be helpful to save some parts of the existing data. For example, to save only the required columns to a new Dataset:

when opening the processor after a successful Workflow run, the interface should look something like:

and when clicking on "OPEN DATASET" button, the saved Dataset can be visualized:



Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article