Overview
The Lag Generation Processor creates a lag on a selected column based on a time interval. The generation of lags is often necessary for time series analyses (e.g., ARMA or ARIMA models).
Input
The processor needs a sequentially ordered timestamp variable and a column containing ratio-scaled observations.
Configuration
Column For Lag / Lead Generation
Column for which the lag / lead values should be generated. This column may have any type, no restrictions apply here.
Do Simple Row-Based Lag / Lead Generation Without The Need For Equidistant Time-Series
When set, the column selected in "Sorting column" is only used for sorting the input data before lag / lead generation and does not need to be a datetime column, but can be any sortable column.
If no sorting column is given, we assume the input data is already ordered. Each lag / lead is directly referring to its preceding / succeeding row(s).
Sorting Column
This configuration option can be used in three different ways:
- If time-based lag / lead generation is done (No row-based lag / lead generation, equidistance of time-stamps is mandatory), the chosen column needs to contain values of type datetime.
- If row-based lag / lead generation is done (check the row-based lag / lead generation option) this option may be not set, then we assume the incoming data is sorted.
- The option is set to a column that is used for sorting the dataset before applying the lag / lead generation (doesn't need to be a datetime column, but can be any column with scale type interval or ratio).
Time Interval
Interval Multiplicator
When using time-based lag / lead generation, the chosen interval can be further customized (E.g. by using the interval seconds and the value 2 here, we have a time-lag of 2 seconds). When using row-based lag / lead generation, this option is used as a span between the value and its lagged value, e.g. by setting 2 here and lag generation, the first lag of a row is not from the previous row, but from the row before.
The default value for this option is 1.
Amount Of Lags / Leads
Define the amount of lags / leads to create.
Lag / Lead Generation
- lag - values for generated column lags are taken from the previous rows of the dataset and new column names have a “_LAG” suffix,
- lead - values for generated column leads are taken from the next rows of the dataset and new column names have a “_LEAD” suffix.
Exploration
- Delete edge rows - Only keep rows in the result, for which the lags can be calculated.
- Pad with NULL - Keep the edge rows and set their lags to NULL.
- Fill with first / last value - Keep the edge rows and set their lags to first lag value.
Columns For Grouping
Output
The dataset containing new columns with the leading/lagged observations.
Example
In this example we want to apply a lag to a dataset.
Input
As input we use a small sample dataset (24 rows) that contains a timestamp, a character and a corresponding number in each row. Here is a snippet of it:
time | character | number |
00:00 | a | 1 |
01:00 | b | 2 |
02:00 | c | 3 |
03:00 | d | 4 |
04:00 | e | 5 |
The whole dataset is attached at the the end of the article.
Workflow
In this workflow we use a Data Table Load Processor to load the dataset, then we convert the strings in the "Time" Column to actual datetime values with a Data Type Conversion Processor. The output of the conversion is passed to a Result Table and to the Lag / Lead Processor, which creates the lag and then saves it again to a Result Table.
Configuration
The example configuration has the following settings:
- Column For Lag / Lead Generation: number
- Sorting Column: time
- Time Interval: "Hour"
- Amount of Lags / Leads: 3
For the remaining options the default value is used.
Result
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article