Overview
The Multi-Input Query Processor can be used to execute a Spark SQL query statement on multiple input datasets, similar to the Double Input Query Processor.
Spark SQL brings native support for SQL to Spark and streamlines the process of querying data stored both in RDDs (Spark’s distributed datasets) and in external sources.
More information about Spark SQL.
Input
The processor works with three input datasets that contain any type of data. If you need less input ports for a query, you can either connect an empty input table to one of the nodes of the processor, or you can use the Double Input Query Processor.
Configuration
In the processor configuration, the SQL statement to execute, and the aliases of the input datasets can be defined.
Note that the aliases of the input datasets also have to be defined within the SQL statement to be able to use them.
Output
The processor output is the result of the SQL statement specified in the configuration.
Example
In this example we want to join three tables with product, customer and transaction data to see which customer bought which product.
Input
Workflow
In this workflow we just take three Custom Input Tables for our input. The result is stored in a Filterable Result Table.
Configuration
In the configuration we insert our SQL query which is executed on the input datasets. When datasets are connected to the processor, the column names of inputs are also shown to make it easier to write the query.
Result
Related Articles
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article