Double Input Query Processor

Modified on Tue, 30 Nov 2021 at 02:53 PM

Overview

The Query processor executes a Spark SQL query statement on the input datasets.

Spark SQL brings native support for SQL to Spark and streamlines the process of querying data stored both in RDDs (Spark’s distributed datasets) and in external sources. 

More information about Spark SQL.


Input

The Query processor operates on two input datasets containing any type of data.


Configuration


NOTE THAT the input tables must be called firstInputTable and secondInputTable, respectively, when calling them in the SQL statement.


Supported SQL features can be found in the Spark SQL documentation.


Output

Once the query is executed, the response can be visualised in the output table.


Example

Workflow


Input data


firstInputTable



secondInputTable



Example Configuration


We use the following SQL statement to query our data set:

SELECT  f.ContactName, f.City, s.OrderDate
FROM firstInputTable f, secondInputTable s
WHERE  f.CustomerID =  s.CustomerID
AND  f.Country <> 'Sweden'

Result



Relevant Articles

Query Processor

Query Helper Processor

Multi-Input Query Processor

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article