XML / JSON Parsing Processor

Modified on Mon, 04 Apr 2022 at 01:38 PM


Note that versions 1.1 and 1.2 of this Processor are deprecated. For a better experience use the version 1.3
The linked section explains the main upgrades and improvements in version 1.3


1. Processor v1.1


2. Processor v1.2


3. Processor v1.3


Overview

Creates an output dataset (XML/JSON format) from a single text column. The content of the output dataset is given by XPATH /JSONPath expressions, where each expression creates one column.

XPATH (JSONPath) uses a non-XML (non-JSON) syntax to provide a flexible way of addressing different parts of an  XML (JSON) document.


More information about XPATH and JSONPath.



Input

The input table should contain the JSON or XML text to parse. It can have multiple columns. The column to parse is specified by the Processor.


Note that One column should contain the same text format.



Configuration

Output

The output table contains the result corresponding to the XPATH / JSONPATH expressions specified in the Processor configuration. Each expression result is stored in a separate column.


Example

Workflow

Example Input

The input table is composed of two columns:

  • The first column contains two JSON Objects targeted by the first parser.
  • The second column contains an XML format text targeted by the second parser.

The used code is shown bellow:


First JSON Object

{ 
   "title":{ 
      "text":"My Chart"
   },
   "subtitle":{ 
      "text":"My Untitled Chart"
   },
   "chart":{ 
      "type":"line",
      "inverted":false
   },
   "series":[ 
      { 
         "name":"Y"
      },
      { 
         "name":"Z"
      }
   ]
}


Second JSON Object

{ 
   "id":"timeschartkpiTimesElement",
   "type":"highcharts",
   "tempCustomSql":"SELECT * FROM InputTable",
   "source":"casetableDatasource",
   "config":{ 
      "title":{ 
         "text":""
      }
   }
}

XML

<?xml version="1.0" encoding="UTF-8"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Work on issue QA-100</body>
</note>

Example Configuration

As mentioned above we first add the objects above into the Custom Input Table Processor as will be shown. WE used two parsers respectively for both the JSON and XML columns, and added two paths in each parser resulting in two output columns.



JSON Parsing Processor


In this example we set up our Custom Input Table with the two JSON example objects we provide above. the configuration is shown below:

We seek to retrieve all instances under "title" contained in both JSON objects, and so we use the following JSON objects for our parsing processor:


We obtain the following output:



XML Parsing Processor

In this example we again paste the XML object into the Custom Input Table processor as shown below:



We seek to select the recipient of the letter embodied by the XML object (denoted by "to"). We use the following configuration:



We obtain the following output:





XML / JSON Parsing v1.2 Processor

Input Data

For the upcoming examples, we'll be using the following input JSON code

{
   "totalItems":28228,
   "page":0,
   "ipp":50,
   "paginated":true,
   "items":[
      {
         "foo":{
            "foo2":"bar"
         }
      },
      {
         "foo":{
            "foo2":"bar2"
         }
      },
      {
         "foo3":{
            "bar3":"bar4"
         }
      }
   ]
}


Improvements

Remove Unnecessary Quotes

In case the result is a single string, the Processor removes the unwanted quotes.

The following figures show the Processor configuration for the JSON path expression and the results in both versions 1.1 and 1.2


Add Null to Empty Cells

If a JSONPATH query does not match and has no result, the Processor will add NULL into the cell instead of dropping the whole row


This upgrade can help overcome multiple issues faced with the old version of the Processor.

Version 1.1 fails to handle cases where the column containing JSON had missing values (empty strings, NULL). It requires that all XPATH / JSONPATH expressions specified in the Processor configuration must return a non-empty result in every row of the input column.
If this condition is not fulfilled, the Processor fails to execute.

The following example shows a successful case using version v1.2



In this same example, version 1.1 of the Processor fails to execute. 


XML / JSON Parsing v1.3 Processor

Input Data

For the upcoming examples, we'll be using the following input JSON code

Improvements

Don't drop rows when referencing empty arrays

In case you are referencing an empty array with the used jsonpath expressison, the previous versions of the processor dropped the entire row.

The updated version v1.3 no longer drops the entire row, but provides a NULL value as a result of the query.

The follwing figures show the processor configuration for the JSON path expression and the results in both versions 1.1/1.2 and 1.3.



Example Configuration




Output Processor v1.2



Output Processor v1.3




Optional processor result

The creation of the processor result is optional and disabled by default within the updated processor. As the processor result can be in the range of megabytes for broad rows, it its usually not used. Therefore, with disabled processor results, the payload size of the transmitted results are reduced and the execution performance is slightly improved.




Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article