Introduction
It was requested to provide the functionality to only retrieve selected columns from a Data Table. The "/data/content" endpoint allows passing an array of transformations to the request which then returns the respective result.
The following transformations are available:
- SqlQuery
- ColumnsSelection
- Aggregation(count,sum,avg,min,max,group)
- ColumnsSorting
- Filter (Boolean, Integer, DateTime, Double, String)
- Join
The data_transformers parameter is available in the following API methods:
Usage
In general the data_transformers parameter awaits a list of dictionaries. Its elements represent the respective transformations.
### Usage with written-out dictionary
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[
{
"type": "ColumnsSelection",
"columns": ["Zip"]
},
{
"type": "ColumnsSorting",
"sortConfigs": [
{
"attributeName": "Zip",
"sortOrder": "DESC"
}
]
}
])
Helper Functions
To avoid the cumbersome process of writing these configurations the SDK now provides functions that return the respective dictionary.
Currently, there is no helper function for the join transformation until requested.
### Usage with provided functions
from onedata.datatables.transformers import select, sort, desc
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[select("Zip"), sort(desc("Zip"))]
)
To be able to use those functions they first have to be imported from onedata.datatables.transformers.
This file offers functions for a guided creation of transformations that can be applied to data tables retrieved by e.g. OneDataDataTableApi.get_content. It allows the user to generate the transformations' parameter assisted by the SDK and makes it more convenient to use. The transformations' parameter in the respective API method expects a dict and therefore all helper methods return formatted dicts for each available transformation.
- select(*columns: str)
- sql(sql: str)
- filter(column_name: str, column_type: Union[RepresentationType, str], search: Union[str, None] = None, values: Union[list, None] = None, range_min=None, range_max=None)
Note 1: column_type either accepts an enum of type RepresentationType or a string value of "bool", "str", "int", "double" or "datetime"
Note 2: If the filter values are set to None or the array is empty, all results will be returned. - sort(*config: dict) (argument position = sort priority)
- asc(column_name: str)
- desc(column_name: str)
- aggregation(*aggr_config: dict)
- sum(column_name: str)
- count(column_name: str)
- avg(column_name: str)
- min(column_name: str)
- max(column_name: str)
- group(column_name: str)
Usage Examples
### sql
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[sql("SELECT Zip FROM inputTable")])
print(f"dt sql: {dt.records}")
### select
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[select("Zip", "PDT")])
print(f"dt select 'Zip' and 'PDT': {dt.records}")
### aggregation
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[aggregation(sum("Zip"))])
print(f"dt SUM 'Zip': {dt.records}")
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[aggregation(avg("Zip"))])
print(f"dt AVG 'Zip': {dt.records}")
### sort
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[sort(asc("Zip"))])
print(f"dt sort ASC 'Zip': {dt.records}")
### filter
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[
filter("Zip", "int", None, [94041, 94063])])
print(f"dt filter 'Zip': {dt.records}")
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article