Provided Methods
- Create a Data Table in a project
- Get Data Table content (by name/id)
- Modify Data Table Content
- Create from pandas dataframe
Method documentation
All the examples expect that there is already a OneDataApi object initialized and assigned to a variable called "onedata_api".
### Creation of the ONE DATA Api ###
onedata_api = OneDataApi(base_url=base_url, username=user, password=pw, verify=False, timeout=999)
create
Description
Creates a data table. Optional request options (request_transformers, response_transformers, timeout, verify, sleep_after_response_millis, deserializer) can be specified.
Returns: UploadedDataResponse
Parameters
Property | Type | Required | Default | Description |
---|---|---|---|---|
name | str | true | - | name of the data table |
headers | [str] | true | - | list of strings, defining the headers. E.g. [{"a": 1, "b": "asd"}] where a and b are headers |
content | [dict] | true | - | list of dictionaries, defining the content. Each dictionary must have all attributes (headers) |
project_id | Union[str, uuid.UUID] | true | - | UUID if the project |
locked | DatasetLock | true | - | defines the lock state of the data table |
storage_format | DataTableStorageType | true | - | defines the storage type of the data table |
create_fallback | bool | false | True | defines whether a fall back data table should be created (contains all malformed entries) |
connection_schema | str | false | None | Data base schema which holds the relevant table |
target_project_id | Union[str, uuid.UUID] | false | None | ID of the projet where the new data table should be created |
Usage
### Creation of a datatable ###
response: UploadedDataResponse = onedata_api.datatables.create(name="dt via sdk", headers=["a", "b"],
getContent
Description
Retrieves the paginated content of a DataTable with the given data id XOR frt id or by name and project id. Optional request options (request_transformers, response_transformers, timeout, verify, sleep_after_response_millis, deserializer) can be specified.
For retrieving the content as a pandas.Dataframe, please specify the custom deserializer 'ResponseToPandasDataFrameDeserializer'. All officially documented functionality of a pandas.Dataframe can then be used (https://pandas.pydata.org/pandas-docs/stable/reference/frame.html).
Returns: DataTableContent
Parameters
Property | Type | Required | Default | Description |
---|---|---|---|---|
data_id | Union[str, uuid.UUID] | false | None | ID of the data table. Can not be used with frt_id. |
frt_id | Union[str, uuid.UUID] | false | None | FRT ID of a data table. Can not be used with data_id |
name | str | false | None | name of the data table |
project_id | Union[str, uuid.UUID] | false | None | Id of the project the data tables belongs to. When get by id, project_id will be ignored. |
page | int | false | 1 | the page that should be fetched |
pagination_size | int | false | 10000 | the number of entries per page which should be fetched |
data_transformers | [dict] | false | None | list of transformation dicts applied to retrieved data table |
force_exact_match | bool | false | True | Search for exact match, False: Data table includes the name |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
Usage
### Retrieval of datatable content ###
datatable: DataTableContent = onedata_api.datatables.get_content(name=datatable_name,
getContentSamples
Description
Retrieves samples from the content of a DataTable with the given data id/frt id or by name and project id. Optional request options (request_transformers, response_transformers, timeout, verify, sleep_after_response_millis, deserializer) can be specified.
Returns: DataTableContent
Parameters
Property | Type | Required | Default | Description |
---|---|---|---|---|
data_id | Union[str, uuid.UUID] | false | None | ID of the data table. Can not be used with frt_id |
frt_id | Union[str, uuid.UUID] | false | None | FRT ID of a data table. Can not be used with data_id |
name | str | false | None | name of the data table |
project_id | Union[str, uuid.UUID] | false | None | Id of the project the data tables belongs to. When get by id, project_id will be ignored. |
sample_count | int | false | 20 | the number of samples which should be retrieved. The server does not guarantee that the exact number is returned, but it tries to stay as close as possible. |
data_transformers | [dict] | false | None | list of transformation dicts applied to retrieved data table |
force_exact_match | bool | false | True | Search for exact match, False: Data table includes the name |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
Usage
### Retrieval of datatable content samples ###
datatable: DataTableContent = onedata_api.datatables.get_content_samples(data_id=datatable_id, sample_count=100)
replaceContent
Description
Replaces the entire content of the data table with the given data id/frt id or by name and project id with the new_data.
The schema must match the existing one!
Optional request options (request_transformers, response_transformers, timeout, verify, sleep_after_response_millis, deserializer) can be specified.
Returns: None
Parameters
Property | Type | Required | Default | Description |
---|---|---|---|---|
data_id | Union[str, uuid.UUID] | false | None | ID of the data table. Can not be used with frt_id |
frt_id | Union[str, uuid.UUID] | false | None | FRT ID of a data table. Can not be used with data_id |
name | str | false | None | name of the data table |
project_id | Union[str, uuid.UUID] | false | None | Id of the project the data tables belongs to. When get by id, project_id will be ignored |
force_exact_match | bool | false | True | Search for exact match, False: Data table includes the name |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute on first match |
new_data | [dict] | false | None | list of dictionaries, defining the new data. E.g. [{"a": 1, "b": "asd"}] where a and b are headers |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
Usage
### Replacement of datatable content###
onedata_api.datatables.replace_content(data_id=id, new_data=[{"a": 2, "b": "newer text"}])
updateContent
Description:
Updates the rows given as updated of the data table with the given data id/frt id or by name and project id. Optional request options (request_transformers, response_transformers, timeout, verify, sleep_after_response_millis, deserializer) can be specified.
Returns: None
Parameters
Property | Type | Required | Default | Description |
---|---|---|---|---|
data_id | Union[str, uuid.UUID] | false | None | ID of the data table. Can not be used with frt_id |
frt_id | Union[str, uuid.UUID] | false | None | FRT ID of a data table. Can not be used with data_id |
name | str | false | None | name of the data table |
project_id | Union[str, uuid.UUID] | false | None | Id of the project the data tables belongs to. When get by id, project_id will be ignored |
force_exact_match | bool | false | True | Search for exact match, False: Data table includes the name |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute on first match |
updated | [dict] | false | None | list of dictionaries, defining the rows to update. E.g. [{"a": 1, "b": "asd"}] where a and b are headers |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
Usage
content: DataTableContent = onedata_api.datatables.get_content(data_id=dt.id)
print(f"content: {content.records}")
###
# content: [{'a': 1, 'b': 'asd', '__ROW_ID__': '1f088a87-67e9-4244-a7ca-d5159d2f1c8d'}, {'a': 1, 'b': 'asd', '__ROW_ID__': '936e46fd-59c0-4d93-8575-daad2310f398'}]
###
updated_record = content.records[1]
updated_record["b"] = "changed"
### Update of datatable content###
onedata_api.datatables.update_content(data_id=dt.id, updated=[updated_record])
content: DataTableContent = onedata_api.datatables.get_content(data_id=dt.id)
print(f"content: {content}")
###
# content: [{'a': 1, 'b': 'changed', '__ROW_ID__': '936e46fd-59c0-4d93-8575-daad2310f398'}, {'a': 1, 'b': 'asd', '__ROW_ID__': '1f088a87-67e9-4244-a7ca-d5159d2f1c8d'}]
###
deleteContent
Description
Deletes the rows with ids given as deleted from the data table with the given data id/frt id or by name and project id. Optional request options (request_transformers, response_transformers, timeout, verify, sleep_after_response_millis, deserializer) can be specified.
Returns: None
Parameters
Property | Type | Required | Default | Description |
---|---|---|---|---|
data_id | Union[str, uuid.UUID] | false | None | ID of the data table. Can not be used with frt_id |
frt_id | Union[str, uuid.UUID] | false | None | FRT ID of a data table. Can not be used with data_id |
name | str | false | None | name of the data table |
project_id | Union[str, uuid.UUID] | false | None | Id of the project the data tables belongs to. When get by id, project_id will be ignored |
force_exact_match | bool | false | True | Search for exact match, False: Data table includes the name |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute on first match |
deleted | [Union[str, uuid.UUID]] | false | None | list of row ids, defining the rows that should be deleted |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
Usage
content: DataTableContent = onedata_api.datatables.get_content(data_id=dt.id)
print(f"content: {content.records}")
###
# content: [{'a': 1, 'b': 'asd', '__ROW_ID__': 'bf60f922-129d-4de9-95f1-a90b83ed34c1'}, {'a': 1, 'b': 'asd', '__ROW_ID__': 'fd6f7053-5438-463f-9ad5-a3204321678c'}]
###
### Deletion of datatable content###
onedata_api.datatables.delete_content(data_id=dt.id, deleted=[content.records[0]["__ROW_ID__"]])
content: DataTableContent = onedata_api.datatables.get_content(data_id=dt.id)
print(f"content: {content.records}")
###
# content: [{'a': 1, 'b': 'asd', '__ROW_ID__': 'fd6f7053-5438-463f-9ad5-a3204321678c'}]
###
addContent
Description
Creates a DataTable with the given name in the given data id/frt id or by name and project id. Optional request options (request_transformers, response_transformers, timeout, verify, sleep_after_response_millis, deserializer) can be specified.
Returns: None
Parameters
Property | Type | Required | Default | Description |
---|---|---|---|---|
data_id | Union[str, uuid.UUID] | false | None | ID of the data table. Can not be used with frt_id |
frt_id | Union[str, uuid.UUID] | false | None | FRT ID of a data table. Can not be used with data_id |
name | str | false | None | name of the data table |
project_id | Union[str, uuid.UUID] | false | None | Id of the project the data tables belongs to. When get by id, project_id will be ignored |
force_exact_match | bool | false | True | Search for exact match, False: Data table includes the name |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute on first match |
added | [dict] | false | None | list of dictionaries, defining the rows that should be added |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute first match |
Usage
content: DataTableContent = onedata_api.datatables.get_content(data_id=dt.id)
print(f"datatable: {content.records}")
###
# datatable: [{'a': 1, 'b': 'some text', '__ROW_ID__': 'b513b7aa-5e99-43e3-a937-df2c854bf0eb'}]
###
add_row_content = {"a": 100, "b": "added text"}
### Addition of datatable content###
onedata_api.datatables.add_content(ata_id=dt.id, dded=[add_row_content])
content: DataTableContent = onedata_api.datatables.get_content(data_id=dt.id)
print(f"content: {content.records}")
###
# content: [{'a': 100, 'b': 'added text', '__ROW_ID__': '791a985e-d6bc-40e7-8681-0fac3229442b'}, {'a': 1, 'b': 'some text', '__ROW_ID__': 'b513b7aa-5e99-43e3-a937-df2c854bf0eb'}]
###
patchContent
Description
Updates the rows given as updated, deletes the rows with ids given in deleted and adds new rows given in added for a data table with the given data_id. Optional request options (request_transformers, response_transformers, timeout, verify, sleep_after_response_millis, deserializer) can be specified.
Returns: None
Parameters
Property | Type | Required | Default | Description |
---|---|---|---|---|
data_id | Union[str, uuid.UUID] | false | None | ID of the data table. Can not be used with frt_id |
frt_id | Union[str, uuid.UUID] | false | None | FRT ID of a data table. Can not be used with data_id |
name | str | false | None | name of the data table |
project_id | Union[str, uuid.UUID] | false | None | Id of the project the data tables belongs to. When get by id, project_id will be ignored |
force_exact_match | bool | false | True | Search for exact match, False: Data table includes the name |
force_distinct | bool | false | True | Raise exception if name is ambiguous, False: execute on first match |
deleted | [Union[str, uuid.UUID]] | false | None | list of row ids, defining the rows that should be deleted |
updated | [dict] | false | None | list of dictionaries, defining the rows that should be updated |
added | [dict] | false | None | list of dictionaries, defining the rows that should be added |
Usage
content: DataTableContent = onedata_api.datatables.get_content(data_id=dt.id)
print(f"row1: {datatable.records[0]}")
print(f"row2: {datatable.records[1]}")
print(f"row3: {datatable.records[2]}")
###
# row1: {'a': 1, 'b': 'some text', '__ROW_ID__': '39eb1921-2d26-4c78-9fa6-592a98cfbe34'}
# row2: {'a': 1, 'b': 'some text', '__ROW_ID__': '69e958dc-6476-4ae1-bf2f-154b99c0d782'}
# row3: {'a': 1, 'b': 'some text', '__ROW_ID__': '425d2403-fd52-4514-b6e5-4802ed1d74a7'}
###
print("-----------DataTablePatchContent-----------")
onedata_api.datatables.patch_content(data_id=id,
added=[{"a": 2, "b": "added"}],
deleted=[row1["__ROW_ID__"]],
updated=[{"a": 3, "b": "updated", "__ROW_ID__": row2["__ROW_ID__"]}])
datatable: DataTableContent = onedata_api.datatables.get_content(data_id=id)
print(f"row1: {datatable.records[0]}")
print(f"row2: {datatable.records[1]}")
print(f"row3: {datatable.records[2]}")
###
# row1: {'a': 1, 'b': 'some text', '__ROW_ID__': '425d2403-fd52-4514-b6e5-4802ed1d74a7'}
# row2: {'a': 2, 'b': 'added', '__ROW_ID__': 'ef7400d9-df9f-4174-b92c-353a8bbf081c'}
# row3: {'a': 3, 'b': 'updated', '__ROW_ID__': '69e958dc-6476-4ae1-bf2f-154b99c0d782'}
###
createFromPandas
Description
Creates a DataTable with the given name in the given project. Optional request options (request_transformers, response_transformers, timeout, verify, sleep_after_response_millis, deserializer) can be specified.
Returns: UploadedDataResponse
Parameters
Property | Type | Required | Default | Description |
---|---|---|---|---|
name | str | true | None | name of the datatable |
project_id | Union[str, uuid.UUID] | true | None | project the datatable is created in |
data | pandas.DataFrame | true | None | pandas dataframe to create a datatable from |
locked | DatasetLock | false | DatasetLock.Open | defines the lock state of the data table |
storage_format | DataTableStorageType | false | DataTableStorageType.PARQUET | defines the storage format of the data table |
create_fallback | bool | false | True | defines whether a fall back data table should be created (contains all malformed entries) |
deleted | [Union[str, uuid.UUID]] | false | None | list of row ids, defining the rows that should be deleted |
updated | [dict] | false | None | list of dictionaries, defining the rows that should be updated |
added | [dict] | false | None | list of dictionaries, defining the rows that should be added |
Usage
d = {'col1': [1, 2], 'col2': [3, 4]}
pandas_dataframe: DataFrame = pandas.DataFrame(data=d)
print(f"pandas_dataframe: {pandas_dataframe}")
###
# pandas_dataframe:
# col1 col2
# 0 1 3
# 1 2 4
###
### Creation of datatable by pandas t###
response = onedata_api.create_from_pandas(name=name, project_id=project_id, data=pandas_dataframe, locked=locked,
storage_format=storage_format, create_fallback=create_fallback)
content: DataTableContent = onedata_api.get_content(data_id=response.data[0].id)
print(f"content: {content.records}")
###
# content: [{'col1': 1, 'col2': 3, '__ROW_ID__': '8609f7d5-bf29-4ffa-aba6-a61aa9342156'}, {'col1': 2, 'col2': 4, '__ROW_ID__': '5cb17b5c-ff22-4ad6-b842-790af14a30e4'}]
###
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article