TABLE OF CONTENTS
- General Description
- Necessary Rights
- How to Enable Access Logging
- Step-by-step
- Boundaries and current restrictions
This is an advanced feature. To use the functionality efficiently, the user has to be familiar with CSV import.
General Description
In order to get an overview of which user accessed which Data Tables at what time, logging of data accesses is required. This requires a log file to be created for each access, containing the information date and time of access, user ID and Data Table ID. Data access logging then allows reconstruction of accesses in data sets, which could, for example, be important for reasons of IT security.
Necessary Rights
To enable the feature, DevOps support is required. Users can also import the logs back into ONE DATA for further analysis. To enable them to do so, a Super Admin is required.
Users who can enable access logging: DevOps
Users who can enable access log import: Super Admins
How to Enable Access Logging
By default, logging is disabled. It can be turned on by setting the environment variable DATAACCESS_LOGGING_ACTIVATED to true ("true" (string) for Kubernetes).
If access logging is activated, all data accesses are written to a log file. Each day produces its own log file and is archived as a gzip. Archived logs will be deleted automatically after 14 days.
Configure access logging
Use DATAACCESS_LOGGING_PATTERN to change the file pattern. You can for example set yyyy-MM for one file per month, yyyy-ww for one file per calendar week, or yyyy-MM-dd-HH for one file per hour.
Use DATAACCESS_LOGGING_MAXHISTORY to change the delay until archived logs are deleted. The value represents the number of files and is therefore dependent on the pattern. For example, pattern yyyy-MM and maxHistory 6 will keep logs for 6 months, pattern yyyy-MM-dd-HH and maxHistory 6 will keep logs for 6 hours.
What accesses are actually logged
All accesses to a Database Connection, Filesystem Connection, Data Tables (from upload) and Virtual Data Tables are logged. Failed attempts, for example due to missing access rights, are also logged.
The table below shows all verified scenarios. The access to Data Tables via Apps is also covered.
Data Table Type | Open (DT, Statistics, Sample, Apps) | Usage in Workflow | Analysis Autorization Preview |
---|---|---|---|
Virtual Data Table | yes | yes | not available |
Filesystem Connection | yes | yes | yes |
Database Connection | yes | yes | yes |
Data Tables (from upload) | yes | yes | yes |
Step-by-step
In this section, we will explain how to use the functionality step by step.
1. Set environment variables. This step needs to be done by DevOps.
a. Depending on your setup, set the environment variables accordingly. Note that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #Docker onedata-server: [ ... ] environment: [ ... ] - "DATAACCESS_LOGGING_ACTIVATED=true" # optional, default "false" - "DATAACCESS_LOGGING_PATTERN=yyyy-MM-dd" # optional, default "yyyy-MM-dd" - "DATAACCESS_LOGGING_MAXHISTORY=14" # optional, default "14" # Kubernetes onedata: version: server: random [ ... ] environment: server: [ ... ] DATAACCESS_LOGGING_ACTIVATED: "true" # optional, default "false" DATAACCESS_LOGGING_PATTERN: "yyyy-MM-dd" # optional, default "yyyy-MM-dd" DATAACCESS_LOGGING_MAXHISTORY: 14 # optional, default "14" |
2. Check that logback.xml is up-to-date. This step needs to be done by DevOps.
a. Location and other logging options are configurable in the ${onedata.root}/logback.xml
b. Make sure that the log outpout folder persists (for example, docker mount volume)
c. There are two possibilities to get the logback.xml:
i. Use the provided logback.xml and replace yours. Make sure you have no custom local changes.
ii. Merge the following appender and logger into your logback.xml:
4. (User) Import logs via merged file Connection
a. Create new file Connection (part of the FILESYSTEM Connection Feature)

b. Set merge rule
[
{
"mergeRule": "data_access.*",
"fileName": "mergedAccessLog.csv"
}
]
c. Create Data Table from mergedAccessLog.csv and use as desired
Boundaries and current restrictions
Boundaries
- Logs for Data Connection Load Processor (reading data from external database without persisting the data in ONE DATA) is not included for now. That means we only can track if the Connection is used, not a certain table.
- With only data access in ONE DATA being tracked, REST API calls in the Flexible API Processor are also not covered. These should be covered by other means.
Current restrictions
- Save access to concrete line in Data Table.
- Make sure that logs are saved in case instance is re-started. This has to be configured by DevOps by mounting persistent storage in the container.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article