With increasing volumes of marine data being acquired the time intensive manual processing and quality control of the data is becoming less practical, is difficult to resource and introduces significant bottlenecks and quality control issues. To support a broader quality-controlled data and evidence base for national marine programmes the ability to process data efficiently is needed using new technical capabilities and approaches.
The use of new tools and processes to automate parts of the data processing pipeline allows for much greater efficiency, with optimise data processes allowing scientists to be redirected to higher value adding activities.
These steps can also result in enhanced quality control being applied to data, identifying outliers or issues earlier than would otherwise be the case. This reduces the need to revise downstream analyses and resulting changes to data or information products or services.
The implementation of data processing notebooks, quality checking software and related data extract, transform and load (ETL) tools to process, quality control and standardise storage of oceanographic biological and chemistry data would significantly reduce the effort required to process these data, enhance the quality control of the data, and reduce the lead time in making these data available for analysis and use.
The optimization of data process can also support cross-team multi-disciplinary work, applying consistent practice across teams and domains e.g. oceanographic data being made available to support chemistry, etc. This approach can be extended to processing a range of datasets with relevance to MSFD, MSP, food safety, climate and ORE programmes.
This project will implement:
- Data processing notebooks, quality control software and related ETL tools will be implemented on an operational basis to support more efficient processing, quality assurance and storage of data being acquired through new or targeted existing programmes.
- At least 5 data processes relevant to marine monitoring and assessment, climate analysis. ocean renewable energy, and / or aquaculture and food safety will be identified and reviewed in detail with regard to data processing and quality control optimisation e.g. check that all parameters expected have been received and have values that are within sane limits, identify outliers, markup data with specific metadata, etc.
- Build QC pipelines using analysis tools and practice (including new tools for outlier detection) to develop trusted models which are responsive and efficient.
- Apply the data processing toolset and process optimisation steps to a number of prioritised data processes, to remove manual overheads and to make the data available much more quickly for analysis and publication.
Data processing tools for marine observations to support a broader quality-controlled data evidence base for marine monitoring and assessment. These will be deployed to include marine observations and field sampling data, with support for the remote input of data, first level QC and ingestion into secure data storage of data from a range of data sources including for physical and chemical data.
This project will run for 4 years from 2023 to 2026.
- Improve the efficiency of data quality control, transform and load processes enabling these to be more widely applied to new data acquisition with resulting quality, efficiency and responsiveness gains.
- Make data more readily available for analysis and reporting by reducing processing bottlenecks for data which has been acquired but requires extensive manual processing intervention before it can be analysed or shared.
- Reduce overheads on scientific staff, allowing them to review data quality much more efficiently allowing them to focus on higher value-adding activity.
- Increase the quality of data early in the cycle, removing resulting rework or the reissue of data or information products or services.
Contact: eoin.ogrady@marine.ie
- Operational data processing tools, to include notebooks, quality control software and related ETL tools, will be implemented and available on an operational basis to support data processing, quality assurance and storage of newly acquired marine monitoring data.
- Data quality control models for efficient and responsive data quality control will be available to apply to new data processes. The data processing tool set will support the visual identification of errors for experts to review and / or automate quality flagging of data.
- Data loading tools will be available to ensure that data can be loaded into consistent managed data stores, with the resulting data available for analysis and publication.
- The data processing tools and process will be applied for at least 3 key data processes relevant to MSP, MSFD, DCF, Ocean Renewable Energy, Climate Analysis, and / or aquaculture and food safety.