Automating data quality monitoring at scale through LLMs and causal inference
Reference number | |
Coordinator | Validio AB |
Funding from Vinnova | SEK 1 950 000 |
Project duration | May 2024 - March 2025 |
Status | Completed |
Venture | Ground-breaking technology solutions |
Call | Groundbreaking and scalable technology solutions in 2024 |
Important results from the project
Yes, the project largely met its goals by delivering significant advancements in automated data quality monitoring. Achievements include enhanced ML models for improved anomaly detection and streamlined user workflows ("one-click" setup), boosting setup efficiency for data quality checks and thereby data reliability. Other important results include a major optimisation of the code to enable deeper root cause analysis and novel LLM-based methods for outlier detection and forecasting.
Expected long term effects
The project´s long-term effects include democratizing data quality management, making it accessible to more diverse users. It promotes sustainability by identifying and removing false and unused data, reducing storage, compute, and carbon emissions. The project seeks to enhance data reliability and reduce bias in data-driven decisions, contributing to fairer outcomes (UN SDGs 5, 10, 12, 13). Additionally, it aims to strengthen Validio´s position as a Nordic and European AI leader in this field.
Approach and implementation
The 10-month Agile project was planned in 5 work packages: AI R&D, integration, UX, testing, and management. Initially focused on custom LLMs for auto-setup/RCA, it was adapted based on pilot feedback. Execution prioritized optimizing existing ML models for anomaly detection and enhancing user workflows (one-click setup, bulk check creation), while LLM research continued for forecasting and description generation. Resource allocation increased in the final months to complete work packages.