Automating data quality monitoring at scale through LLMs and causal inference

Reference number
Coordinator	Validio AB
Funding from Vinnova	SEK 1 950 000
Project duration	May 2024 - March 2025
Status	Completed
Venture	Ground-breaking technology solutions
Call	Groundbreaking and scalable technology solutions in 2024

Important results from the project

Yes, the project largely met its goals by delivering significant advancements in automated data quality monitoring. Achievements include enhanced ML models for improved anomaly detection and streamlined user workflows ("one-click" setup), boosting setup efficiency for data quality checks and thereby data reliability. Other important results include a major optimisation of the code to enable deeper root cause analysis and novel LLM-based methods for outlier detection and forecasting.

Expected long term effects

The project´s long-term effects include democratizing data quality management, making it accessible to more diverse users. It promotes sustainability by identifying and removing false and unused data, reducing storage, compute, and carbon emissions. The project seeks to enhance data reliability and reduce bias in data-driven decisions, contributing to fairer outcomes (UN SDGs 5, 10, 12, 13). Additionally, it aims to strengthen Validio´s position as a Nordic and European AI leader in this field.

Approach and implementation

The 10-month Agile project was planned in 5 work packages: AI R&D, integration, UX, testing, and management. Initially focused on custom LLMs for auto-setup/RCA, it was adapted based on pilot feedback. Execution prioritized optimizing existing ML models for anomaly detection and enhancing user workflows (one-click setup, bulk check creation), while LLM research continued for forecasting and description generation. Resource allocation increased in the final months to complete work packages.

External links

The project description has been provided by the project members themselves and the text has not been looked at by our editors.

Last updated 18 April 2025

Reference number 2024-00505

Automating data quality monitoring at scale through LLMs and causal inference

Important results from the project

Expected long term effects

Approach and implementation

External links

Contact us

Follow us

About us

Applications and reports