Glossary

Data Lineage

What is Data Lineage?

Data lineage tracks the movement of data over time from the source system to different forms of persistence and transformations and ultimately to data’s consumption by an application or analytics model. A visual representation can provide transparency to the flow of data from its source systems through transformation, processing, and aggregation steps and into analysis, allowing data engineers to drill down on specific details or check versions and changes over time.

 

Why is Data Lineage Important?

Data lineage provides visibility to changes in the formats and values. One important outcome is supporting explainability for business decisions that might be driven by that data. With transparency to the sources and transformations, human experts can gain confidence in machine learning model predictions to facilitate change management. Data lineage is also useful for testing and debugging models to verify the correctness of the predictions or conclusions, and provide an audit trail for compliance, especially in regulated businesses.

 

Data Lineage in the C3 AI Platform

The C3 AI® Type System offers model-driven architecture that delivers a well-orchestrated flow of data from internal and external data sources through canonical transforms into simple and compound metrics, which then become features for ML model development. The data lineage features within the C3 AI Platform provide a visual interface for tracking the movement of data, drilling down into specific details, viewing specific expressions for transformations, and seeing the changes in values over time. Data lineage also supports the interpretability framework and the creation of evidence packages to support compliance and audit activity that can verify insights provided by models under development or in deployment.