There's a lot of talk about getting data โin orderโ before doing AI, including traditional Data Quality (DQ) initiatives. But from our experiences delivering AI-driven Data Products, we need a fundamental shift in how we think about Data Quality in the AI world.
TL;DR: You donโt need perfect data to build highly accurate AI models. In fact, weโve "actually in real-life, for a client" built an effective AI model on 75K rows where 30K were garbage.
AI and Data Quality: Flipping the Script
In traditional data management, DQ is about cleaning, structuring, and validating data before itโs used. This makes sense for BI, reporting, and systems of record.
But AI works differently. It doesnโt need "perfect" dataโit just needs enough statistically significant data points to identify patterns.
The key is:
โ
Rapidly identifying the right data (even if itโs messy)
โ
Iterating, testing, and refining in real-time
โ
Understanding when data needs augmentation, labeling, or restructuring
โ
Having access to raw data, not just "cleansed" data thatโs lost critical context
Traditional DQ approaches slow AI innovation because they assume we know what data is needed upfront. But in AI, we often donโt know until we start working with it.
Shifting from โFix It Upstreamโ to โWork With It Iterativelyโ
Instead of spending months cleaning data before testing an AI model, we should:
๐น Work directly with the use case โ Start from the business problem (Top-Down approach) and find the data that supports it.
๐น Analyze the data statistically, not row-by-row โ DQ isnโt about fixing individual bad records; itโs about understanding the datasetโs overall patterns.
๐น Adapt in real-time โ Change the model, get new data, synthesize missing data, or iterate based on what you discover.
๐น Retain raw data access โ AI models need the full picture, including the structure of messy data. Passing data through traditional pipelines often removes valuable context.
This iterative AI-driven approach has massive implications not just for DQ but for how organizations structure their entire data strategy.
Data Quality Is Contextual
Data Quality isnโt about some universal gold standardโitโs relative to the job at hand.
Whatโs considered "bad" in one context (e.g., structured reports) might be useful signal in another (e.g., AI models extracting trends from raw text).
As Eddie Short put it:
โFor AI, trying to craft high-quality training data is self-defeatingโespecially with unstructured data.โ
Or as Andy Mott said:
โQuality is always defined by the consumer of the data. In this case, itโs the AI model.โ
Final Thought: AI Requires a Data Mindset Shift
๐ Data Quality for AI is about speed, iteration, and adaptability.
๐ Perfect data isnโt requiredโonly enough signal to train effective models.
๐ AI and Data Products require a shift from traditional โclean it firstโ thinking to a more experimental, hypothesis-driven approach.
If organizations fail to embrace this shift, theyโll slow down AI innovation and fall behind those who iterate fast.
Whatโs Your Experience?
Are you still trying to "fix" data before using it for AI, or have you moved to a faster, more iterative approach? Get in touch with us at Dataception!
With Dataception's DOGs (Data Object Graphs), AI is just a walk in the park!