In the world of data management, there's a critical yet often overlooked aspect that could render even the most modern data platforms obsolete: the ability to easily and quickly retire data use-cases, including the associated models and data. This might seem like a minor detail, but it's a fundamental aspect of effective product management and sustainability in data architecture.
The Essence of Product Management: Life-Cycle
Many data professionals often ask me about the definitions of data products and data-as-a-product. The simplest aspect of product management that resonates with data people is the product life-cycle. This encompasses the entire journey from idea to retirement:
- Idea
- Experiment
- Test (with business)
- Prototype
- Build
- Deploy
- Operate
- Retire
While stages like ideation, experimentation, and deployment get a lot of attention, retirement is frequently overlooked. This oversight can lead to significant issues down the line.
The Retirement Challenge
A common scenario in data platforms involves legacy data models and ETL jobs that persist long after their initial purpose has faded. Conversations about these often go like this:
- Q: "What is that column?"
- A: "Oh, that's left over from an old system. We're not sure if it's used."
- Q: "Why don't you delete it then?"
- A: "Are you mad? We've seen some 'SELECT *' queries using that table, and since we only manage the data, we don't know if it's still in use."
This situation is a classic example of what I call "use-case debt."
Understanding Data Platform Debt
Data platform debt comes in two flavors:
- Tech Debt: The well-known scenario where developers take shortcuts or leave out certain features due to time constraints imposed by the business.
- Use-Case Debt: The accumulation of outdated data structures, processes, and policies that remain because no one wants to risk breaking something potentially important.
While tech debt is widely acknowledged, use-case debt is a silent killer that can render a data platform unwieldy and eventually lead to its replacement.
The Discrete Data Product Approach
One effective way to address this problem is by adopting a discrete data product approach. Here, the "product" is the analytics or data logic that solves a specific business use-case, packaged with its corresponding data model and data. This approach isolates each use-case, making it easier to retire or update without affecting other components.
Shaun Ryan noted that this concept aligns with the original intention of agile: "Data models should get trimmed and finely refined in a good agile project." The problem is that many current agile practices in data management are insufficiently rigorous, leading to the accumulation of use-case debt.
Practical Implementation
Implementing this approach involves:
- Agile Practices: Regularly refining and trimming data models to keep them lean and relevant.
- Component Life-Cycle Management: Treating each data product as a separate entity with its own life-cycle, including retirement.
- Clear Documentation and Monitoring: Keeping detailed records of data use-cases and their dependencies to facilitate safe retirement.
Conclusion
Retiring data use-cases is not just a theoretical exercise; it's a practical necessity for maintaining a healthy and functional data platform. If your platform can't easily and quickly retire data use-cases, it may already be on its way to obsolescence. By adopting a discrete data product approach and focusing on the entire product life-cycle, including retirement, you can ensure that your data platform remains agile, efficient, and relevant.