Mastering the Data Product Life-Cycle: A Key to Building a Successful Modern Data Ecosystem

We often receive questions about data products and the operating models organizations need to truly harness their value. One critical aspect to consider (after deciding what types of data products to build) is their life-cycleโ€”how to effectively build and operate them. Here, we present a high-level view of the data product life-cycle, leveraging OODA loop principles.

The Three-Part Data Product Life-Cycle

Iteration is essential at every stage of the data product life-cycle, which consists of three main phases: Prototype, Industrialize, and Operate.

1. Prototype Phase

This phase involves working directly with the business to quickly develop working data product candidates, incorporating actual data, logic, and visualizations. Fast feedback is crucial here.

  • Capture: Rapidly gather data from the data estate, external sources, and the catalogue (already onboarded) into sandboxes. Incorporate other data products (metrics/models) as dependencies to build upon.
  • Understand: Collaborate with data owners and data product owners to understand the data and products. Link/join them relationally, semantically, or referentially as necessary. Focus on annotation and markup, not transformation into an intermediate form.
  • Model: Develop data product-specific data structures sufficient for the actual models (metrics/ML). Experiment with and prototype models/metrics.
  • Visualize and Feedback: Present visualizations and get feedback from users right from the start.

2. Industrialize Phase

Once the business accepts the data product candidates, the pre-production work begins.

  • Engineer: Harden the code/pipelines, implement CI/CD, versioning, and package the data product into run-times for execution on the platform.
  • Test: Conduct model testing with users and production data, and perform system testing.
  • Control: Add data quality rules, policies, and map back to data dictionaries/taxonomy for others to reuse. Conduct additional testing here.
  • Publish: Package and publish the data products, datasets, and configurations to the ecosystem for reuse.

3. Operate Phase

Execute and run the data product as run-times on the ecosystem infrastructure.

  • Execute/Adapt: Run data product runtimes on platform clusters with break-fix processes for emergency fixes.
  • Monitor: Implement data and data product observability to track ML skew/drift/errors, system errors, resource usage, ESG metrics, and actual usage.
  • Retire: Retire data products as necessary to ensure the ecosystem remains a living, breathing infrastructure.

The Importance of Continuous Iteration

Each stage in the life-cycle is iterative, allowing for continuous improvement and adaptation. This iterative process is critical to ensure data products remain relevant and valuable in a rapidly changing market.

Additional Insights

Broader Perspectives on Data Product Marketplaces: when building data products, consider examining data exchange platforms that formalize the creation, publication, and distribution of data products. These platforms, which create data marketplaces or data storefronts, are becoming essential components of every data architecture. They offer significant value in reducing time to market, enhancing the ability to respond to fast-changing markets, and increasing the level of analytics sophistication.

Building data marketplaces is challenging, especially if you want to avoid time-consuming pre-processing efforts. The decision to build or buy external data solutions involves evaluating factors like time, money, resources, and scalability. While building may seem cost-effective initially, the long-term financial and resource implications often make buying a more viable option.

Effective Collaboration Between Business and Data Experts: effective collaboration between business experts and data experts is crucial in the prototype phase. Best practices involve accurately capturing needs and maintaining continuous engagement to ensure projects stay relevant and aligned with their original intent.

Strategic Planning for Long-Term Success: product management must take a leading role in the data product life-cycle, ensuring that the right things are done at the right times. This involves co-creation with internal and external data product customers, establishing shared innovation processes, and developing data contracts.

Conclusion

A well-defined data product life-cycle is fundamental to building and operating a successful modern data ecosystem. By following these stagesโ€”Prototype, Industrialize, and Operateโ€”organizations can ensure they are continuously delivering value through their data products. This approach not only enhances the ability to respond to market changes but also embeds data and analytics deeply within the business.

Final Thoughts

Organizations that master the data product life-cycle and adopt a robust operating model will be well-positioned to thrive in the modern data ecosystem. By iterating continuously and embedding data and analytics within the business, they can achieve significant competitive advantages.


Notes:

For those considering building out this model or wondering if it will work for them, feel free to reach out to discuss deeper dives into our Data Product operating model approach.

If anyone is considering referencing this model in a presentation, feel free to do so with proper credits.