For decades, data scientists (née statisticians) have had sandboxes to explore data and find valuable insights. In what seemed like a happy compromise, analysts could quickly load, manipulate, and combine enterprise and industry data in search of new insights and predictions without worry that they would compromise sensitive data or production workflows. While this accelerated creating new insights, putting them into production was a nightmare. A bevy of custom code and data created in an ungoverned environment needed to be converted, quality controlled, and optimized before deployment. It often took the better part of a year for a business to get value from an insight gleaned in a few weeks.
The specter of big data threatened to make the situation worse—in a big way. Now analysts were using data structures and programming languages foreign to IT.
The volume and complexity of external data sources were exploding. Without a new approach, insights found in a big data sandbox might never make it into production.