Agile development has been all the rage for a while now – extreme programming, scrum, user stories, epics, backlogs, etc. have become the lingua franca of any software development organization worth it’s salt. And although notion of agile development hasn’t yet completely penetrated other parts of the enterprise, there is an increasing awareness of the benefits of agile development. One of the areas where the concept of agile development is starting to gain traction is around analytics. As Jim Kobielus noted in a recent post, organizations that are ability to quickly learn and iterate using experimentation are able to gain a competitive advantage; analytics vendors like SAS have also been promoting the concept of agile applied to big data analytics.
This development is not surprising, as the values of agile development should resonate with anyone who’s involved in delivering data and insights. At it’s core, agile values:
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
In the realm of analytics, what do these values mean? And more importantly, how can they be realized in order to really realize the vision of “Agile Analytics”?
Let’s start with the implications of “agile” in the world of Big Data Analytics (and also, Big Data Applications). Based on my experience at Yahoo! (and also, based on what I am starting to see from Greenplum’s customers) there is kind of an evolution of needs that takes place during the lifecycle of Big Data Analytics application development – let’s call this the Analytics Application Development Lifecycle. Before diving into what this lifecycle looks like, let’s first talk about the environment that enterprises with Big Data are dealing with today. In general, I’ve seen the following characteristics, which end up informing what an Agile Big Data environment needs to support:
Underlying Data Sets are Fast Changing: In this environment, timely analysis of new products and concepts is a competitive advantage. As a result, data processing and analysis systems need to be flexible enough to support underlying changes without requiring a rewrite or a new data model.
- Demand for Analytics is Time Sensitive: In the big data world, the ability to analyze new features that are in production and impact revenue/ monetization is critical. Delays in turning around new requests can result in serious financial impact or customer risk.
- Business Questions and Data Needs are Unpredictable: Anyone who is supporting the Business Intelligence (BI) needs of a “Big-Data-Driven” organization will tell you that reporting and analysis needs for new features can’t be anticipated – additional data needs often arise as the result of first-pass analyses. This means that data query and analysis systems must be built for unpredictable demands.
- Volumes of Data and Data Consumers are Extremely Large: Analytics systems need to support deep analysis by data scientists, dashboards and reporting for larger internal user bases, and consumption by operational systems. To complicate things, all of these capabilities need to scale to support massive & growing data sets.
- Ad-hoc access to “raw” event/user level data
- Data source agnosticism – Hadoop & RDBMS interop
- Data search and discovery
- Analysis- and Developer-friendly environment – SQL, Code
- Lower-than-average cost of change for new data, metrics
- Schedule and publish capabilities for views, tables, insights
- Unified catalog/metadata service
- 3rd Party Tool “friendliness”
- Resource management for ad-hoc & production workloads
- Enterprise features for the entire data system
There are plenty of other things to think about as well: do you have the right “Data Scientists” within your organization to leverage this platform? Are you properly instrumenting your products and processes to drive data into your data platform? Are you thinking about closing the loop by building applications and systems that can leverage the insights delivered by your data science team (operationalization, as it were)? All things to keep in mind as you venture into the exciting new world of Big Data, and Big Analytics.
About the Author