There are many reasons why analytics software projects routinely miss deadlines and overrun the budget, in particular regarding the data preparation phase where leaders kick these projects off without nailing the prerequisites:
- Many analytics projects begin as software engagements, and software projects are notoriously known to be delayed. Commonly the development processes lack specification, there is no source control in place, and no incremental delivery requirements.
- Developers are humans with all of the usual proclivities to over-promise, overestimate skills, and give in to the pressure of management.
- The inherent nature of development does not help either: bugs can be tough to fix, software libraries do not link, the implementation must work on a great variety of different computing architectures, etc.
Way too often in an analytics project, significant delays accumulate even before the first line of code or query has been written. It all starts with data availability and collection. It is not uncommon for companies to start analytics projects based on the belief in the potential business value of analytics, dependent on utopic data sets. The quality of data eats into the realized value, but it is even worse if the data is not even available or accessible to the team before the start of the project.
I have been involved in several projects where there is an initial assumption that the business users will provide some data and the miracles of analytics will be performed. This can happen, but only after a multi-year delay in the execution due to data not being available from the beginning. If the business users own the data, then the task of making the data accessible is not a top priority. After all, the business has been running for years without this new analytics nuisance.
After the data has been collected and passed to the analytics team, the data quality issues arise and missing data sets are identified. At this point, a back-and-forth is started with the data owners and the best outcome is that the data is ready after a significant delay in the project execution. While there is a move towards EDWs that ease data access, in the foreseeable future there will always be spreadsheets around corporations with valuable information not accessible to everyone.
Corporations that do this well can execute a project more quickly because of well-established analytics project leadership. These leaders understand that a project cannot start without the data being ready and of acceptable quality. No matter how high the perceived business value of a project is, they will not pull the trigger to start before all of the prerequisites are met. It only starts after the data is ready.
This contrast in leadership is especially pronounced when outsourcing the project development. The billable hours keep accumulating while the consultants wait for the data or work with old - or even worse, fictitious - data sets. On the other hand, the consultants also feel the pressure as they see project deadlines slipping away. Experienced leaders do not cave in to the pressure of upper management and the allure of high ROI. They do not officially start a consulting engagement before all of the data is readily available and of acceptable quality. Potential delays in the execution are now narrowed down to the general peculiarities of software projects.