Calculating Your ETL ROI
How to cut the costs and boost the benefits of your ETL program.
Nov 26, 2019
There are many resources that explain how to calculate the ROI of a data warehousing project, but few isolate just the ETL slice of the pie. Extract, Transform, and Load processes make up a significant portion of any data warehousing operation and are a complex line item on the balance sheet. "Do ETL" is about as glib as "do science."
Companies resolved to become data-driven may assume there's no point in calculating ETL ROI if it's an essential component of data warehousing, but different ETL methods can yield wildly different returns in different business contexts. It's important to find the best solution for you by identifying where -- given your company's size, resources, and ambitions -- your expenses will be greatest and minimizing those costs while maximizing returns.
Always the easier side of the equation to estimate, ETL costs come down to tools and time. Cutting costs involves finding an ETL solution that's easy to use, integrates well with your existing data architecture, runs efficiently, and will scale as you accumulate data and alter your models. Here are six cost components to consider.
Software. There are effectively two different flavors of ETL software: standalone and integrated. Integrated ETL software typically comes as part of an end-to-end BI platform and is the obvious choice if you're already in the market for a business intelligence solution. Standalone options are great if you already have a BI solution or are ETLing your data for purposes other than reporting, such as moving storage locations or database types.
Hardware. If you don't already have them, you will need workstations for the initial ETL programming and servers (either on premises or in the cloud) on which to host your source data and target data repositories. You will also need servers for the ETL processing itself, which is performed more efficiently if your ETL solution comes equipped with change data capture capabilities. Rather than batch-update the entirety of your repository each night, you can selectively update only stale areas of your target database, reducing the resources required and improving data quality. Because updates must often run within a narrow timeframe, change efficiency means not having to pay for more powerful processors.
Consulting. Enterprises often find it necessary to have data architects on their payroll to help manage post-implementation ETL processes, but some may not have the budget and opt instead to hire a consultant or contractor. A discovery call should help you estimate the costs here; consider adding an extra 20 percent to the quote to adjust for optimism bias and hidden fees.
Training. If you're planning to build and maintain your ETL in house (rather than hiring an agency), you will be responsible for training employees in the ETL software you adopt. The more usable the software, the less training it will require to get your developers up to speed. Note that many ETL solutions are designed for multidimensional databases (MDBs) or "cubes," which are nonstandard and typically require ETL architects to learn proprietary querying languages. Adopting an ETL tool that connects to relational databases will almost guarantee reduced training costs.
Development. Even though internal staff will be on the payroll regardless of their assignments, you'll want to account for the opportunity cost of diverting them from other projects. For this reason, it's a good idea to factor salaries into your ROI calculations, accounting for the approximate number of hours each team member is likely to spend on ETL-related tasks.
Maintenance (total cost of ownership). If any of the above costs recur for the life of your ETL processes, factor them into ongoing maintenance costs to find your ETL's total cost of ownership (TCO). Some applications can be had for a one-time licensing fee while others are licensed periodically. Some companies may hire ETL consultants only during the initial planning stages while others may look to contract with an agency long term. Training may be costly initially but become relatively inexpensive once internal staff are skilled enough to train new recruits themselves.
The monetary return of an ETL solution hinges on the data management and reporting efficiency improvements it affords. To estimate the value of these benefits, you'll need to know about how many people in your organization spend how much time performing these data transformations in an ad hoc fashion.
Data management. ETL can be a powerful means of catching operational errors through data quality checks. When validation logic is introduced at the ETL level, any suspicious or irregular records can be flagged for investigation. Sometimes these inquiries uncover recordkeeping mistakes, customer experience issues, fraud, and other anomalies that cost the company. In the absence of ETL, data quality is often monitored manually, making systems more vulnerable to human error and limiting the scope of these data management projects.
Because the monetary yields of a data quality program are highly circumstantial, it's helpful to base your estimates on returns from past and present quality checks. For example, if in an average year your company uncovers $50,000 worth of fraud and $100,000 in customer service issues, it would be fiscally conservative to estimate that ETL might return at least $200,000 per year in data quality checks.
Efficiency. Data manipulation not done using ETL processes is often done on an ad hoc basis by the data analysts responsible for supplying their departments with reports. If considering ETL for internal use, estimating the fiscal benefits of improved efficiency is simply a matter of approximating how much time analysts currently spend combining, filtering, cleansing, formatting, and aggregating data and multiplying that number by their hourly rates.
If you're a SaaS provider or similar vendor adopting ETL as part of a product offering, this efficiency can become a marketable value proposition and contribute to increased sales. Highly refined data that's as approachable as it is accessible plays well on product demos.
A Final Word
Data may be the new oil, but you won't get value out of it unless costs are properly controlled and benefits clearly defined beforehand. Assess your business context and determine where your risks and opportunities are most pronounced, then make the judgement call that's right for your company.
Originally published with TDWI.