BI Roadblock #1: Data Silos
Jun 2, 2020
A data silo is an information repository that is either isolated from stakeholders, other repositories, or both. Though they’re easy to spot and generally known to be problematic, they’re also one of the most common barriers to good data practice.
This is in part because young silos are typically limited in size and scope. It’s not uncommon for business departments to develop their own recordkeeping systems or use specialized applications without considering how the data housed within them may one day become more broadly relevant. A tech startup might, for example, purchase CRM software for its sales department and a ticketing system for its support department. Though the two systems do not integrate with each other, these silos have yet to adversely impact the company, as neither department is interested in the other department’s data.
The CRM and ticketing silos only become problematic with the advent of this startup’s customer success team, which needs access to both repositories and the ability to join the data into one cohesive customer story. This is how data silos take companies by surprise.
Data silos come in many forms (applications, spreadsheets, databases, third-party resources, etc.) but are easily identified by the organizational problems they create. You may be dealing with this BI roadblock if:
- You cannot access a repository of relevant information.
- You cannot easily merge or join data between related repositories.
- You cannot manipulate or analyze a repository’s data until you’ve exported it to a flat file.
- The data you need only exists as a flat file or in another static format.
- Changing a record in one repository does not automatically update related records in another repository, resulting in inconsistencies.
New applications and database management systems are being released at an unprecedented rate, so keeping up with customers’ data integration needs can be tricky. Many SaaS providers offer out-of-box data connectors to partner applications, but sometimes these low-code options aren’t available. Because most companies have little-to-no say in where their SaaS application data is collected, they must make do with the silo until they can migrate the data to a warehouse they control. Such a project can take months or even years depending on the company’s technical resources, the number of siloes being merged, and how much transformation the data will have to undergo. This limbo period is typically where spreadsheets enter the picture.
Spreadsheets are great for small projects but should not be used for long-term data storage or analysis. Because they’re static files, they quickly become outdated. They also don’t typically come with any logging capabilities, so people can make changes without there being a record of it. Unless spreadsheets are hosted in the cloud, only one person at a time may access them. Because they must be updated manually, they’re also especially prone to human error.
Even in the absence of these quality concerns, spreadsheets are cumbersome reporting tools and take longer than relational databases to query using a business intelligence solution.
Silos make it nearly impossible to get a complete picture of your data, and that can result in all kinds of operational inefficiencies.
On an episode of Data Talks titled “Escaping Spreadsheet Hell,” BI consultant and “data whisperer” Jen Stirrup describes some of the silo scenarios she’s encountered in the field. One company she consulted for stored all its employees’ PTO data in a spreadsheet, and human error resulted in everyone’s data getting jumbled. “The spreadsheet had two columns on it,” she recounts, “the name and the number of holidays that the person had taken that year. But they’d sorted one column of the two and not the other. So that mixed up everybody’s holidays.” Another company needed a spreadsheet just to index all its spreadsheets.
Fallout from data silos can include, but is not limited to:
- Compromised data quality due to human error.
- Business decisions being made based on false or incomplete data.
- The inability of individuals and/or departments to coordinate with one another.
- Inconsistencies between data repositories and difficulty reconciling them.
- Lack of trust in an institution or system due to poor transparency.
- Workflow inefficiency.
- Lost revenue due to oversights.
You can eliminate data silos either through storage or through reporting. If you use the storage method, you’ll likely engage in some form of data warehousing. A data warehouse is “a repository of data that has undergone ETL (Extract, Transform, Load) processing, which may include information management and governance, for the purpose of integrating data from diverse sources and making it easier to analyze.” This process is an opportunity not only to unite your data in one repository, but also to reconcile inconsistencies across sources and prepare it for more efficient querying.
The alternative is to access all data sources from one central reporting hub, such as a business intelligence solution. BI tools can typically connect to a wide variety of sources, making it easy to join across repositories with common data. If the data needs some massaging before it can be integrated, check to see if your BI tool can assist you in making those adjustments. Some allow for lightweight data manipulation and other convenient alternatives to data warehousing.
Whatever your method, the goal is to give all stakeholders access to the data pertaining to them and their work. The problems propagated by data silos tend to compound over time, making the transition to a centralized access point worth the investment in the long run.
Originally published with Software Business Growth.