Think Outside the Cube
Jun 10, 2019
These days, most SaaS providers embedding BI in their applications recognize they also need some kind of ETL. (Check out this infographic if you aren't sure what ETL is). In response, many BI platforms are coming up with their own proprietary data preparation solutions. Some have chosen to use multidimensional databases, often referred to as cubes.
But cubes require advanced knowledge of data modeling as well as a number of non-standard querying and programming languages, which vary across platforms, making them unwieldy even for technologists.
There is a better way to prepare and warehouse data for analytics and reporting. We need to think outside the cube.
Let’s take a look at what cubes are, when they serve, and when to consider alternatives.
What are cubes, anyway? And when should you use them?
If you’re reading this, it is likely you have heard of data cubes before, even if you haven’t used one. OLAP (online analytical processing) cubes, also known as multidimensional database (MDDB) engines, were created in the 1980s as a method of segmenting and aggregating data from large databases in order to speed up queries for analysis and reporting. Because the industry never drove to any query language standards for cubes— like the SQL standard(s) in relational databases— cubes are more difficult to work with and never became predominant. Basically, cubes are like a niche foreign film that sounds interesting but you just don’t feel like reading subtitles the whole time.
Cubes have made a comeback in recent years, though, as embedded BI platforms make attempts to modernize the concept with their own nonstandard, proprietary versions.
The problem is that cubes are complex and time-consuming to construct, and when they are the only option, they become roadblocks.
None of this is to say that cubes don’t have their uses. In fact, Exago BI supports OLAP cubes as data sources. Cubes come in handy when you want to store data in a multidimensional view in order to spot trends in large datasets, cache and preprocess data by aggregating it to speed up queries, or blend data from disparate sources for easier reporting. You might also build a cube in order to take a slice of a large database and focus only on the most relevant data for a given case, such as to see how sales of a certain product have progressed over the year.
If you know anything about data prep, you’re probably thinking, “If cubes do all this, it makes sense to use them for ETL.” And you aren’t wrong. The problem is that cubes are complex and time-consuming to construct, and when they are the only option, they become roadblocks.
When and why to think outside the cube
When you want all of the benefits described above without any of the drawbacks, it's time to think outside the cube. Alternatives to cubes include ETL tools based on relational databases, like Exago’s no-code ETL, which take a much smoother route to the same destination.
Here’s a chart to help you compare the differences of using a cube for ETL versus doing ETL in a relational system like Exago BI.
Any cube, OLAP or proprietary, has a steep learning curve and will require full-time analyst or dev resources to build and manage.
Cubes in their most standard form are complex to begin with, and since many BI solutions have put their unique spin on them, the learning curve has actually gotten steeper. Developers, analysts, or power users who are familiar with OLAP or some other type of cube now have to try to apply that knowledge to a new design that comes with its own idiosyncrasies and querying or programming language. Some cases might even call for the creation of an OLAP cube as well as a proprietary cube, which then has to integrate with the OLAP cube. Confusing, right?
As college grads enter the field and the average age of your staff decreases, it’s likely your staff will have no experience with cubes. You will have to decide whether to hire staff for the express purpose of building and managing data cubes, or to train existing staff from the ground up. The former means finding someone with cube experience, who will still have to learn the ins and outs of your BI solution’s proprietary cube, and the latter means teaching your staff SQL, or some other querying language like Microsoft’s MDX, along with cube dimensions, hierarchies, and so on.
If your data isn’t already suited to the cube format, you have to convert it. It is likely you are working with relational or transactional databases, which are more common, transparent, and generally easier to work with, but are a very different way of storing data than a cube.
Cubes must be built incrementally to avoid record fragmentation and other errors. You have to add one or two tables at a time, then run a check, then add more tables, then run a check...you get it. Unlike in a relational database, where you can add or update categories as needed without affecting existing data, adding or modifying a category in a cube means rebuilding the entire thing each time because the accuracy of the data relies on its structure.
Building a cube means coding and querying, which is time consuming to do even when you are fluent in the necessary languages.
A Better Way
So while cubes do have their benefits and may fit some environments better than a relational system, clinging to cubes as the only way to prepare and warehouse data in business intelligence runs counter to the progress the industry has made to democratize analytics. If you want the pros of cube-based ETL without the cons, it’s time to try an alternative route that avoids these roadblocks.
That’s why we developed a no-code, no-cube, ETL solution that saves time and sidesteps the learning curve while delivering squeaky clean data to your customers. If that sounds like a dream come true, fill out the form below to get in touch.