No More Data Models

How Exago BI Manages Joins so Users Don’t Have To

The majority of BI solutions offering ad hoc reporting require that users either build their own data models or select from a list of pre-built data models each and every time they create a report. Which would be totally fine, if admins could predict their users’ needs with perfect accuracy and if all users could be counted on to understand relational databases. But that’s not the world we live in.

In this world, there are people like Bob. Bob is a paper-and-pencil kind of guy, an HR consultant at Fikticious, Inc., where he manages client accounts and logs their payroll, compensation, and benefits information into an application called Emploi. Bob knows his way around Emploi’s simpler features, but he’s squeamish around anything that looks like math or a programming language.

If Emploi’s embedded BI solution requires that Bob build or select a data model in order to create a report, he’s likely to make mistakes. Let’s say a DBA has set up object joins in the database. When Bob selects objects A and B for his report, the join between them will link the key columns identified in the database. But what if the join type (inner, left outer, right outer, full outer) isn’t defined? What if there are multiple foreign keys to choose from, and Bob just picks one at random? Alternatively, what if Bob mistakenly chooses the A-B-C model from a bank of models instead of the A-C-B model, not realizing there’s a difference between the two? What if he needs an A-B-C-D model and can’t find it? Or, let’s suppose he chooses the correct data model (A-B-C) and begins building his report only to discover that he’d like to add fields from object D after all? He would have to scrap his report and begin anew.

These scenarios result in faulty reporting at worst and frustrated employees at best. Embedded BI can be a powerful addition to a SaaS application, but if it doesn’t accommodate a range of users with varying technical proficiencies, adoption rates will suffer. The build-a-model method puts non-technical users at a disadvantage, and the choose-a-model method forces power users to put their projects on hold while they wait for admins to build the specific model they need.

Exago BI does away with static data modeling altogether. Users simply select the objects they think they’ll need on the report and start building. If they decide to add or remove objects later, they can do so easily without scrapping the report. Non-technical users never have to think about joins, and technical users can reconfigure them at will. All this is accomplished through the use of data frameworks.

A data framework is a collection of administrative settings that allow for the dynamic creation of data models. When a user selects a handful of data tables to report on, the framework determines how those objects will join to one another by default. Users granted the appropriate permissions then have the option of deviating from the default if they so choose.

In “A Reference Architecture for Self-Service Analytics,” Eckerson Group advocates a “minimal data modeling” approach as part of their suggested data governance strategy for self-service BI. According to the report, it’s important that users be able to construct “business views based on local requirements” without having to wait on IT to produce new global data models. Exago BI’s data frameworks help stabilize the information supply chain while giving power users the flexibility they need to build genuine ad hoc reports.

“Our goal was to have the default satisfy 98% of use cases,” explains Exago CTO Stewart Meyers, “which meant giving admins conditional control over join paths. That’s why we created the ability to add join weights and multiple data models.”

If there are multiple possible join paths between data objects in a given set, join weights indicate which paths have priority. The higher the weight, the higher the priority.

In the example below, the preferred path between objects is ambiguous. Should the model be A-B-C or A-C-B? If all joins have the same priority, say a priority of 0, then there’s no way to know. Giving the A-B join a weight of 1, however, will prioritize it above the A-C join in cases where both are possible.

If, however, the prefered join path is different for different user groups, the admin can set up data model categories to guide the right people to the right model.

Let’s say, for example, that these data objects pertain to a clothing company’s warehouse stock. The A-B-C path includes items that are in stock but have never been ordered, and the A-C-B path includes ordered products. The company’s shipping department isn’t concerned with the products that haven’t been ordered, so it will be more performant for them to use the A-C-B path. The company’s executive team, however, is interested in all products, including those yet to be ordered, in which case the A-B-C model would be most performant.

Administrators could use the multiple data models feature to create categories of data objects and joins for each user set. In this case, those categories might be labeled “Shipping” and “Executive,” making it easy for employees to find the appropriate data model regardless of their technical training.  

Between data models and data frameworks, frameworks are far harder to come by in the BI space. “Which is interesting,” says Meyers, “because while the data model method may seem more exact, it leaves more room for user error. Frameworks, by contrast, intuit modelling decisions based on the user’s actions, and that keeps error rates low.”


Authored by
Nicole Hitner
Content Strategist
comments powered by Disqus