Data Models That Build Themselves
Jul 16, 2019
Self-service BI is about bridging the knowledge gap that has historically separated business professionals from their data. It’s about doing away with intimate knowledge of information systems as a prerequisite for finding out last quarter’s growth margin. And when it comes to replacing SQL statements with friendly drag-and-drop graphical elements, the majority of BI solutions are doing an outstanding job of bridging that gap.
Why, then, do so many BI solutions still require users to understand data modeling?
Self-service and ad hoc solutions typically approach the thorny issue of data modeling in one of two ways: either the “build-a-model” method or the “choose-a-model” method. Neither method accommodates both basic business users and users with a high reporting aptitude (traditionally business analysts and data scientists) because both methods yield static data models. Static data models are built intentionally and for a specific purpose by either administrators or end users. New ones can be built, but once built, they cannot be altered. This results in a “one size fits all” user experience that, in an attempt to accommodate all user groups, serves none of them well.
Static Modeling Methodologies
In the build-a-model method, users select the tables they want in their report and manually join those tables to one another to form a model. While offering users maximum control over modeling, this method also requires that they understand join types, entity relationship types, primary keys, foreign keys, modeling best practices, and the data’s semantics — both in general and with regard to the data set in question. How BI applications attempt to bridge this knowledge gap — with tooltips, tutorials, documentation, and clever UI/UX design — will vary, but users will still have to learn to build models.
This is a reasonable expectation to place on data analysts, but business users are compelled to choose between getting adequate training, which is an inconvenience, and passing their reporting tasks to a team member, alienating them from the application and slowing their productivity.
One could argue that these non-technical business users are self-service BI’s primary audience. Data analysts and data scientists are often already comfortable using BI applications, and some even write SQL and R. Self-service BI aims to, as PC Magazine explains it, put reporting power in the hands of “the people who really needed to get and understand the business—the company decision makers.”
“The goal for much of today's BI software is to be available and usable by anyone in the organization. Instead of requesting reports or queries through the IT or database departments, executives and decision makers can create their own queries, reports, and data visualizations through self-service models, and connect to disparate data both within and outside the organization through prebuilt connectors.”
It follows, then, that the process of selecting data for a report should be made as intuitive to those users as possible.
Some BI providers cater to less technical users by providing them with a library of premade data models. The downside here is that using these, again, requires business users to have at least some grasp of the options and their implications.
Let’s suppose we want to report off data tables A, B, and C. An A-B-C model should not be confused with an A-C-B model (or even with an A-B-C model composed of different join types). If, for example, A is a table called Patients and B is a table called Insurers, a left outer join between the two tables would show all patients regardless of whether they have insurance. An inner join between them would effectively filter out all patients without insurance and all insurers without patients in the given medical group.
Two models with just slightly different joins will produce very different metrics. Unless the models are equipped with a detailed summary, it would be easy for a business user to simply select one of the two without understanding how they differ and how those differences will affect the subsequent report, its output, and any ensuing decisions based on that output.
Moreover, the choose-a-model method creates a bottleneck for those more-advanced users who require custom models. Soliciting data models from IT hamstrings advanced users and runs counter to the core mandate of self-service data analytics solutions.
Thankfully, there is dynamic data modeling. Sophisticated algorithms coupled with admin-defined defaults have the potential to make selecting data intuitive to business users and while also giving analysts full control over their own models.
More and more, business intelligence providers are being confronted with the reality that one size does not fit all and maybe never did. As the Eckerson Group’s “A Reference Architecture for Self-Service Analytics” outlines, different classes of end users require different user experiences:
“There are many reasons why it’s difficult to achieve the promise of self-service analytics. One of the biggest is that self-service analytics is not a homogeneous thing that can be universally deployed. Self-service analytics means different things to different people.”
To genuinely accommodate such a varying array of users as require self-service analytics, BI providers need to provide a flexible UX design — and that extends to data modeling. Business users who build ad hoc reports (Eckerson calls these users “data explorers”) should be able to simply select the tables they want on their report and have the application determine the optimal join path between them. The model is built dynamically by the application itself as users add and removes tables. They don’t even need to know they’re building models— it just happens in the background.
The model is built dynamically by the application itself as users add and removes tables. They don’t even need to know they’re building models— it just happens in the background.
The admin-defined data framework guiding the dynamic models will, like any default, only satisfy 80% of use cases. The other 20% of the time, more advanced users may want to deviate from the default model, making fine-tuned adjustments by hand.
If a particular user group or department must routinely use models that deviate from the default framework, admins may set up alternative frameworks, using labeling to steer users to data objects using the framework best for them. The models they build from there would still be constructed dynamically, just according to a different set of rules.
Progressive UX design anticipates users’ needs even when they don’t realize their needs are going unmet. Smart, dynamic algorithms that intuit users’ modeling needs bridge the knowledge gap for business users and give ample control to power users represents the next wave of self-service BI innovation in interaction design.
Original article appeared on Dataversity.