Real-Time Data Analytics Best Practices
Apr 7, 2020
What is real-time data analytics?
Though it sounds straightforward and remains a longstanding priority for data-driven companies, real-time analytics is often poorly understood. Many define it as the ability to access and manipulate data immediately or very soon after it enters a system, but this definition begs an important question: How soon is “soon?” Is it seconds later? Minutes later? Days later?
It depends. Contrary to popular opinion, real-time data analytics isn’t a feature that BI solutions either do or do not have. Rather, it’s a benchmark for how fast BI solutions can deliver insights relative to the demand for those insights. What qualifies as “real time,” therefore, is subject to the use case in question. As such, a more accurate definition of real-time data analytics would be the ability to access and manipulate data when desired.
For a warehouse reporting on which drivers are on premises, “real time” might mean less than the time it takes a truck to get from the warehouse’s front gate to its loading docks. For an HR firm pulling current mailing addresses, it might mean less than the number of hours between the start of today and the close of the last business day. Different people in different situations need their data updated at different intervals.
By this more accurate definition, real-time data analytics is a near-universal requirement. No organization given an alternative would opt to reach for a piece of information only to find it unavailable. Data bottlenecks cause operational inefficiencies with real, sometimes severe, consequences. Since real-time analytics isn’t so much a feature as a benchmark, it’s important to understand what processes will have the most impact on your time to insight.
How do I reduce time to insight?
In a BI scenario, the leading factor affecting how long it takes to perform an analytical task is the volume of data being returned from the database. Even with modern hardware innovations, passing information from one server to another takes time. The more data being sent, the longer the transfer will take.
This is why data warehousing is such common practice. A database optimized for data entry looks very different from one optimized for data extraction, so organizations will often maintain two: one for raw, transactional data, and the other for that same data after it’s been prepared for analysis. The process by which the data is extracted from the transactional database and ported into the warehouse is known as ETL, or Extract, Transform, Load, and is generally performed at regular intervals depending on need.
How you transform your data during ETL can dramatically reduce your analytical time to insight. Denormalizing the data set so as to reduce the number of tables involved in a query has a profound impact on query times, as does introducing new summary tables containing the results of commonly performed calculations. Combining records from multiple data sources in one warehouse also helps cut back on data transfer.
Data warehouses do take time and resources to build, however, and not all organizations have one ready to hand. If you don’t have a warehouse or find your warehouse unable to meet your real-time needs, there are ways that analytics tools can help bridge the gap.
What should I look for in an analytics solution?
If real-time analytics is a priority for you, look for a BI solution with configuration settings that allow it to pull less data from the database and process that data faster. These types of features will reduce your time to insight whether or not you have a data warehouse. Exago BI can report directly off transactional databases, and below are some of the tools it leverages in boosting those data retrieval speeds:
The simplest way to reduce the number of records being returned from the database is to apply filters. Exago BI not only allows users to apply multiple filters to a report but also permits advanced filtering logic and filter formulas. Report authors may require those consuming the report to select filter values before executing it. Exago BI administrators even have the ability to optimize filtering when reporting off stored procedures.
If you can predict when you’ll need a report, schedule it to appear in your inbox by the appointed time. Even if the execution takes a little while, it will happen behind the scenes and therefore require no waiting. Exago built its scheduler from scratch to ensure that it would be optimal for reporting and later added its scheduler queue for even more efficient data processing. Exago BI is well equipped to handle recurring reporting tasks so that the data will be ready when you need it.
Report Execution Caching
For reports you’ll need at a moment’s notice but want as up-to-date as possible, there’s execution caching. The report executes behind the scenes at a frequency set by the report author so that, when someone executes the report, the data can be swiftly retrieved from memory rather than pulled from the database.
Programmable Data Objects
Exago BI can connect directly to programmable data objects such as stored procedures and .NET assemblies so that you can do some data manipulation on the fly without a warehouse. Again, the more data preparation you do ahead of time, the quicker your report executions will be.
Grouping and Aggregating in the Database
Grouping and aggregating can sometimes mean pulling more records from the database than will appear in the report output itself. Exago BI circumvents this by allowing you to have grouping and aggregate calculations take place on the database server whenever possible. If enabled, qualifying reports will run more quickly and place less burden on the web server than they would otherwise.
Storing Database Schema
Retrieving metadata and schema information adds to report execution time. Exago BI allows administrators to store this information locally to reduce time-to-insight.
The above is by no means a complete list of Exago BI’s performance-enhancing features, but it covers those most relevant to real-time data analytics. When browsing the market for a BI solution, be sure to inquire after similar capabilities, as these will offer both speed and flexibility. Multidimensional databases, or “cubes,” are another way to speed up data retrieval, for example, but are cumbersome to set up and can make it more difficult to access a range of sources. Maintain a healthy skepticism of vendors that claim to have real-time analytics or talk about it as though it were an objective feature. Remember, your use cases dictate what “real-time” actually means, so use those requirements to guide your decision-making regarding analytics tools.