by Avon S Puri

Rethinking data management for faster time to value

Opinion
06 Apr 20215 mins
Big DataBusiness IntelligenceData Management

Enterprises should abandon traditional ETL approaches and strategies and adopt future state architectures that distribute the data transformation burden.

big data analytics elephant raining data binary
Credit: Thinkstock

As enterprises move ahead in their digital transformation journey, the massive trail of data from digital transactions is growing steadily, but for many organizations extracting intelligence from data remains a pipe dream.

According to analyst firm IDC’s Worldwide Global DataSphere Forecast, 2021–2025, business and consumer data has been amassing at a compound annual growth rate (CAGR) of about 23% since last year, with a 28% CAGR attributed to enterprises, and is expected to reach 180 zettabytes by 2025.  Data created in the cloud is also growing at 36% annually, while data collected at the edge through various IoT and sensing devices is growing at 33% annually and will make up 22% of the total global datasphere by 2025.

For enterprises, the task of making data compute‐ready becomes more complex as the amount of data grows, and companies are spending little time and effort developing effective data management processes and platforms to make that data easily actionable. For example, many companies collect massive amounts of digital transaction data pertaining to their customers, orders, product use, install base, service tickets, crash logs, and market intelligence, but have no good way of creating a 360 degree view of each customer or their business—despite having more technology choices available to them than ever for extracting intelligence from data.

Many enterprises have reached a state where it is clear to them that the amount of data they possess neither provides a constant competitive advantage nor allows them to easily unlock value from it.  At the same time, this expanded data ownership raises confidentiality concerns and enforcement costs and also adds to complexity of the environments.

Toward a better data management strategy

Current state architectures are a result of amassing data without first developing a strategy for effectively and intelligently using that data, implementing a complex mix of technologies and fragmented processes, and relying on data engineering practices that are based on a very weak data foundation. 

For the most part, these foundations are based on the extract, transform, and load (ETL) method—extracting data from a number of sources, transforming the data into a specific format via an ETL server, and then loading the data into a data warehouse where it can be analyzed and hopefully presented as business intelligence. However, the data transformation process can be somewhat complex and compute intensive as it is translated into a format that can be recognized and used by a line of business databases. It can also take significant time since the process involves a lot of I/O activity, string processing, and data parsing.

A better data management strategy starts with shuffling the letters “ETL” a bit and employing a process that begins with the extraction of the data, then loading it into specific data repositories that individually transform the data into a more useful and relevant form. This ELT methodology loads the data into your target system before transforming it, shifting those duties to individual cloud‐based data warehouses.

Instead of using a single ETL engine/server to transform all the structured and unstructured raw data, with an ELT approach segments of the data are channeled to specific cloud data warehouses where those portions are individually transformed. The result is less I/O time and faster parsing.

Less chaos, more intelligence

Future state data architectures, based on an ELT structure, will focus on building a strong data foundation layer and a platform‐based approach to provide an all‐encompassing data management solution for the entire organization. Whether it is IoT data, clickstreams, sales and marketing intelligence, business metrics, or user analytics, future architectures will rely on a cohesive platform to reduce the gap between the acquisition of data and unlocking value. 

Some of the key considerations for the future state architecture are:

  • Implementation of foundation layer capabilities, including connectors, event streaming, source writebacks, and MapReduce. A next layer will be comprised of data management lifecycle, data modeling, schema enforcement, data privacy, governance, consents, security, data projects, and stewardship.
  • At the heart of this architecture is a discovery and self‐learning engine that can crawl and retrieve data from various sources in the ecosystem—constantly adapting to changing business needs and ingesting the right amount of compute‐ready data.
  • To meet the realities of complying with data privacy regulations, data structure and persistence abstraction is required to provide solutions for data residency.
architecture Sequoia Capital

The end goal of future state architectures is to eliminate long‐running queries and joins with business data by acquiring data elements that are compute‐ready and lead to the optimal usage of data storage and processing resources. This will not only reduce the amount of data stored to a fraction of what we store today, but will also increase the speed at which businesses can unlock useful and actionable business intelligence.