Data lake & streamlining of data extraction

March 19, 2024

Key figures

Customer: World-wide active manufacturer of energy solutions

Project volume: 40,000 USD, 3 months

Project scope

A manufacturer and designer of energy solutions identified the need for a comprehensive solution to streamline their data extraction process from various sources and seamlessly ingest it into a data warehouse. This transformation was imperative to facilitate essential operations like normalizations and calculations, ultimately enabling the utilization of clean data in a data visualization tool. The decision to implement this solution was fueled by several critical factors:

  1. Diverse Data Sources: The presence of multiple data sources, including ERP, analytics API, external partner API, and databases, required specific developer skills to extract data. Moreover, this approach incurred additional extraction costs, especially when dealing with diverse APIs and databases.
  2. Data Disparity: The various types of data in different formats, coupled with distinct identifiers and entity names, necessitated transformation and normalization to ensure consistency and coherence across the entire dataset.
  3. Manual Processes: The existing workflow involved manual interventions to import clean data into the data visualization tool, which not only consumed time but also introduced the possibility of errors.

The overarching goal was to achieve the following improvements:

  • Cost and Effort Optimization: By introducing a more streamlined process, the aim was to reduce both human and machine costs associated with data extraction and manipulation.
  • Automation of Data Processes: The implementation sought to automate the transfer and transformation of data, eliminating the need for manual interventions and enhancing overall efficiency.
  • Accessibility for Non-Technical Users: The objective was to establish a sustainable system that could be effectively managed by non-technical personnel, ensuring long-term viability and ease of operation.

Implementation

The implemented solution involved the adoption of an Extract, Load, Transform (ELT) approach, leveraging Fivetran as the data integration tool, Snowflake as the data warehouse, and DBT (Data Build Tool) for advanced data transformation. Fivetran played a crucial role in automating the extraction of data from diverse sources, including ERP, analytics API, external partner API, and databases, simplifying the process and minimizing manual efforts.

The chosen data warehouse, Snowflake, provided a robust platform for storing and managing the extracted data. Its scalability and performance capabilities were instrumental in handling the diverse datasets efficiently. Additionally, the integration of DBT facilitated the necessary transformations, normalizations, and calculations on the raw data, ensuring consistency and coherence across the entire dataset.

For data visualization, Power BI was employed, offering a user-friendly interface for non-technical users to explore and comprehend the insights derived from the transformed and cleaned data. This combination of Fivetran, Snowflake, DBT, and Power BI created a seamless end-to-end solution, optimizing the ELT process, enhancing automation, and empowering non-technical stakeholders to derive meaningful insights from the data.

The implementation of this integrated solution yielded substantial gains across various dimensions of the organization's data management and analysis processes.

Use-Case: Inverter Availability and Data Gap Identification

One prominent use-case within the implemented solution revolves around monitoring inverter availability, a critical aspect in the energy solutions domain. Occasionally, data retrieval encounters delays due to communication issues, resulting in gaps within the data series. To address this challenge, a solution has been deployed to identify these data gaps.

The system is designed to recognize instances where inverter data is not available during the scheduled fetch. Leveraging advanced functionalities of DBT (Data Build Tool), the solution intelligently detects these gaps, triggering an automated process to fetch the missing data from the source. The backfilling mechanism ensures a seamless continuity in the data series, mitigating the impact of temporary unavailability and maintaining the integrity of the overall dataset.

By proactively addressing and resolving communication-related delays, this use-case exemplifies the system's adaptability and resilience, ensuring that critical information on inverter availability remains consistently accurate and up-to-date for effective decision-making in the energy solutions ecosystem.

Gains from the Integrated Data Solution

Firstly, the adoption of an ELT approach, facilitated by Fivetran, Snowflake, and DBT, led to significant time and cost savings. The automated data extraction and transformation processes reduced the dependency on manual interventions, optimizing human and machine efforts.

The use of Snowflake as the data warehouse provided scalability and performance improvements, accommodating the diverse datasets seamlessly. This scalability translated into enhanced processing speed and efficiency, further contributing to improved overall system performance.

DBT's contribution was pivotal in ensuring data consistency and reliability. The ability to transform and normalize data effectively streamlined operations, enabling more accurate and reliable insights.

With Power BI as the chosen data visualization tool, non-technical users gained access to a user-friendly interface, empowering them to explore and interpret data insights independently. This not only improved decision-making but also fostered a culture of data-driven decision support across the organization.

In essence, the gains encompassed operational efficiency, cost reduction, improved data reliability, and enhanced accessibility for non-technical users, collectively positioning the organization for more informed and strategic decision-making.