Declined

“Data Lake” Simulation of Data Processing and Analysis with Schema on Read Approach

Authors

1

Muhammad Irfan Zuhri

Institut Teknologi Sepuluh Nopember

2

Ade Irma Rosida Wijaya

3

Dwi Oktavianto Wahyu Nugroho

Abstract

Advancements in digital technology have driven an increase in volume, velocity, and variety of data generated by modern systems. Organizations are now required to manage structured, semi-structured, and unstructured data originating from diverse sources, creating the need for a flexible and scalable storage architecture. A Data Lake is a data storage architecture designed to accommodate various types of data in their original form without structural limitations, allowing the integration of structured, semi-structured, and unstructured data within a single platform. This study implements a simulation of data processing and data analysis using a schema-on-read approach, a method in which the schema is applied only when the data is accessed or analyzed. In this research, the simulation illustrates the flow of ingestion, storage, metadata management, and analytical processes that take advantage of the flexibility offered by schema on read. This approach provides an overview of how a data lake can support adaptive big data analysis without requiring complex initial transformations.

Publication Info

Submitted
18 December 2025

Original Article

View this article on the original journal website for additional features and citation options.

View in OJS

Share

Publication History

Transparent editorial process timeline

Submitted

18 Dec 2025

Editorial Decision

24 Dec 2025