The Bronze Data Layer – Building a Resilient Foundation for Raw Data Ingestion

Justyna
PMO Manager at Multishoring

Main Problems

  • The "Data Swamp" Risk
  • Operational Impact
  • Schema Drift
  • Cost & Compliance

Most data projects do not fail because the math is wrong. They fail because the starting data is flawed. When an engineer finds a mistake in the calculations six months later, the ability to fix it depends on one thing: did you keep the original copy?

Executive summary

In the medallion architecture, the bronze layer is the first stop for your data. Some teams treat it like a temporary dumping ground. This is a mistake. This layer is your safety net. It is the only place where the data exists exactly as it came from the source.

Your operational systems, like a CRM or ERP, change constantly. They overwrite old records, update customer statuses, and delete canceled orders. Once that happens in the source, the history is lost.

A proper raw data ingestion strategy prevents this loss. It creates a permanent archive. It captures every record exactly as it arrived. This allows you to “replay” the past if you find a bug or if your reporting needs change. This article explains why the bronze layer is the foundation of a healthy data lakehouse and how to build it correctly.

What is the Bronze Layer in Medallion Architecture?

The Bronze layer, often called the Landing Zone, is the storage area where data arrives from external systems. It holds a precise copy of the information in its original format.

In the data lake bronze silver gold pattern, this layer has one rule: do not touch the data.

You do not correct typos. You do not remove duplicates. You do not convert currency. If the source system sends a file with errors, the Bronze layer accepts those errors. This “As-Is” approach is intentional.

It decouples data delivery from data processing. If the cleaning process fails later, your ingestion process keeps running without interruption.

Key Characteristics

  • Raw Fidelity: The table structure mirrors the source. If your CRM has a column called CUST_NM, your Bronze table has CUST_NM, not CustomerName.
  • Metadata Tags: While you do not change the data, you do add to it. You append technical columns to track context. Common additions include the ingestion timestamp (when it arrived), the source system ID (where it came from), and the file name.
  • Efficient Formats: Engineers typically store this data in compressed, open formats like Parquet or Delta Lake. These formats are cheaper to store and faster to read than standard CSV files or JSON blobs.

Struggling with a chaotic “Data Swamp”?

We specialize in building resilient Data Lakehouse foundations. From low-impact CDC ingestion to cost-optimized Bronze layer architecture, our experts ensure your raw data is safe, audit-ready, and scalable.

EXPLORE DATA SERVICES

Let’s discuss how to secure your data foundation.

Anna - PMO Specialist
Anna PMO Specialist

Let’s discuss how to secure your data foundation.

EXPLORE DATA SERVICES
Anna - PMO Specialist
Anna PMO Specialist

Why Storing “Dirty” Data Matters for Business Continuity

A common question from business leaders is: “Why store bad data? Why not clean it immediately?”

You store it because business rules change, but history does not.

Imagine you calculate “Net Profit” by subtracting costs from revenue. You run this calculation during ingestion and discard the raw numbers. Six months later, the finance team decides that “Shipping Costs” should not be part of that specific calculation. If you did not keep the raw data, you cannot fix the past reports. You are stuck with the old numbers.

The Bronze layer acts as a permanent historical data archive. It allows you to reprocess your entire history based on new rules without asking your IT team to query the busy ERP system again.

This also supports compliance. Auditors often ask to see exactly what the system received before any changes were made. The Bronze data layer provides this proof. It establishes clear data lineage, showing the exact path from the source to the final report. Because this data is rarely accessed, it can sit in cheaper “cold storage,” keeping costs low while maintaining safety.

A flow diagram of the Medallion Architecture highlighting the Bronze Layer. It shows data moving from Source Systems into the Bronze Layer, where it is stored as an immutable "As-Is" copy with added metadata tags, before moving to the Silver and Golden layers. Key benefits listed include business continuity, compliance, and historical archiving.

Bronze Data Ingestion – ELT Methodology and Change Data Capture

Speed is the priority in the bronze layer. To move fast, modern architectures use the ELT methodology (Extract, Load, Transform).

In the past, teams used ETL. They tried to transform the data before loading it. This created a bottleneck. If the transformation logic broke, the data never landed. With ELT, you load the data first (Bronze) and worry about fixing it later (Silver). This keeps the pipeline flowing.

The Role of Change Data Capture (CDC)

Loading data efficiently requires smart techniques. You should not copy the entire database every night. That is slow and expensive.

Instead, teams use Change Data Capture.

CDC software watches your source system. It identifies only what changed: new rows, updates to existing rows, or deletions. It grabs these specific changes and appends them to your Bronze layer. This allows for low-latency ingestion, meaning your data platform is only minutes behind the real world, rather than a whole day.

Common Bronze Layer Errors – When Should You Transform?

The most common mistake we see in early implementations is “over-eagerness.” Engineers or analysts want to make the data look good immediately. They start renaming columns or filtering out null values before the data lands in Bronze.

Do not do this.

Transformations belong in the silver data layer, paving the way for the business-ready gold layer. Here is why applying them too early causes problems:

  • Renaming Columns: If you change cust_id to CustomerID in Bronze, and the source system adds a new field, your automated scripts might fail because the names no longer match.
  • Filtering Data: If you delete “test” accounts during ingestion, you might accidentally delete real customers who have “test” in their name. Once deleted at this stage, that data is gone.
  • Ignoring Schema Drift: Source systems change. A marketing tool might add a “Twitter Handle” field tomorrow. Your Bronze layer must be flexible enough to accept this new column without crashing.

Treat the Bronze layer as a data quality firewall. It captures everything, good or bad, so you can decide how to handle it later in a controlled environment.

How Multishoring Optimizes Your Data Ingestion Strategy

Building a solid Bronze layer looks simple on paper, but the reality involves complex decisions about file formats, partition strategies, and connectivity. A mistake here ripples through your entire company. If the foundation is weak, the reports on the CEO’s desk will be wrong.

Multishoring helps organizations build resilient data architectures. Through our data integration consulting services, we design pipelines that handle high volumes of data and adapt to changes in your source systems. Whether you need custom database development or a migration to a modern cloud platform, we make sure your history is safe and your future is ready.

Is your data foundation solid enough to support AI and advanced data analytics? Stop building on shaky ground. Contact Multishoring today to discuss your integration strategy and ensure your raw data is secure, auditable, and ready for use.

contact

Thank you for your interest in Multishoring.

We’d like to ask you a few questions to better understand your IT needs.

Justyna PMO Manager

    * - fields are mandatory

    Signed, sealed, delivered!

    Await our messenger pigeon with possible dates for the meet-up.

    Justyna PMO Manager

    Let me be your single point of contact and lead you through the cooperation process.