Skip to content

anjijava16/Databricks_fs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 

Repository files navigation

# Databricks_fs

Like πŸ‘ Share 🀝

✳️ Different Layers in Databricks Lakehouse Architecture? ✳️

✳️ Landing Layer: (native Format) -
βœ… This layer is an optional and depends on source systems and data.
βœ…  Landing is just container in data lake to store raw source data.
βœ…  This layer represents the area where data land from the data source before processing into delta layers.
βœ…  Different external systems data ingesting in data lake in native foramt.
βœ…  Landing is just source systems data in native files like (csv,json,xml,parquet...)
βœ…  Landing data can be structured , semi-strucutred and un-strucutred files.
βœ…  Landing data comes from Different sources as a Batch/Streaming Process.

✳️ Bronze layer (Delta Format) 
βœ…  source data converted and loaded as delta format
βœ…  everyday data will be appended in delta tables.
βœ…  bronze tabels are partitioned with updated_date/load_Date to get better performance.
βœ…  Different external source systems data managed in bronze layer. 
βœ…  The table structures in this layer correspond to the source system table structures "as-is,".
βœ…  Bronze tabels will have additional metadata columns that capture the load date/time, process ID, etc. 
βœ…  The focus in this layer is quick Change Data Capture and the ability to provide an historical archive of source (cold storage).
βœ…  Bronze can be used for reload scenarios in future.
All Historical data will be managed here with audit columns.


✳️ Silver Layer  (Delta Format)
βœ…  Uses DeltaLake tables (with SQL table names)
βœ…  Preserves grain of original data (no aggregation)
βœ…  Eliminates duplicate records
βœ…  Production schema enforced
βœ…  Data quality checks passed
βœ…  Corrupt data quarantined
βœ…  Data stored to support production workloads
βœ…  Optimized for long-term retention and ad-hoc queries
βœ…  Validate data quality and schema
βœ…  Enrich and transform data
βœ…  Optimize data layout and storage for downstream queries
βœ…  Provide single source of truth for analytics


✳️ Gold layer (Delta Format)
βœ…  Validated and business-level tables
βœ…  lakehouse is typically organized in consumption-ready "project-specific" databases. 
βœ…  The Gold layer is for reporting and uses more de-normalized and read-optimized data models with fewer joins. 
βœ…  The final layer of data transformations and data quality rules are applied here. 
βœ…  Final presentation layer of projects are business data wise models.
βœ…  We see a lot of Kimball style star schema-based data models or Inmon style Data marts fit in this Gold Layer of the lakehouse.

✳️ Benefits of multiple layers
βœ… Simple data model
βœ… Easy to understand and implement
βœ… Enables incremental ETL
βœ…Can recreate your tables from raw data at any time
βœ… ACID transactions, time travel

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors