The Agenda

Agenda Home
SQLBits 2024 runs from Tuesday 19th – Saturday 23rd March.
Operations

Advanced Data Engineering with Databricks on the Lakehouse

Description

In this session, you will build upon existing knowledge of Apache Spark™, Structured Streaming and Delta Lake to unlock the full potential of the lakehouse by utilising the suite of tools provided by Databricks. This session places a heavy emphasis on designs favouring incremental data processing, enabling systems optimised to continuously ingest and analyse ever-growing data. The topics in this course helps learners to work towards the Databricks Certified Data Engineer Professional exam.

Learning Objectives


• Design databases and pipelines optimized for the Databricks Lakehouse Platform
• Implement efficient incremental data processing to validate and enrich data-driven business decisions and applications
• Leverage Databricks-native features for managing access to sensitive data and fulfilling right-to-be-forgotten requests
• Manage error troubleshooting, code promotion, task orchestration and production job monitoring using Databricks tools

Previous Experience

Not all of the below experience are required, but 3/5 is recommended - Experience using PySpark APIs to perform advanced data transformations - Experience using SQL in production data warehouse or data lake implementations - Experience working in Databricks notebooks and configuring clusters - Familiarity with creating and manipulating data in Delta Lake tables with SQL - Ability to use Spark Structured Streaming to incrementally read from a Delta table

Tech Covered

Data Bricks, Operations