DENG-254: Preparing with Cloudera Data Engineering
Duration: 4 Days (32 Hours)
DENG-254: Preparing with Cloudera Data Engineering Course Overview:
Conducted over four days, this immersive training course is designed to equip developers with the essential concepts and skills to harness the power of Apache Spark for crafting high-performance, parallel applications within the Cloudera Data Platform (CDP) environment.
The course combines theoretical understanding with hands-on experience, enabling participants to proficiently develop Spark applications that seamlessly integrate with core components of CDP. Through practical exercises, students will become adept at crafting Spark applications, while also gaining expertise in utilizing Spark SQL to query structured data. Moreover, the curriculum delves into leveraging Hive capabilities for data ingestion and denormalization, as well as handling substantial volumes of “big data” stored within a distributed file system.
Intended Audience:
- This course is designed for developers and data engineers. All students are expected to have basic Linux experience, and basic proficiency with either Python or Scala programming languages.
Learning Objectives of DENG-254: Preparing with Cloudera Data Engineering:
During this course, you will learn how to:
- Distribute, store, and process data in a CDP cluster
- Write, configure, and deploy Apache Spark applications
- Use the Spark interpreters and Spark applications to explore, process, and analyze distributed data
- Query data using Spark SQL, DataFrames, and Hive tables
- Deploy a Spark application on the Data Engineering Service
HDFS Introduction
- HDFS Overview
- HDFS Components and Interactions
- Additional HDFS Interactions
- Ozone Overview
- Exercise: Working with HDFS
YARN Introduction
- YARN Overview
- YARN Components and Interaction
- Working with YARN
- Exercise: Working with YARN
Working with RDDs
- Resilient Distributed Datasets (RDDs)
- Exercise: Working with RDDs
Working with DataFrames
- Introduction to DataFrames
- Exercise: Introducing DataFrames
- Exercise: Reading and Writing DataFrames
- Exercise: Working with Columns
- Exercise: Working with Complex Types
- Exercise: Combining and Splitting DataFrames
- Exercise: Summarizing and Grouping DataFrames
- Exercise: Working with UDFs
- Exercise: Working with Windows
Introduction to Apache Hive
- About Hive
- Transforming data with Hive QL
Working with Apache Hive
- Exercise: Working with Partitions
- Exercise: Working with Buckets
- Exercise: Working with Skew
- Exercise: Using Serdes to Ingest Text Data
- Exercise: Using Complex Types to Denormalize Data
Hive and Spark Integration
- Hive and Spark Integration
- Exercise: Spark Integration with Hive
Distributed Processing Challenges
- Shuffle
- Skew
- Order
Spark Distributed Processing
- Spark Distributed Processing
- Exercise: Explore Query Execution Order
Spark Distributed Persistence
- DataFrame and Dataset Persistence
- Persistence Storage Levels
- Viewing Persisted RDDs
- Exercise: Persisting DataFrames
Data Engineering Service
- Create and Trigger Ad-Hoc Spark Jobs
- Orchestrate a Set of Jobs Using Airflow
- Data Lineage using Atlas
- Auto-scaling in Data Engineering Service
Workload XM
- Optimize Workloads, Performance, Capacity
- Identify Suboptimal Spark Jobs
DENG-254: Preparing with Cloudera Data Engineering Course Prerequisites
- Basic knowledge of SQL is helpful. Prior knowledge of Spark and Hadoop is not required.
Discover the perfect fit for your learning journey
Choose Learning Modality
Live Online
- Convenience
- Cost-effective
- Self-paced learning
- Scalability
Classroom
- Interaction and collaboration
- Networking opportunities
- Real-time feedback
- Personal attention
Onsite
- Familiar environment
- Confidentiality
- Team building
- Immediate application
Training Exclusives
This course comes with following benefits:
- Practice Labs.
- Get Trained by Certified Trainers.
- Access to the recordings of your class sessions for 90 days.
- Digital courseware
- Experience 24*7 learner support.
Got more questions? We’re all ears and ready to assist!