Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Day 1: Data Processing and Python Essentials
Session 1: Spark DataFrames and Basic Operations
- Working with Spark DataFrames Implementing Basic Operations
- Groupby and Aggregate Operations
- Handling Timestamps and Dates
- Hands-on Exercise: Data analysis using Spark DataFrames
Session 2: Python Programming for Big Data
- Core Python for Data Handling Using Variables, Lists, and Functions
- Working with Classes and Files
- Integrating APIs and External Data
- Hands-on Exercise: Building a Python project that processes and analyzes data with PySpark
Day 2: Advanced PySpark and Machine Learning
Session 3: Machine Learning with PySpark
- Implementing Machine Learning with Spark MLlib Linear and Logistic Regression
- Random Forest Classification Models
- Hands-on Exercise: Building and evaluating machine learning models using PySpark
Session 4: Clustering and Recommender Systems
- K-means Clustering Theory and Practical Implementation
- Hands-on Exercise: Building a K-means clustering model
- Recommender Systems Building a recommendation engine with Spark MLlib
- Hands-on Exercise: Recommender system project
Session 5: Spark Streaming and NLP
- Real-Time Data Streaming with Spark Implementing real-time data processing
- Hands-on Exercise: Streaming data with Spark
- Natural Language Processing (NLP) with PySpark Implementing basic NLP tasks
- Hands-on Exercise: NLP pipeline using PySpark
Requirements
Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python.
Target Audience: Intermediate-level professionals in the banking industry familiar with Python and Spark, seeking to deepen their skills in big data processing and machine learning.
14 Hours