Data Engineering Internship 2026
Section
The Capstone Project (Weeks 8–12)
From Week 8 onwards you build the Movie Analytics System — an individually-built end-to-end analytics platform that integrates everything from the first seven weeks.
Technical Requirements
Your capstone must demonstrate proficiency across all five technical areas. Refer to the evaluation rubric in Chapter 11.
| Area | What to include |
|---|---|
| Python + Pandas | Classes for data extraction and transformation. Modular, reusable functions. Proper error handling. |
| PySpark | Batch data processing. Schema enforcement. Write to PostgreSQL or Parquet. |
| dbt | Full three-layer project (staging, intermediate, marts). Tests and documentation. source() and ref() used correctly. |
| Airflow | DAG that orchestrates the full pipeline end to end. Scheduled. All tasks have retries. |
| Power BI | Star schema data model. 5+ DAX measures. Dashboard with clear data story and recommended actions. |
Training Repo Structure
| Folder | Contents |
|---|---|
| data/ | Raw datasets and sample data files |
| notebooks/ | Exploratory analysis notebooks for each week |
| scripts/ | Python scripts for extraction and loading |
| dbt_project/ | dbt project starter template |
| airflow/dags/ | Airflow DAG examples |
| docs/ | Week-by-week activity instructions |
Capstone Presentation (Week 12)
The final presentation covers the full system. Prepare all four sections:
| Section | Time | What to cover |
|---|---|---|
| Live demo | 5 min | Walk through the Power BI dashboard. Show it answering at least 3 business questions. |
| Architecture walkthrough | 5 min | Diagram showing the full data flow: source → PySpark → PostgreSQL → dbt → Power BI. |
| Code walkthrough | 5 min | Show 2-3 key pieces of code: a dbt model, a PySpark job, or an Airflow DAG. Explain design decisions. |
| Learnings | 5 min | What was hardest. What you would do differently. What you are taking away from the bootcamp. |
Pro Tip
Read the full capstone brief on Day 1 of Week 8 — do not wait until Week 10 to understand the requirements.
Use the training repo notebooks as reference, not as copy-paste code. Write your own implementation.
Start the Power BI dashboard by Week 10 at the latest. Visualisation takes more time than expected.