Data Engineering Internship 2026
Chapter 10
Weekly Milestones Tracker
Self-assess at the end of each week. Bring unchecked items to your next consultation session.
Day 0 | Setup
- WSL2 (Windows) or Terminal (macOS) is working
- Docker Desktop is running and docker compose version works
- PostgreSQL container starts and I can connect with psql
- Python 3.11+ installed and pip works
- dbt --version shows 1.8+
- Airflow UI accessible at localhost:8080
- PySpark SparkSession creates without errors
- VS Code with all required extensions installed
Week 1 | Database Design
- I can explain the difference between OLTP and OLAP
- I can design a normalized ERD from a business scenario
- I can design a star schema with a fact table and dimension tables
- I understand when to use star schema vs snowflake schema
- I completed Project 1 and presented my ERD and star schema to instructors
- I submitted my design justification document
Week 2 | Core Tools
- I can write a bash script that downloads a file and handles errors
- I can load a CSV into PostgreSQL using PySpark
- I can clean a DataFrame with Pandas (nulls, deduplication, type casting)
- I can write window functions in SQL (RANK, LAG, LEAD)
- I completed Project 2 and demonstrated the end-to-end pipeline
Weeks 3–5 | Data Pipelines
- I understand the difference between ETL and ELT
- I can create a dbt project with staging, intermediate, and mart layers
- I use source() for raw tables and ref() for dbt models — never hardcoded table names
- I have unique and not_null tests on all primary keys
- I can build an Airflow DAG that runs a multi-step pipeline
- I completed Project 3 — dbt run and dbt test both pass with zero errors
- I completed Project 3 — Airflow DAG runs end to end with all tasks green
Weeks 6–7 | Power BI
- I can connect Power BI Desktop to a PostgreSQL database
- I can build a star schema data model in Power BI Model view
- I have written at least 5 DAX measures in a dedicated Measures table
- My dashboard answers at least 3 specific business questions
- I have written a data story that includes a recommended action
- I completed Project 4 and presented the dashboard to instructors
Weeks 8–12 | Capstone
- I have read the full capstone brief on Day 1 of Week 8
- I have an architecture diagram showing the full data flow
- Python classes and modular functions are used for extraction and transformation
- PySpark processes the data and writes to PostgreSQL
- dbt project has staging/intermediate/mart layers with passing tests
- Airflow DAG orchestrates the full pipeline on a schedule
- Power BI dashboard is connected to the mart tables
- I have submitted my capstone and delivered the final presentation