Stratpoint Engineering

Data Engineering Internship 2026

Sign in with your Stratpoint Google account to continue.

Data Engineering
Data Engineering Internship 2026
Section

The Capstone Project (Weeks 8–12)

From Week 8 onwards you build the Movie Analytics System — an individually-built end-to-end analytics platform that integrates everything from the first seven weeks.

Technical Requirements

Your capstone must demonstrate proficiency across all five technical areas. Refer to the evaluation rubric in Chapter 11.

AreaWhat to include
Python + PandasClasses for data extraction and transformation. Modular, reusable functions. Proper error handling.
PySparkBatch data processing. Schema enforcement. Write to PostgreSQL or Parquet.
dbtFull three-layer project (staging, intermediate, marts). Tests and documentation. source() and ref() used correctly.
AirflowDAG that orchestrates the full pipeline end to end. Scheduled. All tasks have retries.
Power BIStar schema data model. 5+ DAX measures. Dashboard with clear data story and recommended actions.

Training Repo Structure

FolderContents
data/Raw datasets and sample data files
notebooks/Exploratory analysis notebooks for each week
scripts/Python scripts for extraction and loading
dbt_project/dbt project starter template
airflow/dags/Airflow DAG examples
docs/Week-by-week activity instructions

Capstone Presentation (Week 12)

The final presentation covers the full system. Prepare all four sections:

SectionTimeWhat to cover
Live demo5 minWalk through the Power BI dashboard. Show it answering at least 3 business questions.
Architecture walkthrough5 minDiagram showing the full data flow: source → PySpark → PostgreSQL → dbt → Power BI.
Code walkthrough5 minShow 2-3 key pieces of code: a dbt model, a PySpark job, or an Airflow DAG. Explain design decisions.
Learnings5 minWhat was hardest. What you would do differently. What you are taking away from the bootcamp.

Pro Tip

Read the full capstone brief on Day 1 of Week 8 — do not wait until Week 10 to understand the requirements.

Use the training repo notebooks as reference, not as copy-paste code. Write your own implementation.

Start the Power BI dashboard by Week 10 at the latest. Visualisation takes more time than expected.