delete all

Merge pull request #438 from DataTalksClub/de-zoomcamp
Update week 4 project
2024-01-28 00:02:37 +00:00 · 2024-01-28 01:01:11 +01:00 · 2024-01-28 00:00:37 +00:00 · 2024-01-27 23:57:45 +00:00 · 2024-01-27 22:54:09 +00:00 · 2024-01-27 22:55:02 +01:00
161 changed files with 1000 additions and 1283 deletions
--- a/week_1_basics_n_setup/1_terraform_gcp/1_terraform_overview.md
+++ b/week_1_basics_n_setup/1_terraform_gcp/1_terraform_overview.md
--- a/week_1_basics_n_setup/1_terraform_gcp/2_gcp_overview.md
+++ b/week_1_basics_n_setup/1_terraform_gcp/2_gcp_overview.md
--- a/week_1_basics_n_setup/1_terraform_gcp/README.md
+++ b/week_1_basics_n_setup/1_terraform_gcp/README.md
--- a/week_1_basics_n_setup/1_terraform_gcp/terraform/README.md
+++ b/week_1_basics_n_setup/1_terraform_gcp/terraform/README.md
--- a/week_1_basics_n_setup/1_terraform_gcp/terraform/terraform_basic/main.tf
+++ b/week_1_basics_n_setup/1_terraform_gcp/terraform/terraform_basic/main.tf
--- a/week_1_basics_n_setup/1_terraform_gcp/terraform/terraform_with_variables/main.tf
+++ b/week_1_basics_n_setup/1_terraform_gcp/terraform/terraform_with_variables/main.tf
--- a/week_1_basics_n_setup/1_terraform_gcp/terraform/terraform_with_variables/variables.tf
+++ b/week_1_basics_n_setup/1_terraform_gcp/terraform/terraform_with_variables/variables.tf
--- a/week_1_basics_n_setup/1_terraform_gcp/windows.md
+++ b/week_1_basics_n_setup/1_terraform_gcp/windows.md
--- a/week_1_basics_n_setup/2_docker_sql/.gitignore
+++ b/week_1_basics_n_setup/2_docker_sql/.gitignore
--- a/week_1_basics_n_setup/2_docker_sql/Dockerfile
+++ b/week_1_basics_n_setup/2_docker_sql/Dockerfile
--- a/week_1_basics_n_setup/2_docker_sql/README.md
+++ b/week_1_basics_n_setup/2_docker_sql/README.md
--- a/week_1_basics_n_setup/2_docker_sql/docker-compose.yaml
+++ b/week_1_basics_n_setup/2_docker_sql/docker-compose.yaml
--- a/week_1_basics_n_setup/2_docker_sql/ingest_data.py
+++ b/week_1_basics_n_setup/2_docker_sql/ingest_data.py
--- a/week_1_basics_n_setup/2_docker_sql/pg-test-connection.ipynb
+++ b/week_1_basics_n_setup/2_docker_sql/pg-test-connection.ipynb
--- a/week_1_basics_n_setup/2_docker_sql/pipeline.py
+++ b/week_1_basics_n_setup/2_docker_sql/pipeline.py
--- a/week_1_basics_n_setup/2_docker_sql/upload-data.ipynb
+++ b/week_1_basics_n_setup/2_docker_sql/upload-data.ipynb
--- a/01-docker-terraform/README.md
+++ b/01-docker-terraform/README.md
@ -0,0 +1,174 @@
+# Introduction
+
+* [Video](https://www.youtube.com/watch?v=-zpVha7bw5A)
+* [Slides](https://www.slideshare.net/AlexeyGrigorev/data-engineering-zoomcamp-introduction)
+* Overview of [Architecture](https://github.com/DataTalksClub/data-engineering-zoomcamp#overview), [Technologies](https://github.com/DataTalksClub/data-engineering-zoomcamp#technologies) & [Pre-Requisites](https://github.com/DataTalksClub/data-engineering-zoomcamp#prerequisites)
+
+
+We suggest watching videos in the same order as in this document.
+
+The last video (setting up the environment) is optional, but you can check it earlier 
+if you have troubles setting up the environment and following along with the videos.
+
+
+# Docker + Postgres
+
+[Code](2_docker_sql)
+
+## :movie_camera: [Introduction to Docker](https://www.youtube.com/watch?v=EYNwNlOrpr0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+* Why do we need Docker
+* Creating a simple "data pipeline" in Docker
+
+
+## :movie_camera: [Ingesting NY Taxi Data to Postgres](https://www.youtube.com/watch?v=2JM-ziJt0WI&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+* Running Postgres locally with Docker
+* Using `pgcli` for connecting to the database
+* Exploring the NY Taxi dataset
+* Ingesting the data into the database
+* **Note** if you have problems with `pgcli`, check [this video](https://www.youtube.com/watch?v=3IkfkTwqHx4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb) for an alternative way to connect to your database
+
+## :movie_camera: [Connecting pgAdmin and Postgres](https://www.youtube.com/watch?v=hCAIVe9N0ow&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+* The pgAdmin tool
+* Docker networks
+
+
+Note: The UI for PgAdmin 4 has changed, please follow the below steps for creating a server:
+
+* After login to PgAdmin, right click Servers in the left sidebar.
+* Click on Register.
+* Click on Server.
+* The remaining steps to create a server are the same as in the videos.
+
+
+## :movie_camera: [Putting the ingestion script into Docker](https://www.youtube.com/watch?v=B1WwATwf-vY&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+* Converting the Jupyter notebook to a Python script
+* Parametrizing the script with argparse
+* Dockerizing the ingestion script
+
+## :movie_camera: [Running Postgres and pgAdmin with Docker-Compose](https://www.youtube.com/watch?v=hKI6PkPhpa0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+* Why do we need Docker-compose
+* Docker-compose YAML file
+* Running multiple containers with `docker-compose up`
+
+## :movie_camera: [SQL refresher](https://www.youtube.com/watch?v=QEcps_iskgg&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+* Adding the Zones table
+* Inner joins
+* Basic data quality checks
+* Left, Right and Outer joins
+* Group by
+
+## :movie_camera: Optional: Docker Networing and Port Mapping
+
+Optional: If you have some problems with docker networking, check [Port Mapping and Networks in Docker](https://www.youtube.com/watch?v=tOr4hTsHOzU&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+* Docker networks
+* Port forwarding to the host environment
+* Communicating between containers in the network
+* `.dockerignore` file
+
+## :movie_camera: Optional: Walk-Through on WSL
+
+Optional: If you are willing to do the steps from "Ingesting NY Taxi Data to Postgres" till "Running Postgres and pgAdmin with Docker-Compose" with Windows Subsystem Linux please check [Docker Module Walk-Through on WSL](https://www.youtube.com/watch?v=Mv4zFm2AwzQ)
+
+
+# GCP
+
+## :movie_camera: Introduction to GCP (Google Cloud Platform)
+
+[Video](https://www.youtube.com/watch?v=18jIzE41fJ4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+
+# Terraform
+
+[Code](1_terraform_gcp)
+
+## :movie_camera: Introduction Terraform: Concepts and Overview
+
+* [Video](https://youtu.be/s2bOYDCKl_M)
+* [Companion Notes](1_terraform_gcp)
+
+## :movie_camera: Terraform Basics: Simple one file Terraform Deployment
+
+* [Video](https://youtu.be/Y2ux7gq3Z0o)
+* [Companion Notes](1_terraform_gcp)
+
+## :movie_camera: Deployment with a Variables File
+
+* [Video](https://youtu.be/PBi0hHjLftk)
+* [Companion Notes](1_terraform_gcp)    
+
+## Configuring terraform and GCP SDK on Windows
+
+* [Instructions](1_terraform_gcp/windows.md)
+
+
+# Environment setup 
+
+For the course you'll need:
+
+* Python 3 (e.g. installed with Anaconda)
+* Google Cloud SDK
+* Docker with docker-compose
+* Terraform
+
+If you have problems setting up the env, you can check these videos
+
+## :movie_camera: GitHub Codespaces
+
+[Preparing the environment with GitHub Codespaces](https://www.youtube.com/watch?v=XOSUt8Ih3zA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+
+## :movie_camera: GCP Cloud VM 
+
+[Setting up the environment on cloud VM](https://www.youtube.com/watch?v=ae-CV2KfoN0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+* Generating SSH keys
+* Creating a virtual machine on GCP
+* Connecting to the VM with SSH
+* Installing Anaconda
+* Installing Docker
+* Creating SSH `config` file
+* Accessing the remote machine with VS Code and SSH remote
+* Installing docker-compose
+* Installing pgcli
+* Port-forwarding with VS code: connecting to pgAdmin and Jupyter from the local computer
+* Installing Terraform
+* Using `sftp` for putting the credentials to the remote machine
+* Shutting down and removing the instance
+
+# Homework
+
+* [Homework](../cohorts/2024/01-docker-terraform/homework.md)
+
+
+# Community notes
+
+Did you take notes? You can share them here
+
+* [Notes from Alvaro Navas](https://github.com/ziritrion/dataeng-zoomcamp/blob/main/notes/1_intro.md)
+* [Notes from Abd](https://itnadigital.notion.site/Week-1-Introduction-f18de7e69eb4453594175d0b1334b2f4)
+* [Notes from Aaron](https://github.com/ABZ-Aaron/DataEngineerZoomCamp/blob/master/week_1_basics_n_setup/README.md)
+* [Notes from Faisal](https://github.com/FaisalMohd/data-engineering-zoomcamp/blob/main/week_1_basics_n_setup/Notes/DE%20Zoomcamp%20Week-1.pdf)
+* [Michael Harty's Notes](https://github.com/mharty3/data_engineering_zoomcamp_2022/tree/main/week01)
+* [Blog post from Isaac Kargar](https://kargarisaac.github.io/blog/data%20engineering/jupyter/2022/01/18/data-engineering-w1.html)
+* [Handwritten Notes By Mahmoud Zaher](https://github.com/zaherweb/DataEngineering/blob/master/week%201.pdf)
+* [Notes from Candace Williams](https://teacherc.github.io/data-engineering/2023/01/18/zoomcamp1.html)
+* [Notes from Marcos Torregrosa](https://www.n4gash.com/2023/data-engineering-zoomcamp-semana-1/)
+* [Notes from Vincenzo Galante](https://binchentso.notion.site/Data-Talks-Club-Data-Engineering-Zoomcamp-8699af8e7ff94ec49e6f9bdec8eb69fd)
+* [Notes from Victor Padilha](https://github.com/padilha/de-zoomcamp/tree/master/week1)
+* [Notes from froukje](https://github.com/froukje/de-zoomcamp/blob/main/week_1_basics_n_setup/notes/notes_week_01.md)
+* [Notes from adamiaonr](https://github.com/adamiaonr/data-engineering-zoomcamp/blob/main/week_1_basics_n_setup/2_docker_sql/NOTES.md)
+* [Notes from Xia He-Bleinagel](https://xiahe-bleinagel.com/2023/01/week-1-data-engineering-zoomcamp-notes/)
+* [Notes from Balaji](https://github.com/Balajirvp/DE-Zoomcamp/blob/main/Week%201/Detailed%20Week%201%20Notes.ipynb)
+* [Notes from Erik](https://twitter.com/ehub96/status/1621351266281730049)
+* [Notes by Alain Boisvert](https://github.com/boisalai/de-zoomcamp-2023/blob/main/week1.md)
+* Notes on [Docker, Docker Compose, and setting up a proper Python environment](https://medium.com/@verazabeida/zoomcamp-2023-week-1-f4f94cb360ae), by Vera
+* [Setting up the development environment on Google Virtual Machine](https://itsadityagupta.hashnode.dev/setting-up-the-development-environment-on-google-virtual-machine), blog post by Aditya Gupta
+* [Notes from Zharko Cekovski](https://www.zharconsulting.com/contents/data/data-engineering-bootcamp-2024/week-1-postgres-docker-and-ingestion-scripts/)
+* [2024 Module Walkthough video by ellacharmed on youtube](https://youtu.be/VUZshlVAnk4)
+* [2024 Companion Module Walkthough slides by ellacharmed](https://github.com/ellacharmed/data-engineering-zoomcamp/blob/ella2024/cohorts/2024/01-docker-terraform/walkthrough-01.pdf)
+* Add your notes here
--- a/02-workflow-orchestration/README.md
+++ b/02-workflow-orchestration/README.md
@ -0,0 +1,151 @@
+> If you're looking for Airflow videos from the 2022 edition,
+> check the [2022 cohort folder](../cohorts/2022/week_2_data_ingestion/). <br>
+> If you're looking for Prefect videos from the 2023 edition,
+> check the [2023 cohort folder](../cohorts/2023/week_2_data_ingestion/).
+
+# Week 2: Workflow Orchestration
+
+Welcome to Week 2 of the Data Engineering Zoomcamp! 🚀😤 This week, we'll be covering workflow orchestration with Mage.
+
+Mage is an open-source, hybrid framework for transforming and integrating data. ✨
+
+This week, you'll learn how to use the Mage platform to author and share _magical_ data pipelines. This will all be covered in the course, but if you'd like to learn a bit more about Mage, check out our docs [here](https://docs.mage.ai/introduction/overview). 
+
+* [2.2.1 - 📯 Intro to Orchestration](#221----intro-to-orchestration)
+* [2.2.2 - 🧙‍♂️ Intro to Mage](#222---%EF%B8%8F-intro-to-mage)
+* [2.2.3 - 🐘 ETL: API to Postgres](#223----etl-api-to-postgres)
+* [2.2.4 - 🤓 ETL: API to GCS](#224----etl-api-to-gcs)
+* [2.2.5 - 🔍 ETL: GCS to BigQuery](#225----etl-gcs-to-bigquery)
+* [2.2.6 - 👨‍💻 Parameterized Execution](#226----parameterized-execution)
+* [2.2.7 - 🤖 Deployment (Optional)](#227----deployment-optional)
+* [2.2.8 - 🧱 Advanced Blocks (Optional)](#228----advanced-blocks-optional)
+* [2.2.9 - 🗒️ Homework](#229---%EF%B8%8F-homework)
+* [2.2.10 - 👣 Next Steps](#2210----next-steps)
+
+## 📕 Course Resources
+
+### 2.2.1 - 📯 Intro to Orchestration
+
+In this section, we'll cover the basics of workflow orchestration. We'll discuss what it is, why it's important, and how it can be used to build data pipelines.
+
+Videos
+- 2.2.1a - [What is Orchestration?](https://www.youtube.com/watch?v=Li8-MWHhTbo&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+Resources
+- [Slides](https://docs.google.com/presentation/d/17zSxG5Z-tidmgY-9l7Al1cPmz4Slh4VPK6o2sryFYvw/)
+
+### 2.2.2 - 🧙‍♂️ Intro to Mage
+
+In this section, we'll introduce the Mage platform. We'll cover what makes Mage different from other orchestrators, the fundamental concepts behind Mage, and how to get started. To cap it off, we'll spin Mage up via Docker 🐳 and run a simple pipeline.
+
+Videos
+- 2.2.2a - [What is Mage?](https://www.youtube.com/watch?v=AicKRcK3pa4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+- 
+- 2.2.2b - [Configuring Mage](https://www.youtube.com/watch?v=tNiV7Wp08XE?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+- 2.2.2c - [A Simple Pipeline](https://www.youtube.com/watch?v=stI-gg4QBnI&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+Resources
+- [Getting Started Repo](https://github.com/mage-ai/mage-zoomcamp)
+- [Slides](https://docs.google.com/presentation/d/1y_5p3sxr6Xh1RqE6N8o2280gUzAdiic2hPhYUUD6l88/)
+
+### 2.2.3 - 🐘 ETL: API to Postgres
+
+Hooray! Mage is up and running. Now, let's build a _real_ pipeline. In this section, we'll build a simple ETL pipeline that loads data from an API into a Postgres database. Our database will be built using Docker— it will be running locally, but it's the same as if it were running in the cloud.
+
+Videos
+- 2.2.3a - [Configuring Postgres](https://www.youtube.com/watch?v=pmhI-ezd3BE&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+- 2.2.3b - [Writing an ETL Pipeline](https://www.youtube.com/watch?v=Maidfe7oKLs&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+Resources
+- [Taxi Dataset](https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz)
+- [Sample loading block](https://github.com/mage-ai/mage-zoomcamp/blob/solutions/magic-zoomcamp/data_loaders/load_nyc_taxi_data.py)
+
+
+### 2.2.4 - 🤓 ETL: API to GCS
+
+Ok, so we've written data _locally_ to a database, but what about the cloud? In this tutorial, we'll walk through the process of using Mage to extract, transform, and load data from an API to Google Cloud Storage (GCS). 
+
+We'll cover both writing _partitioned_ and _unpartitioned_ data to GCS and discuss _why_ you might want to do one over the other. Many data teams start with extracting data from a source and writing it to a data lake _before_ loading it to a structured data source, like a database.
+
+Videos
+- 2.2.4a - [Configuring GCP](https://www.youtube.com/watch?v=00LP360iYvE&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+- 2.2.4b - [Writing an ETL Pipeline](https://www.youtube.com/watch?v=w0XmcASRUnc&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+Resources
+- [DTC Zoomcamp GCP Setup](../week_1_basics_n_setup/1_terraform_gcp/2_gcp_overview.md)
+
+### 2.2.5 - 🔍 ETL: GCS to BigQuery
+
+Now that we've written data to GCS, let's load it into BigQuery. In this section, we'll walk through the process of using Mage to load our data from GCS to BigQuery. This closely mirrors a very common data engineering workflow: loading data from a data lake into a data warehouse.
+
+Videos
+- 2.2.5a - [Writing an ETL Pipeline](https://www.youtube.com/watch?v=JKp_uzM-XsM)
+
+### 2.2.6 - 👨‍💻 Parameterized Execution
+
+By now you're familiar with building pipelines, but what about adding parameters? In this video, we'll discuss some built-in runtime variables that exist in Mage and show you how to define your own! We'll also cover how to use these variables to parameterize your pipelines. Finally, we'll talk about what it means to *backfill* a pipeline and how to do it in Mage.
+
+Videos
+- 2.2.6a - [Parameterized Execution](https://www.youtube.com/watch?v=H0hWjWxB-rg&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+- 2.2.6b - [Backfills](https://www.youtube.com/watch?v=ZoeC6Ag5gQc&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+Resources
+- [Mage Variables Overview](https://docs.mage.ai/development/variables/overview)
+- [Mage Runtime Variables](https://docs.mage.ai/getting-started/runtime-variable)
+
+### 2.2.7 - 🤖 Deployment (Optional)
+
+In this section, we'll cover deploying Mage using Terraform and Google Cloud. This section is optional— it's not *necessary* to learn Mage, but it might be helpful if you're interested in creating a fully deployed project. If you're using Mage in your final project, you'll need to deploy it to the cloud.
+
+Videos
+- 2.2.7a - [Deployment Prerequisites](https://www.youtube.com/watch?v=zAwAX5sxqsg&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+- 2.2.7b - [Google Cloud Permissions](https://www.youtube.com/watch?v=O_H7DCmq2rA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+- 2.2.7c - [Deploying to Google Cloud - Part 1](https://www.youtube.com/watch?v=9A872B5hb_0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+- 2.2.7d - [Deploying to Google Cloud - Part 2](https://www.youtube.com/watch?v=0YExsb2HgLI&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+
+Resources
+- [Installing Terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli)
+- [Installing `gcloud` CLI](https://cloud.google.com/sdk/docs/install)
+- [Mage Terraform Templates](https://github.com/mage-ai/mage-ai-terraform-templates)
+
+Additional Mage Guides
+- [Terraform](https://docs.mage.ai/production/deploying-to-cloud/using-terraform)
+- [Deploying to GCP with Terraform](https://docs.mage.ai/production/deploying-to-cloud/gcp/setup)
+
+### 2.2.8 - 🗒️ Homework 
+
+We've prepared a short exercise to test you on what you've learned this week. You can find the homework [here](../cohorts/2024/02-workflow-orchestration/homework.md). This follows closely from the contents of the course and shouldn't take more than an hour or two to complete. 😄
+
+### 2.2.9 - 👣 Next Steps
+
+Congratulations! You've completed Week 2 of the Data Engineering Zoomcamp. We hope you've enjoyed learning about Mage and that you're excited to use it in your final project. If you have any questions, feel free to reach out to us on Slack. Be sure to check out our "Next Steps" video for some inspiration for the rest of your journey 😄.
+
+Videos
+- 2.2.9a - [Next Steps](https://www.youtube.com/watch?v=uUtj7N0TleQ)
+
+Resources
+- [Slides](https://docs.google.com/presentation/d/1yN-e22VNwezmPfKrZkgXQVrX5owDb285I2HxHWgmAEQ/edit#slide=id.g262fb0d2905_0_12)
+
+### 📑 Additional Resources
+
+- [Mage Docs](https://docs.mage.ai/)
+- [Mage Guides](https://docs.mage.ai/guides)
+- [Mage Slack](https://www.mage.ai/chat)
+
+
+# Community notes
+
+Did you take notes? You can share them here:
+
+## 2024 notes
+
+* Add your notes above this line
+
+## 2023 notes
+
+See [here](../cohorts/2023/week_2_workflow_orchestration#community-notes)
+
+
+## 2022 notes
+
+See [here](../cohorts/2022/week_2_data_ingestion#community-notes)
--- a/week_3_data_warehouse/README.md
+++ b/week_3_data_warehouse/README.md
@ -1,51 +1,54 @@
-## Data Warehouse and BigQuery
+# Data Warehouse and BigQuery

 - [Slides](https://docs.google.com/presentation/d/1a3ZoBAXFk8-EhUsd7rAZd-5p_HpltkzSeujjRGB2TAI/edit?usp=sharing)  
 - [Big Query basic SQL](big_query.sql)

+# Videos

-### Data Warehouse
+## Data Warehouse

- [Data Warehouse and BigQuery](https://youtu.be/jrHljAoD6nM)
+- [Data Warehouse and BigQuery](https://www.youtube.com/watch?v=jrHljAoD6nM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)

-### Partitoning and clustering
+## :movie_camera: Partitoning and clustering

- [Partioning and Clustering](https://youtu.be/jrHljAoD6nM?t=726)  
- [Partioning vs Clustering](https://youtu.be/-CqXf7vhhDs)  
+- [Partioning and Clustering](https://www.youtube.com/watch?v=jrHljAoD6nM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)  
+- [Partioning vs Clustering](https://www.youtube.com/watch?v=-CqXf7vhhDs&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)

-### Best practices
+## :movie_camera: Best practices

- [BigQuery Best Practices](https://youtu.be/k81mLJVX08w)  
+- [BigQuery Best Practices](https://www.youtube.com/watch?v=k81mLJVX08w&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)

-### Internals of BigQuery
+## :movie_camera: Internals of BigQuery

- [Internals of Big Query](https://youtu.be/eduHi1inM4s)  
+- [Internals of Big Query](https://www.youtube.com/watch?v=eduHi1inM4s&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)

-### Advanced
+## Advanced topics

-#### ML
-[BigQuery Machine Learning](https://youtu.be/B-WtpB0PuG4)  
-[SQL for ML in BigQuery](big_query_ml.sql)
+### :movie_camera: Machine Learning in Big Query
+
+* [BigQuery Machine Learning](https://www.youtube.com/watch?v=B-WtpB0PuG4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
+* [SQL for ML in BigQuery](big_query_ml.sql)

 **Important links**
+
 - [BigQuery ML Tutorials](https://cloud.google.com/bigquery-ml/docs/tutorials)
 - [BigQuery ML Reference Parameter](https://cloud.google.com/bigquery-ml/docs/analytics-reference-patterns)
 - [Hyper Parameter tuning](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-glm)
 - [Feature preprocessing](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-preprocess-overview)

-##### Deploying ML model
+### :movie_camera: Deploying ML model

- [BigQuery Machine Learning Deployment](https://youtu.be/BjARzEWaznU)  
+- [BigQuery Machine Learning Deployment](https://www.youtube.com/watch?v=BjARzEWaznU&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
 - [Steps to extract and deploy model with docker](extract_model.md)  



-### Homework
+# Homework

-* [Homework](../cohorts/2023/week_3_data_warehouse/homework.md)
+* [2024 Homework](../cohorts/2024/03-data-warehouse/homework.md)


-## Community notes
+# Community notes

 Did you take notes? You can share them here.

--- a/week_3_data_warehouse/big_query.sql
+++ b/week_3_data_warehouse/big_query.sql
--- a/week_3_data_warehouse/big_query_hw.sql
+++ b/week_3_data_warehouse/big_query_hw.sql
--- a/week_3_data_warehouse/big_query_ml.sql
+++ b/week_3_data_warehouse/big_query_ml.sql
--- a/week_3_data_warehouse/extract_model.md
+++ b/week_3_data_warehouse/extract_model.md
--- a/week_3_data_warehouse/extras/README.md
+++ b/week_3_data_warehouse/extras/README.md
--- a/week_3_data_warehouse/extras/web_to_gcs.py
+++ b/week_3_data_warehouse/extras/web_to_gcs.py
--- a/04-analytics-engineering/README.md
+++ b/04-analytics-engineering/README.md
@ -0,0 +1,141 @@
+# Week 4: Analytics Engineering 
+Goal: Transforming the data loaded in DWH into Analytical Views developing a [dbt project](taxi_rides_ny/README.md).
+
+### Prerequisites
+By this stage of the course you should have already: 
+
+- A running warehouse (BigQuery or postgres) 
+- A set of running pipelines ingesting the project dataset (week 3 completed)
+- The following datasets ingested from the course [Datasets list](https://github.com/DataTalksClub/nyc-tlc-data/): 
+  * Yellow taxi data - Years 2019 and 2020
+  * Green taxi data - Years 2019 and 2020 
+  * fhv data - Year 2019. 
+
+Note:
+* A quick hack has been shared to load that data quicker, check instructions in [week3/extras](../03-data-warehouse/extras)
+* If you recieve an error stating "Permission denied while globbing file pattern." when attemting to run fact_trips.sql this [Video](https://www.youtube.com/watch?v=kL3ZVNL9Y4A&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb) may be helpful in resolving the issue
+
+## Setting up your environment 
+  
+
+### Setting up dbt for using BigQuery (Alternative A - preferred)
+
+1. Open a free developer dbt cloud account following[this link](https://www.getdbt.com/signup/)
+2. [Following these instructions to connect to your BigQuery instance]([https://docs.getdbt.com/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-setting-up-bigquery-oauth](https://docs.getdbt.com/guides/bigquery?step=4)). More detailed instructions in [dbt_cloud_setup.md](dbt_cloud_setup.md)
+
+_Optional_: If you feel more comfortable developing locally you could use a local installation of dbt core. You can follow the [official dbt documentation]([https://docs.getdbt.com/dbt-cli/installation](https://docs.getdbt.com/docs/core/installation-overview)) or follow the [dbt core with BigQuery on Docker](docker_setup/README.md) guide to setup dbt locally on docker. You will need to install the latest version with the BigQuery adapter (dbt-bigquery). 
+
+### Setting up dbt for using Postgres locally (Alternative B)
+
+As an alternative to the cloud, that require to have a cloud database, you will be able to run the project installing dbt locally.
+You can follow the [official dbt documentation]([https://docs.getdbt.com/dbt-cli/installation](https://docs.getdbt.com/dbt-cli/installation)) or use a docker image from oficial [dbt repo](https://github.com/dbt-labs/dbt/). You will need to install the latest version with the postgres adapter (dbt-postgres).
+After local installation you will have to set up the connection to PG in the `profiles.yml`, you can find the templates [here](https://docs.getdbt.com/docs/core/connect-data-platform/postgres-setup)
+
+</details>
+
+## Content
+
+### Introduction to analytics engineering
+
+* What is analytics engineering?
+* ETL vs ELT 
+* Data modeling concepts (fact and dim tables)
+
+ :movie_camera: [Video](https://www.youtube.com/watch?v=uF76d5EmdtU&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=32)
+
+### What is dbt? 
+
+* Intro to dbt 
+
+ :movie_camera: [Video](https://www.youtube.com/watch?v=4eCouvVOJUw&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=33)
+
+## Starting a dbt project
+
+### Alternative A: Using BigQuery + dbt cloud
+* Starting a new project with dbt init (dbt cloud and core)
+* dbt cloud setup
+* project.yml
+
+ :movie_camera: [Video](https://www.youtube.com/watch?v=iMxh6s_wL4Q&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=34)
+ 
+### Alternative B: Using Postgres + dbt core (locally)
+* Starting a new project with dbt init (dbt cloud and core)
+* dbt core local setup
+* profiles.yml
+* project.yml
+
+:movie_camera: [Video](https://www.youtube.com/watch?v=1HmL63e-vRs&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=35)
+
+### dbt models
+
+* Anatomy of a dbt model: written code vs compiled Sources
+* Materialisations: table, view, incremental, ephemeral  
+* Seeds, sources and ref  
+* Jinja and Macros 
+* Packages 
+* Variables
+
+:movie_camera: [Video](https://www.youtube.com/watch?v=UVI30Vxzd6c&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=36)
+
+_Note: This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice_
+
+### Testing and documenting dbt models
+* Tests  
+* Documentation 
+
+:movie_camera: [Video](https://www.youtube.com/watch?v=UishFmq1hLM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=37)
+
+_Note: This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice_
+
+## Deployment
+
+### Alternative A: Using BigQuery + dbt cloud
+* Deployment: development environment vs production 
+* dbt cloud: scheduler, sources and hosted documentation
+
+:movie_camera: [Video](https://www.youtube.com/watch?v=rjf6yZNGX8I&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=38)
+
+### Alternative B: Using Postgres + dbt core (locally)
+* Deployment: development environment vs production 
+* dbt cloud: scheduler, sources and hosted documentation
+
+:movie_camera: [Video](https://www.youtube.com/watch?v=Cs9Od1pcrzM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=39)
+
+## Visualising the transformed data
+:movie_camera: [Google data studio Video](https://www.youtube.com/watch?v=39nLTs74A3E&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=42) 
+:movie_camera: [Metabase Video](https://www.youtube.com/watch?v=BnLkrA7a6gM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=43) 
+
+ 
+## Advanced concepts
+
+ * [Make a model Incremental](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models)
+ * [Use of tags](https://docs.getdbt.com/reference/resource-configs/tags)
+ * [Hooks](https://docs.getdbt.com/docs/building-a-dbt-project/hooks-operations)
+ * [Analysis](https://docs.getdbt.com/docs/building-a-dbt-project/analyses)
+ * [Snapshots](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots)
+ * [Exposure](https://docs.getdbt.com/docs/building-a-dbt-project/exposures)
+ * [Metrics](https://docs.getdbt.com/docs/building-a-dbt-project/metrics)
+
+
+## Community notes
+
+Did you take notes? You can share them here.
+
+* [Notes by Alvaro Navas](https://github.com/ziritrion/dataeng-zoomcamp/blob/main/notes/4_analytics.md)
+* [Sandy's DE learning blog](https://learningdataengineering540969211.wordpress.com/2022/02/17/week-4-setting-up-dbt-cloud-with-bigquery/)
+* [Notes by Victor Padilha](https://github.com/padilha/de-zoomcamp/tree/master/week4)
+* [Marcos Torregrosa's blog (spanish)](https://www.n4gash.com/2023/data-engineering-zoomcamp-semana-4/)
+* [Notes by froukje](https://github.com/froukje/de-zoomcamp/blob/main/week_4_analytics_engineering/notes/notes_week_04.md)
+* [Notes by Alain Boisvert](https://github.com/boisalai/de-zoomcamp-2023/blob/main/week4.md)
+* [Setting up Prefect with dbt by Vera](https://medium.com/@verazabeida/zoomcamp-week-5-5b6a9d53a3a0)
+* [Blog by Xia He-Bleinagel](https://xiahe-bleinagel.com/2023/02/week-4-data-engineering-zoomcamp-notes-analytics-engineering-and-dbt/)
+* [Setting up DBT with BigQuery by Tofag](https://medium.com/@fagbuyit/setting-up-your-dbt-cloud-dej-9-d18e5b7c96ba)
+* [Blog post by Dewi Oktaviani](https://medium.com/@oktavianidewi/de-zoomcamp-2023-learning-week-4-analytics-engineering-with-dbt-53f781803d3e)
+* [Notes from Vincenzo Galante](https://binchentso.notion.site/Data-Talks-Club-Data-Engineering-Zoomcamp-8699af8e7ff94ec49e6f9bdec8eb69fd)
+* [Notes from Balaji](https://github.com/Balajirvp/DE-Zoomcamp/blob/main/Week%204/Data%20Engineering%20Zoomcamp%20Week%204.ipynb)
+ *Add your notes here (above this line)*
+
+## Useful links
+- [Slides used in the videos](https://docs.google.com/presentation/d/1xSll_jv0T8JF4rYZvLHfkJXYqUjPtThA/edit?usp=sharing&ouid=114544032874539580154&rtpof=true&sd=true)
+- [Visualizing data with Metabase course](https://www.metabase.com/learn/visualization/)
+- [dbt free courses](https://courses.getdbt.com/collections)
--- a/week_4_analytics_engineering/dbt_cloud_setup.md
+++ b/week_4_analytics_engineering/dbt_cloud_setup.md
--- a/week_4_analytics_engineering/docker_setup/Dockerfile
+++ b/week_4_analytics_engineering/docker_setup/Dockerfile
--- a/week_4_analytics_engineering/docker_setup/README.md
+++ b/week_4_analytics_engineering/docker_setup/README.md
--- a/week_4_analytics_engineering/docker_setup/docker-compose.yaml
+++ b/week_4_analytics_engineering/docker_setup/docker-compose.yaml
--- a/week_5_batch_processing/.gitignore
+++ b/week_5_batch_processing/.gitignore
--- a/week_5_batch_processing/README.md
+++ b/week_5_batch_processing/README.md
@ -1,12 +1,12 @@
-## Week 5: Batch Processing
+# Week 5: Batch Processing

-### 5.1 Introduction
+## 5.1 Introduction

 * :movie_camera: 5.1.1 [Introduction to Batch Processing](https://youtu.be/dcHe5Fl3MF8?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
 * :movie_camera: 5.1.2 [Introduction to Spark](https://youtu.be/FhaqbEOuQ8U?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)


-### 5.2 Installation
+## 5.2 Installation

 Follow [these intructions](setup/) to install Spark:

@ -19,7 +19,7 @@ And follow [this](setup/pyspark.md) to run PySpark in Jupyter
 * :movie_camera: 5.2.1 [(Optional) Installing Spark (Linux)](https://youtu.be/hqUbB9c8sKg?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)


-### 5.3 Spark SQL and DataFrames
+## 5.3 Spark SQL and DataFrames

 * :movie_camera: 5.3.1 [First Look at Spark/PySpark](https://youtu.be/r_Sf6fCB40c?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb) 
 * :movie_camera: 5.3.2 [Spark Dataframes](https://youtu.be/ti3aC1m3rE8?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
@ -32,19 +32,19 @@ Script to prepare the Dataset [download_data.sh](code/download_data.sh)
 * :movie_camera: 5.3.4 [SQL with Spark](https://www.youtube.com/watch?v=uAlp2VuZZPY&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)


-### 5.4 Spark Internals
+## 5.4 Spark Internals

 * :movie_camera: 5.4.1 [Anatomy of a Spark Cluster](https://youtu.be/68CipcZt7ZA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
 * :movie_camera: 5.4.2 [GroupBy in Spark](https://youtu.be/9qrDsY_2COo&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
 * :movie_camera: 5.4.3 [Joins in Spark](https://youtu.be/lu7TrqAWuH4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)

-### 5.5 (Optional) Resilient Distributed Datasets
+## 5.5 (Optional) Resilient Distributed Datasets

 * :movie_camera: 5.5.1 [Operations on Spark RDDs](https://youtu.be/Bdu-xIrF3OM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
 * :movie_camera: 5.5.2 [Spark RDD mapPartition](https://youtu.be/k3uB2K99roI&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)


-### 5.6 Running Spark in the Cloud
+## 5.6 Running Spark in the Cloud

 * :movie_camera: 5.6.1 [Connecting to Google Cloud Storage ](https://youtu.be/Yyz293hBVcQ&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
 * :movie_camera: 5.6.2 [Creating a Local Spark Cluster](https://youtu.be/HXBwSlXo5IA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
@ -52,13 +52,13 @@ Script to prepare the Dataset [download_data.sh](code/download_data.sh)
 * :movie_camera: 5.6.4 [Connecting Spark to Big Query](https://youtu.be/HIm2BOj8C0Q&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)


-### Homework
+# Homework


-* [Homework](../cohorts/2023/week_5_batch_processing/homework.md)
+* [2024 Homework](../cohorts/2024)


-## Community notes
+# Community notes

 Did you take notes? You can share them here.

--- a/week_5_batch_processing/code/03_test.ipynb
+++ b/week_5_batch_processing/code/03_test.ipynb
--- a/week_5_batch_processing/code/04_pyspark.ipynb
+++ b/week_5_batch_processing/code/04_pyspark.ipynb
--- a/week_5_batch_processing/code/05_taxi_schema.ipynb
+++ b/week_5_batch_processing/code/05_taxi_schema.ipynb
--- a/week_5_batch_processing/code/06_spark_sql.ipynb
+++ b/week_5_batch_processing/code/06_spark_sql.ipynb
--- a/week_5_batch_processing/code/06_spark_sql.py
+++ b/week_5_batch_processing/code/06_spark_sql.py
--- a/week_5_batch_processing/code/06_spark_sql_big_query.py
+++ b/week_5_batch_processing/code/06_spark_sql_big_query.py
--- a/week_5_batch_processing/code/07_groupby_join.ipynb
+++ b/week_5_batch_processing/code/07_groupby_join.ipynb
--- a/week_5_batch_processing/code/08_rdds.ipynb
+++ b/week_5_batch_processing/code/08_rdds.ipynb
--- a/week_5_batch_processing/code/09_spark_gcs.ipynb
+++ b/week_5_batch_processing/code/09_spark_gcs.ipynb
--- a/week_5_batch_processing/code/cloud.md
+++ b/week_5_batch_processing/code/cloud.md
--- a/week_5_batch_processing/code/download_data.sh
+++ b/week_5_batch_processing/code/download_data.sh
--- a/week_5_batch_processing/code/homework.ipynb
+++ b/week_5_batch_processing/code/homework.ipynb
--- a/week_5_batch_processing/setup/config/core-site.xml
+++ b/week_5_batch_processing/setup/config/core-site.xml
--- a/week_5_batch_processing/setup/config/spark-defaults.conf
+++ b/week_5_batch_processing/setup/config/spark-defaults.conf
--- a/week_5_batch_processing/setup/config/spark.dockerfile
+++ b/week_5_batch_processing/setup/config/spark.dockerfile
--- a/week_5_batch_processing/setup/hadoop-yarn.md
+++ b/week_5_batch_processing/setup/hadoop-yarn.md
--- a/week_5_batch_processing/setup/linux.md
+++ b/week_5_batch_processing/setup/linux.md
--- a/week_5_batch_processing/setup/macos.md
+++ b/week_5_batch_processing/setup/macos.md
--- a/week_5_batch_processing/setup/pyspark.md
+++ b/week_5_batch_processing/setup/pyspark.md
--- a/week_5_batch_processing/setup/windows.md
+++ b/week_5_batch_processing/setup/windows.md
--- a/week_6_stream_processing/.gitignore
+++ b/week_6_stream_processing/.gitignore
--- a/week_6_stream_processing/README.md
+++ b/week_6_stream_processing/README.md
@ -1,6 +1,6 @@
 # Week 6: Stream Processing

-## Code structure
+# Code structure
 * [Java examples](java)
 * [Python examples](python)
 * [KSQLD examples](ksqldb)
@ -74,13 +74,7 @@ Please follow the steps described under [pyspark-streaming](python/streams-examp

 ## Homework

-[Form](https://forms.gle/rK7268U92mHJBpmW7)
-
-The homework is mostly theoretical. In the last question you have to provide working code link, please keep in mind that this
-question is not scored.
-
-Deadline: 13 March 2023, 22:00 CET
-
+* [2024 Homework](../cohorts/2024/)

 ## Community notes

--- a/week_6_stream_processing/java/kafka_examples/.gitignore
+++ b/week_6_stream_processing/java/kafka_examples/.gitignore
--- a/week_6_stream_processing/java/kafka_examples/build.gradle
+++ b/week_6_stream_processing/java/kafka_examples/build.gradle
--- a/week_6_stream_processing/java/kafka_examples/build/generated-main-avro-java/schemaregistry/RideRecord.java
+++ b/week_6_stream_processing/java/kafka_examples/build/generated-main-avro-java/schemaregistry/RideRecord.java
--- a/week_6_stream_processing/java/kafka_examples/build/generated-main-avro-java/schemaregistry/RideRecordCompatible.java
+++ b/week_6_stream_processing/java/kafka_examples/build/generated-main-avro-java/schemaregistry/RideRecordCompatible.java
--- a/week_6_stream_processing/java/kafka_examples/build/generated-main-avro-java/schemaregistry/RideRecordNoneCompatible.java
+++ b/week_6_stream_processing/java/kafka_examples/build/generated-main-avro-java/schemaregistry/RideRecordNoneCompatible.java
--- a/week_6_stream_processing/java/kafka_examples/gradle/wrapper/gradle-wrapper.jar
+++ b/week_6_stream_processing/java/kafka_examples/gradle/wrapper/gradle-wrapper.jar
--- a/week_6_stream_processing/java/kafka_examples/gradle/wrapper/gradle-wrapper.properties
+++ b/week_6_stream_processing/java/kafka_examples/gradle/wrapper/gradle-wrapper.properties
--- a/week_6_stream_processing/java/kafka_examples/gradlew
+++ b/week_6_stream_processing/java/kafka_examples/gradlew
--- a/week_6_stream_processing/java/kafka_examples/gradlew.bat
+++ b/week_6_stream_processing/java/kafka_examples/gradlew.bat
--- a/week_6_stream_processing/java/kafka_examples/settings.gradle
+++ b/week_6_stream_processing/java/kafka_examples/settings.gradle
--- a/week_6_stream_processing/java/kafka_examples/src/main/avro/rides.avsc
+++ b/week_6_stream_processing/java/kafka_examples/src/main/avro/rides.avsc
--- a/week_6_stream_processing/java/kafka_examples/src/main/avro/rides_compatible.avsc
+++ b/week_6_stream_processing/java/kafka_examples/src/main/avro/rides_compatible.avsc
--- a/week_6_stream_processing/java/kafka_examples/src/main/avro/rides_non_compatible.avsc
+++ b/week_6_stream_processing/java/kafka_examples/src/main/avro/rides_non_compatible.avsc
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/AvroProducer.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/AvroProducer.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonConsumer.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonConsumer.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonKStream.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonKStream.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonKStreamJoins.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonKStreamJoins.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonKStreamWindow.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonKStreamWindow.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonProducer.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonProducer.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonProducerPickupLocation.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/JsonProducerPickupLocation.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/Secrets.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/Secrets.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/Topics.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/Topics.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/customserdes/CustomSerdes.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/customserdes/CustomSerdes.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/data/PickupLocation.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/data/PickupLocation.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/data/Ride.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/data/Ride.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/data/VendorInfo.java
+++ b/week_6_stream_processing/java/kafka_examples/src/main/java/org/example/data/VendorInfo.java
--- a/week_6_stream_processing/java/kafka_examples/src/main/resources/rides.csv
+++ b/week_6_stream_processing/java/kafka_examples/src/main/resources/rides.csv
--- a/week_6_stream_processing/java/kafka_examples/src/test/java/org/example/JsonKStreamJoinsTest.java
+++ b/week_6_stream_processing/java/kafka_examples/src/test/java/org/example/JsonKStreamJoinsTest.java
--- a/week_6_stream_processing/java/kafka_examples/src/test/java/org/example/JsonKStreamTest.java
+++ b/week_6_stream_processing/java/kafka_examples/src/test/java/org/example/JsonKStreamTest.java
--- a/week_6_stream_processing/java/kafka_examples/src/test/java/org/example/helper/DataGeneratorHelper.java
+++ b/week_6_stream_processing/java/kafka_examples/src/test/java/org/example/helper/DataGeneratorHelper.java
--- a/week_6_stream_processing/ksqldb/commands.md
+++ b/week_6_stream_processing/ksqldb/commands.md
--- a/week_6_stream_processing/python/README.md
+++ b/week_6_stream_processing/python/README.md
--- a/week_6_stream_processing/python/avro_example/consumer.py
+++ b/week_6_stream_processing/python/avro_example/consumer.py
--- a/week_6_stream_processing/python/avro_example/producer.py
+++ b/week_6_stream_processing/python/avro_example/producer.py
--- a/week_6_stream_processing/python/avro_example/ride_record.py
+++ b/week_6_stream_processing/python/avro_example/ride_record.py
--- a/week_6_stream_processing/python/avro_example/ride_record_key.py
+++ b/week_6_stream_processing/python/avro_example/ride_record_key.py
--- a/week_6_stream_processing/python/avro_example/settings.py
+++ b/week_6_stream_processing/python/avro_example/settings.py
--- a/week_6_stream_processing/python/docker/README.md
+++ b/week_6_stream_processing/python/docker/README.md
--- a/week_6_stream_processing/python/docker/docker-compose.yml
+++ b/week_6_stream_processing/python/docker/docker-compose.yml
--- a/week_6_stream_processing/python/docker/kafka/docker-compose.yml
+++ b/week_6_stream_processing/python/docker/kafka/docker-compose.yml
--- a/week_6_stream_processing/python/docker/spark/build.sh
+++ b/week_6_stream_processing/python/docker/spark/build.sh
--- a/week_6_stream_processing/python/docker/spark/cluster-base.Dockerfile
+++ b/week_6_stream_processing/python/docker/spark/cluster-base.Dockerfile
--- a/week_6_stream_processing/python/docker/spark/docker-compose.yml
+++ b/week_6_stream_processing/python/docker/spark/docker-compose.yml
--- a/week_6_stream_processing/python/docker/spark/jupyterlab.Dockerfile
+++ b/week_6_stream_processing/python/docker/spark/jupyterlab.Dockerfile
--- a/week_6_stream_processing/python/docker/spark/spark-base.Dockerfile
+++ b/week_6_stream_processing/python/docker/spark/spark-base.Dockerfile
--- a/week_6_stream_processing/python/docker/spark/spark-master.Dockerfile
+++ b/week_6_stream_processing/python/docker/spark/spark-master.Dockerfile
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Victoria Perez Mola	87f33b1b85	delete all	2024-01-28 00:02:37 +00:00
Victoria Perez Mola	2a59822b4a	Merge pull request #438 from DataTalksClub/de-zoomcamp Update week 4 project	2024-01-28 01:01:11 +01:00
Victoria Perez Mola	f8221f25be	add hack for loading initial data	2024-01-28 00:00:37 +00:00
Victoria Perez Mola	9c219f7fdc	update project	2024-01-27 23:57:45 +00:00
Victoria Perez Mola	5703a49efd	update directory	2024-01-27 22:54:09 +00:00
Victoria Perez Mola	7e2c7f94c4	Merge pull request #410 from eltociear/patch-1 Update asking-questions.md	2024-01-27 22:55:02 +01:00
Victoria Perez Mola	20671b4b48	Merge pull request #432 from DarkDesire/patch-1 Update homework.md for HW2. Right link for green taxi dataset	2024-01-27 22:53:39 +01:00
Victoria Perez Mola	1d7f51ffaf	Improve formatting W4 readme	2024-01-27 21:50:37 +01:00
Victoria Perez Mola	43b2104fa9	Update W4 README for cohort 2024.md Update links and content for readability	2024-01-27 21:38:20 +01:00
Alexey Grigorev	b11c9cb1e3	Update README.md	2024-01-27 17:53:10 +01:00
Eldar Dragomir	ee0546ba0a	Update homework.md, right link for green taxi dataset	2024-01-26 14:05:43 +01:00
Alexey Grigorev	1decc32b8d	Update asking-questions.md	2024-01-25 16:55:17 +01:00
Leo Rubiano	178fe94ed8	Update asking-questions.md (#425 )	2024-01-24 18:50:12 +01:00
Alexey Grigorev	a5e008b498	Update README.md	2024-01-24 15:56:30 +01:00
ellacharmed	ebcb10c8ab	Add walkthrough video and pdf links to Notes (#421 )	2024-01-24 15:52:26 +01:00
Alexey Grigorev	cb55908a7c	Update README.md	2024-01-24 10:42:23 +01:00
Magdalena Kuhn	34a63cff05	add star history ;D (#423 ) Co-authored-by: Magdalena Kuhn <magdalena.kuhn@bmw.de>	2024-01-24 08:41:28 +01:00
Michael Shoemaker	3e247158a4	Added Week3 Homework (#419 )	2024-01-23 08:58:50 +01:00
Peter Wagner Sandoval Moreno	11c60f66c7	Update homework.md (#415 ) Fix terraform overview link	2024-01-19 10:56:18 +01:00
Alexey Grigorev	594faf0f32	Update homework.md	2024-01-18 22:25:21 +01:00
Luis Guilherme Sousa de Oliveira	2bb25463ea	Update homework.md (#414 ) Correction of Q5 Header	2024-01-18 21:36:01 +01:00
Alexey Grigorev	bbe191aecc	Update README.md	2024-01-18 17:05:43 +01:00
Alexey Grigorev	fa39a9d342	deadline for hw1	2024-01-17 11:37:19 +01:00
Alexey Grigorev	e4cb817399	cosmetic changes	2024-01-17 11:30:55 +01:00
Alexey Grigorev	5259facfb4	changing the design a bit	2024-01-17 09:59:32 +01:00
Alexey Grigorev	130a508a65	replaced short youtube urls with long	2024-01-17 09:51:12 +01:00
Ikko Eltociear Ashimine	dce01a2794	Update asking-questions.md Guidlines -> Guidelines	2024-01-17 00:14:20 +09:00
Matt	142b9f4ee4	Homework (again) (#403 ) * homework redo * homework redo * hw	2024-01-13 16:47:17 +01:00
Zesky665	d18ceb6044	Update README.md (#404 ) Added my notes to the list.	2024-01-13 16:46:52 +01:00
Matt	0e0aae68b4	Add links for course videos (#402 ) * homework; * homework; * homework; * homework; * update with new videos * update with new videos * updates * updates * updates * updates * updates * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * Update README.md * Update README.md * Update README.md * Create homework.md * Update README.md * Update README.md * Delete 02-workflow-orchestration/homework.md * Delete cohorts/2024/week_2_workflow_orchestration/homework.md * update homework * Delete 02-workflow-orchestration/homework.md * Create homework.md * Update README.md * Update README.md * Update README.md * Update README.md * update, add videos * add video links * add video links * homework links * Update homework.md * Update README.md --------- Co-authored-by: Alexey Grigorev <alexeygrigorev@users.noreply.github.com>	2024-01-12 22:15:55 +01:00
Alexey Grigorev	468aacb1ef	Update README.md	2024-01-10 13:48:42 +01:00
Matt	860833525a	Update Metadata for Week 2 (#399 ) * homework; * homework; * homework; * homework; * update with new videos * update with new videos * updates * updates * updates * updates * updates * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * Update README.md * Update README.md * Update README.md * Create homework.md * Update README.md * Update README.md * Delete 02-workflow-orchestration/homework.md * Delete cohorts/2024/week_2_workflow_orchestration/homework.md --------- Co-authored-by: Alexey Grigorev <alexeygrigorev@users.noreply.github.com>	2024-01-09 23:33:55 +01:00
Aditya Tiwari	2418faf718	PR to address change in PgAdmin 4 UI, for creating a server. (#400 ) * PgAdmin UI update note added. * Punctuation Update.	2024-01-09 23:31:27 +01:00
Alexey Grigorev	325131f959	typo	2024-01-08 23:30:47 +01:00
Alexey Grigorev	8c455873fd	workshops	2024-01-08 18:06:07 +01:00
Alexey Grigorev	be68361c40	renaming + syllabus update	2024-01-08 17:51:51 +01:00
Asad Rauf	bfef9aa2fb	add prefect links to cohort 2023 (#391 ) * add prefect links to cohort 2023 * capitalize readme and tidy up notes * add link to prefect in the main orchestration page * clean up week 2 readme	2023-12-30 22:22:38 +01:00
Alexey Grigorev	9847430ca7	Update README.md	2023-12-23 20:14:37 +01:00
Alexey Grigorev	960fed9828	Update README.md	2023-12-21 10:43:40 +01:00
Alexey Grigorev	3f5cefcdd7	Add files via upload	2023-12-21 10:42:49 +01:00
Luis Guilherme Sousa de Oliveira	57c7ce33f8	Adding Module 1 HW (#396 ) * Adding * Changing folder --------- Co-authored-by: Luis Oliveira <luiolive3@publicisgroupe.net>	2023-12-20 19:10:37 +01:00