99 Commits

Author SHA1 Message Date
456ba9a9e1 Merge branch 'main' of https://github.com/DataTalksClub/data-engineering-zoomcamp into de-zoomcamp-videos 2024-02-05 22:22:56 +00:00
d2e59f2350 Update URL in homework 3 (#448)
* Update URL in homework 3

URL was incorrect leading to errors in downloading

* Update homework.md
2024-02-05 18:12:54 +01:00
da6a842ee7 Update dlt.md 2024-02-05 17:43:58 +01:00
d763f07395 Update dlt.md 2024-02-05 16:49:12 +01:00
427d17d012 rearranged notebooks #461 2024-02-05 12:54:11 +01:00
51a9c95b7d Update homework.md (week 2 & 3) (#456)
* Update homework.md (week 2)

Update homework.md to explain beforehand what should be included in the homework repository

* Update homework.md (week 3)

Update homework.md to explain beforehand what should be included in the homework repository
2024-02-05 12:34:02 +01:00
6a2b86d8af Update README.md (#460)
week 3 notes
2024-02-05 12:33:37 +01:00
e659ff26b8 fix location join (#470) 2024-02-05 12:32:17 +01:00
6bc22c63cf Use embedded links in youtube URLs (#471)
Update README.md with markdown formatting from 

- https://markdown-videos-api.jorgenkh.no/docs#/
- https://github.com/orgs/community/discussions/16925
2024-02-05 12:29:51 +01:00
7b7d84e292 fix location join 2024-02-04 22:02:12 +00:00
0f9b564bce Merge pull request #468 from DataTalksClub/de-zoomcamp-videos
De zoomcamp creating the whole project
2024-02-04 22:35:24 +01:00
fe4419866d Merge branch 'main' of https://github.com/DataTalksClub/data-engineering-zoomcamp into de-zoomcamp-videos 2024-02-04 21:34:26 +00:00
53b2676115 complete my whole project 2024-02-04 21:34:12 +00:00
c0c772b8ce Merge pull request #459 from inner-outer-space/patch-1
Update README.md
2024-02-04 22:16:06 +01:00
4117ce9f5d Merge pull request #458 from inner-outer-space/patch-2
Update README.md
2024-02-04 22:15:43 +01:00
b1ad88253c Merge pull request #466 from maria-fisher/patch-3
Update README.md
2024-02-04 22:15:17 +01:00
049dd34c6c fix conflics 2024-02-04 21:06:30 +00:00
1efd2a236c build a whole dbt project 2024-02-04 21:04:29 +00:00
72c4c821dc remove unused files 2024-02-04 20:48:14 +01:00
68e8e1a9cb make dm_monthly_zone_revenue cross-db 2024-02-04 20:47:15 +01:00
261b50d042 Update schema.yml tests 2024-02-04 20:34:52 +01:00
b269844ea3 Update dbt_project.yml variables 2024-02-04 20:32:52 +01:00
35b99817dc Update stg_yellow_tripdata to latest dbt syntax 2024-02-04 19:15:35 +01:00
78a5940578 Update to latest dbt functions naming 2024-02-04 19:11:46 +01:00
13a7752e5e Merge branch 'main' of https://github.com/DataTalksClub/data-engineering-zoomcamp into de-zoomcamp-videos 2024-02-04 17:28:29 +00:00
3af1021228 Update README.md
videos transcript week 3
2024-02-03 17:27:35 +00:00
f641f94a25 Update README.md
week 1 notes
2024-02-01 11:24:28 +01:00
0563fb5ff7 Update README.md
notes for week 2
2024-02-01 11:21:37 +01:00
a64e90ac36 Include logos for RisingWave Workshop (#455)
As per title.
2024-02-01 07:45:08 +01:00
e69c289b40 Update homework.md to explain beforehand what should be included in the homework repository (#447) 2024-01-31 18:57:25 +01:00
69bc9aec1b Update README.md batch [process (#449)
Update README.md batch [process
2024-01-31 18:55:05 +01:00
fe176c1679 Update README.md data streaming notes (#450)
Update README.md data streaming notes
2024-01-31 18:54:53 +01:00
d9cb16e282 Corrected errors in the instructions (#452) 2024-01-31 15:54:13 +01:00
6d2f1aa7e8 Delete Frame 124.jpg 2024-01-31 13:40:52 +03:00
390b2f6994 Add files via upload 2024-01-31 13:15:21 +03:00
ef6791e1cf Update README.md 2024-01-31 10:55:10 +01:00
865849b0ef Update README.md 2024-01-31 10:54:22 +01:00
9249bfba29 Add files via upload 2024-01-31 10:53:20 +01:00
bb43aa52e4 Delete images/architecture/untitled_diagram.drawio__10_.png 2024-01-31 10:48:33 +01:00
9a6d7878fd Delete images/architecture/arch_2.png 2024-01-31 10:48:22 +01:00
fe0b744ffe Update README.md 2024-01-31 10:43:28 +01:00
dbe68cd993 Add files via upload 2024-01-31 10:42:21 +01:00
a00f31fb85 formatting dlthub workshop (#451)
* adding dlt course

* adding dlt course

* improve formatting

* add cta

* add cta

* add links to slack

* visual improvements

* visual improvements

* visual improvements

---------

Co-authored-by: Adrian <Adrian>
2024-01-31 08:46:18 +01:00
9882dd7411 Update homework.md 2024-01-30 10:29:47 +01:00
f46e0044b9 Update homework.md 2024-01-30 10:29:16 +01:00
38087a646d Update homework.md (#429)
I believe the wording for question 2 is misleading or the correct answer isn't listed. When filtering the dataset to only contain records with more than zero passengers or trips longer than zero:

 ```
df = data[(data['passenger_count'] > 0) & (data['trip_distance'] > 0)]
```
the shape of the resulting dataframe is (139370, 20).

When filtering the dataframe based on the actual question:

```
df_2 = data[(data['passenger_count'] == 0) | (data['trip_distance'] == 0)]
```

the resulting shape is (9455, 20).
2024-01-29 23:31:41 +01:00
4617e63ddd Change the 1st homework of cohort 2024 to reduce ambiguity (#409) 2024-01-29 19:31:53 +01:00
738c22f91b Fix typo in JDK install instructions (#430)
Due to the missing extra dash the line yields the following error:
xcode-select: error: invalid argument '-install'
2024-01-29 19:28:48 +01:00
d576cfb1c9 Update README.md (#439)
Added youtube link to 2nd video on module-01 environment setup demo.
2024-01-29 19:27:43 +01:00
af248385c0 Update README.md (#443)
videos transcripts week 2

Co-authored-by: Alexey Grigorev <alexeygrigorev@users.noreply.github.com>
2024-01-29 19:27:31 +01:00
7abbbde00e Update README.md (#444) 2024-01-29 19:26:41 +01:00
dd84d736bc Fix typo in README.md (#446)
seperated -> separated
2024-01-29 19:26:16 +01:00
6ae0b18eea Update homework.md 2024-01-29 19:12:35 +01:00
e9c8748e29 add dlt course content (#445)
* adding dlt course

* adding dlt course

* improve formatting

* add cta

* add cta

* add links to slack

---------

Co-authored-by: Adrian <Adrian>
2024-01-29 18:45:11 +01:00
a6fda6d5ca Update rising-wave.md (#441) 2024-01-29 15:25:03 +01:00
ee88d7f230 Merge branch 'main' of https://github.com/DataTalksClub/data-engineering-zoomcamp into de-zoomcamp-videos 2024-01-28 21:57:02 +00:00
7a251b614b Update homework.md 2024-01-28 22:40:58 +01:00
b6901c05bf init my dbt project! 2024-01-28 00:16:23 +00:00
9e89d9849e delete 2024-01-28 00:14:21 +00:00
2a59822b4a Merge pull request #438 from DataTalksClub/de-zoomcamp
Update week 4 project
2024-01-28 01:01:11 +01:00
f8221f25be add hack for loading initial data 2024-01-28 00:00:37 +00:00
9c219f7fdc update project 2024-01-27 23:57:45 +00:00
5703a49efd update directory 2024-01-27 22:54:09 +00:00
7e2c7f94c4 Merge pull request #410 from eltociear/patch-1
Update asking-questions.md
2024-01-27 22:55:02 +01:00
20671b4b48 Merge pull request #432 from DarkDesire/patch-1
Update homework.md for HW2. Right link for green taxi dataset
2024-01-27 22:53:39 +01:00
1d7f51ffaf Improve formatting W4 readme 2024-01-27 21:50:37 +01:00
43b2104fa9 Update W4 README for cohort 2024.md
Update links and content for readability
2024-01-27 21:38:20 +01:00
b11c9cb1e3 Update README.md 2024-01-27 17:53:10 +01:00
ee0546ba0a Update homework.md, right link for green taxi dataset 2024-01-26 14:05:43 +01:00
1decc32b8d Update asking-questions.md 2024-01-25 16:55:17 +01:00
178fe94ed8 Update asking-questions.md (#425) 2024-01-24 18:50:12 +01:00
a5e008b498 Update README.md 2024-01-24 15:56:30 +01:00
ebcb10c8ab Add walkthrough video and pdf links to Notes (#421) 2024-01-24 15:52:26 +01:00
cb55908a7c Update README.md 2024-01-24 10:42:23 +01:00
34a63cff05 add star history ;D (#423)
Co-authored-by: Magdalena Kuhn <magdalena.kuhn@bmw.de>
2024-01-24 08:41:28 +01:00
3e247158a4 Added Week3 Homework (#419) 2024-01-23 08:58:50 +01:00
11c60f66c7 Update homework.md (#415)
Fix terraform overview link
2024-01-19 10:56:18 +01:00
594faf0f32 Update homework.md 2024-01-18 22:25:21 +01:00
2bb25463ea Update homework.md (#414)
Correction of Q5 Header
2024-01-18 21:36:01 +01:00
bbe191aecc Update README.md 2024-01-18 17:05:43 +01:00
fa39a9d342 deadline for hw1 2024-01-17 11:37:19 +01:00
e4cb817399 cosmetic changes 2024-01-17 11:30:55 +01:00
5259facfb4 changing the design a bit 2024-01-17 09:59:32 +01:00
130a508a65 replaced short youtube urls with long 2024-01-17 09:51:12 +01:00
dce01a2794 Update asking-questions.md
Guidlines -> Guidelines
2024-01-17 00:14:20 +09:00
142b9f4ee4 Homework (again) (#403)
* homework redo

* homework redo

* hw
2024-01-13 16:47:17 +01:00
d18ceb6044 Update README.md (#404)
Added my notes to the list.
2024-01-13 16:46:52 +01:00
0e0aae68b4 Add links for course videos (#402)
* homework;

* homework;

* homework;

* homework;

* update with new videos

* update with new videos

* updates

* updates

* updates

* updates

* updates

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* Update README.md

* Update README.md

* Update README.md

* Create homework.md

* Update README.md

* Update README.md

* Delete 02-workflow-orchestration/homework.md

* Delete cohorts/2024/week_2_workflow_orchestration/homework.md

* update homework

* Delete 02-workflow-orchestration/homework.md

* Create homework.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* update, add videos

* add video links

* add video links

* homework links

* Update homework.md

* Update README.md

---------

Co-authored-by: Alexey Grigorev <alexeygrigorev@users.noreply.github.com>
2024-01-12 22:15:55 +01:00
468aacb1ef Update README.md 2024-01-10 13:48:42 +01:00
860833525a Update Metadata for Week 2 (#399)
* homework;

* homework;

* homework;

* homework;

* update with new videos

* update with new videos

* updates

* updates

* updates

* updates

* updates

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* Update README.md

* Update README.md

* Update README.md

* Create homework.md

* Update README.md

* Update README.md

* Delete 02-workflow-orchestration/homework.md

* Delete cohorts/2024/week_2_workflow_orchestration/homework.md

---------

Co-authored-by: Alexey Grigorev <alexeygrigorev@users.noreply.github.com>
2024-01-09 23:33:55 +01:00
2418faf718 PR to address change in PgAdmin 4 UI, for creating a server. (#400)
* PgAdmin UI update note added.

* Punctuation Update.
2024-01-09 23:31:27 +01:00
325131f959 typo 2024-01-08 23:30:47 +01:00
8c455873fd workshops 2024-01-08 18:06:07 +01:00
be68361c40 renaming + syllabus update 2024-01-08 17:51:51 +01:00
bfef9aa2fb add prefect links to cohort 2023 (#391)
* add prefect links to cohort 2023

* capitalize readme and tidy up notes

* add link to prefect in the main orchestration page

* clean up week 2 readme
2023-12-30 22:22:38 +01:00
9847430ca7 Update README.md 2023-12-23 20:14:37 +01:00
960fed9828 Update README.md 2023-12-21 10:43:40 +01:00
3f5cefcdd7 Add files via upload 2023-12-21 10:42:49 +01:00
57c7ce33f8 Adding Module 1 HW (#396)
* Adding

* Changing folder

---------

Co-authored-by: Luis Oliveira <luiolive3@publicisgroupe.net>
2023-12-20 19:10:37 +01:00
172 changed files with 17254 additions and 877 deletions

View File

@ -0,0 +1,210 @@
# Introduction
* [![](https://markdown-videos-api.jorgenkh.no/youtube/AtRhA-NfS24)](https://www.youtube.com/watch?v=AtRhA-NfS24&list=PL3MmuxUbc_hKihpnNQ9qtTmWYy26bPrSb&index=3)
* [Slides](https://www.slideshare.net/AlexeyGrigorev/data-engineering-zoomcamp-introduction)
* Overview of [Architecture](https://github.com/DataTalksClub/data-engineering-zoomcamp#overview), [Technologies](https://github.com/DataTalksClub/data-engineering-zoomcamp#technologies) & [Pre-Requisites](https://github.com/DataTalksClub/data-engineering-zoomcamp#prerequisites)
We suggest watching videos in the same order as in this document.
The last video (setting up the environment) is optional, but you can check it earlier
if you have troubles setting up the environment and following along with the videos.
# Docker + Postgres
[Code](2_docker_sql)
## :movie_camera: Introduction to Docker
[![](https://markdown-videos-api.jorgenkh.no/youtube/EYNwNlOrpr0)](https://youtu.be/EYNwNlOrpr0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=4)
* Why do we need Docker
* Creating a simple "data pipeline" in Docker
## :movie_camera: Ingesting NY Taxi Data to Postgres
[![](https://markdown-videos-api.jorgenkh.no/youtube/2JM-ziJt0WI)](https://youtu.be/2JM-ziJt0WI&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=5)
* Running Postgres locally with Docker
* Using `pgcli` for connecting to the database
* Exploring the NY Taxi dataset
* Ingesting the data into the database
> [!TIP]
>if you have problems with `pgcli`, check this video for an alternative way to connect to your database in jupyter notebook and pandas.
>
> [![](https://markdown-videos-api.jorgenkh.no/youtube/3IkfkTwqHx4)](https://youtu.be/3IkfkTwqHx4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=6)
## :movie_camera: Connecting pgAdmin and Postgres
[![](https://markdown-videos-api.jorgenkh.no/youtube/hCAIVe9N0ow)](https://youtu.be/hCAIVe9N0ow&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=7)
* The pgAdmin tool
* Docker networks
> [!IMPORTANT]
>The UI for PgAdmin 4 has changed, please follow the below steps for creating a server:
>
>* After login to PgAdmin, right click Servers in the left sidebar.
>* Click on Register.
>* Click on Server.
>* The remaining steps to create a server are the same as in the videos.
## :movie_camera: Putting the ingestion script into Docker
[![](https://markdown-videos-api.jorgenkh.no/youtube/B1WwATwf-vY)](https://youtu.be/B1WwATwf-vY&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=8)
* Converting the Jupyter notebook to a Python script
* Parametrizing the script with argparse
* Dockerizing the ingestion script
## :movie_camera: Running Postgres and pgAdmin with Docker-Compose
[![](https://markdown-videos-api.jorgenkh.no/youtube/hKI6PkPhpa0)](https://youtu.be/hKI6PkPhpa0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=9)
* Why do we need Docker-compose
* Docker-compose YAML file
* Running multiple containers with `docker-compose up`
## :movie_camera: SQL refresher
[![](https://markdown-videos-api.jorgenkh.no/youtube/QEcps_iskgg)](https://youtu.be/QEcps_iskgg&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=10)
* Adding the Zones table
* Inner joins
* Basic data quality checks
* Left, Right and Outer joins
* Group by
## :movie_camera: Optional: Docker Networking and Port Mapping
> [!TIP]
> Optional: If you have some problems with docker networking, check **Port Mapping and Networks in Docker video**.
[![](https://markdown-videos-api.jorgenkh.no/youtube/tOr4hTsHOzU)](https://youtu.be/tOr4hTsHOzU&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=5)
* Docker networks
* Port forwarding to the host environment
* Communicating between containers in the network
* `.dockerignore` file
## :movie_camera: Optional: Walk-Through on WSL
> [!TIP]
> Optional: If you are willing to do the steps from "Ingesting NY Taxi Data to Postgres" till "Running Postgres and pgAdmin with Docker-Compose" with Windows Subsystem Linux please check **Docker Module Walk-Through on WSL**.
[![](https://markdown-videos-api.jorgenkh.no/youtube/Mv4zFm2AwzQ)](https://youtu.be/Mv4zFm2AwzQ&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=33)
# GCP
## :movie_camera: Introduction to GCP (Google Cloud Platform)
[![](https://markdown-videos-api.jorgenkh.no/youtube/18jIzE41fJ4)](https://youtu.be/18jIzE41fJ4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=3)
# Terraform
[Code](1_terraform_gcp)
## :movie_camera: Introduction Terraform: Concepts and Overview, a primer
[![](https://markdown-videos-api.jorgenkh.no/youtube/s2bOYDCKl_M)](https://youtu.be/s2bOYDCKl_M&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=11)
* [Companion Notes](1_terraform_gcp)
## :movie_camera: Terraform Basics: Simple one file Terraform Deployment
[![](https://markdown-videos-api.jorgenkh.no/youtube/Y2ux7gq3Z0o)](https://youtu.be/Y2ux7gq3Z0o&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=12)
* [Companion Notes](1_terraform_gcp)
## :movie_camera: Deployment with a Variables File
[![](https://markdown-videos-api.jorgenkh.no/youtube/PBi0hHjLftk)](https://youtu.be/PBi0hHjLftk&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=13)
* [Companion Notes](1_terraform_gcp)
## Configuring terraform and GCP SDK on Windows
* [Instructions](1_terraform_gcp/windows.md)
# Environment setup
For the course you'll need:
* Python 3 (e.g. installed with Anaconda)
* Google Cloud SDK
* Docker with docker-compose
* Terraform
* Git account
> [!NOTE]
>If you have problems setting up the environment, you can check these videos.
>
>If you already have a working coding environment on local machine, these are optional. And only need to select one method. But if you have time to learn it now, these would be helpful if the local environment suddenly do not work one day.
## :movie_camera: GCP Cloud VM
### Setting up the environment on cloud VM
[![](https://markdown-videos-api.jorgenkh.no/youtube/ae-CV2KfoN0)](https://youtu.be/ae-CV2KfoN0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=14)
* Generating SSH keys
* Creating a virtual machine on GCP
* Connecting to the VM with SSH
* Installing Anaconda
* Installing Docker
* Creating SSH `config` file
* Accessing the remote machine with VS Code and SSH remote
* Installing docker-compose
* Installing pgcli
* Port-forwarding with VS code: connecting to pgAdmin and Jupyter from the local computer
* Installing Terraform
* Using `sftp` for putting the credentials to the remote machine
* Shutting down and removing the instance
## :movie_camera: GitHub Codespaces
### Preparing the environment with GitHub Codespaces
[![](https://markdown-videos-api.jorgenkh.no/youtube/XOSUt8Ih3zA)](https://youtu.be/XOSUt8Ih3zA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=15)
# Homework
* [Homework](../cohorts/2024/01-docker-terraform/homework.md)
# Community notes
Did you take notes? You can share them here
* [Notes from Alvaro Navas](https://github.com/ziritrion/dataeng-zoomcamp/blob/main/notes/1_intro.md)
* [Notes from Abd](https://itnadigital.notion.site/Week-1-Introduction-f18de7e69eb4453594175d0b1334b2f4)
* [Notes from Aaron](https://github.com/ABZ-Aaron/DataEngineerZoomCamp/blob/master/week_1_basics_n_setup/README.md)
* [Notes from Faisal](https://github.com/FaisalMohd/data-engineering-zoomcamp/blob/main/week_1_basics_n_setup/Notes/DE%20Zoomcamp%20Week-1.pdf)
* [Michael Harty's Notes](https://github.com/mharty3/data_engineering_zoomcamp_2022/tree/main/week01)
* [Blog post from Isaac Kargar](https://kargarisaac.github.io/blog/data%20engineering/jupyter/2022/01/18/data-engineering-w1.html)
* [Handwritten Notes By Mahmoud Zaher](https://github.com/zaherweb/DataEngineering/blob/master/week%201.pdf)
* [Notes from Candace Williams](https://teacherc.github.io/data-engineering/2023/01/18/zoomcamp1.html)
* [Notes from Marcos Torregrosa](https://www.n4gash.com/2023/data-engineering-zoomcamp-semana-1/)
* [Notes from Vincenzo Galante](https://binchentso.notion.site/Data-Talks-Club-Data-Engineering-Zoomcamp-8699af8e7ff94ec49e6f9bdec8eb69fd)
* [Notes from Victor Padilha](https://github.com/padilha/de-zoomcamp/tree/master/week1)
* [Notes from froukje](https://github.com/froukje/de-zoomcamp/blob/main/week_1_basics_n_setup/notes/notes_week_01.md)
* [Notes from adamiaonr](https://github.com/adamiaonr/data-engineering-zoomcamp/blob/main/week_1_basics_n_setup/2_docker_sql/NOTES.md)
* [Notes from Xia He-Bleinagel](https://xiahe-bleinagel.com/2023/01/week-1-data-engineering-zoomcamp-notes/)
* [Notes from Balaji](https://github.com/Balajirvp/DE-Zoomcamp/blob/main/Week%201/Detailed%20Week%201%20Notes.ipynb)
* [Notes from Erik](https://twitter.com/ehub96/status/1621351266281730049)
* [Notes by Alain Boisvert](https://github.com/boisalai/de-zoomcamp-2023/blob/main/week1.md)
* Notes on [Docker, Docker Compose, and setting up a proper Python environment](https://medium.com/@verazabeida/zoomcamp-2023-week-1-f4f94cb360ae), by Vera
* [Setting up the development environment on Google Virtual Machine](https://itsadityagupta.hashnode.dev/setting-up-the-development-environment-on-google-virtual-machine), blog post by Aditya Gupta
* [Notes from Zharko Cekovski](https://www.zharconsulting.com/contents/data/data-engineering-bootcamp-2024/week-1-postgres-docker-and-ingestion-scripts/)
* [2024 Module-01 Walkthough video by ellacharmed on youtube](https://youtu.be/VUZshlVAnk4)
* [2024 Companion Module Walkthough slides by ellacharmed](https://github.com/ellacharmed/data-engineering-zoomcamp/blob/ella2024/cohorts/2024/01-docker-terraform/walkthrough-01.pdf)
* [2024 Module-01 Environment setup video by ellacharmed on youtube](https://youtu.be/Zce_Hd37NGs)
* [Docker Notes from Linda](https://github.com/inner-outer-space/de-zoomcamp-2024/blob/main/1-basics-n-setup/docker_sql/readme.md) • [Terraform Notes from Linda](https://github.com/inner-outer-space/de-zoomcamp-2024/blob/main/1-basics-n-setup/terraform_gcp/readme.md)
* Add your notes above this line

View File

@ -0,0 +1,154 @@
> If you're looking for Airflow videos from the 2022 edition,
> check the [2022 cohort folder](../cohorts/2022/week_2_data_ingestion/). <br>
> If you're looking for Prefect videos from the 2023 edition,
> check the [2023 cohort folder](../cohorts/2023/week_2_data_ingestion/).
# Week 2: Workflow Orchestration
Welcome to Week 2 of the Data Engineering Zoomcamp! 🚀😤 This week, we'll be covering workflow orchestration with Mage.
Mage is an open-source, hybrid framework for transforming and integrating data. ✨
This week, you'll learn how to use the Mage platform to author and share _magical_ data pipelines. This will all be covered in the course, but if you'd like to learn a bit more about Mage, check out our docs [here](https://docs.mage.ai/introduction/overview).
* [2.2.1 - 📯 Intro to Orchestration](#221----intro-to-orchestration)
* [2.2.2 - 🧙‍♂️ Intro to Mage](#222---%EF%B8%8F-intro-to-mage)
* [2.2.3 - 🐘 ETL: API to Postgres](#223----etl-api-to-postgres)
* [2.2.4 - 🤓 ETL: API to GCS](#224----etl-api-to-gcs)
* [2.2.5 - 🔍 ETL: GCS to BigQuery](#225----etl-gcs-to-bigquery)
* [2.2.6 - 👨‍💻 Parameterized Execution](#226----parameterized-execution)
* [2.2.7 - 🤖 Deployment (Optional)](#227----deployment-optional)
* [2.2.8 - 🧱 Advanced Blocks (Optional)](#228----advanced-blocks-optional)
* [2.2.9 - 🗒️ Homework](#229---%EF%B8%8F-homework)
* [2.2.10 - 👣 Next Steps](#2210----next-steps)
## 📕 Course Resources
### 2.2.1 - 📯 Intro to Orchestration
In this section, we'll cover the basics of workflow orchestration. We'll discuss what it is, why it's important, and how it can be used to build data pipelines.
Videos
- 2.2.1a - [What is Orchestration?](https://www.youtube.com/watch?v=Li8-MWHhTbo&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
Resources
- [Slides](https://docs.google.com/presentation/d/17zSxG5Z-tidmgY-9l7Al1cPmz4Slh4VPK6o2sryFYvw/)
### 2.2.2 - 🧙‍♂️ Intro to Mage
In this section, we'll introduce the Mage platform. We'll cover what makes Mage different from other orchestrators, the fundamental concepts behind Mage, and how to get started. To cap it off, we'll spin Mage up via Docker 🐳 and run a simple pipeline.
Videos
- 2.2.2a - [What is Mage?](https://www.youtube.com/watch?v=AicKRcK3pa4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
-
- 2.2.2b - [Configuring Mage](https://www.youtube.com/watch?v=tNiV7Wp08XE?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
- 2.2.2c - [A Simple Pipeline](https://www.youtube.com/watch?v=stI-gg4QBnI&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
Resources
- [Getting Started Repo](https://github.com/mage-ai/mage-zoomcamp)
- [Slides](https://docs.google.com/presentation/d/1y_5p3sxr6Xh1RqE6N8o2280gUzAdiic2hPhYUUD6l88/)
### 2.2.3 - 🐘 ETL: API to Postgres
Hooray! Mage is up and running. Now, let's build a _real_ pipeline. In this section, we'll build a simple ETL pipeline that loads data from an API into a Postgres database. Our database will be built using Docker— it will be running locally, but it's the same as if it were running in the cloud.
Videos
- 2.2.3a - [Configuring Postgres](https://www.youtube.com/watch?v=pmhI-ezd3BE&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
- 2.2.3b - [Writing an ETL Pipeline](https://www.youtube.com/watch?v=Maidfe7oKLs&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
Resources
- [Taxi Dataset](https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz)
- [Sample loading block](https://github.com/mage-ai/mage-zoomcamp/blob/solutions/magic-zoomcamp/data_loaders/load_nyc_taxi_data.py)
### 2.2.4 - 🤓 ETL: API to GCS
Ok, so we've written data _locally_ to a database, but what about the cloud? In this tutorial, we'll walk through the process of using Mage to extract, transform, and load data from an API to Google Cloud Storage (GCS).
We'll cover both writing _partitioned_ and _unpartitioned_ data to GCS and discuss _why_ you might want to do one over the other. Many data teams start with extracting data from a source and writing it to a data lake _before_ loading it to a structured data source, like a database.
Videos
- 2.2.4a - [Configuring GCP](https://www.youtube.com/watch?v=00LP360iYvE&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
- 2.2.4b - [Writing an ETL Pipeline](https://www.youtube.com/watch?v=w0XmcASRUnc&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
Resources
- [DTC Zoomcamp GCP Setup](../week_1_basics_n_setup/1_terraform_gcp/2_gcp_overview.md)
### 2.2.5 - 🔍 ETL: GCS to BigQuery
Now that we've written data to GCS, let's load it into BigQuery. In this section, we'll walk through the process of using Mage to load our data from GCS to BigQuery. This closely mirrors a very common data engineering workflow: loading data from a data lake into a data warehouse.
Videos
- 2.2.5a - [Writing an ETL Pipeline](https://www.youtube.com/watch?v=JKp_uzM-XsM)
### 2.2.6 - 👨‍💻 Parameterized Execution
By now you're familiar with building pipelines, but what about adding parameters? In this video, we'll discuss some built-in runtime variables that exist in Mage and show you how to define your own! We'll also cover how to use these variables to parameterize your pipelines. Finally, we'll talk about what it means to *backfill* a pipeline and how to do it in Mage.
Videos
- 2.2.6a - [Parameterized Execution](https://www.youtube.com/watch?v=H0hWjWxB-rg&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
- 2.2.6b - [Backfills](https://www.youtube.com/watch?v=ZoeC6Ag5gQc&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
Resources
- [Mage Variables Overview](https://docs.mage.ai/development/variables/overview)
- [Mage Runtime Variables](https://docs.mage.ai/getting-started/runtime-variable)
### 2.2.7 - 🤖 Deployment (Optional)
In this section, we'll cover deploying Mage using Terraform and Google Cloud. This section is optional— it's not *necessary* to learn Mage, but it might be helpful if you're interested in creating a fully deployed project. If you're using Mage in your final project, you'll need to deploy it to the cloud.
Videos
- 2.2.7a - [Deployment Prerequisites](https://www.youtube.com/watch?v=zAwAX5sxqsg&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
- 2.2.7b - [Google Cloud Permissions](https://www.youtube.com/watch?v=O_H7DCmq2rA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
- 2.2.7c - [Deploying to Google Cloud - Part 1](https://www.youtube.com/watch?v=9A872B5hb_0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
- 2.2.7d - [Deploying to Google Cloud - Part 2](https://www.youtube.com/watch?v=0YExsb2HgLI&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
Resources
- [Installing Terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli)
- [Installing `gcloud` CLI](https://cloud.google.com/sdk/docs/install)
- [Mage Terraform Templates](https://github.com/mage-ai/mage-ai-terraform-templates)
Additional Mage Guides
- [Terraform](https://docs.mage.ai/production/deploying-to-cloud/using-terraform)
- [Deploying to GCP with Terraform](https://docs.mage.ai/production/deploying-to-cloud/gcp/setup)
### 2.2.8 - 🗒️ Homework
We've prepared a short exercise to test you on what you've learned this week. You can find the homework [here](../cohorts/2024/02-workflow-orchestration/homework.md). This follows closely from the contents of the course and shouldn't take more than an hour or two to complete. 😄
### 2.2.9 - 👣 Next Steps
Congratulations! You've completed Week 2 of the Data Engineering Zoomcamp. We hope you've enjoyed learning about Mage and that you're excited to use it in your final project. If you have any questions, feel free to reach out to us on Slack. Be sure to check out our "Next Steps" video for some inspiration for the rest of your journey 😄.
Videos
- 2.2.9a - [Next Steps](https://www.youtube.com/watch?v=uUtj7N0TleQ)
Resources
- [Slides](https://docs.google.com/presentation/d/1yN-e22VNwezmPfKrZkgXQVrX5owDb285I2HxHWgmAEQ/edit#slide=id.g262fb0d2905_0_12)
### 📑 Additional Resources
- [Mage Docs](https://docs.mage.ai/)
- [Mage Guides](https://docs.mage.ai/guides)
- [Mage Slack](https://www.mage.ai/chat)
# Community notes
Did you take notes? You can share them here:
## 2024 notes
* [2024 Videos transcripts week 2](https://drive.google.com/drive/folders/1yxT0uMMYKa6YOxanh91wGqmQUMS7yYW7?usp=sharing) by Maria Fisher
* [Notes from Jonah Oliver](https://www.jonahboliver.com/blog/de-zc-w2)
* [Notes from Linda](https://github.com/inner-outer-space/de-zoomcamp-2024/blob/main/2-workflow-orchestration/readme.md)
* Add your notes above this line
## 2023 notes
See [here](../cohorts/2023/week_2_workflow_orchestration#community-notes)
## 2022 notes
See [here](../cohorts/2022/week_2_data_ingestion#community-notes)

View File

@ -1,51 +1,54 @@
## Data Warehouse and BigQuery
# Data Warehouse and BigQuery
- [Slides](https://docs.google.com/presentation/d/1a3ZoBAXFk8-EhUsd7rAZd-5p_HpltkzSeujjRGB2TAI/edit?usp=sharing)
- [Big Query basic SQL](big_query.sql)
# Videos
### Data Warehouse
## Data Warehouse
- [Data Warehouse and BigQuery](https://youtu.be/jrHljAoD6nM)
- [Data Warehouse and BigQuery](https://www.youtube.com/watch?v=jrHljAoD6nM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### Partitoning and clustering
## :movie_camera: Partitoning and clustering
- [Partioning and Clustering](https://youtu.be/jrHljAoD6nM?t=726)
- [Partioning vs Clustering](https://youtu.be/-CqXf7vhhDs)
- [Partioning and Clustering](https://www.youtube.com/watch?v=jrHljAoD6nM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
- [Partioning vs Clustering](https://www.youtube.com/watch?v=-CqXf7vhhDs&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### Best practices
## :movie_camera: Best practices
- [BigQuery Best Practices](https://youtu.be/k81mLJVX08w)
- [BigQuery Best Practices](https://www.youtube.com/watch?v=k81mLJVX08w&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### Internals of BigQuery
## :movie_camera: Internals of BigQuery
- [Internals of Big Query](https://youtu.be/eduHi1inM4s)
- [Internals of Big Query](https://www.youtube.com/watch?v=eduHi1inM4s&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### Advanced
## Advanced topics
#### ML
[BigQuery Machine Learning](https://youtu.be/B-WtpB0PuG4)
[SQL for ML in BigQuery](big_query_ml.sql)
### :movie_camera: Machine Learning in Big Query
* [BigQuery Machine Learning](https://www.youtube.com/watch?v=B-WtpB0PuG4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
* [SQL for ML in BigQuery](big_query_ml.sql)
**Important links**
- [BigQuery ML Tutorials](https://cloud.google.com/bigquery-ml/docs/tutorials)
- [BigQuery ML Reference Parameter](https://cloud.google.com/bigquery-ml/docs/analytics-reference-patterns)
- [Hyper Parameter tuning](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-glm)
- [Feature preprocessing](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-preprocess-overview)
##### Deploying ML model
### :movie_camera: Deploying ML model
- [BigQuery Machine Learning Deployment](https://youtu.be/BjARzEWaznU)
- [BigQuery Machine Learning Deployment](https://www.youtube.com/watch?v=BjARzEWaznU&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
- [Steps to extract and deploy model with docker](extract_model.md)
### Homework
# Homework
* [Homework](../cohorts/2023/week_3_data_warehouse/homework.md)
* [2024 Homework](../cohorts/2024/03-data-warehouse/homework.md)
## Community notes
# Community notes
Did you take notes? You can share them here.
@ -58,4 +61,6 @@ Did you take notes? You can share them here.
* [Notes by froukje](https://github.com/froukje/de-zoomcamp/blob/main/week_3_data_warehouse/notes/notes_week_03.md)
* [Notes by Alain Boisvert](https://github.com/boisalai/de-zoomcamp-2023/blob/main/week3.md)
* [Notes from Vincenzo Galante](https://binchentso.notion.site/Data-Talks-Club-Data-Engineering-Zoomcamp-8699af8e7ff94ec49e6f9bdec8eb69fd)
* [2024 videos transcript week3](https://drive.google.com/drive/folders/1quIiwWO-tJCruqvtlqe_Olw8nvYSmmDJ?usp=sharing) by Maria Fisher
* [Notes by Linda](https://github.com/inner-outer-space/de-zoomcamp-2024/blob/main/3-data-warehouse/readme.md)
* Add your notes here (above this line)

View File

@ -0,0 +1,141 @@
# Week 4: Analytics Engineering
Goal: Transforming the data loaded in DWH into Analytical Views developing a [dbt project](taxi_rides_ny/README.md).
### Prerequisites
By this stage of the course you should have already:
- A running warehouse (BigQuery or postgres)
- A set of running pipelines ingesting the project dataset (week 3 completed)
- The following datasets ingested from the course [Datasets list](https://github.com/DataTalksClub/nyc-tlc-data/):
* Yellow taxi data - Years 2019 and 2020
* Green taxi data - Years 2019 and 2020
* fhv data - Year 2019.
Note:
* A quick hack has been shared to load that data quicker, check instructions in [week3/extras](../03-data-warehouse/extras)
* If you recieve an error stating "Permission denied while globbing file pattern." when attemting to run fact_trips.sql this [Video](https://www.youtube.com/watch?v=kL3ZVNL9Y4A&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb) may be helpful in resolving the issue
## Setting up your environment
### Setting up dbt for using BigQuery (Alternative A - preferred)
1. Open a free developer dbt cloud account following[this link](https://www.getdbt.com/signup/)
2. [Following these instructions to connect to your BigQuery instance]([https://docs.getdbt.com/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-setting-up-bigquery-oauth](https://docs.getdbt.com/guides/bigquery?step=4)). More detailed instructions in [dbt_cloud_setup.md](dbt_cloud_setup.md)
_Optional_: If you feel more comfortable developing locally you could use a local installation of dbt core. You can follow the [official dbt documentation]([https://docs.getdbt.com/dbt-cli/installation](https://docs.getdbt.com/docs/core/installation-overview)) or follow the [dbt core with BigQuery on Docker](docker_setup/README.md) guide to setup dbt locally on docker. You will need to install the latest version with the BigQuery adapter (dbt-bigquery).
### Setting up dbt for using Postgres locally (Alternative B)
As an alternative to the cloud, that require to have a cloud database, you will be able to run the project installing dbt locally.
You can follow the [official dbt documentation]([https://docs.getdbt.com/dbt-cli/installation](https://docs.getdbt.com/dbt-cli/installation)) or use a docker image from oficial [dbt repo](https://github.com/dbt-labs/dbt/). You will need to install the latest version with the postgres adapter (dbt-postgres).
After local installation you will have to set up the connection to PG in the `profiles.yml`, you can find the templates [here](https://docs.getdbt.com/docs/core/connect-data-platform/postgres-setup)
</details>
## Content
### Introduction to analytics engineering
* What is analytics engineering?
* ETL vs ELT
* Data modeling concepts (fact and dim tables)
:movie_camera: [Video](https://www.youtube.com/watch?v=uF76d5EmdtU&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=32)
### What is dbt?
* Intro to dbt
:movie_camera: [Video](https://www.youtube.com/watch?v=4eCouvVOJUw&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=33)
## Starting a dbt project
### Alternative A: Using BigQuery + dbt cloud
* Starting a new project with dbt init (dbt cloud and core)
* dbt cloud setup
* project.yml
:movie_camera: [Video](https://www.youtube.com/watch?v=iMxh6s_wL4Q&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=34)
### Alternative B: Using Postgres + dbt core (locally)
* Starting a new project with dbt init (dbt cloud and core)
* dbt core local setup
* profiles.yml
* project.yml
:movie_camera: [Video](https://www.youtube.com/watch?v=1HmL63e-vRs&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=35)
### dbt models
* Anatomy of a dbt model: written code vs compiled Sources
* Materialisations: table, view, incremental, ephemeral
* Seeds, sources and ref
* Jinja and Macros
* Packages
* Variables
:movie_camera: [Video](https://www.youtube.com/watch?v=UVI30Vxzd6c&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=36)
_Note: This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice_
### Testing and documenting dbt models
* Tests
* Documentation
:movie_camera: [Video](https://www.youtube.com/watch?v=UishFmq1hLM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=37)
_Note: This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice_
## Deployment
### Alternative A: Using BigQuery + dbt cloud
* Deployment: development environment vs production
* dbt cloud: scheduler, sources and hosted documentation
:movie_camera: [Video](https://www.youtube.com/watch?v=rjf6yZNGX8I&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=38)
### Alternative B: Using Postgres + dbt core (locally)
* Deployment: development environment vs production
* dbt cloud: scheduler, sources and hosted documentation
:movie_camera: [Video](https://www.youtube.com/watch?v=Cs9Od1pcrzM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=39)
## Visualising the transformed data
:movie_camera: [Google data studio Video](https://www.youtube.com/watch?v=39nLTs74A3E&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=42)
:movie_camera: [Metabase Video](https://www.youtube.com/watch?v=BnLkrA7a6gM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=43)
## Advanced concepts
* [Make a model Incremental](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models)
* [Use of tags](https://docs.getdbt.com/reference/resource-configs/tags)
* [Hooks](https://docs.getdbt.com/docs/building-a-dbt-project/hooks-operations)
* [Analysis](https://docs.getdbt.com/docs/building-a-dbt-project/analyses)
* [Snapshots](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots)
* [Exposure](https://docs.getdbt.com/docs/building-a-dbt-project/exposures)
* [Metrics](https://docs.getdbt.com/docs/building-a-dbt-project/metrics)
## Community notes
Did you take notes? You can share them here.
* [Notes by Alvaro Navas](https://github.com/ziritrion/dataeng-zoomcamp/blob/main/notes/4_analytics.md)
* [Sandy's DE learning blog](https://learningdataengineering540969211.wordpress.com/2022/02/17/week-4-setting-up-dbt-cloud-with-bigquery/)
* [Notes by Victor Padilha](https://github.com/padilha/de-zoomcamp/tree/master/week4)
* [Marcos Torregrosa's blog (spanish)](https://www.n4gash.com/2023/data-engineering-zoomcamp-semana-4/)
* [Notes by froukje](https://github.com/froukje/de-zoomcamp/blob/main/week_4_analytics_engineering/notes/notes_week_04.md)
* [Notes by Alain Boisvert](https://github.com/boisalai/de-zoomcamp-2023/blob/main/week4.md)
* [Setting up Prefect with dbt by Vera](https://medium.com/@verazabeida/zoomcamp-week-5-5b6a9d53a3a0)
* [Blog by Xia He-Bleinagel](https://xiahe-bleinagel.com/2023/02/week-4-data-engineering-zoomcamp-notes-analytics-engineering-and-dbt/)
* [Setting up DBT with BigQuery by Tofag](https://medium.com/@fagbuyit/setting-up-your-dbt-cloud-dej-9-d18e5b7c96ba)
* [Blog post by Dewi Oktaviani](https://medium.com/@oktavianidewi/de-zoomcamp-2023-learning-week-4-analytics-engineering-with-dbt-53f781803d3e)
* [Notes from Vincenzo Galante](https://binchentso.notion.site/Data-Talks-Club-Data-Engineering-Zoomcamp-8699af8e7ff94ec49e6f9bdec8eb69fd)
* [Notes from Balaji](https://github.com/Balajirvp/DE-Zoomcamp/blob/main/Week%204/Data%20Engineering%20Zoomcamp%20Week%204.ipynb)
*Add your notes here (above this line)*
## Useful links
- [Slides used in the videos](https://docs.google.com/presentation/d/1xSll_jv0T8JF4rYZvLHfkJXYqUjPtThA/edit?usp=sharing&ouid=114544032874539580154&rtpof=true&sd=true)
- [Visualizing data with Metabase course](https://www.metabase.com/learn/visualization/)
- [dbt free courses](https://courses.getdbt.com/collections)

View File

@ -0,0 +1,5 @@
# you shouldn't commit these into source control
# these are the default directory names, adjust/add to fit your needs
target/
dbt_packages/
logs/

View File

@ -35,4 +35,4 @@ _Alternative: use `$ dbt build` to execute with one command the 3 steps above to
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](http://slack.getdbt.com/) on Slack for live discussions and support
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices

View File

@ -7,13 +7,13 @@ version: '1.0.0'
config-version: 2
# This setting configures which "profile" dbt uses for this project.
profile: 'pg-dbt-workshop'
profile: 'default'
# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that models in this project can be
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analysis"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
@ -21,17 +21,20 @@ snapshot-paths: ["snapshots"]
target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_packages"
- "dbt_modules"
- "target"
- "dbt_packages"
# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models
# In this example config, we tell dbt to build all models in the example/ directory
# as tables. These settings can be overridden in the individual model files
# In dbt, the default materialization for a model is a view. This means, when you run
# dbt run or dbt build, all of your models will be built as a view in your data platform.
# The configuration below will override this setting for models in the example folder to
# instead be materialized as tables. Any models you add to the root of the models folder will
# continue to be built as views. These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
taxi_rides_ny:
# Applies to all files under models/.../
@ -46,4 +49,4 @@ seeds:
taxi_rides_ny:
taxi_zone_lookup:
+column_types:
locationid: numeric
locationid: numeric

View File

@ -1,18 +1,17 @@
{#
{#
This macro returns the description of the payment_type
#}
{% macro get_payment_type_description(payment_type) -%}
case {{ payment_type }}
case {{ dbt.safe_cast("payment_type", api.Column.translate_type("integer")) }}
when 1 then 'Credit card'
when 2 then 'Cash'
when 3 then 'No charge'
when 4 then 'Dispute'
when 5 then 'Unknown'
when 6 then 'Voided trip'
else 'EMPTY'
end
{%- endmacro %}
{%- endmacro %}

View File

@ -1,9 +1,8 @@
{{ config(materialized='table') }}
select
locationid,
borough,
zone,
replace(service_zone,'Boro','Green') as service_zone
replace(service_zone,'Boro','Green') as service_zone
from {{ ref('taxi_zone_lookup') }}

View File

@ -6,8 +6,7 @@ with trips_data as (
select
-- Reveneue grouping
pickup_zone as revenue_zone,
date_trunc('month', pickup_datetime) as revenue_month,
--Note: For BQ use instead: date_trunc(pickup_datetime, month) as revenue_month,
{{ dbt.date_trunc("month", "pickup_datetime") }} as revenue_month,
service_type,
@ -20,7 +19,6 @@ with trips_data as (
sum(ehail_fee) as revenue_monthly_ehail_fee,
sum(improvement_surcharge) as revenue_monthly_improvement_surcharge,
sum(total_amount) as revenue_monthly_total_amount,
sum(congestion_surcharge) as revenue_monthly_congestion_surcharge,
-- Additional calculations
count(tripid) as total_monthly_trips,
@ -28,4 +26,4 @@ with trips_data as (
avg(trip_distance) as avg_montly_trip_distance
from trips_data
group by 1,2,3
group by 1,2,3

View File

@ -1,29 +1,29 @@
{{ config(materialized='table') }}
{{
config(
materialized='table'
)
}}
with green_data as (
with green_tripdata as (
select *,
'Green' as service_type
'Green' as service_type
from {{ ref('stg_green_tripdata') }}
),
yellow_data as (
yellow_tripdata as (
select *,
'Yellow' as service_type
from {{ ref('stg_yellow_tripdata') }}
),
trips_unioned as (
select * from green_data
union all
select * from yellow_data
select * from green_tripdata
union all
select * from yellow_tripdata
),
dim_zones as (
select * from {{ ref('dim_zones') }}
where borough != 'Unknown'
)
select
trips_unioned.tripid,
select trips_unioned.tripid,
trips_unioned.vendorid,
trips_unioned.service_type,
trips_unioned.ratecodeid,
@ -48,10 +48,9 @@ select
trips_unioned.improvement_surcharge,
trips_unioned.total_amount,
trips_unioned.payment_type,
trips_unioned.payment_type_description,
trips_unioned.congestion_surcharge
trips_unioned.payment_type_description
from trips_unioned
inner join dim_zones as pickup_zone
on trips_unioned.pickup_locationid = pickup_zone.locationid
inner join dim_zones as dropoff_zone
on trips_unioned.dropoff_locationid = dropoff_zone.locationid
on trips_unioned.dropoff_locationid = dropoff_zone.locationid

View File

@ -0,0 +1,129 @@
version: 2
models:
- name: dim_zones
description: >
List of unique zones idefied by locationid.
Includes the service zone they correspond to (Green or yellow).
- name: dm_monthly_zone_revenue
description: >
Aggregated table of all taxi trips corresponding to both service zones (Green and yellow) per pickup zone, month and service.
The table contains monthly sums of the fare elements used to calculate the monthly revenue.
The table contains also monthly indicators like number of trips, and average trip distance.
columns:
- name: revenue_monthly_total_amount
description: Monthly sum of the the total_amount of the fare charged for the trip per pickup zone, month and service.
tests:
- not_null:
severity: error
- name: fact_trips
description: >
Taxi trips corresponding to both service zones (Green and yellow).
The table contains records where both pickup and dropoff locations are valid and known zones.
Each record corresponds to a trip uniquely identified by tripid.
columns:
- name: tripid
data_type: string
description: "unique identifier conformed by the combination of vendorid and pickyp time"
- name: vendorid
data_type: int64
description: ""
- name: service_type
data_type: string
description: ""
- name: ratecodeid
data_type: int64
description: ""
- name: pickup_locationid
data_type: int64
description: ""
- name: pickup_borough
data_type: string
description: ""
- name: pickup_zone
data_type: string
description: ""
- name: dropoff_locationid
data_type: int64
description: ""
- name: dropoff_borough
data_type: string
description: ""
- name: dropoff_zone
data_type: string
description: ""
- name: pickup_datetime
data_type: timestamp
description: ""
- name: dropoff_datetime
data_type: timestamp
description: ""
- name: store_and_fwd_flag
data_type: string
description: ""
- name: passenger_count
data_type: int64
description: ""
- name: trip_distance
data_type: numeric
description: ""
- name: trip_type
data_type: int64
description: ""
- name: fare_amount
data_type: numeric
description: ""
- name: extra
data_type: numeric
description: ""
- name: mta_tax
data_type: numeric
description: ""
- name: tip_amount
data_type: numeric
description: ""
- name: tolls_amount
data_type: numeric
description: ""
- name: ehail_fee
data_type: numeric
description: ""
- name: improvement_surcharge
data_type: numeric
description: ""
- name: total_amount
data_type: numeric
description: ""
- name: payment_type
data_type: int64
description: ""
- name: payment_type_description
data_type: string
description: ""

View File

@ -1,20 +1,16 @@
version: 2
sources:
- name: staging
#For bigquery:
#database: taxi-rides-ny-339813
# For postgres:
database: production
schema: trips_data_all
- name: staging
database: taxi-rides-ny-339813-412521
# For postgres:
#database: production
schema: trips_data_all
# loaded_at_field: record_loaded_at
tables:
- name: green_tripdata
- name: yellow_tripdata
tables:
- name: green_tripdata
- name: yellow_tripdata
# freshness:
# error_after: {count: 6, period: hour}
@ -75,7 +71,7 @@ models:
memory before sending to the vendor, aka “store and forward,”
because the vehicle did not have a connection to the server.
Y= store and forward trip
N= not a store and forward trip
N = not a store and forward trip
- name: Dropoff_longitude
description: Longitude where the meter was disengaged.
- name: Dropoff_latitude
@ -200,4 +196,4 @@ models:
- name: Tolls_amount
description: Total amount of all tolls paid in trip.
- name: Total_amount
description: The total amount charged to passengers. Does not include cash tips.
description: The total amount charged to passengers. Does not include cash tips.

View File

@ -0,0 +1,52 @@
{{
config(
materialized='view'
)
}}
with tripdata as
(
select *,
row_number() over(partition by vendorid, lpep_pickup_datetime) as rn
from {{ source('staging','green_tripdata') }}
where vendorid is not null
)
select
-- identifiers
{{ dbt_utils.generate_surrogate_key(['vendorid', 'lpep_pickup_datetime']) }} as tripid,
{{ dbt.safe_cast("vendorid", api.Column.translate_type("integer")) }} as vendorid,
{{ dbt.safe_cast("ratecodeid", api.Column.translate_type("integer")) }} as ratecodeid,
{{ dbt.safe_cast("pulocationid", api.Column.translate_type("integer")) }} as pickup_locationid,
{{ dbt.safe_cast("dolocationid", api.Column.translate_type("integer")) }} as dropoff_locationid,
-- timestamps
cast(lpep_pickup_datetime as timestamp) as pickup_datetime,
cast(lpep_dropoff_datetime as timestamp) as dropoff_datetime,
-- trip info
store_and_fwd_flag,
{{ dbt.safe_cast("passenger_count", api.Column.translate_type("integer")) }} as passenger_count,
cast(trip_distance as numeric) as trip_distance,
{{ dbt.safe_cast("trip_type", api.Column.translate_type("integer")) }} as trip_type,
-- payment info
cast(fare_amount as numeric) as fare_amount,
cast(extra as numeric) as extra,
cast(mta_tax as numeric) as mta_tax,
cast(tip_amount as numeric) as tip_amount,
cast(tolls_amount as numeric) as tolls_amount,
cast(ehail_fee as numeric) as ehail_fee,
cast(improvement_surcharge as numeric) as improvement_surcharge,
cast(total_amount as numeric) as total_amount,
coalesce({{ dbt.safe_cast("payment_type", api.Column.translate_type("integer")) }},0) as payment_type,
{{ get_payment_type_description("payment_type") }} as payment_type_description
from tripdata
where rn = 1
-- dbt build --select <model_name> --vars '{'is_test_run': 'false'}'
{% if var('is_test_run', default=true) %}
limit 100
{% endif %}

View File

@ -9,19 +9,19 @@ with tripdata as
)
select
-- identifiers
{{ dbt_utils.surrogate_key(['vendorid', 'tpep_pickup_datetime']) }} as tripid,
cast(vendorid as integer) as vendorid,
cast(ratecodeid as integer) as ratecodeid,
cast(pulocationid as integer) as pickup_locationid,
cast(dolocationid as integer) as dropoff_locationid,
{{ dbt_utils.generate_surrogate_key(['vendorid', 'tpep_pickup_datetime']) }} as tripid,
{{ dbt.safe_cast("vendorid", api.Column.translate_type("integer")) }} as vendorid,
{{ dbt.safe_cast("ratecodeid", api.Column.translate_type("integer")) }} as ratecodeid,
{{ dbt.safe_cast("pulocationid", api.Column.translate_type("integer")) }} as pickup_locationid,
{{ dbt.safe_cast("dolocationid", api.Column.translate_type("integer")) }} as dropoff_locationid,
-- timestamps
cast(tpep_pickup_datetime as timestamp) as pickup_datetime,
cast(tpep_dropoff_datetime as timestamp) as dropoff_datetime,
-- trip info
store_and_fwd_flag,
cast(passenger_count as integer) as passenger_count,
{{ dbt.safe_cast("passenger_count", api.Column.translate_type("integer")) }} as passenger_count,
cast(trip_distance as numeric) as trip_distance,
-- yellow cabs are always street-hail
1 as trip_type,
@ -35,16 +35,14 @@ select
cast(0 as numeric) as ehail_fee,
cast(improvement_surcharge as numeric) as improvement_surcharge,
cast(total_amount as numeric) as total_amount,
cast(payment_type as integer) as payment_type,
{{ get_payment_type_description('payment_type') }} as payment_type_description,
cast(congestion_surcharge as numeric) as congestion_surcharge
coalesce({{ dbt.safe_cast("payment_type", api.Column.translate_type("integer")) }},0) as payment_type,
{{ get_payment_type_description('payment_type') }} as payment_type_description
from tripdata
where rn = 1
-- dbt build --m <model.sql> --var 'is_test_run: false'
-- dbt build --select <model.sql> --vars '{'is_test_run: false}'
{% if var('is_test_run', default=true) %}
limit 100
{% endif %}
{% endif %}

View File

@ -0,0 +1,6 @@
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: dbt-labs/codegen
version: 0.12.1
sha1_hash: d974113b0f072cce35300077208f38581075ab40

View File

@ -0,0 +1,5 @@
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: dbt-labs/codegen
version: 0.12.1

View File

@ -6,5 +6,4 @@ seeds:
Taxi Zones roughly based on NYC Department of City Planning's Neighborhood
Tabulation Areas (NTAs) and are meant to approximate neighborhoods, so you can see which
neighborhood a passenger was picked up in, and which neighborhood they were dropped off in.
Includes associated service_zone (EWR, Boro Zone, Yellow Zone)
Includes associated service_zone (EWR, Boro Zone, Yellow Zone)

View File

@ -1,266 +1,266 @@
"locationid","borough","zone","service_zone"
1,"EWR","Newark Airport","EWR"
2,"Queens","Jamaica Bay","Boro Zone"
3,"Bronx","Allerton/Pelham Gardens","Boro Zone"
4,"Manhattan","Alphabet City","Yellow Zone"
5,"Staten Island","Arden Heights","Boro Zone"
6,"Staten Island","Arrochar/Fort Wadsworth","Boro Zone"
7,"Queens","Astoria","Boro Zone"
8,"Queens","Astoria Park","Boro Zone"
9,"Queens","Auburndale","Boro Zone"
10,"Queens","Baisley Park","Boro Zone"
11,"Brooklyn","Bath Beach","Boro Zone"
12,"Manhattan","Battery Park","Yellow Zone"
13,"Manhattan","Battery Park City","Yellow Zone"
14,"Brooklyn","Bay Ridge","Boro Zone"
15,"Queens","Bay Terrace/Fort Totten","Boro Zone"
16,"Queens","Bayside","Boro Zone"
17,"Brooklyn","Bedford","Boro Zone"
18,"Bronx","Bedford Park","Boro Zone"
19,"Queens","Bellerose","Boro Zone"
20,"Bronx","Belmont","Boro Zone"
21,"Brooklyn","Bensonhurst East","Boro Zone"
22,"Brooklyn","Bensonhurst West","Boro Zone"
23,"Staten Island","Bloomfield/Emerson Hill","Boro Zone"
24,"Manhattan","Bloomingdale","Yellow Zone"
25,"Brooklyn","Boerum Hill","Boro Zone"
26,"Brooklyn","Borough Park","Boro Zone"
27,"Queens","Breezy Point/Fort Tilden/Riis Beach","Boro Zone"
28,"Queens","Briarwood/Jamaica Hills","Boro Zone"
29,"Brooklyn","Brighton Beach","Boro Zone"
30,"Queens","Broad Channel","Boro Zone"
31,"Bronx","Bronx Park","Boro Zone"
32,"Bronx","Bronxdale","Boro Zone"
33,"Brooklyn","Brooklyn Heights","Boro Zone"
34,"Brooklyn","Brooklyn Navy Yard","Boro Zone"
35,"Brooklyn","Brownsville","Boro Zone"
36,"Brooklyn","Bushwick North","Boro Zone"
37,"Brooklyn","Bushwick South","Boro Zone"
38,"Queens","Cambria Heights","Boro Zone"
39,"Brooklyn","Canarsie","Boro Zone"
40,"Brooklyn","Carroll Gardens","Boro Zone"
41,"Manhattan","Central Harlem","Boro Zone"
42,"Manhattan","Central Harlem North","Boro Zone"
43,"Manhattan","Central Park","Yellow Zone"
44,"Staten Island","Charleston/Tottenville","Boro Zone"
45,"Manhattan","Chinatown","Yellow Zone"
46,"Bronx","City Island","Boro Zone"
47,"Bronx","Claremont/Bathgate","Boro Zone"
48,"Manhattan","Clinton East","Yellow Zone"
49,"Brooklyn","Clinton Hill","Boro Zone"
50,"Manhattan","Clinton West","Yellow Zone"
51,"Bronx","Co-Op City","Boro Zone"
52,"Brooklyn","Cobble Hill","Boro Zone"
53,"Queens","College Point","Boro Zone"
54,"Brooklyn","Columbia Street","Boro Zone"
55,"Brooklyn","Coney Island","Boro Zone"
56,"Queens","Corona","Boro Zone"
57,"Queens","Corona","Boro Zone"
58,"Bronx","Country Club","Boro Zone"
59,"Bronx","Crotona Park","Boro Zone"
60,"Bronx","Crotona Park East","Boro Zone"
61,"Brooklyn","Crown Heights North","Boro Zone"
62,"Brooklyn","Crown Heights South","Boro Zone"
63,"Brooklyn","Cypress Hills","Boro Zone"
64,"Queens","Douglaston","Boro Zone"
65,"Brooklyn","Downtown Brooklyn/MetroTech","Boro Zone"
66,"Brooklyn","DUMBO/Vinegar Hill","Boro Zone"
67,"Brooklyn","Dyker Heights","Boro Zone"
68,"Manhattan","East Chelsea","Yellow Zone"
69,"Bronx","East Concourse/Concourse Village","Boro Zone"
70,"Queens","East Elmhurst","Boro Zone"
71,"Brooklyn","East Flatbush/Farragut","Boro Zone"
72,"Brooklyn","East Flatbush/Remsen Village","Boro Zone"
73,"Queens","East Flushing","Boro Zone"
74,"Manhattan","East Harlem North","Boro Zone"
75,"Manhattan","East Harlem South","Boro Zone"
76,"Brooklyn","East New York","Boro Zone"
77,"Brooklyn","East New York/Pennsylvania Avenue","Boro Zone"
78,"Bronx","East Tremont","Boro Zone"
79,"Manhattan","East Village","Yellow Zone"
80,"Brooklyn","East Williamsburg","Boro Zone"
81,"Bronx","Eastchester","Boro Zone"
82,"Queens","Elmhurst","Boro Zone"
83,"Queens","Elmhurst/Maspeth","Boro Zone"
84,"Staten Island","Eltingville/Annadale/Prince's Bay","Boro Zone"
85,"Brooklyn","Erasmus","Boro Zone"
86,"Queens","Far Rockaway","Boro Zone"
87,"Manhattan","Financial District North","Yellow Zone"
88,"Manhattan","Financial District South","Yellow Zone"
89,"Brooklyn","Flatbush/Ditmas Park","Boro Zone"
90,"Manhattan","Flatiron","Yellow Zone"
91,"Brooklyn","Flatlands","Boro Zone"
92,"Queens","Flushing","Boro Zone"
93,"Queens","Flushing Meadows-Corona Park","Boro Zone"
94,"Bronx","Fordham South","Boro Zone"
95,"Queens","Forest Hills","Boro Zone"
96,"Queens","Forest Park/Highland Park","Boro Zone"
97,"Brooklyn","Fort Greene","Boro Zone"
98,"Queens","Fresh Meadows","Boro Zone"
99,"Staten Island","Freshkills Park","Boro Zone"
100,"Manhattan","Garment District","Yellow Zone"
101,"Queens","Glen Oaks","Boro Zone"
102,"Queens","Glendale","Boro Zone"
103,"Manhattan","Governor's Island/Ellis Island/Liberty Island","Yellow Zone"
104,"Manhattan","Governor's Island/Ellis Island/Liberty Island","Yellow Zone"
105,"Manhattan","Governor's Island/Ellis Island/Liberty Island","Yellow Zone"
106,"Brooklyn","Gowanus","Boro Zone"
107,"Manhattan","Gramercy","Yellow Zone"
108,"Brooklyn","Gravesend","Boro Zone"
109,"Staten Island","Great Kills","Boro Zone"
110,"Staten Island","Great Kills Park","Boro Zone"
111,"Brooklyn","Green-Wood Cemetery","Boro Zone"
112,"Brooklyn","Greenpoint","Boro Zone"
113,"Manhattan","Greenwich Village North","Yellow Zone"
114,"Manhattan","Greenwich Village South","Yellow Zone"
115,"Staten Island","Grymes Hill/Clifton","Boro Zone"
116,"Manhattan","Hamilton Heights","Boro Zone"
117,"Queens","Hammels/Arverne","Boro Zone"
118,"Staten Island","Heartland Village/Todt Hill","Boro Zone"
119,"Bronx","Highbridge","Boro Zone"
120,"Manhattan","Highbridge Park","Boro Zone"
121,"Queens","Hillcrest/Pomonok","Boro Zone"
122,"Queens","Hollis","Boro Zone"
123,"Brooklyn","Homecrest","Boro Zone"
124,"Queens","Howard Beach","Boro Zone"
125,"Manhattan","Hudson Sq","Yellow Zone"
126,"Bronx","Hunts Point","Boro Zone"
127,"Manhattan","Inwood","Boro Zone"
128,"Manhattan","Inwood Hill Park","Boro Zone"
129,"Queens","Jackson Heights","Boro Zone"
130,"Queens","Jamaica","Boro Zone"
131,"Queens","Jamaica Estates","Boro Zone"
132,"Queens","JFK Airport","Airports"
133,"Brooklyn","Kensington","Boro Zone"
134,"Queens","Kew Gardens","Boro Zone"
135,"Queens","Kew Gardens Hills","Boro Zone"
136,"Bronx","Kingsbridge Heights","Boro Zone"
137,"Manhattan","Kips Bay","Yellow Zone"
138,"Queens","LaGuardia Airport","Airports"
139,"Queens","Laurelton","Boro Zone"
140,"Manhattan","Lenox Hill East","Yellow Zone"
141,"Manhattan","Lenox Hill West","Yellow Zone"
142,"Manhattan","Lincoln Square East","Yellow Zone"
143,"Manhattan","Lincoln Square West","Yellow Zone"
144,"Manhattan","Little Italy/NoLiTa","Yellow Zone"
145,"Queens","Long Island City/Hunters Point","Boro Zone"
146,"Queens","Long Island City/Queens Plaza","Boro Zone"
147,"Bronx","Longwood","Boro Zone"
148,"Manhattan","Lower East Side","Yellow Zone"
149,"Brooklyn","Madison","Boro Zone"
150,"Brooklyn","Manhattan Beach","Boro Zone"
151,"Manhattan","Manhattan Valley","Yellow Zone"
152,"Manhattan","Manhattanville","Boro Zone"
153,"Manhattan","Marble Hill","Boro Zone"
154,"Brooklyn","Marine Park/Floyd Bennett Field","Boro Zone"
155,"Brooklyn","Marine Park/Mill Basin","Boro Zone"
156,"Staten Island","Mariners Harbor","Boro Zone"
157,"Queens","Maspeth","Boro Zone"
158,"Manhattan","Meatpacking/West Village West","Yellow Zone"
159,"Bronx","Melrose South","Boro Zone"
160,"Queens","Middle Village","Boro Zone"
161,"Manhattan","Midtown Center","Yellow Zone"
162,"Manhattan","Midtown East","Yellow Zone"
163,"Manhattan","Midtown North","Yellow Zone"
164,"Manhattan","Midtown South","Yellow Zone"
165,"Brooklyn","Midwood","Boro Zone"
166,"Manhattan","Morningside Heights","Boro Zone"
167,"Bronx","Morrisania/Melrose","Boro Zone"
168,"Bronx","Mott Haven/Port Morris","Boro Zone"
169,"Bronx","Mount Hope","Boro Zone"
170,"Manhattan","Murray Hill","Yellow Zone"
171,"Queens","Murray Hill-Queens","Boro Zone"
172,"Staten Island","New Dorp/Midland Beach","Boro Zone"
173,"Queens","North Corona","Boro Zone"
174,"Bronx","Norwood","Boro Zone"
175,"Queens","Oakland Gardens","Boro Zone"
176,"Staten Island","Oakwood","Boro Zone"
177,"Brooklyn","Ocean Hill","Boro Zone"
178,"Brooklyn","Ocean Parkway South","Boro Zone"
179,"Queens","Old Astoria","Boro Zone"
180,"Queens","Ozone Park","Boro Zone"
181,"Brooklyn","Park Slope","Boro Zone"
182,"Bronx","Parkchester","Boro Zone"
183,"Bronx","Pelham Bay","Boro Zone"
184,"Bronx","Pelham Bay Park","Boro Zone"
185,"Bronx","Pelham Parkway","Boro Zone"
186,"Manhattan","Penn Station/Madison Sq West","Yellow Zone"
187,"Staten Island","Port Richmond","Boro Zone"
188,"Brooklyn","Prospect-Lefferts Gardens","Boro Zone"
189,"Brooklyn","Prospect Heights","Boro Zone"
190,"Brooklyn","Prospect Park","Boro Zone"
191,"Queens","Queens Village","Boro Zone"
192,"Queens","Queensboro Hill","Boro Zone"
193,"Queens","Queensbridge/Ravenswood","Boro Zone"
194,"Manhattan","Randalls Island","Yellow Zone"
195,"Brooklyn","Red Hook","Boro Zone"
196,"Queens","Rego Park","Boro Zone"
197,"Queens","Richmond Hill","Boro Zone"
198,"Queens","Ridgewood","Boro Zone"
199,"Bronx","Rikers Island","Boro Zone"
200,"Bronx","Riverdale/North Riverdale/Fieldston","Boro Zone"
201,"Queens","Rockaway Park","Boro Zone"
202,"Manhattan","Roosevelt Island","Boro Zone"
203,"Queens","Rosedale","Boro Zone"
204,"Staten Island","Rossville/Woodrow","Boro Zone"
205,"Queens","Saint Albans","Boro Zone"
206,"Staten Island","Saint George/New Brighton","Boro Zone"
207,"Queens","Saint Michaels Cemetery/Woodside","Boro Zone"
208,"Bronx","Schuylerville/Edgewater Park","Boro Zone"
209,"Manhattan","Seaport","Yellow Zone"
210,"Brooklyn","Sheepshead Bay","Boro Zone"
211,"Manhattan","SoHo","Yellow Zone"
212,"Bronx","Soundview/Bruckner","Boro Zone"
213,"Bronx","Soundview/Castle Hill","Boro Zone"
214,"Staten Island","South Beach/Dongan Hills","Boro Zone"
215,"Queens","South Jamaica","Boro Zone"
216,"Queens","South Ozone Park","Boro Zone"
217,"Brooklyn","South Williamsburg","Boro Zone"
218,"Queens","Springfield Gardens North","Boro Zone"
219,"Queens","Springfield Gardens South","Boro Zone"
220,"Bronx","Spuyten Duyvil/Kingsbridge","Boro Zone"
221,"Staten Island","Stapleton","Boro Zone"
222,"Brooklyn","Starrett City","Boro Zone"
223,"Queens","Steinway","Boro Zone"
224,"Manhattan","Stuy Town/Peter Cooper Village","Yellow Zone"
225,"Brooklyn","Stuyvesant Heights","Boro Zone"
226,"Queens","Sunnyside","Boro Zone"
227,"Brooklyn","Sunset Park East","Boro Zone"
228,"Brooklyn","Sunset Park West","Boro Zone"
229,"Manhattan","Sutton Place/Turtle Bay North","Yellow Zone"
230,"Manhattan","Times Sq/Theatre District","Yellow Zone"
231,"Manhattan","TriBeCa/Civic Center","Yellow Zone"
232,"Manhattan","Two Bridges/Seward Park","Yellow Zone"
233,"Manhattan","UN/Turtle Bay South","Yellow Zone"
234,"Manhattan","Union Sq","Yellow Zone"
235,"Bronx","University Heights/Morris Heights","Boro Zone"
236,"Manhattan","Upper East Side North","Yellow Zone"
237,"Manhattan","Upper East Side South","Yellow Zone"
238,"Manhattan","Upper West Side North","Yellow Zone"
239,"Manhattan","Upper West Side South","Yellow Zone"
240,"Bronx","Van Cortlandt Park","Boro Zone"
241,"Bronx","Van Cortlandt Village","Boro Zone"
242,"Bronx","Van Nest/Morris Park","Boro Zone"
243,"Manhattan","Washington Heights North","Boro Zone"
244,"Manhattan","Washington Heights South","Boro Zone"
245,"Staten Island","West Brighton","Boro Zone"
246,"Manhattan","West Chelsea/Hudson Yards","Yellow Zone"
247,"Bronx","West Concourse","Boro Zone"
248,"Bronx","West Farms/Bronx River","Boro Zone"
249,"Manhattan","West Village","Yellow Zone"
250,"Bronx","Westchester Village/Unionport","Boro Zone"
251,"Staten Island","Westerleigh","Boro Zone"
252,"Queens","Whitestone","Boro Zone"
253,"Queens","Willets Point","Boro Zone"
254,"Bronx","Williamsbridge/Olinville","Boro Zone"
255,"Brooklyn","Williamsburg (North Side)","Boro Zone"
256,"Brooklyn","Williamsburg (South Side)","Boro Zone"
257,"Brooklyn","Windsor Terrace","Boro Zone"
258,"Queens","Woodhaven","Boro Zone"
259,"Bronx","Woodlawn/Wakefield","Boro Zone"
260,"Queens","Woodside","Boro Zone"
261,"Manhattan","World Trade Center","Yellow Zone"
262,"Manhattan","Yorkville East","Yellow Zone"
263,"Manhattan","Yorkville West","Yellow Zone"
264,"Unknown","NV","N/A"
265,"Unknown","NA","N/A"
"locationid","borough","zone","service_zone"
1,"EWR","Newark Airport","EWR"
2,"Queens","Jamaica Bay","Boro Zone"
3,"Bronx","Allerton/Pelham Gardens","Boro Zone"
4,"Manhattan","Alphabet City","Yellow Zone"
5,"Staten Island","Arden Heights","Boro Zone"
6,"Staten Island","Arrochar/Fort Wadsworth","Boro Zone"
7,"Queens","Astoria","Boro Zone"
8,"Queens","Astoria Park","Boro Zone"
9,"Queens","Auburndale","Boro Zone"
10,"Queens","Baisley Park","Boro Zone"
11,"Brooklyn","Bath Beach","Boro Zone"
12,"Manhattan","Battery Park","Yellow Zone"
13,"Manhattan","Battery Park City","Yellow Zone"
14,"Brooklyn","Bay Ridge","Boro Zone"
15,"Queens","Bay Terrace/Fort Totten","Boro Zone"
16,"Queens","Bayside","Boro Zone"
17,"Brooklyn","Bedford","Boro Zone"
18,"Bronx","Bedford Park","Boro Zone"
19,"Queens","Bellerose","Boro Zone"
20,"Bronx","Belmont","Boro Zone"
21,"Brooklyn","Bensonhurst East","Boro Zone"
22,"Brooklyn","Bensonhurst West","Boro Zone"
23,"Staten Island","Bloomfield/Emerson Hill","Boro Zone"
24,"Manhattan","Bloomingdale","Yellow Zone"
25,"Brooklyn","Boerum Hill","Boro Zone"
26,"Brooklyn","Borough Park","Boro Zone"
27,"Queens","Breezy Point/Fort Tilden/Riis Beach","Boro Zone"
28,"Queens","Briarwood/Jamaica Hills","Boro Zone"
29,"Brooklyn","Brighton Beach","Boro Zone"
30,"Queens","Broad Channel","Boro Zone"
31,"Bronx","Bronx Park","Boro Zone"
32,"Bronx","Bronxdale","Boro Zone"
33,"Brooklyn","Brooklyn Heights","Boro Zone"
34,"Brooklyn","Brooklyn Navy Yard","Boro Zone"
35,"Brooklyn","Brownsville","Boro Zone"
36,"Brooklyn","Bushwick North","Boro Zone"
37,"Brooklyn","Bushwick South","Boro Zone"
38,"Queens","Cambria Heights","Boro Zone"
39,"Brooklyn","Canarsie","Boro Zone"
40,"Brooklyn","Carroll Gardens","Boro Zone"
41,"Manhattan","Central Harlem","Boro Zone"
42,"Manhattan","Central Harlem North","Boro Zone"
43,"Manhattan","Central Park","Yellow Zone"
44,"Staten Island","Charleston/Tottenville","Boro Zone"
45,"Manhattan","Chinatown","Yellow Zone"
46,"Bronx","City Island","Boro Zone"
47,"Bronx","Claremont/Bathgate","Boro Zone"
48,"Manhattan","Clinton East","Yellow Zone"
49,"Brooklyn","Clinton Hill","Boro Zone"
50,"Manhattan","Clinton West","Yellow Zone"
51,"Bronx","Co-Op City","Boro Zone"
52,"Brooklyn","Cobble Hill","Boro Zone"
53,"Queens","College Point","Boro Zone"
54,"Brooklyn","Columbia Street","Boro Zone"
55,"Brooklyn","Coney Island","Boro Zone"
56,"Queens","Corona","Boro Zone"
57,"Queens","Corona","Boro Zone"
58,"Bronx","Country Club","Boro Zone"
59,"Bronx","Crotona Park","Boro Zone"
60,"Bronx","Crotona Park East","Boro Zone"
61,"Brooklyn","Crown Heights North","Boro Zone"
62,"Brooklyn","Crown Heights South","Boro Zone"
63,"Brooklyn","Cypress Hills","Boro Zone"
64,"Queens","Douglaston","Boro Zone"
65,"Brooklyn","Downtown Brooklyn/MetroTech","Boro Zone"
66,"Brooklyn","DUMBO/Vinegar Hill","Boro Zone"
67,"Brooklyn","Dyker Heights","Boro Zone"
68,"Manhattan","East Chelsea","Yellow Zone"
69,"Bronx","East Concourse/Concourse Village","Boro Zone"
70,"Queens","East Elmhurst","Boro Zone"
71,"Brooklyn","East Flatbush/Farragut","Boro Zone"
72,"Brooklyn","East Flatbush/Remsen Village","Boro Zone"
73,"Queens","East Flushing","Boro Zone"
74,"Manhattan","East Harlem North","Boro Zone"
75,"Manhattan","East Harlem South","Boro Zone"
76,"Brooklyn","East New York","Boro Zone"
77,"Brooklyn","East New York/Pennsylvania Avenue","Boro Zone"
78,"Bronx","East Tremont","Boro Zone"
79,"Manhattan","East Village","Yellow Zone"
80,"Brooklyn","East Williamsburg","Boro Zone"
81,"Bronx","Eastchester","Boro Zone"
82,"Queens","Elmhurst","Boro Zone"
83,"Queens","Elmhurst/Maspeth","Boro Zone"
84,"Staten Island","Eltingville/Annadale/Prince's Bay","Boro Zone"
85,"Brooklyn","Erasmus","Boro Zone"
86,"Queens","Far Rockaway","Boro Zone"
87,"Manhattan","Financial District North","Yellow Zone"
88,"Manhattan","Financial District South","Yellow Zone"
89,"Brooklyn","Flatbush/Ditmas Park","Boro Zone"
90,"Manhattan","Flatiron","Yellow Zone"
91,"Brooklyn","Flatlands","Boro Zone"
92,"Queens","Flushing","Boro Zone"
93,"Queens","Flushing Meadows-Corona Park","Boro Zone"
94,"Bronx","Fordham South","Boro Zone"
95,"Queens","Forest Hills","Boro Zone"
96,"Queens","Forest Park/Highland Park","Boro Zone"
97,"Brooklyn","Fort Greene","Boro Zone"
98,"Queens","Fresh Meadows","Boro Zone"
99,"Staten Island","Freshkills Park","Boro Zone"
100,"Manhattan","Garment District","Yellow Zone"
101,"Queens","Glen Oaks","Boro Zone"
102,"Queens","Glendale","Boro Zone"
103,"Manhattan","Governor's Island/Ellis Island/Liberty Island","Yellow Zone"
104,"Manhattan","Governor's Island/Ellis Island/Liberty Island","Yellow Zone"
105,"Manhattan","Governor's Island/Ellis Island/Liberty Island","Yellow Zone"
106,"Brooklyn","Gowanus","Boro Zone"
107,"Manhattan","Gramercy","Yellow Zone"
108,"Brooklyn","Gravesend","Boro Zone"
109,"Staten Island","Great Kills","Boro Zone"
110,"Staten Island","Great Kills Park","Boro Zone"
111,"Brooklyn","Green-Wood Cemetery","Boro Zone"
112,"Brooklyn","Greenpoint","Boro Zone"
113,"Manhattan","Greenwich Village North","Yellow Zone"
114,"Manhattan","Greenwich Village South","Yellow Zone"
115,"Staten Island","Grymes Hill/Clifton","Boro Zone"
116,"Manhattan","Hamilton Heights","Boro Zone"
117,"Queens","Hammels/Arverne","Boro Zone"
118,"Staten Island","Heartland Village/Todt Hill","Boro Zone"
119,"Bronx","Highbridge","Boro Zone"
120,"Manhattan","Highbridge Park","Boro Zone"
121,"Queens","Hillcrest/Pomonok","Boro Zone"
122,"Queens","Hollis","Boro Zone"
123,"Brooklyn","Homecrest","Boro Zone"
124,"Queens","Howard Beach","Boro Zone"
125,"Manhattan","Hudson Sq","Yellow Zone"
126,"Bronx","Hunts Point","Boro Zone"
127,"Manhattan","Inwood","Boro Zone"
128,"Manhattan","Inwood Hill Park","Boro Zone"
129,"Queens","Jackson Heights","Boro Zone"
130,"Queens","Jamaica","Boro Zone"
131,"Queens","Jamaica Estates","Boro Zone"
132,"Queens","JFK Airport","Airports"
133,"Brooklyn","Kensington","Boro Zone"
134,"Queens","Kew Gardens","Boro Zone"
135,"Queens","Kew Gardens Hills","Boro Zone"
136,"Bronx","Kingsbridge Heights","Boro Zone"
137,"Manhattan","Kips Bay","Yellow Zone"
138,"Queens","LaGuardia Airport","Airports"
139,"Queens","Laurelton","Boro Zone"
140,"Manhattan","Lenox Hill East","Yellow Zone"
141,"Manhattan","Lenox Hill West","Yellow Zone"
142,"Manhattan","Lincoln Square East","Yellow Zone"
143,"Manhattan","Lincoln Square West","Yellow Zone"
144,"Manhattan","Little Italy/NoLiTa","Yellow Zone"
145,"Queens","Long Island City/Hunters Point","Boro Zone"
146,"Queens","Long Island City/Queens Plaza","Boro Zone"
147,"Bronx","Longwood","Boro Zone"
148,"Manhattan","Lower East Side","Yellow Zone"
149,"Brooklyn","Madison","Boro Zone"
150,"Brooklyn","Manhattan Beach","Boro Zone"
151,"Manhattan","Manhattan Valley","Yellow Zone"
152,"Manhattan","Manhattanville","Boro Zone"
153,"Manhattan","Marble Hill","Boro Zone"
154,"Brooklyn","Marine Park/Floyd Bennett Field","Boro Zone"
155,"Brooklyn","Marine Park/Mill Basin","Boro Zone"
156,"Staten Island","Mariners Harbor","Boro Zone"
157,"Queens","Maspeth","Boro Zone"
158,"Manhattan","Meatpacking/West Village West","Yellow Zone"
159,"Bronx","Melrose South","Boro Zone"
160,"Queens","Middle Village","Boro Zone"
161,"Manhattan","Midtown Center","Yellow Zone"
162,"Manhattan","Midtown East","Yellow Zone"
163,"Manhattan","Midtown North","Yellow Zone"
164,"Manhattan","Midtown South","Yellow Zone"
165,"Brooklyn","Midwood","Boro Zone"
166,"Manhattan","Morningside Heights","Boro Zone"
167,"Bronx","Morrisania/Melrose","Boro Zone"
168,"Bronx","Mott Haven/Port Morris","Boro Zone"
169,"Bronx","Mount Hope","Boro Zone"
170,"Manhattan","Murray Hill","Yellow Zone"
171,"Queens","Murray Hill-Queens","Boro Zone"
172,"Staten Island","New Dorp/Midland Beach","Boro Zone"
173,"Queens","North Corona","Boro Zone"
174,"Bronx","Norwood","Boro Zone"
175,"Queens","Oakland Gardens","Boro Zone"
176,"Staten Island","Oakwood","Boro Zone"
177,"Brooklyn","Ocean Hill","Boro Zone"
178,"Brooklyn","Ocean Parkway South","Boro Zone"
179,"Queens","Old Astoria","Boro Zone"
180,"Queens","Ozone Park","Boro Zone"
181,"Brooklyn","Park Slope","Boro Zone"
182,"Bronx","Parkchester","Boro Zone"
183,"Bronx","Pelham Bay","Boro Zone"
184,"Bronx","Pelham Bay Park","Boro Zone"
185,"Bronx","Pelham Parkway","Boro Zone"
186,"Manhattan","Penn Station/Madison Sq West","Yellow Zone"
187,"Staten Island","Port Richmond","Boro Zone"
188,"Brooklyn","Prospect-Lefferts Gardens","Boro Zone"
189,"Brooklyn","Prospect Heights","Boro Zone"
190,"Brooklyn","Prospect Park","Boro Zone"
191,"Queens","Queens Village","Boro Zone"
192,"Queens","Queensboro Hill","Boro Zone"
193,"Queens","Queensbridge/Ravenswood","Boro Zone"
194,"Manhattan","Randalls Island","Yellow Zone"
195,"Brooklyn","Red Hook","Boro Zone"
196,"Queens","Rego Park","Boro Zone"
197,"Queens","Richmond Hill","Boro Zone"
198,"Queens","Ridgewood","Boro Zone"
199,"Bronx","Rikers Island","Boro Zone"
200,"Bronx","Riverdale/North Riverdale/Fieldston","Boro Zone"
201,"Queens","Rockaway Park","Boro Zone"
202,"Manhattan","Roosevelt Island","Boro Zone"
203,"Queens","Rosedale","Boro Zone"
204,"Staten Island","Rossville/Woodrow","Boro Zone"
205,"Queens","Saint Albans","Boro Zone"
206,"Staten Island","Saint George/New Brighton","Boro Zone"
207,"Queens","Saint Michaels Cemetery/Woodside","Boro Zone"
208,"Bronx","Schuylerville/Edgewater Park","Boro Zone"
209,"Manhattan","Seaport","Yellow Zone"
210,"Brooklyn","Sheepshead Bay","Boro Zone"
211,"Manhattan","SoHo","Yellow Zone"
212,"Bronx","Soundview/Bruckner","Boro Zone"
213,"Bronx","Soundview/Castle Hill","Boro Zone"
214,"Staten Island","South Beach/Dongan Hills","Boro Zone"
215,"Queens","South Jamaica","Boro Zone"
216,"Queens","South Ozone Park","Boro Zone"
217,"Brooklyn","South Williamsburg","Boro Zone"
218,"Queens","Springfield Gardens North","Boro Zone"
219,"Queens","Springfield Gardens South","Boro Zone"
220,"Bronx","Spuyten Duyvil/Kingsbridge","Boro Zone"
221,"Staten Island","Stapleton","Boro Zone"
222,"Brooklyn","Starrett City","Boro Zone"
223,"Queens","Steinway","Boro Zone"
224,"Manhattan","Stuy Town/Peter Cooper Village","Yellow Zone"
225,"Brooklyn","Stuyvesant Heights","Boro Zone"
226,"Queens","Sunnyside","Boro Zone"
227,"Brooklyn","Sunset Park East","Boro Zone"
228,"Brooklyn","Sunset Park West","Boro Zone"
229,"Manhattan","Sutton Place/Turtle Bay North","Yellow Zone"
230,"Manhattan","Times Sq/Theatre District","Yellow Zone"
231,"Manhattan","TriBeCa/Civic Center","Yellow Zone"
232,"Manhattan","Two Bridges/Seward Park","Yellow Zone"
233,"Manhattan","UN/Turtle Bay South","Yellow Zone"
234,"Manhattan","Union Sq","Yellow Zone"
235,"Bronx","University Heights/Morris Heights","Boro Zone"
236,"Manhattan","Upper East Side North","Yellow Zone"
237,"Manhattan","Upper East Side South","Yellow Zone"
238,"Manhattan","Upper West Side North","Yellow Zone"
239,"Manhattan","Upper West Side South","Yellow Zone"
240,"Bronx","Van Cortlandt Park","Boro Zone"
241,"Bronx","Van Cortlandt Village","Boro Zone"
242,"Bronx","Van Nest/Morris Park","Boro Zone"
243,"Manhattan","Washington Heights North","Boro Zone"
244,"Manhattan","Washington Heights South","Boro Zone"
245,"Staten Island","West Brighton","Boro Zone"
246,"Manhattan","West Chelsea/Hudson Yards","Yellow Zone"
247,"Bronx","West Concourse","Boro Zone"
248,"Bronx","West Farms/Bronx River","Boro Zone"
249,"Manhattan","West Village","Yellow Zone"
250,"Bronx","Westchester Village/Unionport","Boro Zone"
251,"Staten Island","Westerleigh","Boro Zone"
252,"Queens","Whitestone","Boro Zone"
253,"Queens","Willets Point","Boro Zone"
254,"Bronx","Williamsbridge/Olinville","Boro Zone"
255,"Brooklyn","Williamsburg (North Side)","Boro Zone"
256,"Brooklyn","Williamsburg (South Side)","Boro Zone"
257,"Brooklyn","Windsor Terrace","Boro Zone"
258,"Queens","Woodhaven","Boro Zone"
259,"Bronx","Woodlawn/Wakefield","Boro Zone"
260,"Queens","Woodside","Boro Zone"
261,"Manhattan","World Trade Center","Yellow Zone"
262,"Manhattan","Yorkville East","Yellow Zone"
263,"Manhattan","Yorkville West","Yellow Zone"
264,"Unknown","NV","N/A"
265,"Unknown","NA","N/A"
1 locationid borough zone service_zone
2 1 EWR Newark Airport EWR
3 2 Queens Jamaica Bay Boro Zone
4 3 Bronx Allerton/Pelham Gardens Boro Zone
5 4 Manhattan Alphabet City Yellow Zone
6 5 Staten Island Arden Heights Boro Zone
7 6 Staten Island Arrochar/Fort Wadsworth Boro Zone
8 7 Queens Astoria Boro Zone
9 8 Queens Astoria Park Boro Zone
10 9 Queens Auburndale Boro Zone
11 10 Queens Baisley Park Boro Zone
12 11 Brooklyn Bath Beach Boro Zone
13 12 Manhattan Battery Park Yellow Zone
14 13 Manhattan Battery Park City Yellow Zone
15 14 Brooklyn Bay Ridge Boro Zone
16 15 Queens Bay Terrace/Fort Totten Boro Zone
17 16 Queens Bayside Boro Zone
18 17 Brooklyn Bedford Boro Zone
19 18 Bronx Bedford Park Boro Zone
20 19 Queens Bellerose Boro Zone
21 20 Bronx Belmont Boro Zone
22 21 Brooklyn Bensonhurst East Boro Zone
23 22 Brooklyn Bensonhurst West Boro Zone
24 23 Staten Island Bloomfield/Emerson Hill Boro Zone
25 24 Manhattan Bloomingdale Yellow Zone
26 25 Brooklyn Boerum Hill Boro Zone
27 26 Brooklyn Borough Park Boro Zone
28 27 Queens Breezy Point/Fort Tilden/Riis Beach Boro Zone
29 28 Queens Briarwood/Jamaica Hills Boro Zone
30 29 Brooklyn Brighton Beach Boro Zone
31 30 Queens Broad Channel Boro Zone
32 31 Bronx Bronx Park Boro Zone
33 32 Bronx Bronxdale Boro Zone
34 33 Brooklyn Brooklyn Heights Boro Zone
35 34 Brooklyn Brooklyn Navy Yard Boro Zone
36 35 Brooklyn Brownsville Boro Zone
37 36 Brooklyn Bushwick North Boro Zone
38 37 Brooklyn Bushwick South Boro Zone
39 38 Queens Cambria Heights Boro Zone
40 39 Brooklyn Canarsie Boro Zone
41 40 Brooklyn Carroll Gardens Boro Zone
42 41 Manhattan Central Harlem Boro Zone
43 42 Manhattan Central Harlem North Boro Zone
44 43 Manhattan Central Park Yellow Zone
45 44 Staten Island Charleston/Tottenville Boro Zone
46 45 Manhattan Chinatown Yellow Zone
47 46 Bronx City Island Boro Zone
48 47 Bronx Claremont/Bathgate Boro Zone
49 48 Manhattan Clinton East Yellow Zone
50 49 Brooklyn Clinton Hill Boro Zone
51 50 Manhattan Clinton West Yellow Zone
52 51 Bronx Co-Op City Boro Zone
53 52 Brooklyn Cobble Hill Boro Zone
54 53 Queens College Point Boro Zone
55 54 Brooklyn Columbia Street Boro Zone
56 55 Brooklyn Coney Island Boro Zone
57 56 Queens Corona Boro Zone
58 57 Queens Corona Boro Zone
59 58 Bronx Country Club Boro Zone
60 59 Bronx Crotona Park Boro Zone
61 60 Bronx Crotona Park East Boro Zone
62 61 Brooklyn Crown Heights North Boro Zone
63 62 Brooklyn Crown Heights South Boro Zone
64 63 Brooklyn Cypress Hills Boro Zone
65 64 Queens Douglaston Boro Zone
66 65 Brooklyn Downtown Brooklyn/MetroTech Boro Zone
67 66 Brooklyn DUMBO/Vinegar Hill Boro Zone
68 67 Brooklyn Dyker Heights Boro Zone
69 68 Manhattan East Chelsea Yellow Zone
70 69 Bronx East Concourse/Concourse Village Boro Zone
71 70 Queens East Elmhurst Boro Zone
72 71 Brooklyn East Flatbush/Farragut Boro Zone
73 72 Brooklyn East Flatbush/Remsen Village Boro Zone
74 73 Queens East Flushing Boro Zone
75 74 Manhattan East Harlem North Boro Zone
76 75 Manhattan East Harlem South Boro Zone
77 76 Brooklyn East New York Boro Zone
78 77 Brooklyn East New York/Pennsylvania Avenue Boro Zone
79 78 Bronx East Tremont Boro Zone
80 79 Manhattan East Village Yellow Zone
81 80 Brooklyn East Williamsburg Boro Zone
82 81 Bronx Eastchester Boro Zone
83 82 Queens Elmhurst Boro Zone
84 83 Queens Elmhurst/Maspeth Boro Zone
85 84 Staten Island Eltingville/Annadale/Prince's Bay Boro Zone
86 85 Brooklyn Erasmus Boro Zone
87 86 Queens Far Rockaway Boro Zone
88 87 Manhattan Financial District North Yellow Zone
89 88 Manhattan Financial District South Yellow Zone
90 89 Brooklyn Flatbush/Ditmas Park Boro Zone
91 90 Manhattan Flatiron Yellow Zone
92 91 Brooklyn Flatlands Boro Zone
93 92 Queens Flushing Boro Zone
94 93 Queens Flushing Meadows-Corona Park Boro Zone
95 94 Bronx Fordham South Boro Zone
96 95 Queens Forest Hills Boro Zone
97 96 Queens Forest Park/Highland Park Boro Zone
98 97 Brooklyn Fort Greene Boro Zone
99 98 Queens Fresh Meadows Boro Zone
100 99 Staten Island Freshkills Park Boro Zone
101 100 Manhattan Garment District Yellow Zone
102 101 Queens Glen Oaks Boro Zone
103 102 Queens Glendale Boro Zone
104 103 Manhattan Governor's Island/Ellis Island/Liberty Island Yellow Zone
105 104 Manhattan Governor's Island/Ellis Island/Liberty Island Yellow Zone
106 105 Manhattan Governor's Island/Ellis Island/Liberty Island Yellow Zone
107 106 Brooklyn Gowanus Boro Zone
108 107 Manhattan Gramercy Yellow Zone
109 108 Brooklyn Gravesend Boro Zone
110 109 Staten Island Great Kills Boro Zone
111 110 Staten Island Great Kills Park Boro Zone
112 111 Brooklyn Green-Wood Cemetery Boro Zone
113 112 Brooklyn Greenpoint Boro Zone
114 113 Manhattan Greenwich Village North Yellow Zone
115 114 Manhattan Greenwich Village South Yellow Zone
116 115 Staten Island Grymes Hill/Clifton Boro Zone
117 116 Manhattan Hamilton Heights Boro Zone
118 117 Queens Hammels/Arverne Boro Zone
119 118 Staten Island Heartland Village/Todt Hill Boro Zone
120 119 Bronx Highbridge Boro Zone
121 120 Manhattan Highbridge Park Boro Zone
122 121 Queens Hillcrest/Pomonok Boro Zone
123 122 Queens Hollis Boro Zone
124 123 Brooklyn Homecrest Boro Zone
125 124 Queens Howard Beach Boro Zone
126 125 Manhattan Hudson Sq Yellow Zone
127 126 Bronx Hunts Point Boro Zone
128 127 Manhattan Inwood Boro Zone
129 128 Manhattan Inwood Hill Park Boro Zone
130 129 Queens Jackson Heights Boro Zone
131 130 Queens Jamaica Boro Zone
132 131 Queens Jamaica Estates Boro Zone
133 132 Queens JFK Airport Airports
134 133 Brooklyn Kensington Boro Zone
135 134 Queens Kew Gardens Boro Zone
136 135 Queens Kew Gardens Hills Boro Zone
137 136 Bronx Kingsbridge Heights Boro Zone
138 137 Manhattan Kips Bay Yellow Zone
139 138 Queens LaGuardia Airport Airports
140 139 Queens Laurelton Boro Zone
141 140 Manhattan Lenox Hill East Yellow Zone
142 141 Manhattan Lenox Hill West Yellow Zone
143 142 Manhattan Lincoln Square East Yellow Zone
144 143 Manhattan Lincoln Square West Yellow Zone
145 144 Manhattan Little Italy/NoLiTa Yellow Zone
146 145 Queens Long Island City/Hunters Point Boro Zone
147 146 Queens Long Island City/Queens Plaza Boro Zone
148 147 Bronx Longwood Boro Zone
149 148 Manhattan Lower East Side Yellow Zone
150 149 Brooklyn Madison Boro Zone
151 150 Brooklyn Manhattan Beach Boro Zone
152 151 Manhattan Manhattan Valley Yellow Zone
153 152 Manhattan Manhattanville Boro Zone
154 153 Manhattan Marble Hill Boro Zone
155 154 Brooklyn Marine Park/Floyd Bennett Field Boro Zone
156 155 Brooklyn Marine Park/Mill Basin Boro Zone
157 156 Staten Island Mariners Harbor Boro Zone
158 157 Queens Maspeth Boro Zone
159 158 Manhattan Meatpacking/West Village West Yellow Zone
160 159 Bronx Melrose South Boro Zone
161 160 Queens Middle Village Boro Zone
162 161 Manhattan Midtown Center Yellow Zone
163 162 Manhattan Midtown East Yellow Zone
164 163 Manhattan Midtown North Yellow Zone
165 164 Manhattan Midtown South Yellow Zone
166 165 Brooklyn Midwood Boro Zone
167 166 Manhattan Morningside Heights Boro Zone
168 167 Bronx Morrisania/Melrose Boro Zone
169 168 Bronx Mott Haven/Port Morris Boro Zone
170 169 Bronx Mount Hope Boro Zone
171 170 Manhattan Murray Hill Yellow Zone
172 171 Queens Murray Hill-Queens Boro Zone
173 172 Staten Island New Dorp/Midland Beach Boro Zone
174 173 Queens North Corona Boro Zone
175 174 Bronx Norwood Boro Zone
176 175 Queens Oakland Gardens Boro Zone
177 176 Staten Island Oakwood Boro Zone
178 177 Brooklyn Ocean Hill Boro Zone
179 178 Brooklyn Ocean Parkway South Boro Zone
180 179 Queens Old Astoria Boro Zone
181 180 Queens Ozone Park Boro Zone
182 181 Brooklyn Park Slope Boro Zone
183 182 Bronx Parkchester Boro Zone
184 183 Bronx Pelham Bay Boro Zone
185 184 Bronx Pelham Bay Park Boro Zone
186 185 Bronx Pelham Parkway Boro Zone
187 186 Manhattan Penn Station/Madison Sq West Yellow Zone
188 187 Staten Island Port Richmond Boro Zone
189 188 Brooklyn Prospect-Lefferts Gardens Boro Zone
190 189 Brooklyn Prospect Heights Boro Zone
191 190 Brooklyn Prospect Park Boro Zone
192 191 Queens Queens Village Boro Zone
193 192 Queens Queensboro Hill Boro Zone
194 193 Queens Queensbridge/Ravenswood Boro Zone
195 194 Manhattan Randalls Island Yellow Zone
196 195 Brooklyn Red Hook Boro Zone
197 196 Queens Rego Park Boro Zone
198 197 Queens Richmond Hill Boro Zone
199 198 Queens Ridgewood Boro Zone
200 199 Bronx Rikers Island Boro Zone
201 200 Bronx Riverdale/North Riverdale/Fieldston Boro Zone
202 201 Queens Rockaway Park Boro Zone
203 202 Manhattan Roosevelt Island Boro Zone
204 203 Queens Rosedale Boro Zone
205 204 Staten Island Rossville/Woodrow Boro Zone
206 205 Queens Saint Albans Boro Zone
207 206 Staten Island Saint George/New Brighton Boro Zone
208 207 Queens Saint Michaels Cemetery/Woodside Boro Zone
209 208 Bronx Schuylerville/Edgewater Park Boro Zone
210 209 Manhattan Seaport Yellow Zone
211 210 Brooklyn Sheepshead Bay Boro Zone
212 211 Manhattan SoHo Yellow Zone
213 212 Bronx Soundview/Bruckner Boro Zone
214 213 Bronx Soundview/Castle Hill Boro Zone
215 214 Staten Island South Beach/Dongan Hills Boro Zone
216 215 Queens South Jamaica Boro Zone
217 216 Queens South Ozone Park Boro Zone
218 217 Brooklyn South Williamsburg Boro Zone
219 218 Queens Springfield Gardens North Boro Zone
220 219 Queens Springfield Gardens South Boro Zone
221 220 Bronx Spuyten Duyvil/Kingsbridge Boro Zone
222 221 Staten Island Stapleton Boro Zone
223 222 Brooklyn Starrett City Boro Zone
224 223 Queens Steinway Boro Zone
225 224 Manhattan Stuy Town/Peter Cooper Village Yellow Zone
226 225 Brooklyn Stuyvesant Heights Boro Zone
227 226 Queens Sunnyside Boro Zone
228 227 Brooklyn Sunset Park East Boro Zone
229 228 Brooklyn Sunset Park West Boro Zone
230 229 Manhattan Sutton Place/Turtle Bay North Yellow Zone
231 230 Manhattan Times Sq/Theatre District Yellow Zone
232 231 Manhattan TriBeCa/Civic Center Yellow Zone
233 232 Manhattan Two Bridges/Seward Park Yellow Zone
234 233 Manhattan UN/Turtle Bay South Yellow Zone
235 234 Manhattan Union Sq Yellow Zone
236 235 Bronx University Heights/Morris Heights Boro Zone
237 236 Manhattan Upper East Side North Yellow Zone
238 237 Manhattan Upper East Side South Yellow Zone
239 238 Manhattan Upper West Side North Yellow Zone
240 239 Manhattan Upper West Side South Yellow Zone
241 240 Bronx Van Cortlandt Park Boro Zone
242 241 Bronx Van Cortlandt Village Boro Zone
243 242 Bronx Van Nest/Morris Park Boro Zone
244 243 Manhattan Washington Heights North Boro Zone
245 244 Manhattan Washington Heights South Boro Zone
246 245 Staten Island West Brighton Boro Zone
247 246 Manhattan West Chelsea/Hudson Yards Yellow Zone
248 247 Bronx West Concourse Boro Zone
249 248 Bronx West Farms/Bronx River Boro Zone
250 249 Manhattan West Village Yellow Zone
251 250 Bronx Westchester Village/Unionport Boro Zone
252 251 Staten Island Westerleigh Boro Zone
253 252 Queens Whitestone Boro Zone
254 253 Queens Willets Point Boro Zone
255 254 Bronx Williamsbridge/Olinville Boro Zone
256 255 Brooklyn Williamsburg (North Side) Boro Zone
257 256 Brooklyn Williamsburg (South Side) Boro Zone
258 257 Brooklyn Windsor Terrace Boro Zone
259 258 Queens Woodhaven Boro Zone
260 259 Bronx Woodlawn/Wakefield Boro Zone
261 260 Queens Woodside Boro Zone
262 261 Manhattan World Trade Center Yellow Zone
263 262 Manhattan Yorkville East Yellow Zone
264 263 Manhattan Yorkville West Yellow Zone
265 264 Unknown NV N/A
266 265 Unknown NA N/A

View File

@ -1,12 +1,12 @@
## Week 5: Batch Processing
# Week 5: Batch Processing
### 5.1 Introduction
## 5.1 Introduction
* :movie_camera: 5.1.1 [Introduction to Batch Processing](https://youtu.be/dcHe5Fl3MF8?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
* :movie_camera: 5.1.2 [Introduction to Spark](https://youtu.be/FhaqbEOuQ8U?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### 5.2 Installation
## 5.2 Installation
Follow [these intructions](setup/) to install Spark:
@ -19,7 +19,7 @@ And follow [this](setup/pyspark.md) to run PySpark in Jupyter
* :movie_camera: 5.2.1 [(Optional) Installing Spark (Linux)](https://youtu.be/hqUbB9c8sKg?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### 5.3 Spark SQL and DataFrames
## 5.3 Spark SQL and DataFrames
* :movie_camera: 5.3.1 [First Look at Spark/PySpark](https://youtu.be/r_Sf6fCB40c?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
* :movie_camera: 5.3.2 [Spark Dataframes](https://youtu.be/ti3aC1m3rE8?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
@ -32,19 +32,19 @@ Script to prepare the Dataset [download_data.sh](code/download_data.sh)
* :movie_camera: 5.3.4 [SQL with Spark](https://www.youtube.com/watch?v=uAlp2VuZZPY&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### 5.4 Spark Internals
## 5.4 Spark Internals
* :movie_camera: 5.4.1 [Anatomy of a Spark Cluster](https://youtu.be/68CipcZt7ZA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
* :movie_camera: 5.4.2 [GroupBy in Spark](https://youtu.be/9qrDsY_2COo&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
* :movie_camera: 5.4.3 [Joins in Spark](https://youtu.be/lu7TrqAWuH4&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### 5.5 (Optional) Resilient Distributed Datasets
## 5.5 (Optional) Resilient Distributed Datasets
* :movie_camera: 5.5.1 [Operations on Spark RDDs](https://youtu.be/Bdu-xIrF3OM&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
* :movie_camera: 5.5.2 [Spark RDD mapPartition](https://youtu.be/k3uB2K99roI&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### 5.6 Running Spark in the Cloud
## 5.6 Running Spark in the Cloud
* :movie_camera: 5.6.1 [Connecting to Google Cloud Storage ](https://youtu.be/Yyz293hBVcQ&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
* :movie_camera: 5.6.2 [Creating a Local Spark Cluster](https://youtu.be/HXBwSlXo5IA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
@ -52,13 +52,13 @@ Script to prepare the Dataset [download_data.sh](code/download_data.sh)
* :movie_camera: 5.6.4 [Connecting Spark to Big Query](https://youtu.be/HIm2BOj8C0Q&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb)
### Homework
# Homework
* [Homework](../cohorts/2023/week_5_batch_processing/homework.md)
* [2024 Homework](../cohorts/2024)
## Community notes
# Community notes
Did you take notes? You can share them here.
@ -68,4 +68,5 @@ Did you take notes? You can share them here.
* [Alternative : Using docker-compose to launch spark by rafik](https://gist.github.com/rafik-rahoui/f98df941c4ccced9c46e9ccbdef63a03)
* [Marcos Torregrosa's blog (spanish)](https://www.n4gash.com/2023/data-engineering-zoomcamp-semana-5-batch-spark)
* [Notes by Victor Padilha](https://github.com/padilha/de-zoomcamp/tree/master/week5)
* [Notes by Oscar Garcia](https://github.com/ozkary/Data-Engineering-Bootcamp/tree/main/Step5-Batch-Processing)
* Add your notes here (above this line)

View File

@ -10,7 +10,7 @@ for other MacOS versions as well
Ensure Brew and Java installed in your system:
```bash
xcode-select install
xcode-select --install
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
brew install java
```

View File

@ -1,6 +1,6 @@
# Week 6: Stream Processing
## Code structure
# Code structure
* [Java examples](java)
* [Python examples](python)
* [KSQLD examples](ksqldb)
@ -74,13 +74,7 @@ Please follow the steps described under [pyspark-streaming](python/streams-examp
## Homework
[Form](https://forms.gle/rK7268U92mHJBpmW7)
The homework is mostly theoretical. In the last question you have to provide working code link, please keep in mind that this
question is not scored.
Deadline: 13 March 2023, 22:00 CET
* [2024 Homework](../cohorts/2024/)
## Community notes
@ -88,5 +82,6 @@ Did you take notes? You can share them here.
* [Notes by Alvaro Navas](https://github.com/ziritrion/dataeng-zoomcamp/blob/main/notes/6_streaming.md )
* [Marcos Torregrosa's blog (spanish)](https://www.n4gash.com/2023/data-engineering-zoomcamp-semana-6-stream-processing/)
* [Notes by Oscar Garcia](https://github.com/ozkary/Data-Engineering-Bootcamp/tree/main/Step6-Streaming)
* Add your notes here (above this line)

Some files were not shown because too many files have changed in this diff Show More