linnealovespie/data-engineering-zoomcamp

Files

IremErturk cbe18f2f04 Refactor python streaming examples (#337 )

* Refactor tutorial examples

* Updates readmes for week6-streaming

* Adds homework for 2023 week6-streaming

* Fix merge conflicts on README updates

2023-03-07 21:40:14 +01:00

kafka

Initiate PySpark streaming and refactor existing python-kafka examples (#325 )

2023-02-24 11:43:38 +01:00

spark

Initiate PySpark streaming and refactor existing python-kafka examples (#325 )

2023-02-24 11:43:38 +01:00

docker-compose.yml

Initiate PySpark streaming and refactor existing python-kafka examples (#325 )

2023-02-24 11:43:38 +01:00

README.md

Refactor python streaming examples (#337 )

2023-03-07 21:40:14 +01:00

README.md

Running Spark and Kafka Clusters on Docker

1. Build Required Images for running Spark

The details of how to spark-images are build in different layers can be created can be read through the blog post written by André Perez on Medium blog -Towards Data Science

# Build Spark Images
./build.sh

2. Create Docker Network & Volume

# Create Network
docker network  create kafka-spark-network

# Create Volume
docker volume create --name=hadoop-distributed-file-system

3. Run Services on Docker

# Start Docker-Compose (within for kafka and spark folders)
docker compose up -d

In depth explanation of Kafka Listeners

Explanation of Kafka Listeners

4. Stop Services on Docker

# Stop Docker-Compose (within for kafka and spark folders)
docker compose down

5. Helpful Comands

# Delete all Containers
docker rm -f $(docker ps -a -q)

# Delete all volumes
docker volume rm $(docker volume ls -q)