Running PySpark Streaming
Prerequisite
Ensure your Kafka and Spark services up and running by following the docker setup readme. It is important to create network and volume as described in the document. Therefore please ensure, your volume and network are created correctly
docker volume ls # should list hadoop-distributed-file-system
docker network ls # should list kafka-spark-network
Running Producer and Consumer
# Run producer
python3 producer.py
# Run consumer with default settings
python3 consumer.py
# Run consumer for specific topic
python3 consumer.py --topic <topic-name>
Running Streaming Script
spark-submit script ensures installation of necessary jars before running the streaming.py
./spark-submit.sh streaming.py