I have a very strange problem. I wrote a spark streaming job that monitor an HDFS directory, reads the newly added files, and send the file content to Kafka.
When submitting the job I got that error
ImportError: cannot import name KafkaProducer
While the error is very simple, it is weird because I could import KafkaProducer using python and pyspark shells without any problem.
I tried to reboot the machine but the situation still the same.
You can check the code of the spark streaming job from here
Does anyone here have any idea about this problem?
Finally I could solve the issue but the solution was weird and I have no idea what was the problem.
Simply I run this command cat old_script.py > new_script.py then I submitted the job using the new script and everything worked fine.
This is the second time I face this issue with python scripts. I have no explanation to it and I hope anyone could explain it.
Kevin
i write code
Which line are you getting the error on? If it's line 6, I would say that kafka isn't installed correctly on your machine.
If it's line 12, try:
producer = KafkaProducer(bootstrap_servers="Broker_list")The namespace is unnecessary as you've already directly imported it. If you want to keep the namespace, change line 6 to:
import kafka