Awesome piece of code, thanks for sharing! I wonder if it would be possible to hook up the spark-xml library (databricks) in this notebook to be able to read in xml files as well. For my normal pyspark notebook I am already using that based on a guide on microsoft learn. (I am unable to share the link because I am a new user here) Basically, I had to create a new environment for my workspace, upload the spark-xml_2.12-0.18.0.jar library file to it as a custom library and then configure it in my pyspark notebook with %%configure -f {"conf": {"spark.jars.packages": "com.databricks:spark-xml_2.12:0.18.0"}} It was a longshot, but I did try to modify your code by adding the line: .config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.18.0") to the "builder" section, but of course it did not really have any effect. Unfortunately, I am not a fabric guru myself, so I definitely need more help. :-)