Comment by Chinedu Ethelbert on "Mastering Data Science: Beyond the Basics - A Comprehensive Guide to ETL Pipelines (PART 2)"

What a lovely stuff you've done. Well done!! I tried it and was able to download the genres_v2.csv.zip.

Please, do you mind checking the create_tables.py. it appears it's not creating the table in the spotify_genre database. The data cleaning is happening but loading it to the database is not happening, it returns an error, cursor.execute(statement, parameters) sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'dict' [SQL: INSERT INTO genre_raw_all (danceability, energy, key, loudness, mode, acousticness, instrumentalness, liveness, valence, tempo, type, id, uri, track_href, analysis_url, duration_ms, time_signature, genre) VALUES (%(danceability__0)s, %(energy__0)s, % ... 13612 characters truncated ... (track_href__42)s, %(analysis_url__42)s, %(duration_ms__42)s, %(time_signature__42)s, %(genre__42)s)]....

Also, the data prints to the screen (in the command line) during the loading process, which is not a good practice, due to security and stuff.

Furthermore, the architecture is not as specified in the blog. this is because the raw/data folder is not part of it. After all, the codes create it when executed. The code won't run when the raw/data is manually created as you suggested. The folder, pipeline_architecture is redundant. I don't see what it's doing.

Lastly, it's good to point your users where to get the sample data you used. I got mine at Kaggle (wanted to add the link but couldn't if not, I won't be able to post this comment.)

Thank you! 👏 👏

Search Hashnode