not just you. the 80/20 split you mentioned is real and if anything 80 is being generous. the dirty secret is that most AI demos look clean because someone spent 3 days cleaning the data before recording. nobody shows that part. what makes it worse is data cleaning is also the hardest part to hand off to juniors because bad cleaning decisions compound. a wrong assumption made at the cleaning stage will silently poison everything downstream and you won't catch it until the model behaves weirdly in prod.
