Assessing Data Quality
One of the biggest difficulties with ML is dealing with messy data. This is a common and reoccurring problem.
CleanLab
CleanLab is a product that attempts to use statistical methods to clean up data and labels. I need to read more about exactly how it works.
They have some tutorials on how to use their system to clean up text for processing here