Skip to main content

Assessing Data Quality

One of the biggest difficulties with ML is dealing with messy data. This is a common and reoccurring problem.

CleanLab

CleanLab is a product that attempts to use statistical methods to clean up data and labels. I need to read more about exactly how it works.

They have some tutorials on how to use their system to clean up text for processing here