Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a fine line between cleaning data and quantizing data. Spelling mistakes, format error correction, consistent formatting are all "janitorial".

However categorizing data into correct groups, removing what you consider is "non-essential", or simply rounding off decimal numbers all can have impact on the analysis down-stream.

In some ways data science is pretty much all about learning best practices for quantizing data.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: