It's a fine line between cleaning data and quantizing data. Spelling mistakes, format error correction, consistent formatting are all "janitorial".
However categorizing data into correct groups, removing what you consider is "non-essential", or simply rounding off decimal numbers all can have impact on the analysis down-stream.
In some ways data science is pretty much all about learning best practices for quantizing data.
However categorizing data into correct groups, removing what you consider is "non-essential", or simply rounding off decimal numbers all can have impact on the analysis down-stream.
In some ways data science is pretty much all about learning best practices for quantizing data.