上QQ阅读APP看书,第一时间看更新
Data processing and wrangling
We have performed some statistical analyses on the data. Now what? Most of the time, the data that is collected from several data sources is present in its raw form, which cannot be fed to an ML model, hence the need for further data processing.
But you might ask, why not collect the data in a way so that it gets retrieved with all of the necessary processing done? This is typically not a good practice as it breaks the modularity of the workflow.
This is why to make the data consumable in the later steps in the workflow, we need to clean, transform, and persist it. This includes several things such as data normalization, data standardization, missing value imputation, encoding from one value to another, and outlier treatment. All of these are collectively named data wrangling.