chapter 2 : Data cleaning

Missing Values

Missing values occur when some data entries are empty or not recorded for one or more fields in a dataset. This means information that should be there is absent, which can affect the accuracy of your analysis.

Example:

Name

Age

Email

John

28

john@example.com

Sara

sara@example.com

Ali

35

β€” (blank)

Here, Sara’s Age and Ali’s Email are missing values.


Why Missing Values Matter

Missing values can distort descriptive statistics (like mean and median), bias your models, or lead to incorrect conclusions if not handled properly.


How to Handle Missing Values (Best Practices)

There are several ways to treat missing values depending on context:

βœ” Remove Rows/Columns with Missing Data – If only a few values are missing and the dataset is large, removing them may be simplest.
βœ” Impute with Mean/Median/Mode – Replace missing numbers with the average (or median if skewed), and replace missing categories with the most common category.
βœ” Predict Values Using Models – Advanced techniques use other columns to estimate missing values through regression or machine learning.