Prior
to performing data analysis, it is necessary to cleanse data.
This step is crucial and can be the key to success or failure of the
data analysis process. It involves intelligent approach as different
data may require different techniques of preprocessing. Data need to be
prepared in such a way that they reflect the real processes and changes.
Typically, data cleansing includes outliers and incomplete records
detection. Outliers are the values that are significantly different
from typical values. For instance, the value of parameter describing
person’s height might be equal to 3 meters. The incomplete
record might
be result of faults in data acquisition systems. For instance,
temperature sensor might broke down and stop collecting measurements.
Once incorrect or incomplete data are discovered they should be removed
or repaired.







