Basic requirements:
For example, “Google, Apple, Facebook” are 3 separate values but are treated as a single value.
Personally Identifiable Information (PII) columns (e.g. phone, email, address, etc.) are not required.
You can create a practical data set by categorising long sentences with separate values.
Intermediate requirements
The requirements in the following items are the actions that should be taken for missing data to make your model more efficient.
- Missing Data
Advanced requirements
Technical knowledge is recommended for advanced requirements.
- Data Enrichment
Create new columns:
The quality of a dataset is often enhanced by deriving new columns from existing columns or by correlating different datasets.
For example, deriving age from date of birth, duration from start and end dates of customer subscription or employment period, etc.
Once new columns are created, unnecessary columns should not be considered for training the data as they are unnecessary information
Additional columns should be created from comma separated values. Columns with values in comma separated format are treated as one long piece of text instead of different values
For example, “Google, Apple, Facebook” are 3 separate values but treated as a single value
Separate columns can be created for each value and filled with 0/1 depending on their existence for a particular row
Leave A Comment