Predict missing values#

The Predict missing values task predicts the values of empty cells using all the cells in the sheet. See case #1 of the tutorial for a complete example.

This task is suitable for recovering data but also for predicting valuable and hard-to-obtain information. For example, if a column contains how much each of your clients spends on your product, you can use this task to predict how much any potential new clients will spend (before they do so). This way, you can better target those large future clients.

Use this task as follow:

  1. Make sure you have a column with some missing values. Not all values should be missing. The more non-missing values you have, the better will be the prediction of the missing values. There should also be at least one other column in the sheet. See the formatting page to make sure your sheet is well formatted.

  2. In the “target column” section, select the column with missing values. Note that other columns can also contain missing values.

  3. (Optional, advanced) Remove some source columns. In most cases, leaving all the source columns will work best.

  4. (Optional, advanced) Change the learning algorithm. Gradient Boosted Trees and Random Forests are both excellent on tabular data. The decision tree algorithm is more interpretable.

  5. Click the “Predict” button.

A certain number of new columns will be created:

  • “pred:[target column]” is the predicted value for the target column.

  • “pred:Conf.[target column]” is the confidence (between 0% and 100%) of the prediction. The confidence is only available for classification objectives.

If you change values in the sheet, or if you add new rows or columns, press “Predict” again to update the predicted value.

How are missing values predicted?#

To predict missing values, Simple ML trains a machine learning model using the examples (rows) that don’t have missing values in the target column. This model is then used to make prediction for the missing values.