# Predict missing values (Tutorial)

::: {note}

This is the *Predict missing values* tutorial. The *Predict missing values*
documentation is available [here](task_autocomplete_missing_values).

:::

With **Simple ML for Sheets**, also referred to as **Simple ML**, everyone can
use Machine Learning (ML) in Google Sheets without knowing ML, without coding,
and without sharing data with third parties.

This tutorial takes you through the steps of using Simple ML for Sheets to
predict missing values in your data in a spreadsheet.

## Prerequisites

This tutorial is easier to follow if you are familiar with Google Sheets. For
example, make sure you know the basics like how to select columns or set a cell
value.

## What you will learn

*   How to install Simple ML for Sheets.
*   How to use Simple ML to predict missing values in your data.

## Install Simple ML for Sheets

First, install Simple ML for Sheets.

1.  Go to
    [Simple ML for Sheets in Google's Marketplace](https://workspace.google.com/marketplace/app/simple_ml_for_sheets/685936641092)
    and click the **Install** button.

    ![Screenshot of installing the Simple ML for Sheets addon](image/tutorial/install.png)

2.  Validate the permission request. Simple ML stores your machine learning
    models in your Google Drive in the `simple_ml_for_sheets` directory.

3.  Make a copy and open our
    [spreadsheet with example data](https://docs.google.com/spreadsheets/d/1Ve7umK4W6wBje1QZQVbPIH3djCl2uwfwXvezGliCpss/copy).

4.  In the Google Sheets, open *Simple ML for Sheets* in the **Extensions**
    menu. If it is not visible, wait a few seconds and refresh the page. The
    addon can take up to a minute to appear after being installed.

    ![Screenshot of opening the Simple ML for Sheets extension in your
    spreadsheet](image/tutorial/open_extension.png)

## Predict missing values

If you have not done it already, make a copy of the
[tutorial sheet](https://docs.google.com/spreadsheets/d/1Ve7umK4W6wBje1QZQVbPIH3djCl2uwfwXvezGliCpss/copy).

Open the first tab called "Case #1: Predict missing value".

![The tab of the task 1](image/tutorial/open_task_1.png)

### Scenario

You are studying a colony of penguins in Antarctica. This colony is composed of
three penguin species: Chinstrap, Gentoo, and Adelie.

![Image of three penguin types](image/tutorial/penguins.png)

*Image by Allison Horst*

You are studying the differences between these species. Your colleague has
collected physiological measurements (such as size and weight) of approximately
300 penguins. However, they got distracted in the process, and they forgot to
note the species of 30 penguins.

You want to use Simple ML to recover the species of those 30 penguins.

1.  Scroll up and down through the data.

    This sheet contains data about penguins. Each row represents a penguin while
    each column represents a physiological aspect of the penguin.

    The last column called `species` represents the species of the animals. The
    species is missing for the first 30 rows.

2.  Open Simple ML. In the menu, click on **Extensions > Simple ML for Sheets >
    Start**.

    ![Screenshot of opening the Simple ML for Sheets extension in your
    spreadsheet](image/tutorial/open_extension.png)

3.  Wait for the Simple ML side panel to appear (It might take a few seconds).

    ![Screenshot of opening the Simple ML for Sheets extension in your
    spreadsheet](image/tutorial/side_bar_opens.png)

### Predict Missing values

Once Simple ML is loaded, you can use it to predict missing values.

1.  First, notice that some rows are missing values in Column H, `species`.

2.  In the Simple ML for Sheets side panel, make sure to click on
    **Predict missing values**. If you don't see the main menu, click on the
    arrow in the upper left of the sidebar.

    ![Image of the Simple ML for Sheets main menu](image/tutorial/main_menu.png)

3.  In the "Column with empty cells" section, select the column `species`. This
    is the column that contains the missing values you want to predict.

    ![Image of the field in the sidebar for selecting the column whose value is
    to be predicted](image/tutorial/column_with_empty_cells_field.png)

4.  Click the **Predict** button.

    ![Image of the Predict button](image/tutorial/predict_button.png)

After a few seconds, two new columns appear:

*   **Pred:species** are the predicted values for the **species** column.

*   **Pred:Conf.species** are the confidences of the predicted value. In other
    words, this column indicates how confident the model is about the
    predictions in the column **Pred:species**. Confidence is a percentage
    between 0% and 100%, where 100% means that the model is sure of its
    prediction.

![Screenshot of the spreadsheet with new columns for the missing values and
confidence in the predicted
values](image/tutorial/new_columns_for_missing_values.png)

### Some questions for you

Look at the predicted values of species. Do they look reasonable to you?

Feel free to experiment with predicting missing values in other columns.

:::{admonition} For the ML experts

Behind the scenes, the task **Predict missing value** collects rows with
`species` values and uses those rows to train a model. This model is then
applied to the rows with missing `species` values. The model is available in the
**Manage models** task and in the folder `simple_ml_for_sheets` of your Google
Drive.

:::
