End to End Machine Learning with AI First Colab

By Google for Developers

Share:

Key Concepts

  • AI First Collab: A Jupyter notebook hosted on Google Cloud, powered by Gemini, designed to simplify and accelerate machine learning workflows.
  • Gemini: An AI companion and coding agent integrated into AI First Collab, capable of understanding code and data state to assist in model building.
  • End-to-End Machine Learning Journey: The complete process from data ingestion to making predictions.
  • Natural Language Prompting: Using plain English to instruct the AI agent to perform complex tasks.
  • Host Country Effect: An influence on medal counts due to a country hosting the Olympic Games.
  • Holdout Test Set: A portion of data reserved for evaluating model performance on unseen data.
  • Mean Absolute Error (MAE): A metric used to measure the average magnitude of errors in a set of predictions.
  • Iterative and Collaborative Experience: Working with the AI agent in a back-and-forth manner to refine results.

AI First Collab: Simplifying Machine Learning Workflows

AI First Collab is presented as a solution to the often tedious task of writing specific syntax for various data manipulation and machine learning libraries. It is a Jupyter notebook environment hosted on Google Cloud, featuring an "authentic collaborator" powered by Gemini. This AI companion is designed to help users tackle complex problems more efficiently.

Core Features and Functionality

  • Browser-Based and Minimal Setup: AI First Collab runs directly in the browser, requiring little to no initial configuration.
  • Collaboration and Sharing: Notebooks can be easily shared with others via Google Drive, fostering collaborative development.
  • Free Computing Resources: Users can leverage powerful computing resources such as GPUs and TPUs at no cost.
  • Gemini's AI Companion: This coding agent operates across the entire notebook, understanding the code and the current state of the data at each step. This enables an iterative and collaborative workflow where the user works with the agent.

The Machine Learning Workflow in AI First Collab

A typical machine learning journey, as outlined in the video, involves several stages:

  1. Data Ingestion and Preparation: Loading and cleaning raw data.
  2. Exploratory Analysis and Feature Engineering: Understanding the data and creating relevant features.
  3. Model Training: Building and training one or more machine learning models.
  4. Model Evaluation: Assessing the performance of the trained models.
  5. Prediction: Using the trained model to make future predictions.

The key advantage highlighted is that AI First Collab allows this entire workflow to be performed autonomously using natural language prompts.

Real-World Example: Predicting Olympic Medal Counts

To demonstrate the capabilities of AI First Collab, a demo was conducted using data from past Olympic Games results, sourced from Kaggle. The objective was to predict the total medal count (gold, silver, bronze) for each country in the 2026 Winter Olympics.

Prompt and Initial Setup

The process began with an empty Collab notebook. The user initiated interaction with Gemini by clicking the "What can I help you build?" option in the bottom toolbar. The first step involved uploading two CSV files containing Olympic results and host country information.

A detailed prompt was then provided to Gemini:

"Create a model to predict total medal counts for each country in the 2026 Winter Olympics. Use only data from Winter Olympics since 1992. Use each country's medal counts from the previous couple metal winter Olympics to make predictions. Make sure to account for host country effects. Evaluate on a holdout test set of the most recent Winter Olympics. Make 2026 predictions for all countries that participated in 2022. And finally, there are 116 medal events in 2026. So, make sure the country medal predictions add up to three times that number."

This prompt was described as the type of instruction a domain expert would give to a data scientist.

Gemini's Plan and Execution

Gemini responded with a proposed plan, which was displayed in a side panel for easy review. The plan included steps such as:

  • Loading the data.
  • Filtering and preparation.
  • Creating features, including accounting for the host country effect.
  • Training the model.
  • Evaluating the model.
  • Making predictions for 2026, including the requested adjustment.
  • Finishing the process.

Upon accepting the plan, the "auto run" feature was enabled, allowing Gemini to execute all steps autonomously. A checklist of tasks was displayed as they were executed within the notebook.

Step-by-Step Execution and Technical Details

  1. Data Loading: The two CSV files were loaded into memory.
  2. Data Preparation:
    • Data was filtered to include only Winter Olympics since 1992.
    • Host countries were identified.
    • Total medals were aggregated by country for each games.
    • The presenter noted relief from having to recall specific pandas syntax for merging, filtering, and aggregation.
  3. Feature Engineering:
    • Gemini implemented logic to extract medal counts from the previous two Olympic games, creating features named previous_metal_count and previous_previous_metal_count. These were identified as key predictors.
    • An is_host field was added to account for the host country effect, with Gemini correctly identifying relevant countries.
  4. Model Training:
    • Library Used: scikit-learn was employed.
    • Model Type: Linear Regression was chosen, deemed appropriate for a continuous outcome variable (medal count) and a limited number of predictors.
    • Data Splitting: The 2022 Olympics data was held out from training for evaluation purposes.
  5. Model Evaluation:
    • Evaluation Set: The holdout test set (2022 Olympics) was used.
    • Metric: Mean Absolute Error (MAE) was calculated.
    • Result: The MAE was approximately 1.6 medals per country, considered "not bad" for a simple model without explicit consideration of individual sports or athletes.
  6. Prediction Generation:
    • Data Setup: Country-level data, including previous medal counts, was prepared for prediction.
    • Initial Predictions: Initial medal predictions were generated directly from the model.
    • Adjustment: Predictions were adjusted to ensure the total predicted medal count summed up to 348 (three times the 116 medal events), as per the initial prompt.

Visualizing the Results

After the plan execution, the presenter requested an interactive plot of the projected 2026 Olympics medal table. Gemini generated the code for this plot using the Altair library. Upon running the code, an interactive plot was displayed.

  • Key Findings from the Plot:
    • Hovering over the bars revealed Norway as projected to have the highest medal count at 31.
    • Other top projected countries included Germany, Canada, the US, and Italy (the host country for 2026).

Conclusion of the Demo

The demo concluded with the presenter highlighting that they started with just a dataset and a high-level idea and, within minutes, obtained a full predicted medal table without writing any code themselves.

User Control and Collaboration

It was emphasized that users have full access to the generated code and output. This allows for checking, modification, and building upon the AI's work. The most effective outcomes are expected from a collaborative effort between the user and the AI agent.

Getting Started with AI First Collab

To begin using AI First Collab:

  1. Open any new or existing notebook in Collab.
  2. Locate the Gemini Spark icon in the bottom toolbar.
  3. Start by asking questions, giving commands, or selecting from suggested prompts.

The presenter expressed anticipation for how AI First Collab will transform machine learning workflows.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "End to End Machine Learning with AI First Colab". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video