Skip to content

Predictor ML Model

You can train a custom Predictor ML model from your Gainly Dashboard and use it in the Predict endpoint.

What is a Predictor Model?

A Predictor model is a machine learning (ML) model that predicts an outcome based on patterns in your historical structured data.

Types of Outcomes

Predictor models can predict the following types of outcomes:

  • Numeric: Predict a continuous number value (examples: price, temperature)
  • Non-Numeric: Predict a boolean value (examples: yes/no, true/false) or a string value (examples: category, video ID)
Example Use Cases
  • Forecast monthly sales for each product
  • Predict customer lifetime value based on early interactions
  • Estimate how long a project will take based on task characteristics
  • Predict optimal pricing based on market conditions
  • Predict if a customer is likely to churn based on their activity patterns
  • Determine if a transaction is potentially fraudulent
  • Forecast subscription renewal based on engagement metrics
  • Predict equipment failure based on performance metrics
  • Categorize financial transactions based on attributes
  • Classify customer segments based on behavioral metrics
  • Determine product quality grades from manufacturing parameters
  • Categorize risk levels based on financial indicators
  • Recommend products based on user history
  • Recommend personalized content based on user behavior
  • Recommend new features for a user to try
  • Recommend optimal subscription tiers based on usage metrics
  • Predict the most effective marketing channel for each customer segment
  • Determine optimal inventory levels based on seasonal patterns
  • Suggest the best shipping method based on order details
  • Forecast resource allocation needs based on project attributes

Dataset for Training

To train a Predictor ML model, we recommend at least 1000 samples of your historical structured data to train on. These samples must be:

  • High quality
  • Balanced
  • Representative of your domain
  • Consistent
Dataset for Numeric vs. Non-Numeric Outcomes

Numeric outcomes:

  • We recommend at least 1000 samples of your historical structured data to train on.

Non-Numeric outcomes:

  • We recommend at least 500 samples per class of your historical structured data to train on.
  • A class refers to a unique value in the label column (see below for more details on the label column).

Structured Data

In the context of training a Predictor model, structured data refers to tabular data that is organized with columns and rows and contains the following data types:

Data Type Format Example Column Name Example Values
String Text strings (1-100 chars) customer_tier
video_id
"premium"
"video_192864"
Numeric Integer or decimal order_value 29.99, 250
Boolean Binary (1 or 0) is_verified 1, 0

The dataset file must be in CSV format with the following column requirements.

Required Columns

Column Required Description Constraints
label Yes Value (outcome) that the model will learn to predict • Any supported data type (see above)
• At least 2 unique values (outcomes)

Date and Time

  • Date - Split into year, month, and date columns and use numeric values in each column
  • Time - Split into hour, minute, and second columns and use numeric values in each column
Date Example

Convert transaction_date column containing values such as 2024-12-25 into 3 separate columns:

  • transaction_date_year: 2024
  • transaction_date_month: 12
  • transaction_date_day: 25

Long-Form Text

Please note that long-form text is not supported in Predictor models. If your use case requires long-from text columns in addition to structured data columns, please refer to the Mixed Data Types page.

Additional Requirements

  • CSV file must be UTF-8 encoded
  • CSV file must have header row (first row) with column names
  • Column names:
    • Must be unique
    • Must only contain lowercase letters, numbers, and underscores (a-z, 0-9, _)
  • CSV file must include 1-1000 additional columns (excluding the label column). These columns provide the data that the model uses to learn patterns and make predictions about the label column.
  • CSV file must contain (excluding the header row):
    • At least 100 rows of data
    • No more than 1,000,000 rows of data
  • CSV file size must not exceed 200MB

Steps for Training

  1. Log in to your Gainly Dashboard.
  2. Go to Settings > Custom Models.
  3. Click the Create Model button.
  4. Select Predictor as the Model Type.
  5. Follow the on-screen instructions to train your model.