Predictor ML Model¶
You can train a custom Predictor ML model from your Gainly Dashboard and use it in the Predict endpoint.
What is a Predictor Model?¶
A Predictor model is a machine learning (ML) model that predicts an outcome based on patterns in your historical structured data.
Types of Outcomes¶
Predictor models can predict the following types of outcomes:
- Numeric: Predict a continuous number value (examples: price, temperature)
- Non-Numeric: Predict a boolean value (examples: yes/no, true/false) or a string value (examples: category, video ID)
Example Use Cases
- Forecast monthly sales for each product
- Predict customer lifetime value based on early interactions
- Estimate how long a project will take based on task characteristics
- Predict optimal pricing based on market conditions
- Predict if a customer is likely to churn based on their activity patterns
- Determine if a transaction is potentially fraudulent
- Forecast subscription renewal based on engagement metrics
- Predict equipment failure based on performance metrics
- Categorize financial transactions based on attributes
- Classify customer segments based on behavioral metrics
- Determine product quality grades from manufacturing parameters
- Categorize risk levels based on financial indicators
- Recommend products based on user history
- Recommend personalized content based on user behavior
- Recommend new features for a user to try
- Recommend optimal subscription tiers based on usage metrics
- Predict the most effective marketing channel for each customer segment
- Determine optimal inventory levels based on seasonal patterns
- Suggest the best shipping method based on order details
- Forecast resource allocation needs based on project attributes
Dataset for Training¶
To train a Predictor ML model, we recommend at least 1000 samples of your historical structured data to train on. These samples must be:
- High quality
- Balanced
- Representative of your domain
- Consistent
Dataset for Numeric vs. Non-Numeric Outcomes
Numeric outcomes:
- We recommend at least 1000 samples of your historical structured data to train on.
Non-Numeric outcomes:
- We recommend at least 500 samples per class of your historical structured data to train on.
- A class refers to a unique value in the
label
column (see below for more details on thelabel
column).
Structured Data¶
In the context of training a Predictor model, structured data refers to tabular data that is organized with columns and rows and contains the following data types:
Data Type | Format | Example Column Name | Example Values |
---|---|---|---|
String | Text strings (1-100 chars) | customer_tier video_id |
"premium" "video_192864" |
Numeric | Integer or decimal | order_value |
29.99 , 250 |
Boolean | Binary (1 or 0) | is_verified |
1 , 0 |
The dataset file must be in CSV format with the following column requirements.
Required Columns¶
Column | Required | Description | Constraints |
---|---|---|---|
label |
Yes | Value (outcome) that the model will learn to predict | • Any supported data type (see above) • At least 2 unique values (outcomes) |
Date and Time¶
- Date - Split into year, month, and date columns and use
numeric
values in each column - Time - Split into hour, minute, and second columns and use
numeric
values in each column
Date Example
Convert transaction_date
column containing values such as 2024-12-25
into 3 separate columns:
transaction_date_year
:2024
transaction_date_month
:12
transaction_date_day
:25
Long-Form Text¶
Please note that long-form text is not supported in Predictor models. If your use case requires long-from text columns in addition to structured data columns, please refer to the Mixed Data Types page.
Additional Requirements¶
- CSV file must be UTF-8 encoded
- CSV file must have header row (first row) with column names
- Column names:
- Must be unique
- Must only contain lowercase letters, numbers, and underscores (
a-z
,0-9
,_
)
- CSV file must include 1-1000 additional columns (excluding the
label
column). These columns provide the data that the model uses to learn patterns and make predictions about thelabel
column. - CSV file must contain (excluding the header row):
- At least 100 rows of data
- No more than 1,000,000 rows of data
- CSV file size must not exceed 200MB
Steps for Training¶
- Log in to your Gainly Dashboard.
- Go to Settings > Custom Models.
- Click the Create Model button.
- Select Predictor as the Model Type.
- Follow the on-screen instructions to train your model.