Filter
You can use the filter
parameter in your API requests to specify filter criteria for document attributes, and find matching documents.
Gainly API supports complex filter criteria, including AND, OR, NOT logic, as well as nested criteria.
The following API requests support the filter
parameter:
Filter vs. Search
Search involves performing a lexical or semantic (AI-Semantic) text search within documents, specifically targeting the content
and title
fields to match the value provided in the query
parameter.
Filter applies filter criteria (specified via the filter
parameter) to document attributes without conducting a text search.
Specifying both query
and filter
parameters in an API request (that supports both parameters) will apply the filter criteria to narrow down the document set, then perform a text search within those documents.
Use Filter Documents endpoint to apply filter criteria without performing a text search. This is useful for cases - like finding stores within 5 miles of the user - which only require querying document attributes.
Filter | Search |
---|---|
Filter answers the question "Which documents match my attribute criteria?" | Search answers the question "How well does the text in each document match my search terms?" |
To find exact matches in document attributes. | To search text fields (title and content ) and sort by relevance. |
Syntax¶
"filter": {
"must": [ // optional, 1 or more clauses
{
CLAUSE
},
// additional clauses
],
"must_not": [ // optional, 1 or more clauses
{
CLAUSE
},
// additional clauses
],
"should": [ // optional, 1 or more clauses
{
CLAUSE
},
// additional clauses
],
"minimum_should_match": 1 // optional
}
The top-level keys are:
must
¶
Only documents that match all clauses under must
will be included. You can think of it as the AND operator.
must_not
¶
Only documents that match none of the clauses under must_not
will be included. You can think of it as the NOT operator.
should
¶
Documents that match at least N clauses under should
will be included, where N is specified by minimum_should_match
. When N = 1
, you can think of this as the OR operator.
minimum_should_match
¶
Specifies the minimum number of should
clauses that must be matched. If you specify must
criteria, the default for minimum_should_match
is 0
; otherwise, it defaults to 1
.
This can be specified using:
Type | Example |
---|---|
integer | "minimum_should_match": 1 |
percentage | "minimum_should_match": "50%" |
Example¶
"filter": {
"must": [
{
"term": {
"metadata.length": 52
}
},
{
"term": {
"metadata.is_sale": true
}
},
{
"range": {
"metadata.price": {
"gte": 50,
"lte": 100
}
}
}
],
"must_not": [
{
"term": {
"metadata.status": "discontinued"
}
}
],
"should": [
{
"term": {
"metadata.color": "blue"
}
},
{
"term": {
"metadata.color": "red"
}
}
],
"minimum_should_match": 1
}
Supported fields for filter criteria¶
The following fields (document attributes) can be used to specify the filter criteria:
metadata
subfields. Specify using dot notation, for example:metadata.color
metadata.sale.sale_start_date
created_at
language
source_uri
tenant_id
updated_at
Clauses¶
Clauses are used within the top-level keys must
, must_not
, and should
to define your filter criteria.
Term¶
Use the term
clause to match for an exact value, in a field of any type.
Case-insensitive matches¶
If you'd like to perform a case-insensitive match against a keyword
(string) field, you can use the extended syntax for term
:
Matching for capitalization is decided by the case_insensitive
parameter - default is false
.
Array fields¶
Matching against array fields is a special case for the term
clause.
Suppose metadata.color
is an array of strings. The following clause will match documents that have blue as one of the elements in the metadata.color
array.
For example, both of the following documents will match the query shown above.
Terms¶
Use the terms
clause to match for multiple values in a field.
A document is included in the search if it matches any of the values in the array, with the correct capitalization.
For example, Document-1 and Document-2 will match the query above. However, Document-3 will not match due to capitalization differences.
Case-insensitive Match for Multiple Values
terms
clause does not support case-insensitive matches. To perform a case-insensitive match for multiple values in a field, use the term
clause inside should
:
"filter": {
"should": [
{
"term": {
"metadata.color": {
"value": "blue",
"case_insensitive": true
}
}
},
{
"term": {
"metadata.color": {
"value": "red",
"case_insensitive": true
}
}
}
],
"minimum_should_match": 1
}
This query will match all three example documents shown above.
Range¶
Use the range
clause to match for a range of values in a field.
You can use the following operators:
gt
: Greater thanlt
: Less thangte
: Greater than or equal tolte
: Less than or equal to
The following field types can be used in the range
clause:
integer
float
date
Date Values¶
When using the range
clause (as explained above) for a date
field, you can use the following date values.
Date Formats and Values
Format:
- You can specify desired date format using the
format
parameter in the API request- If
format
is not specified, default format is ISO 8601 UTC datetime. E.g.,2025-01-12T20:48:57.845Z
- If
format
is specified, please make sure any absolute date values in your API request match the specified format
- If
- How to specify
format
:yyyy
- Year (e.g., 2025)MM
- Month (01-12)dd
- Day of month (01-31)HH
- Hour in 24h format (00-23)mm
- Minutes (00-59)ss
- Seconds (00-59)- Common combinations:
yyyy-MM-dd
- Date only (e.g., 2025-01-12)yyyy-MM-dd'T'HH:mm:ss'Z'
- Full UTC datetimeyyyy-MM-dd'T'HH:mm:ssXXX
- Datetime with timezone offset (e.g.,yyyy-MM-dd'T'HH:mm:ss-06:00
)
Date Values:
- You can use absolute date values in ISO 8601 format:
- Full datetime:
2024-11-01T12:34:56Z
- Date only:
2024-11-01
- With timezone offset:
2024-11-01T12:34:56-07:00
- Full datetime:
- You can use relative date values like
now
,now-1d
,now-1M
, etc. Supported date math units for relative date values:y
(years)M
(months)w
(weeks)d
(days)h
(hours)m
(minutes)s
(seconds)
- Supported date math operators:
+
(add)-
(subtract)/
(round down)
- Examples of round down operator:
now-1d/d
(yesterday at 00:00:00 - rounds down to start of day)now/d
(today at 00:00:00)now+1y/y
(start of next year)
Exists¶
Use the exists
clause to match for documents that contain metadata
or a specific metadata
field.
A document is included in the search if it contains the specified field.
A document will contain the specified field if a value has ever been set for that field in the given document.
Geo clauses¶
You can use the following clauses with geo_point
fields.
Geo-distance¶
Use geo_distance
clause to match for documents with geo-points that are within a specified distance from the specified geo-point.
The example clause shown above finds all stores located within 10 miles of the specified lat/lon.
Supported Distance Units:
mi
(miles)nmi
(nautical miles)km
(kilometers)m
(meters)
Sorting by distance
The example clause shown above will find stores located within 10 miles of the specified lat/lon.
To display stores closest to the user, you also have to sort by geo-distance.
Nested filter criteria¶
Bool¶
Use bool
clause to nest your filter criteria:
Each bool
clause can contain the top-level keys (must
, must_not
, should
, minimum_should_match
), allowing you to nest your filter criteria to as many levels as needed.
This enables precise control over your filter criteria, using a structured and layered approach.
Common Pitfalls¶
- Always use dot notation for metadata fields (e.g.,
metadata.brand
not justbrand
) - For date ranges:
- Ensure dates are properly formatted when using absolute dates (ISO 8601 format by default)
- When using custom date formats, specify the format explicitly using the
format
parameter - Be careful with timezone handling - UTC is the default
- For term filters:
- Term matches are case-sensitive by default
- Use the extended syntax with
case_insensitive: true
if you need case-insensitive matching
- For geo-distance:
- Ensure the field is of type
geo_point
in your metadata schema - Always specify both
lat
andlon
values - Use appropriate distance units (e.g.,
km
,mi
)
- Ensure the field is of type
- When using
bool
queries:- Avoid deeply nested boolean conditions as they can impact performance
- Use
must_not
instead of negating individual conditions
- General tips:
- Test your filters with a small dataset first
- Use the most specific filter type for your use case
- Consider the performance impact of complex filters on large datasets