Implement RAG
This guide explains how to implement a RAG system using our Generate Answer API.
What is RAG?¶
RAG (Retrieval-Augmented Generation) is a technique that enhances AI language models by:
- First retrieving relevant information from your documents
- Then using that information to generate accurate, contextual answers
Benefits of RAG include:
- More accurate answers based on your specific content
- Reduced hallucinations (making up information)
- Always up-to-date information (as your documents are updated)
- Lower costs compared to fine-tuning
Implementation Steps¶
1. Add Your Documents¶
First, you'll need to add your documents to your Gainly semantic index. You can do this using our Add Document API.
Here's an example implementation (in Python) of adding a document to your Gainly index:
import requests
from typing import Dict
def add_document(title: str, content: str) -> Dict:
"""Add a single document to Gainly index"""
BASE_URL = "https://api.gainly.ai"
VERSION = "v20241104"
headers = {
"Content-Type": "application/json",
"X-API-Key": "YOUR_API_KEY_HERE"
}
response = requests.post(
f"{BASE_URL}/{VERSION}/documents",
headers=headers,
json={
"title": title,
"content": content
}
)
if response.status_code == 200:
return response.json()
else:
raise Exception(f"Failed to add document: {response.status_code}")
# Example usage
document = {
"title": "Product Features",
"content": "Our product offers advanced analytics..."
}
result = add_document(title=document['title'], content=document['content'])
print(f"Added document with ID: {result['id']}")
Adding a Large Number of Documents
For details about adding a large number of documents, see our Batch Add Documents guide. It covers important details like rate limiting.
2. Implement RAG¶
Here's an example implementation (in Python) of a RAG system using our Generate Answer API:
class RAGSystem:
def __init__(self, api_key: str):
self.BASE_URL = "https://api.gainly.ai"
self.VERSION = "v20241104"
self.headers = {
"Content-Type": "application/json",
"X-API-Key": api_key
}
self.conversation_history = []
def generate_answer(self,
query: str,
max_output_tokens: int = 512,
temperature: float = 0.5) -> Dict:
"""
Generate an answer using RAG.
Args:
query: User's question
max_output_tokens: Maximum length of generated answer
temperature: Controls answer creativity (0.0-1.0)
Returns:
Dict containing the answer and related information
"""
try:
response = requests.post(
f"{self.BASE_URL}/{self.VERSION}/generate-answer",
headers=self.headers,
json={
"query": query,
"max_output_tokens": max_output_tokens,
"temperature": temperature,
"previous_messages": self.conversation_history
}
)
if response.status_code == 200:
result = response.json()
# Update conversation history
if "messages" in result:
self.conversation_history = result["messages"]
return result
else:
raise Exception(f"API error: {response.status_code}")
except Exception as e:
print(f"Error generating answer: {e}")
return None
def clear_conversation(self):
"""Clear the conversation history"""
self.conversation_history = []
# Example usage
rag = RAGSystem(api_key="YOUR_API_KEY_HERE")
# Ask a question
result = rag.generate_answer(
query="What are the main product features?",
temperature=0.7
)
if result:
print("\nAnswer:", result["data"][0]["answer"])
print("\nSources:")
for source in result["data"][0]["sources"]:
print(f"- {source['title']}")
3. Conversation Management¶
The Generate Answer API supports conversational context through the previous_messages
parameter. This allows for more natural, context-aware conversations.
Here's how conversation management works:
- For the first question, don't include any
previous_messages
- For follow-up questions, include the
messages
array from the previous API response - The API maintains the conversation context automatically
Best Practices for Conversation Management¶
-
Message History Length
- Default is 4 previous messages
- Can be adjusted using
max_messages
parameter - More context helps with relevance but uses more tokens
-
Conversation Reset
- Clear conversation history when:
- Starting a new topic
- User explicitly requests to start over
- Conversation becomes too long
- Use
clear_conversation()
method to reset
- Clear conversation history when:
-
Token Usage
- Monitor token usage as conversations grow
- Longer conversations use more tokens
- Consider implementing a maximum conversation length
Additional Options¶
The Generate Answer API supports several optional parameters:
filter
: Filter documents by metadatalanguage
: Specify document languagemultilingual_search
: Search across languagesretrieval_limit
: Control number of retrieved passagesai_search_cutoff_score
: Adjust relevance threshold
For more details about these options, see the Generate Answer API Reference.