Skip to content

Implement RAG

This guide explains how to implement a RAG system using our Generate Answer API.

What is RAG?

RAG (Retrieval-Augmented Generation) is a technique that enhances AI language models by:

  1. First retrieving relevant information from your documents
  2. Then using that information to generate accurate, contextual answers

Benefits of RAG include:

  • More accurate answers based on your specific content
  • Reduced hallucinations (making up information)
  • Always up-to-date information (as your documents are updated)
  • Lower costs compared to fine-tuning

Implementation Steps

1. Add Your Documents

First, you'll need to add your documents to your Gainly semantic index. You can do this using our Add Document API.

Here's an example implementation (in Python) of adding a document to your Gainly index:

import requests
from typing import Dict

def add_document(title: str, content: str) -> Dict:
    """Add a single document to Gainly index"""
    BASE_URL = "https://api.gainly.ai"
    VERSION = "v20241104"
    headers = {
        "Content-Type": "application/json",
        "X-API-Key": "YOUR_API_KEY_HERE"
    }

    response = requests.post(
        f"{BASE_URL}/{VERSION}/documents",
        headers=headers,
        json={
            "title": title,
            "content": content
        }
    )

    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"Failed to add document: {response.status_code}")

# Example usage
document = {
    "title": "Product Features",
    "content": "Our product offers advanced analytics..."
}

result = add_document(title=document['title'], content=document['content'])
print(f"Added document with ID: {result['id']}")

Adding a Large Number of Documents

For details about adding a large number of documents, see our Batch Add Documents guide. It covers important details like rate limiting.

2. Implement RAG

Here's an example implementation (in Python) of a RAG system using our Generate Answer API:

class RAGSystem:
    def __init__(self, api_key: str):
        self.BASE_URL = "https://api.gainly.ai"
        self.VERSION = "v20241104"
        self.headers = {
            "Content-Type": "application/json",
            "X-API-Key": api_key
        }
        self.conversation_history = []

    def generate_answer(self, 
                       query: str, 
                       max_output_tokens: int = 512,
                       temperature: float = 0.5) -> Dict:
        """
        Generate an answer using RAG.

        Args:
            query: User's question
            max_output_tokens: Maximum length of generated answer
            temperature: Controls answer creativity (0.0-1.0)

        Returns:
            Dict containing the answer and related information
        """
        try:
            response = requests.post(
                f"{self.BASE_URL}/{self.VERSION}/generate-answer",
                headers=self.headers,
                json={
                    "query": query,
                    "max_output_tokens": max_output_tokens,
                    "temperature": temperature,
                    "previous_messages": self.conversation_history
                }
            )

            if response.status_code == 200:
                result = response.json()

                # Update conversation history
                if "messages" in result:
                    self.conversation_history = result["messages"]

                return result
            else:
                raise Exception(f"API error: {response.status_code}")

        except Exception as e:
            print(f"Error generating answer: {e}")
            return None

    def clear_conversation(self):
        """Clear the conversation history"""
        self.conversation_history = []

# Example usage
rag = RAGSystem(api_key="YOUR_API_KEY_HERE")

# Ask a question
result = rag.generate_answer(
    query="What are the main product features?",
    temperature=0.7
)

if result:
    print("\nAnswer:", result["data"][0]["answer"])
    print("\nSources:")
    for source in result["data"][0]["sources"]:
        print(f"- {source['title']}")

3. Conversation Management

The Generate Answer API supports conversational context through the previous_messages parameter. This allows for more natural, context-aware conversations.

Here's how conversation management works:

  1. For the first question, don't include any previous_messages
  2. For follow-up questions, include the messages array from the previous API response
  3. The API maintains the conversation context automatically

Best Practices for Conversation Management

  1. Message History Length

    • Default is 4 previous messages
    • Can be adjusted using max_messages parameter
    • More context helps with relevance but uses more tokens
  2. Conversation Reset

    • Clear conversation history when:
      • Starting a new topic
      • User explicitly requests to start over
      • Conversation becomes too long
    • Use clear_conversation() method to reset
  3. Token Usage

    • Monitor token usage as conversations grow
    • Longer conversations use more tokens
    • Consider implementing a maximum conversation length

Additional Options

The Generate Answer API supports several optional parameters:

  • filter: Filter documents by metadata
  • language: Specify document language
  • multilingual_search: Search across languages
  • retrieval_limit: Control number of retrieved passages
  • ai_search_cutoff_score: Adjust relevance threshold

For more details about these options, see the Generate Answer API Reference.