Adding Text Files to Your Search Agent

Incorporating plain text (.txt) documents into your search agent is crucial for broadening its knowledge base, enabling it to source information from a wide array of text-based documents. The add_text method simplifies this process, allowing for efficient integration of text content into the agent’s searchable data.

Function Signature

The add_text function is designed for flexibility, facilitating the addition of text files either from a specified directory or as a list of individual files.

Parameters

  • input_dir (Optional[str]): Directory path containing text files to be added. If specified, the function searches this directory for eligible files.
  • input_files (Optional[List]): A list of specific text file paths to add. Takes precedence over input_dir if provided.
  • exclude_hidden (bool): If True, ignores hidden files or files starting with a dot (.) within input_dir.
  • filename_as_id (bool): If True, uses the filename as the unique identifier for each document.
  • recursive (bool): If True, includes files from subdirectories within input_dir.
  • required_exts (Optional[List[str]]): Specifies the file extensions to include, defaulting to text files.
  • system_prompt (str): An optional prompt to guide the system in processing text content.
  • query_wrapper_prompt (str): An optional prompt to enhance the relevance of user queries by wrapping them.
  • embed_model (Union[str, EmbedType]): The embedding model used for text extraction and embedding, defaulting to a standard model.
  • llm_params (dict): Configuration parameters for integrating Large Language Models, if needed.
  • vector_store_params (dict): Configuration for the vector storage, detailing how and where embeddings are stored.
  • service_context_params (dict): Additional parameters for customizing the service context.
  • query_engine_params (dict): Parameters to customize the behavior of the query engine.
  • retriever_params (dict): Configuration for the document retriever, influencing how documents are retrieved based on queries.

Example Usage

Adding Text Files from a Directory

search_agent.add_text(
    input_dir="/path/to/text/files",
    recursive=True
)

This snippet scans the specified directory (and subdirectories, if recursive is True) for text files, adding them to the search agent’s database.

Adding Specific Text Files

search_agent.add_text(
    input_files=["/path/to/specific_file1.txt", "/path/to/specific_file2.txt"],
)

Here, specific text files are added directly, with unique identifiers generated by the search agent if filename_as_id is False.