ChromadbRM
Note
Adapted from documentation provided by https://github.com/animtel
ChromadbRM have the flexibility from a variety of embedding functions as outlined in the chromadb embeddings documentation. While different options are available, this example demonstrates how to utilize OpenAI embeddings specifically.
Setting up the ChromadbRM Client
The constructor initializes an instance of the ChromadbRM
class, with the option to use OpenAI's embeddings or any alternative supported by chromadb, as detailed in the official chromadb embeddings documentation.
collection_name
(str): The name of the chromadb collection.persist_directory
(str): Path to the directory where chromadb data is persisted.embedding_function
(Optional[EmbeddingFunction[Embeddable]], optional): The function used for embedding documents and queries. Defaults toDefaultEmbeddingFunction()
if not specified.k
(int, optional): The number of top passages to retrieve. Defaults to 7.
Example of the ChromadbRM constructor:
ChromadbRM(
collection_name: str,
persist_directory: str,
embedding_function: Optional[EmbeddingFunction[Embeddable]] = OpenAIEmbeddingFunction(),
k: int = 7,
)
Under the Hood
forward(self, query_or_queries: Union[str, List[str]], k: Optional[int] = None) -> dspy.Prediction
Parameters:
- query_or_queries
(Union[str, List[str]]): The query or list of queries to search for.
- k
(Optional[int], optional): The number of results to retrieve. If not specified, defaults to the value set during initialization.
Returns:
- dspy.Prediction
: Contains the retrieved passages, each represented as a dotdict
with a long_text
attribute.
Search the chromadb collection for the top k
passages matching the given query or queries, using embeddings generated via the specified embedding_function
.
Sending Retrieval Requests via ChromadbRM Client
from dspy.retrieve.chromadb_rm import ChromadbRM
import os
import openai
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
embedding_function = OpenAIEmbeddingFunction(
api_key=os.environ.get('OPENAI_API_KEY'),
model_name="text-embedding-ada-002"
)
retriever_model = ChromadbRM(
'your_collection_name',
'/path/to/your/db',
embedding_function=embedding_function,
k=5
)
results = retriever_model("Explore the significance of quantum computing", k=5)
for result in results:
print("Document:", result.long_text, "\n")