MyScaleRM
Constructor
Initializes an instance of the MyScaleRM
class, which is designed to use MyScaleDB (a ClickHouse fork optimized for vector similarity and full-text search) to retrieve documents based on query embeddings. This class supports embedding generation using either local models or OpenAI's API and manages database interactions efficiently.
Syntax
MyScaleRM(
client: clickhouse_connect.driver.client.Client,
table: str,
database: str = 'default',
metadata_columns: List[str] = ['text'],
vector_column: str = 'vector',
k: int = 3,
openai_api_key: Optional[str] = None,
openai_model: Optional[str] = None,
local_embed_model: Optional[str] = None
)
Parameters for MyScaleRM
Constructor
client
(clickhouse_connect.driver.client.Client): A client connection to the MyScaleDB database, used to execute queries and manage interactions with the database.table
(str): Specifies the table within MyScaleDB from which data will be retrieved. This table should be equipped with a vector column for conducting similarity searches.database
(str, optional): The name of the database where the table is located, defaulting to"default"
.metadata_columns
(List[str], optional): Columns to include as metadata in the output, defaulting to["text"]
.vector_column
(str, optional): The column that contains vector data, used for similarity searches, defaulting to"vector"
.k
(int, optional): The number of closest matches to return for each query, defaulting to 3.openai_api_key
(str, optional): API key for accessing OpenAI services, necessary if using OpenAI for embedding generation.openai_model
(str, optional): The specific OpenAI model to use for embeddings, required if an OpenAI API key is provided.local_embed_model
(str, optional): Specifies a local model for embedding generation, chosen if local computation is preferred.
Methods
forward
Executes a retrieval operation based on a user's query and returns the top k
relevant results using the embeddings generated by the specified method.
Syntax
Parameters
user_query
(str): The query to retrieve matching passages.k
(Optional[int], optional): The number of top matches to retrieve. If not provided, it defaults to thek
value set during class initialization.
Returns
dspy.Prediction
: Contains the retrieved passages, formatted as a list ofdotdict
objects. Each entry includes:- long_text (str): The text content of the retrieved passage.
Description
The forward
method leverages the MyScaleDB's vector search capabilities to find the top k
passages that best match the provided query. This method is integral for utilizing the MyScaleRM class to access and retrieve data efficiently based on semantic similarity, facilitated by the chosen embedding generation technique (either via a local model or the OpenAI API).
Quickstart
This section provides practical examples of how to instantiate and use the MyScaleRM
class to retrieve data from MyScaleDB efficiently using text embeddings.
from dspy.retrieve.myscaledb_rm import MyScaleRM
MyScale_model = MyScaleRM(client=client,
table="table_name",
openai_api_key="sk-***",
openai_model="embeddings_model",
vector_column="vector_column_name",
metadata_columns=["add_your_columns_here"],
k=6)
MyScale_model("Please suggest me some funny movies")
passages = results.passages
# Loop through each passage and print the 'long_text'
for passage in passages:
print(passage['long_text'], "\n")