dspy.LocalLLM

dspy.LocalLLM(
    object=None,
    model='LocalLLM',
    model_type='chat',
    temperature=0.0,
    max_tokens=1000,
    cache=True,
    trace=True,
    **kwargs,
)

Create a Local LLM object which you can use alongside dspy

Parameters

Name	Description	Default
object	An object of class ‘llama_cpp.llama.Llama’ as returned by Llama from the llama-cpp-python package	`None`
model	A name that you provide to the model. Defaults to ‘LocalLLM’	`'LocalLLM'`
model_type	String with the type of model. e.g. ‘chat’, ‘responses’. Currently only tested with type: ‘chat’	`'chat'`
temperature	The model temperature to use. Defaults to 0	`0.0`
max_tokens	Maximum number of tokens. Defaults to 1000	`1000`
cache	Use the cache to avoid recomputing the same llm call twice. Defaults to True.	`True`
trace	Boolean indicating to log internal calls. Defaults to False.	`True`

Returns

Name	Type	Description
	dspy.BaseLM	An object of type dspy.BaseLM

Examples

>>> import dspy
>>> from llama_cpp import Llama
>>> from localllm import LocalLLM, localllm_download_model
>>> path = localllm_download_model("gemma-3-270m-it-Q8_0", overwrite = True, trace = False)
>>> transformer = Llama(model_path=path, n_gpu_layers=-1, flash_attn = False, n_ctx = 32768, n_threads = 1, seed = 4321, verbose = False)
>>> out = transformer("How much is 4x4")
>>>
>>> dspy.configure(lm = LocalLLM(transformer))
>>> class Go(dspy.Signature):
...     sentence: str = dspy.InputField(desc = "A question")
...     answer:   str = dspy.OutputField(desc = "A city name")
>>>
>>> model = dspy.Predict(Go)
>>> out = model(sentence="What is the capital of France")
>>> out["answer"]
'Paris'