Skip to content

Server-Side LLM Sampling

MCP includes a sampling feature that lets the server ask the client to run an LLM request. This keeps API keys and billing on the client side while giving your EnrichMCP application the ability to generate text or run tool-aware prompts.

EnrichContext.ask_llm() (and its alias sampling()) is the helper used to make these requests. The method mirrors the MCP sampling API and supports a number of tuning parameters.

Parameters

Name Description
messages Text or SamplingMessage objects to send to the LLM. Strings are converted to user messages automatically.
system_prompt Optional system prompt that defines overall behavior.
max_tokens Maximum number of tokens the client should generate. Defaults to 1000.
temperature Sampling temperature for controlling randomness.
model_preferences ModelPreferences object describing cost, speed and intelligence priorities. Use prefer_fast_model(), prefer_medium_model() or prefer_smart_model() as shortcuts.
allow_tools Controls what tools the LLM can see: "none", "thisServer", or "allServers".
stop_sequences Strings that stop generation when encountered.

Model Preferences

ModelPreferences let the server express whether it cares more about cost, speed or intelligence when the client chooses an LLM. Two convenience functions are provided:

from enrichmcp import prefer_fast_model, prefer_medium_model, prefer_smart_model

Use prefer_fast_model() when low latency and price are most important. prefer_medium_model() offers balanced quality and cost. Use prefer_smart_model() when you need the best reasoning capability.

Tool Access

Set allow_tools to allow the client LLM to inspect available MCP tools. This enables context-aware answers where the LLM can suggest reading or calling other resources.

Example

@app.retrieve
async def summarize(text: str) -> str:
    ctx = app.get_context()
    result = await ctx.ask_llm(
        f"Summarize this: {text}",
        model_preferences=prefer_fast_model(),
        max_tokens=200,
        allow_tools="thisServer",
    )
    return result.content.text

MCP sampling gives your server lightweight LLM features without storing API credentials. See the travel planner example for a complete implementation.