{"id":215602,"date":"2026-01-04T00:53:07","date_gmt":"2026-01-04T00:53:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/nz\/215602\/"},"modified":"2026-01-04T00:53:07","modified_gmt":"2026-01-04T00:53:07","slug":"how-to-keep-mcps-useful-in-agentic-pipelines","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/nz\/215602\/","title":{"rendered":"How to Keep MCPs Useful in Agentic Pipelines"},"content":{"rendered":"<p>Intro<\/p>\n<p class=\"wp-block-paragraph\"> applications powered by Large Language Models (LLMs) require integration with external services, for example integration with Google Calendar to set up meetings or integration with PostgreSQL to get access to some data.\u00a0<\/p>\n<p>Function calling<\/p>\n<p class=\"wp-block-paragraph\">Initially these kinds of integrations were implemented through function calling: we were building some special functions that can be called by an LLM through some specific tokens (LLM was generating some special tokens to call the function, following patterns we defined), parsing and execution. To make it work we were implementing authorization and API calling methods for each of the tools. Importantly, we had to manage all the instructions for these tools to be called and build internal logic of these functions including default or user-specific parameters. But the hype around \u201cAI\u201d required fast, sometimes brute-force solutions to keep the pace, that is where MCPs were introduced by the Anthropic company.\u00a0<\/p>\n<p>MCPs<\/p>\n<p class=\"wp-block-paragraph\">MCP stands for Model Context Protocol and today it is a standard way of providing tools to the majority of the agentic pipelines. MCPs basically manage both integration functions and LLM instructions to use tools. At this point some may argue that Skills and Code execution that were also introduced by the Anthropic lately have killed MCPs, but in fact these features also tend to use MCPs for integration and instruction management (<a href=\"https:\/\/www.anthropic.com\/engineering\/code-execution-with-mcp\" rel=\"nofollow noopener\" target=\"_blank\">Code execution with MCP \u2014 Anthropic<\/a>). Skills and Code execution are focused on the context management problem and tools orchestration, that is a different problem from what MCPs are <a href=\"https:\/\/modelcontextprotocol.io\/docs\/getting-started\/intro\" rel=\"nofollow noopener\" target=\"_blank\">focused on<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">MCPs provide a standard way to integrate different services (tools) with LLMs and also provide instructions LLMs use to call the tools. However, here are a couple of problems:\u00a0<\/p>\n<p>Current model context protocol supposes all the tool calling parameters to be exposed to the LLM, and all their values are supposed to be generated by the LLM. For example, that means the LLM has to generate user id value if function calling requires it. That is an overhead because the system, application knows user id value without the need for LLM to generate it, moreover to make LLM informed about the user id value we have to put it to the prompt (there is a \u201chiding arguments\u201d approach in FastMCP from <a href=\"https:\/\/gofastmcp.com\/patterns\/tool-transformation#hiding-arguments\" rel=\"nofollow noopener\" target=\"_blank\">gofastmcp<\/a> that is focused specifically on this problem, but I haven\u2019t seen it in the original MCP implementation from Anthropic).<\/p>\n<p>No out-of-the-box control over instructions. MCPs provide description for each tool and description for each argument of a tool so these values are just used blindly in the agentic pipelines as an LLM API calling parameters.\u00a0And the description are provided by the each separate MCP server developer.<\/p>\n<p>System prompt and tools<\/p>\n<p class=\"wp-block-paragraph\">When you are calling LLMs you usually provide tools to the LLM call as an API call parameter. The value of this parameter is retrieved from the MCP\u2019s list_tools function that returns JSON schema for the tools it has.<\/p>\n<p class=\"wp-block-paragraph\">At the same time this \u201ctools\u201d parameter is used to put additional information to the model\u2019s system prompt. For example, the Qwen3-VL model has <a href=\"https:\/\/huggingface.co\/Qwen\/Qwen3-VL-8B-Instruct\/blob\/main\/chat_template.json\" rel=\"nofollow noopener\" target=\"_blank\">chat_template<\/a> that manages tools insertion to the system prompt the following way:<\/p>\n<p>\u201c&#8230;You are provided with function signatures within  XML tags:\\\\n\\&#8221; }}\\n\u00a0 \u00a0 {%- for tool in tools %}\\n\u00a0 \u00a0 \u00a0 \u00a0 {{- \\&#8221;\\\\n\\&#8221; }}\\n\u00a0 \u00a0 \u00a0 \u00a0 {{- tool | tojson }}\\n\u00a0 \u00a0 {%- endfor %}&#8230;\u201d<\/p>\n<p class=\"wp-block-paragraph\">So the tools descriptions end up in the system prompt of the LLM you are calling.<\/p>\n<p class=\"wp-block-paragraph\">The first problem is actually partially solved by the mentioned \u201chiding arguments\u201d approach from the FastMCP, but still I saw some solutions where values like \u201cuser id\u201d were pushed to the model\u2019s system prompt to use it in the tool calling \u2014 it is just faster and much simpler to implement from the engineering point of view (actually no engineering required to just put it to the system prompt and rely on a LLM to use it). So here I am focused on the second problem.<\/p>\n<p class=\"wp-block-paragraph\">At the same time I am leaving aside the problems related to tons of rubbish MCPs on the market \u2014 some of them do not work, some have generated tools description that can be confusing to the model. The problem I focus here on \u2014 non-standardised tools and their parameter descriptions that can be the reason why LLMs misbehave with some tools.<\/p>\n<p>Instead of the conclusion for the introduction part:<\/p>\n<p class=\"wp-block-paragraph\">If your agentic LLM-powered pipeline fails with the tools you have, you can:<\/p>\n<p>Just choose a more powerful, modern and expensive LLM API;<\/p>\n<p>Revisit your tools and the instructions overall.<\/p>\n<p class=\"wp-block-paragraph\">Both can work. Make your decision or ask your AI-assistant to make a decision for you\u2026<\/p>\n<p>Formal part of the work \u2014 research<\/p>\n<p>1. Examples of different descriptions<\/p>\n<p class=\"wp-block-paragraph\">Based on the search through the real MCPs on the market, checking their tools lists and the descriptions, I could find many examples of the mentioned issue. Here I am providing just a single example from two different MCPs that have different domains as well (in the real life cases the list of MCPs a model uses tend to have different domains):<\/p>\n<p class=\"wp-block-paragraph\">Example 1:\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Tool description: \u201cGenerate a area chart to show data trends under continuous independent variables and observe the overall data trend, such as, displacement = velocity (average or instantaneous) \u00d7 time: s = v \u00d7 t. If the x-axis is time (t) and the y-axis is velocity (v) at each moment, an area chart allows you to observe the trend of velocity over time and infer the distance traveled by the area\u2019s size.\u201d,<\/p>\n<p class=\"wp-block-paragraph\">\u201cData\u201d property description: \u201cData for area chart, it should be an array of objects, each object contains a `time` field and a `value` field, such as, [{ time: \u20182015\u2019, value: 23 }, { time: \u20182016\u2019, value: 32 }], when stacking is needed for area, the data should contain a `group` field, such as, [{ time: \u20182015\u2019, value: 23, group: \u2018A\u2019 }, { time: \u20182015\u2019, value: 32, group: \u2018B\u2019 }].\u201d<\/p>\n<p class=\"wp-block-paragraph\">Example 2:<\/p>\n<p class=\"wp-block-paragraph\">Tool description: \u201cSearch for Airbnb listings with various filters and pagination. Provide direct links to the user\u201d,<\/p>\n<p class=\"wp-block-paragraph\">\u201cLocation\u201d property description: \u201cLocation to search for (city, state, etc.)\u201d<\/p>\n<p class=\"wp-block-paragraph\">Here I am not saying that any of these descriptions is incorrect, they are just very different from the format and details perspective.<\/p>\n<p>2. Dataset and benchmark<\/p>\n<p class=\"wp-block-paragraph\">To prove that different tools descriptions can change model\u2019s behavior I used NVidia\u2019s <a href=\"https:\/\/huggingface.co\/datasets\/nvidia\/When2Call\" rel=\"nofollow noopener\" target=\"_blank\">\u201cWhen2Call\u201d<\/a> dataset. From this dataset I took test samples that have multiple tools for the model to choose from and one tool is the correct choice (it is correct to call a specific tool rather than any other or than to provide a text answer without any tool call, according to the dataset). The idea of the benchmark is to count correct and incorrect tool calls, I also count \u201cno tool calling\u201d cases as an incorrect answer. For the LLM I selected OpenAI\u2019s \u201cgpt-5-nano\u201d.<\/p>\n<p>3. Data generation<\/p>\n<p class=\"wp-block-paragraph\">The original dataset provides just a single tool description. To create alternative descriptions for each tool and parameter I used \u201cgpt-5-mini\u201d to generate it based on the current one with the following instruction to complicate it (after generation there was an additional step of validation and re-generation when necessary):<\/p>\n<p class=\"wp-block-paragraph\">\u00a0\u201c\u201d\u201dYou will receive the tool definition in JSON format. Your task is to make the tool description more detailed, so it can be used by a weak model.<\/p>\n<p class=\"wp-block-paragraph\">One of the ways to complicate \u2014 insert detailed description of how it works and examples of how to use.<\/p>\n<p class=\"wp-block-paragraph\">Example of detailed descriptions:<\/p>\n<p class=\"wp-block-paragraph\">Tool description: \u201cGenerate a area chart to show data trends under continuous independent variables and observe the overall data trend, such as, displacement = velocity (average or instantaneous) \u00d7 time: s = v \u00d7 t. If the x-axis is time (t) and the y-axis is velocity (v) at each moment, an area chart allows you to observe the trend of velocity over time and infer the distance traveled by the area\u2019s size.\u201d,<\/p>\n<p class=\"wp-block-paragraph\">Property description: \u201cData for area chart, it should be an array of objects, each object contains a `time` field and a `value` field, such as, [{ time: \u20182015\u2019, value: 23 }, { time: \u20182016\u2019, value: 32 }], when stacking is needed for area, the data should contain a `group` field, such as, [{ time: \u20182015\u2019, value: 23, group: \u2018A\u2019 }, { time: \u20182015\u2019, value: 32, group: \u2018B\u2019 }].\u201d<\/p>\n<p class=\"wp-block-paragraph\">Return the updated detailed description strictly in JSON format (just change the descriptions, do not change the structure of the inputted JSON). Start your answer with:<\/p>\n<p class=\"wp-block-paragraph\">\u201cNew JSON-formatted: \u2026\u201d<\/p>\n<p class=\"wp-block-paragraph\">\u201c\u201d\u201d<\/p>\n<p>4. Experiments<\/p>\n<p class=\"wp-block-paragraph\">To test the hypothesis I did a couple of tests, namely:<\/p>\n<p>Measure the baseline of the model performance on the selected benchmark (Baseline);<\/p>\n<p>Replace correct tool descriptions (including both tool description itself and parameters descriptions \u2014 the same for all the experiments) with the generated one (Correct tool replaced);<\/p>\n<p>Replace incorrect tools descriptions with the generated (Incorrect tool replaced);<\/p>\n<p>Replace all tools description with the generated (All tools replaced).<\/p>\n<p class=\"wp-block-paragraph\">Here is a table with the results of these experiments (for each of the experiments 5 evaluations were executed, so in addition to accuracy standard deviation (std) is provided):<\/p>\n<p>MethodMean accuracyAccuracy stdMaximum accuracy over 5 experimentsBaseline76.5%0.0379.0%Correct tool replaced80.5%0.0385.2%Incorrect tool replaced75.1%0.0176.5%All tools replaced75.3%0.0482.7%Table 1. Results of the experiments. Table prepared by the author.<\/p>\n<p>Conclusion<\/p>\n<p class=\"wp-block-paragraph\">From the table above it is evident that tools complication introduce bias to the model, selected LLM tends to choose the tool with more detailed description. At the same time we can see that extended description can confuse the model (in the case of all tools replaced).<\/p>\n<p class=\"wp-block-paragraph\">The table shows that tools description provides mechanisms to manipulate and significantly adjust model\u2019s behaviour \/ accuracy, especially taking into account that the selected benchmark operates with a small number of tools at each model call, the average number of used tools at each sample is 4.35.<\/p>\n<p class=\"wp-block-paragraph\">At the same time it clearly indicates that LLMs can have tools biases that potentially can be misused by MCP providers, that can be similar biases to those I reported before \u2014 <a href=\"https:\/\/medium.com\/towards-artificial-intelligence\/fighting-style-collapse-reinforcement-learning-with-bit-lora-for-llm-style-personalization-46e818f7495e\" rel=\"nofollow noopener\" target=\"_blank\">style biases<\/a>. Research of the biases and their misuse can be important for further studies.<\/p>\n<p>Engineering a solution<\/p>\n<p class=\"wp-block-paragraph\">I\u2019ve prepared a PoC of tooling to address the mentioned issue in practice \u2014 Master-MCP. Master-MCP is a proxy MCP server that can be connected to any number of MCPs and also can be connected to an agent \/ LLM as a single MCP-server itself (currently stdio-transport MCP server). Default features of the Master-MCP I\u2019ve implemented:<\/p>\n<p>Ignore some parameters. The implemented mechanics exclude all the parameters that start with \u201c_\u201d symbol from the tool\u2019s parameters schema. Later this parameter can be inserted programmatically or use default value (if provided).<\/p>\n<p>Tool description adjustments. Master-MCP collects all the tool\u2019s and their descriptions from the connected MCP servers and provide a user a way to adjust it. It exposes a method with the simple UI to edit this list (JSON-schema), so the user can experiment with different tools\u2019 descriptions.<\/p>\n<p class=\"wp-block-paragraph\">I invite everyone interested to join the project. With the community support the plans can include Master-MCP\u2019s functionality extension, for example:<\/p>\n<p>Logging and monitoring followed by the advanced analytics;<\/p>\n<p>Tools hierarchy and orchestration (including ML powered) to combine both modern context management techniques and smart algorithms.<\/p>\n<p class=\"wp-block-paragraph\">Current github page of the project: <a href=\"https:\/\/github.com\/MarvinRomson\/master-mcp\" rel=\"nofollow noopener\" target=\"_blank\">link<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Intro applications powered by Large Language Models (LLMs) require integration with external services, for example integration with Google&hellip;\n","protected":false},"author":2,"featured_media":215603,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[365,21959,363,364,6033,8952,88831,111,139,69,145],"class_list":{"0":"post-215602","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-ai-agent","10":"tag-artificial-intelligence","11":"tag-artificialintelligence","12":"tag-llm","13":"tag-llm-applications","14":"tag-mcp","15":"tag-new-zealand","16":"tag-newzealand","17":"tag-nz","18":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/215602","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/comments?post=215602"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/215602\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media\/215603"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media?parent=215602"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/categories?post=215602"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/tags?post=215602"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}