The agent writes a query and executes it. If the agent does not know how to do particular type of query then it can use graphql introspection. The agent only receives the minimal amount of data as per the graphql query saving valuable tokens.
It works better!
Not only we don't need to load 50+ tools (our entire SDK) but it also solves the N+1 problem when using traditional REST APIs. Also, you don't need to fall back to write code especially for query and mutations. But if you need to do that, the SDK is always available following graphql typed schema - which helps agents write better code!
While I was never a big fan of graphql before, considering the state of MCP, I strongly believe it is one of the best technologies for AI agents.
Whoa there, you don't need to be so sadistic to your team. It's not GraphQL, but having a document describing how your API works, including types, that is important.
I expect you could achieve the same with a comprehensive OpenAPI specification. If you want something a bit stricter I guess SOAP would work too, LLMs love XML after all.
One of my agents is kinda like this too. The only operation is SPARQL query, and the only accessible state is the graph database.
Since most of the ontologies I'm using are public, I just have to namedrop them in prompt; no schemas and little structure introspection needed. At worst, it can just walk and dump triples to figure out structure; it's all RDF triples and URIs.
One nice property: using structured outputs, you can constrain outputs of certain queries to only generate valid RDF to avoid syntax errors. Probably can do similar stuff with GraphQL.
Isn't the challenge that introspecting graphql will lead to either a) a very long set of definitions consuming many tokens or b) many calls to drill into the introspection?
Well either that or stuff the tool usage examples into the prompt for every single request. If you have only 2-3 tools GraphQL is certainly not necessary - but it wont blow up the context either. If you have 50+ tools, I don't see any other way to be honest, unless you create your own tool discovery solution - which is what GraphQL does really well with the caveat that whatever you decide to do is certainly not natural to these LLMs.
Keep in mind that all LLMs are trained on many GraphQL examples because the technology has been in existence since 2015. While anything custom might just work it is certainly not part of the model training set unless you fine-tune.
So yes, if I need to decide on formats I will go for GraphQL, SQL and Markdown.
That is also the approach we took with Exograph (https://exograph.dev). Here is our reasoning (https://exograph.dev/blog/exograph-now-supports-mcp#comparin...). We found that LLMs do a very good job of crafting GraphQL queries for the given schema. While they do make mistakes, returning good descriptive error messages make is easy for them fix queries.
IMO the biggest pain points of graphql are authorization/rate limiting, caching, and mutations... But for selective context loading none of those matter actually. Pretty cool!
TLDR but it shows how you could teach an LLM your GraphQL query language to let it selectively load context into what were very small context windows at the time.
After that the MCP specification came out. Which from my vantage point is a poor and half implemented version of what GraphQL already is.
your use-case is NOT Everyones use-case..(working in depth across one codebase or api but instead sampling dozens of abilities across the web or with other systems) thats the thing
how is that going to work with my use case, do a web search, do a local api call, do a graphql search, do an integration with slack, do a message etc..
Does it matter ? if it's well defined, each of those would be a node in the graph, or can you elaborate ?
Dozens seems not that much, for a graph where a higher level node would be slack, and the agent only loads further if it needs anything related with slack.
Or I'm not understanding.
It doesn’t actually require that second part. Every time I’ve used it in a production system, we had an approved list of query shapes that were accepted. If the client wanted to use a new kind of query, it was performance tested and sometimes needed to be optimized before approval for use.
If you open it up for any possible query, then give that to uncontrolled clients, it’s a recipe for disaster.
GQL is an HTTP endpoint. The question is, how are you schematizing, documenting, validating, code-generating, monitoring, etc. the request and response on your HTTP endpoints? (OpenAPI is another good choice.)
Really? Hmm... where in the HTTP spec does it allow for returning an arbitrary subset of any specific request, rather than the whole thing? And where does it ensure all the results are keyed by id so that you can actually build and update a sensible cache around all of it rather than the mess that totally free-form HTTP responses lead to? Oh weird HTTP doesn't have any of that stuff? Maybe we should make a new spec, something which does allow for these patterns and behaviors? And it might be confusing if we use the exact same name as HTTP, since the usage patterns are different and it enables new abilities. If only we could think of such a name...
An HTTP Range request asks the server to send parts of a resource back to a client. Range requests are useful for various clients, including media players that support random access, data tools that require only part of a large file, and download managers that let users pause and resume a download.
Because it solves all sorts of other problems, like having a well-defined way to specify the schema of queries and results, and lots of tools built around that.
I would be surprised to see many (or any) GQL endpoints in systems with significant complexity and scale that allow completely arbitrary requests.
Yep, OpenAPI is also a good choice nowadays. That’s typically used with the assumption you’ve chosen a supported subset of queries. With GQL you have to add that on top.
Probably for one of the reasons graphql was created in the first place - accomplish a set of fairly complex operations using one rather than a multitude of API calls. The set can be "everything" or it can be "this well-defined subset".
I think they mean something like (or what I think of as) “RPC calls, but with the flexibility to select a granular subset of the result based on one or more schemas”. This is how I’ve used graphql in the past at least.
> I am wondering why you're using graphql if you are kneecapping it and restricting it to set queries.
Because you never want to expose unbounded unlimited dynamic queries in production. You do want a very small subset that you can monitor, debug, and optimize.
It's not. The fragments you can execute are limited if you do it right. A client isn't allowed to just execute anything it wants, because the valid operations are pre-determined. The client sends a reference which executes a specific pre-planned fragment of code.
In development, you let clients roam free, so you have access to the API in a full manner. Deployments then lock-down the API. If you just let a client execute anything it wants in production, you get into performance-trouble very easily once a given client decides to be adventurous.
GraphQL is an execution semantics. It's very close to a lambda calculus, but I don't think that was by design. I think that came about by accident. A client is really sending a small fragment of code to the server, which the server then executes. The closest thing you have is probably SQL queries: the client sends a query to the server, which the server then executes.
It's fundamental to the idea of GraphQL as well. You want to put power into the hands of the client, because that's what allows a top-down approach to UX design. If you always have to manipulate the server-side whenever a client wants to change call structure, you've lost.
No one exposes SQL to clients though. I think where Gql differs from sql is it’s at a higher level. SQL bleeds performance and data layout (e.g. normalizing, limits), GraphQL does not.
It’s not clear if it’s high enough to abstract knowledge from storage. In the end it’s tension between enabling client to wander around productively vs being a bull in a china shop.
> I strongly believe it is one of the best technologies for AI agents
Do you have any quantitative evidence to support this?
Sincere question. I feel it would add some much needed credibility in a space where many folks are abusing the hype wave and low key shilling their products with vibes instead of rigor.
I have thought about this for all of thirty seconds, but it wouldn't shock me if this was the case. The intuition here is about types, and the ability to introspect them. Agents really love automated guardrails. It makes sense to me that this would work better than RESTish stuff, even with OpenAPI.
Same in terms of time spent. The hypothesis graphql is superior passes the basic sniff test. Assuming graphql does what it says on the tin, which my understanding is it does based on my work with Ent, then the claim it’s better for tool and api use by agents follows from common sense.
This is a task I think is suited for a sub agent that is small in size. It can can take the context beating to query for relevant tools and return only what is necessary to the main agent thread.
I've seen a similar setup with an llm loop integrated with clojure. In clojure, code is data, so the llm can query, execute, and modify the program directly
If you knew GraphQL, you may immediately see it - you ask for specific nested structure of the data, which can span many joins across different related collections. This is not the case with common REST API or CLI for example. And introspection is another good reason.
It is called graphql.
The agent writes a query and executes it. If the agent does not know how to do particular type of query then it can use graphql introspection. The agent only receives the minimal amount of data as per the graphql query saving valuable tokens.
It works better!
Not only we don't need to load 50+ tools (our entire SDK) but it also solves the N+1 problem when using traditional REST APIs. Also, you don't need to fall back to write code especially for query and mutations. But if you need to do that, the SDK is always available following graphql typed schema - which helps agents write better code!
While I was never a big fan of graphql before, considering the state of MCP, I strongly believe it is one of the best technologies for AI agents.
I wrote more about this here if you are interested: https://chatbotkit.com/reflections/why-graphql-beats-mcp-for...