If you are just prototyping, you should do the simplest thing that could possibly work. (But this is not a real solution, so I won't count it)
2) Request Batching
This approach is popularised by Facebooks Dataloader library https://github.com/facebook/dataloader
Instead of directly querying the database in the resolver, you return a specification of what data you are interested in. When all resolvers on the same level in the query has returned this specification, the Dataloader library figures out how to batch queries together as much as possible
3) Big Join
You can inspect the GraphQL query in the top level resolver and basically perform one giant join up front. You will then pass the entire value down your resolve hierarchy, and all they have to do is pick out the correct data from the complete result set. Take a look at join-monster if you are interested in this approach.
The Graphcool hosted backend is serving millions of requests an hour using the Dataloader approach, and we have found this to be more flexible and more performant at scale than generating a big complex join up front.
For most queries we see single digit millisecond response time for a batched query, and the time complexity scales linearly with the depth of the query, which is not the case for joins.
trying to access fields like `info.operation.selectionSet.selections[0].selectionSet.selections` always feel wrong to me. I wish graphql library part comes with some utility function for these.
Wow! Thank you, I have been looking for something like this. I use mostly NoSQL which typically doesn't do joins (unless it is a graph database) but this looks really usefully for when I do have a SQL database to work with.
Interesting approach for using GraphQL with a graph database.
There is also an integration for Neo4j [1] that works similarly (it translates GraphQL to Cypher), however it also exposes Cypher in GraphQL through a @cypher directive, which maps complex graph traversals/aggregations to a single GraphQL field.
@tylerbainbridge and I work at Conduit (https://conduithq.com), where we are using GraphQL, join-monster, and PostgreSQL in production.
What you describe isn't a limitation at all-- some of our queries run 10 levels deep and work just fine. In fact, this method is so powerful that most of our queries start from the `user` object and just JOIN off of it.
It's a lib working at a higher level so things like n+1 are left to be solved by the user since the implementation details depend very much on the underlying data storage.
I think this is something that graphql implementations are still trying to figure out. At work (samsara.com) we use a time-based batch[0] for our queries to help with n+1 issues in our graphql server.
Separate branches of the the graphql query are handed to independent goroutines. When batchable rpcs (to the database, or other services) occur, we delay execution by ~1ms and wait for any other rpcs of the same type (and keep delaying for up to ~20ms). This works pretty well for our more expensive field resolvers. We've also thought about ways to examine and track the execution of the goroutines (similar to the facebook/dataloader approach), but the time-based batches have worked pretty well for us so far.
We also aggressively cache queries and rpcs so that the same data is not expensive to fetch across different branches of a query.
Old timers that remember Enterprise JavaBeans 1.x entity beans are probably rolling in laughter currently. It had a very similar problem in 1999-2000 that doomed many systems to crappy performance and tarred EJB forever.
History never fails to repeat itself: beware subsystems that propose to wrap and abstract your database for a new paradigm "automatically". They're leaky and difficult and take a decade to get right. See: ActiveRecord, EJB followed by JPA, .NET's various DB frameworks, Django, etc.
I wrote a little proof of concept in JavaScript showing one way of dealing with the problem [1]. Essentially, as you go through each node in the request, build up a query (say, a SQL query) that represents the fetching of that node by joining it with the query of its parent. Then, stitch the result of running each query back onto the result of running the parent query. Conceptually, this means one resolution function call for each node in the request, rather than one resolution function call for each node node in the response.
However, it's rather verbose with repeated boilerplate, so to make the approach work I think you'd want to put a higher-level layer on top -- that's what I've done with a Python implementation [2]. I think the general approach is the valuable part, rather than the specific libraries, and seems more in keeping with how I'd normally write queries than the usual DataLoader approach. In particular, I've found it tends to improve performance when fetching large data sets since you don't need to call a resolution function on every single field that gets returned. (DataLoader implementations batch queries together to avoid the n+1 problem, but they mean that you need to implement a batching layer, and probably make your resolution functions asynchronous, which is especially expensive for large data sets).