Couldn't find if Apollo solves the database round trips issue with GraphQL (the ...

sorenbs · on July 20, 2017

There are basically two approaches to this issue:

0) Do nothing

If you are just prototyping, you should do the simplest thing that could possibly work. (But this is not a real solution, so I won't count it)

2) Request Batching

This approach is popularised by Facebooks Dataloader library https://github.com/facebook/dataloader Instead of directly querying the database in the resolver, you return a specification of what data you are interested in. When all resolvers on the same level in the query has returned this specification, the Dataloader library figures out how to batch queries together as much as possible

3) Big Join

You can inspect the GraphQL query in the top level resolver and basically perform one giant join up front. You will then pass the entire value down your resolve hierarchy, and all they have to do is pick out the correct data from the complete result set. Take a look at join-monster if you are interested in this approach.

The Graphcool hosted backend is serving millions of requests an hour using the Dataloader approach, and we have found this to be more flexible and more performant at scale than generating a big complex join up front.

For most queries we see single digit millisecond response time for a batched query, and the time complexity scales linearly with the depth of the query, which is not the case for joins.

obi1kenobi · on July 20, 2017

The GraphQL compiler project also implements the 3) approach: https://github.com/kensho-technologies/graphql-compiler

Primer on how to use it: https://blog.kensho.com/compiled-graphql-as-a-database-query...

Full disclosure: I wrote it, and open-sourced it just yesterday.

zackify · on July 20, 2017

2) is easy, but leads to two queries no matter what, because you have to do a whereIn

3) here's the code:

const EagerQL = (info, allowed) => { let fields = info.operation.selectionSet.selections[0].selectionSet.selections; return fields .map(field => field.name.value) .filter(field => allowed.includes(field)); };

// in a resolver user(_, args, context, info) { let eagers = EagerQL(info); console.log(eagers); // ['posts'] }

This will give you fields that were queried for so you can do a big join. Hope this helps someone :P

joesb · on July 21, 2017

trying to access fields like `info.operation.selectionSet.selections[0].selectionSet.selections` always feel wrong to me. I wish graphql library part comes with some utility function for these.

zackify · on July 21, 2017

Completely agree.... I don't understand why there isn't a more standard way of accessing the fields queried for so that you can do joins.

tylerbainbridge · on July 20, 2017

It does not-- we've been using graphql intensively with relationship-heavy graph data and join-monster seems handle this gracefully.

join-monster.readthedocs.io

throwaway2016a · on July 20, 2017

Wow! Thank you, I have been looking for something like this. I use mostly NoSQL which typically doesn't do joins (unless it is a graph database) but this looks really usefully for when I do have a SQL database to work with.

obi1kenobi · on July 20, 2017

If you use a graph database, something like the GraphQL compiler project might be useful: https://github.com/kensho-technologies/graphql-compiler

Primer on how to use it: https://blog.kensho.com/compiled-graphql-as-a-database-query...

I work at Kensho and we use it with queries that are 10+ levels of nesting with no problems.

johnymontana · on July 20, 2017

Interesting approach for using GraphQL with a graph database. There is also an integration for Neo4j [1] that works similarly (it translates GraphQL to Cypher), however it also exposes Cypher in GraphQL through a @cypher directive, which maps complex graph traversals/aggregations to a single GraphQL field.

[1] https://www.npmjs.com/package/neo4j-graphql-cli

ruslan_talpa · on July 20, 2017

it's a great solution as long as your queries are no deeper than 2 levels

brdd · on July 20, 2017

@tylerbainbridge and I work at Conduit (https://conduithq.com), where we are using GraphQL, join-monster, and PostgreSQL in production.

What you describe isn't a limitation at all-- some of our queries run 10 levels deep and work just fine. In fact, this method is so powerful that most of our queries start from the `user` object and just JOIN off of it.

jarym · on July 20, 2017

agreed. If people understand what Join Monster does then they'd see it is still suboptimal in a world where we have Postgres and LATERAL :-D

lstamour · on July 21, 2017

Haven’t read enough into lateral joins myself, but if set to pg dialect, join-monster uses lateral joins for pagination: http://join-monster.readthedocs.io/en/stable/dialects/

I’m sure they’d welcome further feedback or pull requests...

ruslan_talpa · on July 20, 2017

It's a lib working at a higher level so things like n+1 are left to be solved by the user since the implementation details depend very much on the underlying data storage.

If you use PostgreSQL however, i solve that problem :) https://subzero.cloud (GraphQL & REST api for your database) It's still in private beta but you can get a sense of how things are put together by looking a this OS project (just REST) https://github.com/subzerocloud/postgrest-starter-kit

theswan · on July 20, 2017

I think this is something that graphql implementations are still trying to figure out. At work (samsara.com) we use a time-based batch[0] for our queries to help with n+1 issues in our graphql server.

Separate branches of the the graphql query are handed to independent goroutines. When batchable rpcs (to the database, or other services) occur, we delay execution by ~1ms and wait for any other rpcs of the same type (and keep delaying for up to ~20ms). This works pretty well for our more expensive field resolvers. We've also thought about ways to examine and track the execution of the goroutines (similar to the facebook/dataloader approach), but the time-based batches have worked pretty well for us so far.

We also aggressively cache queries and rpcs so that the same data is not expensive to fetch across different branches of a query.

[0] You can see the batching implementation here: https://github.com/samsarahq/thunder/blob/master/batch/batch...

parasubvert · on July 21, 2017

N+1 problem?

Old timers that remember Enterprise JavaBeans 1.x entity beans are probably rolling in laughter currently. It had a very similar problem in 1999-2000 that doomed many systems to crappy performance and tarred EJB forever.

History never fails to repeat itself: beware subsystems that propose to wrap and abstract your database for a new paradigm "automatically". They're leaky and difficult and take a decade to get right. See: ActiveRecord, EJB followed by JPA, .NET's various DB frameworks, Django, etc.

mwilliamson · on July 20, 2017

I wrote a little proof of concept in JavaScript showing one way of dealing with the problem [1]. Essentially, as you go through each node in the request, build up a query (say, a SQL query) that represents the fetching of that node by joining it with the query of its parent. Then, stitch the result of running each query back onto the result of running the parent query. Conceptually, this means one resolution function call for each node in the request, rather than one resolution function call for each node node in the response.

[1] https://github.com/mwilliamson/node-graphjoiner

However, it's rather verbose with repeated boilerplate, so to make the approach work I think you'd want to put a higher-level layer on top -- that's what I've done with a Python implementation [2]. I think the general approach is the valuable part, rather than the specific libraries, and seems more in keeping with how I'd normally write queries than the usual DataLoader approach. In particular, I've found it tends to improve performance when fetching large data sets since you don't need to call a resolution function on every single field that gets returned. (DataLoader implementations batch queries together to avoid the n+1 problem, but they mean that you need to implement a batching layer, and probably make your resolution functions asynchronous, which is especially expensive for large data sets).

[2] https://github.com/healx/python-graphjoiner