-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagination implementation for real-life database #94
Comments
Recently @chandu0101 created a nice and relatively big example of a relay application which uses MongoDB and sangria (scala GraphQL implementation). If you don't mind looking at some scala code, then I would suggest to check out this function: It creates a relay Even though this example uses sangria-relay, conceptually it is very similar to reference implementation. |
From all the solutions I found on the internet (including this one). It seems it does not work if data is inserted in the database during the lifetime of a cursor. Am I wrong ? Is there a generic solution ? |
I thought about a solution to solve the data insertion:
I have a compliant implementation for Mongoose database I can share if anyone interested. My only concern Is how long should I keep the queries in my redis server. |
@mattecapu this package relay-mongodb-connection creates relay connection from mongodb cursors, it also uses |
Thank you @sibelius and @OlegIlyenko, your links were a good starting place for understanding what was going on in the creation of a ConnectionType response. My solution was to not use Essentially I encode information about the ordering field (i.e. the ID) in the cursor and then use that to query results with ID >= or <= than cursor. The My actual use case was a little more complicated because I'm fetching from multiple tables. Thus after retrieving the minimum required records I'm doing an additional round of sorting and cutting. I'd be happy to provide some code examples if someone needs them. |
@mattecapu I think u should provide some code examples to help people understand better how to implement a cursor |
Well the point is, if you implement a ConnectionType for an endpoint you just need to
Cursors can be any string you want. Normally Relay defaults to I used Then a request to my endpoint look like this endpoint(after: "sdflkjsdlfkjslkdf", first: 10) {
# stuff
} Basically, any request described by the specification is supported. Thus when I get the request I process it in the following way:
Using this "algorithm", only the needed data is fetched. While Notably, I return an object with the shape described above (and in the linked spec) and I don't use the EDIT: The above algorithm produces array with different orders depending on which of
Thanks to @Sytten for bringing this problem to my attention. EDIT2: The above ordering reflects pagination, not chronological order. Hence |
That's a really great write-up @mattecapu. I'm going to close this out now but that is exactly the kind of thing that would work well in the documentation (could be something in a code comment, in the |
@mattecapu It seems the algorithm does not work if there was an deletion in the database between two paginated queries and/or if we want to sort by anything else than Id's right ? |
@mattecapu what happens when the ids are not ordered? i.e they are UUIDs, |
Sorry @GuillaumeLeclerc I lost your question in a notification misunderstanding with GitHub. As laid out in the comment above, my algorithm as both the limitations you guys noticed, but we can easily generalize it to overcome them. To support dynamic data getting deleted or added inbetween queries, paginating with Eventually, we can provide a further generalization by not fixing an order-provider field but let it be dynamic, effectively allowing a lot of different orders on the data, which can come in handy. This is pretty simple to implement too, but gets complicated once you have to support dynamic data. @wincent I'll see what I can do! Thanks for the appreciation. |
I came across this issue and I wanted to share a Node.js library that I created a while ago. |
@mattecapu Great post! Just wanted to say it helped out a lot. We had some tweaking to do since we accept arguments other than the Relay args spec. |
Everything was working great with @mattecapu's solution, We have one query for events that returns data in a specific order. We have an events query like this: {
viewer {
events(first: 1, inCountry: Japan) {
edges {
node {
id
name
}
}
}
}
}
But when you introduce |
@Naoto-Ida if I understood correctly and your order is deterministic, |
Our ORDER statement in our SQL would include something like:
We compute it by base64decoding it, and splitting it into So in the end, due to there not being that many records and time constraints, we redis cached the all the event record. We fetch it when a query with the same arguments come in, then based off of the supplied cursor, would slice the total records and serve the ones before/after it. |
Now I see the problem.
Your data will be shattered into several of this atomic operations, which aggregated togheter will give you the most recent version of it. So for example if I want to retrieve attribute SELECT new_value FROM mytable WHERE object_id='34' ORDER BY timestamp DESC LIMIT 1 But now If I want to know which state my db was as a given time, I'll simply exclude all updates done after a specific timestamp_
Voilà, I can now run queries against any version of my DB. Speaking for your specific case, @Naoto-Ida, I think you could get away with a far less disruptive change: create a table |
@mattecapu why |
@luckydrq You add |
@luckydrq yeah it basically peeks at the next page to see if there is one. Re-reading my last post, it comes to me there's a less invasive way to support updates, just use an |
I implemented some helper functions (namely paginate) so that SQL support would be easy (ordering, filtering supported). I followed @mattecapu 's suggested approach. usage: import {
GraphQLObjectType,
} from 'graphql'
import {
connectionArgs,
} from 'graphql-relay'
import * as helpers from 'the-gist-linked-above' // not an actual npm lib yet :P
// helper
const connectionType = nodeType => connectionDefinitions({nodeType}).connectionType
const Query = new GraphQLObjectType({
name: 'Query',
fields: () => ({
// ... other queries here
things: {
type: connectionType(Thing),
args: connectionArgs,
resolve: (_, paginationArgs) => {
// you could get `orderBy` from args, but just hard-coded here for simplicity
return helpers.paginate(models.Thing, paginationArgs, {orderBy: [['name', 'ASC']]})
}
},
})
}) It's currently coupled with https://gist.github.com/pcattori/2bb645d587e45c9fdbcabf5cef7a7106 |
We use this in production https://github.com/entria/graphql-mongoose-loader it solves pagination and dataloader for mongo, using mongoose. we have the same concept for other datasources, as REST api, SQL (oracle and postgres), very easy to extend to any datasource. |
@mattecapu Thanks for posting your solution here it was very helpful. I implemented the same logic in our project on top of our GraphQL and Relay implementations in PHP. |
@mattecapu I read your guide how to handle GraphQL connections with SQL. It's pretty much what I came up with when I was analyzing it myself (+ some details like how to handle hasNextPage). The problem is that the SQL query gets really complicated really fast when I add an optional sorting argument to the GraphQL connection - especially if the sorting can be a combination of fields. Do you have any tips how to handle that and how to do it efficiently? |
Hi @enumag, what do you exactly mean by 'SQl complexity'? The length in chars? Execution time? Other metrics? |
Primarily execution time and efficient usage of indexes on that table. Secondary is the SQL query length and number of conditions in WHERE clause but I'm already using an SQL builder so I can deal with that myself. |
Thank you @mattecapu. I had the same "algorithm" in my head, but I forgot to reverse the result when using last / before.. which makes senses from the UI point of view. If someone is interested, here is my current implementation with Node, Apollo, and Knex on top of MySQL. Constructive feedbacks are more than welcome! I took some shortcuts that I could improve in the future:
const userSchema = gql`
type User implements Node {
id: ID!
email: String!
...
}
extend type Query {
...
usersPaginated(input: UserPaginatedInput!): UserPaginatedConnection
}
input UserPaginatedInput {
first: Int
after: ID
last: Int
before: ID
}
type UserPaginatedConnection {
pageInfo: PageInfo
edges: [UserPaginatedEdge]
}
type PageInfo {
hasNextPage: Boolean
endCursor: ID
hasPreviousPage: Boolean
startCursor: ID
}
type UserPaginatedEdge {
cursor: ID
node: User
}
...
`
const userResolvers = {
...
usersPaginated: async (parent, args, { db }, info) => {
try {
const { first, after, last, before } = args.input
let [hasNextPage, endCursor, hasPreviousPage, startCursor] = [false, null, false, null]
let res = []
if (!!first && !!last)
throw new ApolloError('Ambiguous query: first and last should not be used together')
if (!first && !last)
throw new ApolloError('Missing first or last argument')
const query = db.from('gql_user')
if (!!first) {
if (!!after)
query.where('id', '>', after)
query.limit(first + 1)
.orderBy('id', 'asc')
res = await query
if (res.length > first) {
hasNextPage = true
res = res.slice(0, first)
}
}
else if (!!last) {
if (!!before)
query.where('id', '<', before)
query.limit(last + 1)
.orderBy('id', 'desc')
res = await query
if (res.length > last) {
hasPreviousPage = true
res = res.slice(0, last)
}
res.reverse()
}
startCursor = res[0].id
endCursor = res[res.length - 1].id
const pageInfo = {
hasNextPage,
endCursor,
hasPreviousPage,
startCursor
}
let edges = []
res.map(row => {
edges.push({
cursor: row.id,
node: row
})
})
return {
pageInfo,
edges
}
} catch (err) {
throw new ApolloError(err.sqlMessage, err.code, err)
}
}
...
} |
Sorry for spamming, but I'd like to thank @mattecapu for his explanation of the algorithm in one of his prior posts. I was mislead by these cursors based on static arrays and noticed that there must be a different way for pagination of dynamic data such as data from a database where data gets deleted in between pages / accessing different pages during pagination causing anomalies. The detailed description really helped to come up with a correct solution in my project. |
@mattecapu unless I am mistaken steps 4 and 5 are using the wrong ORDER BY (when using first, it should be ASC and vice-versa). Can you fix so people that find this thread are not more confused than necessary 😅 |
Hi @Sytten, re-reading the specification now it seems that ordering is not dependent on the field, but should be decided by the endpoint. |
@mattecapu Good that you found that too, but that is not really my point :) |
@Sytten I see. It's quite misleading, indeed. I was paginating in reverse chronological order (as most feeds do nowadays) so 'first' to me meant 'first to be displayed'. This explains my choice of ordering, I will edit to clarify. |
Hi @mattecapu sorry for bringing up an old thread, but I need some help. I am trying to implement relay pagination with postgres and I am facing the issue where I want to sort on multiple columns. Let us say I have products table. I have an ID column, which is globally unique and auto increments every time something is added. Sorting based on this ID is very easy using the algorithm mentioned above. But what if I want to sort on multiple columns? For example, I want to sort on the Name column (to get products in alphabetical order), and then start executing pagination queries. The ID cursor no longer makes sense, because we are sorting on a different field now. So do I make a name cursor and use that? What if I want to sort based on three or more columns? How would I create a cursor then? I have created a stock overflow question as well: https://stackoverflow.com/questions/72011183/how-to-implement-graphql-relay-style-cursor-based-pagination-in-postgres-with-s , I don't need answers specific to python, but a recommended general approach is appreciated. |
This is what I do, using TypeORM and type-graphql. @Query(returns => UserConnection)
async getUsers(@Args() conn: UserConnectionArgs): Promise<UserConnection> {
const result = new UserConnection()
let userQuery = this.userRepository.createQueryBuilder()
if (conn.search) {
userQuery = userQuery.where('username LIKE :search', { search: `%${conn.search}%` })
}
if (conn.after) {
userQuery = userQuery.where('id > :id', { id: conn.decodedAfter })
}
if (conn.before) {
userQuery = userQuery.where('id < :id', { id: conn.decodedBefore })
}
if (!conn.first && !conn.last) conn.first = 10
if (conn.first || !conn.last) {
userQuery = userQuery.orderBy('id', 'ASC')
conn.first ||= 10
const users = await userQuery.take(conn.first + 1).getMany()
result.edges = users.slice(0, conn.first).map(u => ({ node: u, cursor: btoa(u.id.toString()) }))
result.pageInfo.hasNextPage = users.length > conn.first
result.pageInfo.hasPreviousPage = false
result.pageInfo.startCursor = result.edges[0]?.cursor || null
result.pageInfo.endCursor = result.edges[result.edges.length - 1]?.cursor || null
}
if (conn.last) {
userQuery = userQuery.orderBy('id', 'DESC')
let users = await userQuery.take(conn.last + 1).getMany()
users = users.reverse()
result.edges = users.slice(0, conn.last).map(u => ({ node: u, cursor: btoa(u.id.toString()) }))
result.pageInfo.hasNextPage = false
result.pageInfo.hasPreviousPage = users.length > conn.last
result.pageInfo.startCursor = result.edges[0]?.cursor || null
result.pageInfo.endCursor = result.edges[result.edges.length - 1]?.cursor || null
}
return result
} |
In all the examples I can find, paginated queries are made against a mockup database which is just a JS array, and thus it is simply passed through
connectionFromArray
to return the correct paginated result (like the Star Wars example mentioned in the README).For a real-life database, query all records and then pass them to
connectionFromPromisedArray
doesn't seem to be a good solution, as it will easily break your perfomance/crash your server as soon as you're doing anything at (even modest) scaleSo what solutions should you use to avoid insane database fetching?
(I'm using a SQL database but I think a good solution to this problem applies to pretty much every not-a-js-array dbms)
The text was updated successfully, but these errors were encountered: