Hi Everyone. The need to write this article was drawn from the challenges I faced while trying to replicate a slightly complicated SQL query in Knex.js in my Node.js application. Thanks to a couple of StackOverflow answers (https://stackoverflow.com/a/56715360/8168950, https://stackoverflow.com/a/9795768/8168950), I was able to figure this out. And so, to prevent others (maybe future me) from experiencing the same challenges, I would just give an overview of the solution below.
Let's say we have a fictional table of trips with several columns listed below -
id, user_id, driver_id, service_id, total_fare, pickup_address_id, destination_address_id, created_at, complete_at
We also have an address table and a trip_status table. We would be getting the address details of the trip's origin and destination from the address table. As for the trip_status table, we are assuming every trip will have different states such as live
and completed
. The columns on our trip_status table are listed below.
id, status, trip_id, created_at, updated_at, deleted_at
The address table has a full_address column among others
Let's say we want to build an endpoint that returns a list of trips with the latest status of each trip and the addresses attached, we would need to select a distinct on the trip_status table to get the latest status of a trip along with an inner join on the trip_status table and the address table (to get the address details).
Writing the query in raw SQL is shown below
select * from (select distinct on (status.trip_id) pickup.full_address as pickup_address, destination.full_address as destination_address, trips.id, status.status, trips.created_at, trips.user_id, trips.champion_id, trips.service_id, trips.billed_fare, trips.total_fare, trips.is_cash, trips.ordered_at, trips.complete_at, trips.auto_cancel_at, trips.updated_at, trips.deleted_at, trips.meta from go_service.trips as trips inner join address_service.addresses as pickup on pickup.id = trips.pickup_address_id inner join address_service.addresses AS destination on destination.id = trips.destination_address_id inner join go_service.trip_status as status on status.trip_id = trips.id where status.status in ('end_trip', 'start_trip', 'arrived_pickup', 'cancelled', 'auto_cancelled', 'unassigned') and trips.created_at >= '2019-11-01' and trips.created_at <= '2020-01-30' order by status.trip_id, status.created_at DESC
) t order by t.created_at desc offset 10 limit 20
To replicate this in knex.js, we can create a function that handles the logic.
async function getTrips(
startDate: Date | string,
endDate: Date | string,
status: string[],
limit: number,
offset: number
) {
let queryResult = await knex.raw(`select * from (select distinct on (status.trip_id) pickup.full_address as pickup_address, destination.full_address as destination_address, trips.id, status.status, trips.created_at, trips.user_id, trips.champion_id, trips.service_id, trips.billed_fare, trips.total_fare, trips.is_cash, trips.ordered_at, trips.complete_at, trips.auto_cancel_at, trips.updated_at, trips.deleted_at, trips.meta from go_service.trips as trips inner join address_service.addresses as pickup on pickup.id = trips.pickup_address_id inner join address_service.addresses AS destination on destination.id = trips.destination_address_id inner join go_service.trip_status as status on status.trip_id = trips.id where status.status in (${status.map(() => '?').join(',')}) and trips.created_at >= ? and trips.created_at <= ? order by status.trip_id, status.created_at DESC) t order by t.created_at desc offset ? limit ?`, [...status, startOfDay(startDate), endOfDay(endDate), offset, limit]
)
return queryResult.rows
}
To elaborate on this, the most challenging part of this query will be how the variables are passed.
For the status array, we cannot pass an array directly as a variable, and so, we have to loop over each item in the array and assign placeholders for each item.
(${status.map(() => '?').join(',')})
Also, since we would likely need to sort our trips by date, we would have to order by the date the trips where created. To be able to do this, we have to meet a requirement of select distinct on (status.trip_id)... which specifies that the first order by clause has to match status.trip_id
. We can simply do that by adding an order by clause within the select distinct on query. We would also need to order by status.created_at since we want just the latest status on each trip. Absence of this will return multiple rows for a single trip.
Finally, we can then order sort our trips by date created outside the select distinct on query by getting the result of the select distinct on as t
and then sorting on t.created_at
.
Cheers.
Top comments (1)
Well written Adams