Schedule - PGConf NYC 2024

A journey into postgresql logical replication: the next chapter

Date: 2024-10-02
Time: 15:00–15:50
Room: The Forum
Level: Intermediate
Feedback: Leave feedback

Toggl Track engineering team embarked on a journey to develop a Postgres-based near real-time OLAP database system to better serve our reporting needs. Toggl Track has been using a monolithic Postgres database since its beginnings and the goal was to progressively shift away our report queries from the transactional database creating a CDC on top of it, leveraging logical replication by using data-changing events, and applying whatever transformations are required to reach the desired - report-friendly - data model, OLAP.

After overcoming some difficulties in developing our own logical replication client, mostly due to incorrect assumptions on the usage – and track - of logical replication events coming out of Postgres, and after the successful production deployment of the first few use cases, the ETL pipeline supporting our OLAP infrastructure has been used as a cornerstone in our push towards an event-based architecture, becoming the base of some of our largest engineering projects, not all related with OLAP usages, from pre-computed authorization sessions to business intelligence functionalities, passing by user-facing event streams such as WebSockets or mobile notifications.

Our ETL pipeline is – mostly - composed of data-changing events sourced from our transactional Postgres databases through logical replication using our client.

This talk is a continuation of “A journey into postgresql logical replication” presented in PGConf.EU 2023 where we presented the intricacies of our solution and went deep(er) into the way that logical replication events are expected to be used, and the challenges that we faced while developing Toggl Track logical replication client, but also the needs and problems that created the right environment for us to embrace the challenge.

While briefly presenting the road that led us here - months of experimentation, debugging, and workarounds, leading to the conclusion that we were losing data due to the incorrect interpretation of LSN positions - this time we want to share the many quirks and kinks learned while using Postgres not only as a source of data – transactional data – but also to handle the OLAP side of things, from the transformation heavy lifting, feed by ETL, to strategies to handle the load while avoiding locking while keeping up with data changes in near real-time.

Slides

The following slides have been made available for this session:

Speaker

José Neves