DEV Community

Timothy Spann.   🇺🇦
Timothy Spann. 🇺🇦

Posted on • Originally published at datainmotion.dev on

Real-Time Irish Transit Analytics


Real-Time Irish Transit Analytics

Apache NiFi, Postgresql, GenAI, Apache Kafka, Apache Flink, JSON, GTFS

Let’s hop on a bus in Ireland!

We need to load static (rarely changing lookup data). We can do this with NiFi very easily. We build and insert these into new Postgresql tables.

See me here:

ChatGPT Authored Introduction:

Unlocking the Future of Transportation: Real-Time Irish Transit Analytics

In the bustling landscape of modern transportation, the ability to harness real-time data is not just a competitive advantage; it’s a necessity. In Ireland, where efficient transit systems are the lifeblood of daily commutes and city connectivity, the fusion of cutting-edge technologies is revolutionizing how we understand and optimize public transportation. This article delves into the world of Real-Time Irish Transit Analytics, where Apache NiFi, PostgreSQL, GenAI, Apache Kafka, Apache Flink, JSON, and GTFS converge to create a dynamic and responsive ecosystem.

Every day, thousands of passengers rely on Ireland’s public transit systems to navigate cities, reach work, or simply explore the beauty of the countryside. Yet, behind the scenes of this seemingly seamless operation lies a complex network of data streams, from vehicle locations to passenger counts, schedules to service updates. Here, Apache NiFi emerges as a pivotal tool, seamlessly orchestrating the flow of data from various sources into a unified pipeline.

PostgreSQL steps in as the reliable database backbone, providing a robust foundation for storing and querying vast amounts of transit data. With the power of GenAI, machine learning algorithms sift through this data trove, uncovering valuable insights into passenger behaviors, traffic patterns, and optimal routes.

But data is only as valuable as its timeliness, and this is where Apache Kafka and Apache Flink shine. Kafka acts as the real-time messaging hub, ensuring that updates from buses, trains, and stations are instantly propagated through the system. Flink’s stream processing capabilities then come into play, analyzing incoming data on the fly to generate actionable intelligence.

In the realm of data interchange, JSON (JavaScript Object Notation) emerges as the lingua franca, facilitating seamless communication between different components of the analytics ecosystem. And anchoring it all is the General Transit Feed Specification (GTFS), a standardized format for public transit schedules and geographic information, ensuring interoperability and accuracy across the board.

Join us on a journey through the intricacies of Real-Time Irish Transit Analytics, where these technologies converge to enhance efficiency, improve passenger experiences, and pave the way for the future of smart transportation.

An important source of data is the static GTFS lookup tables provided a zip file of CSV. We can download and parse this automagically in NiFi. No need to know and precreate tables. NiFi will determine the fields for you.

https://www.transportforireland.ie/transitData/Data/GTFS_Realtime.zip

GTFS Static Data Load

Skip shapes.txt as we aren’t loading those

Set a Default Primary Key

Setting All the Correct Primary Keys for all the Static Files/Tables

Split Up Tables into 1,000 Row Chunks to Make it Easier for Postgresql

We converted CSV to JSON and split up in 1 step

Loaded Results

Update the SQL Automagically

we do not manually set field names, no SQL injection here

Send this SQL to the Database

A list of Ireland Lookup Trips loaded from trips.txt

Let’s parse the real time transit information for Ireland.

GTFS Real-Time

Vehicle Positions is the primary API to get where the buses are.

API REST TEST

APIs: Details
Discover APIs, learn how to use them, try them out interactively, and sign up to acquire keys.developer.nationaltransport.ie

GET https://api.nationaltransport.ie/gtfsr/v2/gtfsr?format=json HTTP/1.1

Cache-Control: no-cache

x-api-key: dddddd

As opposed to most transit systems we have seen in GTFS and GTFS-R feeds they don’t have three types, just the two. They are missing alerts.

[Trip Updates, Vehicle Positions]

APIs: Details
Discover APIs, learn how to use them, try them out interactively, and sign up to acquire keys.developer.nationaltransport.ie

APIs: Details
Discover APIs, learn how to use them, try them out interactively, and sign up to acquire keys.developer.nationaltransport.ie

The GTFS-R API contains real-time updates for services provided by Dublin Bus, Bus Éireann, and Go-Ahead Ireland.

You have to sign up and subscribe to the API to use this.

x-api-key is the header for our private key

Example Vehicle Position as JSON

[ {
"recordid" : "V56",
"route_id" : "3924_62692",
"directionid" : "0",
"latitude" : "53.3537788",
"tripid" : "3924_16321",
"starttime" : "22:50:00",
"vehicleid" : "274",
"startdate" : "20240322",
"uuid" : "8a50c084-0aea-496e-b4c3-dbed373e812e",
"longitude" : "-6.40118694",
"timestamp" : "1711150967",
"ts" : "1711167213555"
} ]

Vehicle Position Slack Message

Irish Transit Tracking
Direction ${directionid}
Request ${invokehttp.request.url} ${invokehttp.status.message} ${invokehttp.tx.id}
Lat/Long ${latitude}/${longitude}
Vehicle ${vehicleid}
Route ${route_id}
Scheduled? ${scheduled}
Start Date/Time/TS ${startdate} / ${starttime} / ${timestamp}
IDs ${uuid} ${recordid} TripID ${tripid}
Scheduled: ${scheduled}

Trip Updates

Example Trip Update as JSON

{
"triptimestamp" : "1711415067",
"stopsequence" : "10",
"schedulerelationship" : "SCHEDULED",
"tripstarttime" : "21:30:00",
"stopid" : "8530B1520901",
"departuredelay" : "-104",
"tripid" : "3950_45558",
"tripschedulerelationship" : "SCHEDULED",
"tripstartdate" : "20240325",
"uuid" : "46595e37-4fdd-48db-8431-216bcabe4dd7",
"departuretime" : "",
"tripdirectionid" : "0",
"arrivaltime" : "",
"arrivaldelay" : "-104",
"triprouteid" : "3950_62756",
"ts" : "1711476673867",
"route_long_name" : "Dublin - Airport - Cavan - Donegal",
"stop_name" : "Topaz Belleek"
}

Trip Update Slack Message

Irish Transit Tracking Trip Updates
Request ${invokehttp.request.url} ${invokehttp.status.message} ${invokehttp.tx.id}
IDs ${uuid}
Arrival Delay / Time: ${arrivaldelay} / ${arrivaltime}
Departure Delay / Time: ${departuredelay} / ${departuretime}
Schedule: ${schedulerelationship} ${tripschedulerelationship}
Stop ID/Sequence: ${stopid} / ${stopsequence}
Trip Direction: ${tripdirectionid} ${tripid}
Trip Route: ${triprouteid}
Trip Start Date / Time / TS: ${tripstartdate} / ${tripstarttime} / ${triptimestamp}

Create Table in Flink

Query Kafka Topic — Flink SQL Table in SSB

Installing CSP Community Edition
You need to access the Downloads Page of Cloudera Stream Processing (CSP) to download the Community Edition version of…docs.cloudera.com

Send Messages

Lookups From Postgresql Table

Finally Send Messages to Slack

NATIONAL ROADS WEATHER STATION

National Roads Weather Station Data - Weather Data - metadata - data.gov.ie
Real-time data from TII's national network of 80+ weather stations. Includes air temperature, precipitation, wind speed…data.gov.ie

PUBLIC TRANSPORT DATA

GTFS | Transport for Ireland
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user…
www.transportforireland.ie

LOOKUP DATA FROM GTFS

Reference - General Transit Feed Specification
This document defines the format and structure of the files that comprise a GTFS dataset. The key words "MUST", "MUST…gtfs.org

stop_idUnique ID

Reference - General Transit Feed Specification
This document defines the format and structure of the files that comprise a GTFS dataset. The key words "MUST", "MUST…gtfs.org

Primary key (trip_id, stop_sequence)

Reference - General Transit Feed Specification
This document defines the format and structure of the files that comprise a GTFS dataset. The key words "MUST", "MUST…gtfs.org

DUBLIN BIKES

data.smartdublin.ie
Edit descriptiondata.smartdublin.ie

RAILROAD

IRISH STATIONS

{"StationDesc":"Millstreet","StationAlias":null,
"StationLatitude":52.0776,"StationLongitude":-9.06973,
"StationCode":"MLSRT","StationId":24,"ts":"1711496919762",
"uuid":"f6e71a76-41cc-4a8e-8795-323c3b43d62f"}

IRISH TRAIN RECORD

{
"TrainStatus":"R","TrainLatitude":53.4169,
"TrainLongitude":-6.1512,"TrainCode":"P617",
"TrainDate":"27 Mar 2024",
"PublicMessage":"P617\\n16:02 - Drogheda to Dublin Pearse (1 mins late)\\nDeparted Portmarnock next stop Dublin Connolly",
"Direction":"Southbound","ts":"1711557932947",
"uuid":"b485cefb-67e8-482d-86ba-1ca43e0b523a"
}

IRISH STATION RECORD

{
"StationDesc":"Midleton",
"StationAlias":null,
"StationLatitude":51.9212,
"StationLongitude":-8.17579,
"StationCode":"MDLTN",
"StationId":68,
"ts":"1711558009615",
"uuid":"1f5ae394-4726-4f3e-8e53-7f50f95ae05e"
}

SOURCE CODE

GitHub - tspannhw/FLaNK-IrelandTransit: Transit in Ireland
Transit in Ireland. Contribute to tspannhw/FLaNK-IrelandTransit development by creating an account on GitHub.github.com

FLINK SQL KAFKA TABLE

CREATE TABLE `ssb`.`Meetups`.`irelandvehicle` (
`recordid` VARCHAR(2147483647),
`route_id` VARCHAR(2147483647),
`directionid` VARCHAR(2147483647),
`latitude` VARCHAR(2147483647),
`tripid` VARCHAR(2147483647),
`starttime` VARCHAR(2147483647),
`vehicleid` VARCHAR(2147483647),
`startdate` VARCHAR(2147483647),
`uuid` VARCHAR(2147483647),
`longitude` VARCHAR(2147483647),
`timestamp` VARCHAR(2147483647),
`ts` VARCHAR(2147483647),
`route_long_name` VARCHAR(2147483647),
`trip_short_name` VARCHAR(2147483647),
`trip_headsign` VARCHAR(2147483647),
`eventTimeStamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp',
WATERMARK FOR `eventTimeStamp` AS `eventTimeStamp` - INTERVAL '3' SECOND
) WITH (
'scan.startup.mode' = 'group-offsets',
'deserialization.failure.policy' = 'ignore_and_log',
'properties.request.timeout.ms' = '120000',
'properties.auto.offset.reset' = 'earliest',
'format' = 'json',
'properties.bootstrap.servers' = 'kafka:9092',
'connector' = 'kafka',
'properties.transaction.timeout.ms' = '900000',
'topic' = 'irelandvehicle',
'properties.group.id' = 'irelandconsumersbb1'
)

RESOURCES

Cloudera Recognized as a Great Place to Work in Ireland and Costa Rica - Cloudera Blog
We're excited to announce that Cloudera has been named the Best Medium Workplace in Ireland™, one of the Best…blog.cloudera.com

XML data processing- using Apache NiFi
Intrmedium.com

Parsing XML Logs With Nifi - Part 1 of 3
I have a plan to write a 3 part "intro" series as to how to handle your XML files.
www.linkedin.com

Apache NiFi for Dummies
Get actionable tips and insights about Apache NiFi, an open-source tool with a drag-and-drop interface for building…
www.cloudera.com

How to extract values from XML data in NiFi -
This recipe helps you extract values from XML data in NiFi
www.projectpro.io

NiFi Tips
I created an install script (linux machines) that will install a local secure version of NiFi. It will also generate…
www.silvercloudcomputing.com

GitHub - tspannhw/EverythingApacheNiFi: EverythingApacheNiFi
EverythingApacheNiFi. Contribute to tspannhw/EverythingApacheNiFi development by creating an account on GitHub.github.com

Jolt quick reference for Nifi Jolt Processors
OBJECTIVE: Provide a quick-start guide for using the Jolt language within a NiFi JoltTransform (JoltTransformJSON or…community.cloudera.com

Reference - General Transit Feed Specification
This document defines the format and structure of the files that comprise a GTFS dataset. The key words "MUST", "MUST…gtfs.org

IF statement for NiFi attribute result - NiFi
Hello all Within NiFi, updateAttribute processor I am trying to change an attribute called 'hive_database' based on the…community.cloudera.com

UpdateAttribute
Updates the Attributes for a FlowFile by using the Attribute Expression Language and/or deletes the attributes based on…nifi.apache.org

Dealing with conditionals in Apache NiFi
When it comes to creating resilient data pipelines, one of the tools that comes to mind is Apache NiFi. At the heart of…medium.com

Exploring NiFi- UpdateAttribute processor
Imedium.com

Building a Data Ingestion System Using Apache NiFi
Collecting data from SQL and NoSQL systems and building a data ingestion pipeline can be a complex process, but it can…pratikbarjatya.medium.com

ETL With Apache Nifi | GridDB | Docs
GridDB Documentationdocs.griddb.net

GitHub - MobilityData/gtfs-validator: Canonical GTFS Validator project for schedule (static) files.
Canonical GTFS Validator project for schedule (static) files. - MobilityData/gtfs-validatorgithub.com

GTFS Realtime Overview | Realtime Transit | Google for Developers
Format for exchanging realtime public transit information.developers.google.com

GitHub - MobilityData/awesome-transit: Community list of transit APIs, apps, datasets, research…
Community list of transit APIs, apps, datasets, research, and software 🚌🌟🚋🌟🚂 …github.com

NYC Traffic!?!??! Are You Kidding Me?
Apache NiFi, Python, Traffic, JSON, Web Camera, REST, XML, RSS, JSONmedium.com

Iteration 2: Building a System to Consume All the (Unsecured) Real-Time Transit Data in the World
This is the remix.medium.com

Never Get Lost in the Stream
NiFi-Kafka-Flink for getting to work. Can’t we just work remote?medium.com

FLaNK for Halifax Canada Transit
Event Streaming in Canada with NiFi, Kafka, Flink, PostgreSQLmedium.com

Transit in Sao Paulo, Brasil — FLaNK Style
Streaming with NiFi, Kafka, Flinkmedium.com

Iteration 1: Building a System to Consume All the Real-Time Transit Data in the World At Once
Source Code:
https://github.com/tspannhw/FLaNK-EveryTransitSystemmedium.com

transit/gtfs-realtime/spec/en at master · google/transit
Contribute to google/transit development by creating an account on GitHub.github.com

transit/gtfs-realtime/spec/en/vehicle-positions.md at master · google/transit
Contribute to google/transit development by creating an account on GitHub.github.com

Reference - General Transit Feed Specification
A GTFS Realtime feed lets transit agencies provide consumers with realtime information about disruptions to their…gtfs.org

Live departures
View live departures. Search for and select a stop to view available departure times, with map and satellite views…
www.transportforireland.ie

GitHub - graphhopper/graphhopper: Open source routing engine for OpenStreetMap. Use it as Java…
Open source routing engine for OpenStreetMap. Use it as Java library or standalone web server. …github.com

graphhopper/README.md at master · graphhopper/graphhopper
Open source routing engine for OpenStreetMap. Use it as Java library or standalone web server. - graphhopper/README.md…github.com

OneBusAway Developer Documentation
OneBusAway is a suite of open source transit information tools that enable transit agencies to provide real-time…developer.onebusaway.org

TheTransitClock.github.io
TheTransitClockthetransitclock.github.io

GitHub - TheTransitClock/transitclockDocker: Docker installer for TheTransitClock
Docker installer for TheTransitClock. Contribute to TheTransitClock/transitclockDocker development by creating an…github.com

GitHub - TheTransitClock/transitime: TheTransitClock real-time transit information system
TheTransitClock real-time transit information system - TheTransitClock/transitimegithub.com

Top comments (0)