DEV Community

Cover image for How Data Science helps you decide to turn while driving - BTS of Turn Identification
Khalid🥑💻 for Ola Campus Pune

Posted on • Updated on

How Data Science helps you decide to turn while driving - BTS of Turn Identification

We’ve adopted maps and navigation into our lives so effortlessly that, more often than not, we don’t have to think about the work that goes into this behind the scenes.

When you’re on the road, you’re more concerned about getting to your location in one piece (assuming most of our readers are stuck in traffic in tier-1 cities just like us tears). A well-designed navigation system ensures you don’t have to think about anything except the destination. A key contributing factor is Turn Identification. Let's find out what goes on behind the scenes in Turn identification.

Our work in the Indian mobility space has given us a vast databank of anonymized telemetry data from our 2/3/4 wheelers. We receive anonymized data in the form of location pings at regular intervals containing geographical coordinates; this enhances our routing system. This data is raw and processed by our fantastic data science team.

This blog post will share a workflow of detecting or identifying turns through this telemetry data. We will cover the following in the blog post:

  • Why is turn identification required?
  • Some terminologies
  • Approach for identifying turn
  • Demonstration
  • Conclusion

Why is turn identification required?

With rapid urbanization and the facelift our cities are getting, we must update our routing system to identify new roads, turn restrictions and other required details.

Identifying these turns helps us improve our algorithm concerning the current and improving road architecture and furniture. To understand the workflow, we must be familiar with some basic terminology.

Terminologies

Here are some terminologies you must be familiar with before getting into the workflow

  1. Way ID - Each road and route have a unique way ID, we use the OpenStreetMap(OSM data)
  2. Device ID - Each device/ cab ping has a unique ID - With respect to the open source data we are using to demonstrate
  3. Bearing angle - It is the angle of way ID corresponding to North in a clockwise direction.
  4. Turn Angle - Change in bearing angle while traversing from one way id to another.
  5. Turn direction - direction of turn (right/left), dependent on the change in bearing angle.
  6. Turn Class - Straight/ right/left/ u-turn, depending on the magnitude of the turn angle.

Let us go ahead with the approach!

The Mathematical Approach

On a high level, the approach considers ride count between the way ID pair as a factor to understand restrictions. Assessing the angle between the two-way I'd help us understand which turn was taken and which was merely a deviation.

For any turn, we have two angles as described below:

Diagram explaining angles

We are following a general mathematical approach to detect the turn based on the angle of deviation from the true north.

Going mathematically,

  1. If | angle 2 - angle 1| < 180 then;
  • If angle 2 > angle 1, then it would be a right turn
  • If not, it would be a left turn
  1. If | angle 2 - angle 1| > 180;
  • If angle 2 > angle 1 then it would be a left turn
  • Else, it would be a right turn

A detailed illustration has been given in the flowchart below.

Illustration of the maths approach

The mathematical approach seems very simple, but there might be cases where something else is needed. A lot of filtering and managing of the data is required, and here is where the data science team takes over. Here is the workflow for identifying the turn.

Steps illustration

Let us understand how each step works with a code snippet inclined. For demonstration purposes, we will use an open-source Chicago Taxi mobility dataset.

Imports

These are the necessary imports required for processing the data.


import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
from shapely.geometry import box
import contextily as ctx

Enter fullscreen mode Exit fullscreen mode

Input data (Cab ping and OSM Shp for Chicago)


df_gps = pd.read_csv('./data_uic.csv',header=None)
osm_chicago = gpd.read_file('./chicago_osm.shp')

Enter fullscreen mode Exit fullscreen mode

Filtering Data

We only require the following data: -

  1. Device_id
  2. Timestamp
  3. Latitude
  4. Longitude
  5. Speed
  6. Bearing

To be more specific, we will only take out the above data for a particular device Id as an indication of a single journey.


df_gps.columns = ['device_id', 'timestamp','latitude','longitude','speed','bearing']
df_ride = df_gps[df_gps.device_id == 478]
gdf_ride = gpd.GeoDataFrame(df_ride, geometry=gpd.points_from_xy(df_ride.longitude,df_ride.latitude),crs=4326)
gdf_ride.timestamp = pd.to_datetime(gdf_ride.timestamp)

Enter fullscreen mode Exit fullscreen mode

gdf_ride

Enter fullscreen mode Exit fullscreen mode

Output gdf_ride

Input data visualization

Input data can also be visualized.


plt.figure(figsize=(10,5))
ax = plt.axes()
gdf_ride.plot(ax=ax, color='b')
ctx.add_basemap(ax,crs=4326, source=ctx.providers.OpenStreetMap.Mapnik)
c = 0

for x, y, label in zip(gdf_ride.geometry.x, gdf_ride.geometry.y, gdf_ride.timestamp):
    if c % 10 == 0: 
        ax.annotate(label, xy=(x, y), xytext=(3, 3), textcoords="offset points",rotation=45, size=8)
    c+=1

Enter fullscreen mode Exit fullscreen mode

Illustration of the output

Similarly, we can also visualize the difference/change in the bearing angle.


plt.figure(figsize=(20,2))
plt.plot(gdf_ride.timestamp[1:],np.diff(gdf_ride.bearing))
plt.show()


Enter fullscreen mode Exit fullscreen mode

visualize the difference/change in the bearing angle.

Now the mathematical logic has to be applied in the form a function:



def angledist(a1, a2):
   print((a1-a2),(a1-a2) % 360,(a2-a1) % 360,a2-a1)
    return(min(abs(a1-a2),abs((a1-a2) % 360),abs((a2-a1) % 360),abs(a2-a1)))

def turn_dir(a1,a2):
    if abs(a2-a1) < 180 :
        if a2 > a1:
            turn_side = 'right'
        else:
            turn_side = 'left'
    else:
        if a2 > a1:
            turn_side = 'left'
        else:
            turn_side = 'right'
    return turn_side

def turn_type(point_gdf,points_final):
    turn_magnitude = []
    turn_class = []
    k = point_gdf.bearing.values
    for  f in points_final['index'].values:
        i = point_gdf.index.values.tolist().index(f)
        try:
         turn_direction.append(turn_dir(k[i],k[i+1]))
            c = [turn_dir(k[i],k[i+1]),turn_dir(k[i],k[i+2]),turn_dir(k[i],k[i+3])]
            if c[0] != c[1]:
                turn = abs(np.diff([angledist(k[i], k[i+1]),angledist(k[i], k[i+2])])[0]/2)
            else:
                turn = np.mean([angledist(k[i], k[i+1]),angledist(k[i], k[i+2])])               
            turn_magnitude.append(turn)
            turn_series = np.unique(np.array(c), return_counts=True)
            if turn <= 15:
                tr_class = 'straight'
            elif turn > 15 or turn < 165:
                tr_class = turn_series[0][list(turn_series[1]).index(turn_series[1].max())] 
            else:
                tr_class = 'U-turn'
            turn_class.append(tr_class)
        except IndexError:
            pass
            print(gdf.index[i], '=',angledist(k[i], k[i+1]),angledist(k[i], k[i+2]),angledist(k[i], k[i+3]),turn_dir(k[i],k[i+1]),np.mean([angledist(k[i], k[i+1]),angledist(k[i], k[i+2]),angledist(k[i+3], k[i])]))
    return turn_magnitude, turn_class
Enter fullscreen mode Exit fullscreen mode

Here is the workflow of the turn classification:

workflow of the turn classification

Now that the function is ready, we have to merge all the different datasets; it will be done by making a spatial joint between the OSM and the ping data.


osm_ping_count = osm_chicago.merge(join_osm_ping.groupby('osm_id').count().geometry.rename('ping_count').reset_index())
osm_ping_count1 = osm_ping_count.merge(join_osm_ping.groupby('osm_id').max().timestamp.rename('leaving_time').reset_index())
osm_ping_count2 = osm_ping_count1.merge(join_osm_ping.groupby('osm_id').min().timestamp.rename('entry_time').reset_index())
final_way_ids = osm_ping_count2[osm_ping_count2.ping_count>1]

Enter fullscreen mode Exit fullscreen mode

We might have multiple duplicate data; let’s remove the unnecessary duplicates;


final_way_ids = final_way_ids.sort_values('ping_count').drop_duplicates(subset=['leaving_time'],keep='last')
## Remove duplicates
final_way_ids = final_way_ids.sort_values('ping_count').drop_duplicates(subset=['entry_time'],keep='last')

Enter fullscreen mode Exit fullscreen mode

points_final = join_ping_osm.reset_index().merge(final_way_ids.osm_id).sort_values('timestamp').drop_duplicates(subset=['osm_id'], keep='last')
to_osm_ids = points_final.osm_id.values[1:].tolist()

Enter fullscreen mode Exit fullscreen mode
points_final = points_final[:-1]
turn_angle, turn_class = turn_type(gdf_ride,points_final)

Enter fullscreen mode Exit fullscreen mode

The next step is to get the last ping for each way Id and obtain the turn angle and classification.


points_final['to_osm_id'] = to_osm_ids
points_final['turn_class'] = turn_class
points_final['turn_angle'] = turn_angle
Enter fullscreen mode Exit fullscreen mode

We can even plot and visualize the output.


plt.figure(figsize=(20,20))
ax = plt.axes()

# osm_ping_count.plot(ax =ax,column= 'way_id')
final_way_ids.plot(column= 'osm_id',ax =ax,cmap='jet')
# gdf_ride_proj.plot(ax =ax,color='c')
points_final.plot(ax =ax,linewidth=5,column= 'osm_id')
# for x, y, label in zip(gdf_proj.geometry.x, gdf_proj.geometry.y, gdf_proj.bearing):
#     ax.annotate(label, xy=(x, y), xytext=(3, 3), textcoords="offset points",rotation=0, size=8)
for x, y, label in zip(points_final.geometry.x, points_final.geometry.y, points_final.turn_class):
    ax.annotate(label, xy=(x, y), xytext=(3, 3), textcoords="offset points",rotation=45, size=8)
# for x, y, label in zip(final_way_ids.geometry.centroid.x, final_way_ids.geometry.centroid.y, final_way_ids.osm_id):
#     ax.annotate(label, xy=(x, y), xytext=(3, 3), textcoords="offset points",rotation=45, size=8)
ctx.add_basemap(ax,crs=26916, source=ctx.providers.OpenStreetMap.Mapnik)

Enter fullscreen mode Exit fullscreen mode

visualizing the output

Time to get the result of the turn classification;

result = points_final[['longitude', 'latitude','osm_id','to_osm_id', 'turn_class', 'turn_angle']]
Enter fullscreen mode Exit fullscreen mode
result.columns = ['longitude', 'latitude','from_way_id','to_way_id', 'turn_class', 'turn_angle']
Enter fullscreen mode Exit fullscreen mode

Result
Enter fullscreen mode Exit fullscreen mode

Visualizaion of result

Note that for scaling things up and using large datasets, we will need the support of Pyspark or Apache Sedona and similar other technologies.

Conclusion

We have relied on maps and navigation systems for almost everything. With the increase in technical advancements, all these techs have become an integral part of our lifestyle. These small features, like turn restriction and identification, require a turn-around of efforts and mathematical calculations. Math concepts and visualizations like these apply to all location and data analytics.

Data scientists work on enormous datasets to solve both simple and complex problems. Our dedicated team of engineers, data scientists and GIS professionals perform tasks that impact the end user silently behind the scenes.

We at OLA Campus Pune work with many exciting technologies and solve real-world problems like identification & restriction of a turn. We are working on building next-gen solutions to the existing mobility problems, aiming to make a consumer's journey a lot easier. A Big thanks to Aazad Patle, our expert in data science, for helping us throughout this article.We look forward to sharing more of our workflows in the coming days.

If you have some feedback or have found this blog post of your interest, do Connect with Us!

Important Links:

  1. A Peak into data science at OLA
  2. PySpark Documentation
  3. Ola campus Pune

Top comments (0)

Timeless DEV post...

Git Concepts I Wish I Knew Years Ago

The most used technology by developers is not Javascript.

It's not Python or HTML.

It hardly even gets mentioned in interviews or listed as a pre-requisite for jobs.

I'm talking about Git and version control of course.

One does not simply learn git