DEV Community

Filip Haftek
Filip Haftek

Posted on

Machine Learning and Elasticsearch empowering great marketplaces

Introduction

Speed is the key factor of every online business in a modern world. Buying and selling cars on classified marketplaces also follows this trend. On Publi24, a general classifieds portal in Romania, part of Russmedia Equity Parters group, we are providing our customers a used car marketplace. We focus on the quality of posted listings and enable fast and precise search options to help our users find their dream car. Because the number of posted ads is constantly increasing, our product team conducted a research on how to improve customer satisfaction by making the prices of the car listings more accurate. Our goal is for the sellers and buyers of Publi24 to make great deals on both sides.
We decided to give Machine Learning technology a try, but also wanted to experiment with one of the most powerful search engines in the world — Elasticsearch. In this article I am going to describe both approaches and compare the results we have achieved.

Proper data is a must

Every Machine Learning process requires very long and careful data pre-processing. After collecting all the data from the last years, we had to analyse it and remove all the information which might make our prediction less precise. When our users want to sell a car on our platform, they must fill in the mandatory fields from the ad placement form (i.e. make, model, registration date, car body type). They can also include optional data, such as A/C or color. A closer look at the data revealed that only the mandatory fields can be considered useful for building the ML model.
Also, we found some deviations in listings i.e. mileage above 1 million kilometers, so we had to remove all the entries with improper data.
After the data cleaning, we decided to go for Machine Learning regression algorithms. The idea behind our decision was to support our users in the process of posting a car listing by having our system suggest the best price for the car after the user fills in all the required fields.

Our final feature set includes:

  • car body
  • fuel type
  • horsepower
  • registration date
  • mileage
  • model

Machine Learning for car price prediction

As Machine Learning python libraries provide many different regression algorithms, when we had clean data, we decided to try some of them:

  • LinearRegression
  • DecisionTreeRegressor
  • GradientBoostingRegressor

We first had to remove the make of the car, as every make has unique model names. I will not get into the detailed process for building the model, but instead I recommend you read this great article: predict car prices with ML. We ended up with Decision Tree Regression model with regressor score 0.85,

where possible values are:
(-infinity:1>
where 1 is the best possible score.

As we had our model, we were able to predict more precisely the proper price for posting the car based on the provided parameters. Our product team reviewed the solution and raised another issue:

Since we can predict the accurate price, how about going one step further? We could actually help our customers manipulate the price in order to incentivise them to post their car listing with a fairer price. This could help them get more views and sell faster.

Elasticsearch advanced features for better data understanding

Elasticsearch is one of the most powerful search engines in the world, based on Lucene. It is perfectly suited for full text search, for narrowing the search results on the websites among other things.
But for us it all comes down to this:

How can it help our customers choose the best price for the car they want to sell on Publi24.ro platform?

In order to answer this question we had to look deeper into the capabilities of Elasticsearch. Apart from indexing the documents and making them available for different types of searches, it also provides many data aggregation features. Our product team was asking for more than single digit price recommendations; they asked for price buckets so we could suggest our users defined price ranges which would increase the probability of selling the car faster.
We looked into stats aggregation, which showed us many statistical data regarding prices for particular groups of cars, but we found out that the feature meeting our expectations was percentiles aggregation.

Percentiles show the point at which a certain percentage of observed values occur.

Apart from predefined numbers, with Elasticsearch we were able to define our own ‘points’ with such query:

aggs: {
  price_percentiles: {
    percentiles: {
      field: "price",
      percents: [20.0, 40.0, 60.0, 80.0],
      keyed: false
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This approach allowed us to place prices into 5 different ranges:

  • very cheap,
  • cheap,
  • average,
  • expensive,
  • very expensive, based on the provided data.

Customer satisfaction first — the approach we have chosen

As we have two approaches already providing us interesting results we needed to answer this question:

Which approach would help our customers most? Single recommended value or price ranges?

Our product team made a research which proved that placing the users’ price into buckets and allowing them to increase/decrease price on the website while posting the listing would result in a better user experience.
With this in mind our UX team has designed the UI which looks like this:
Car price recommendation feature on www.publi24.ro

Our users are now able to manipulate the price and choose the right price bucket for their car listing. This determines the number of ad views and influences how fast our users will be able to sell their car.

Design for failure

Technology best practices are always crucial to run every website with high availability and without any customer interruption. We introduced design for failure approach for price recommendation features to ensure that our users will not be impacted if our recommendation microservice is having any issues. Our team has designed it in such a way that if the microservice is not responding or we have no data for a specific combination of features, the posting process works properly, without showing the price recommendation.

Summary

Helping our users make good deals is crucial to us. We invested a lot of time to research the best approach on the market, including building our own machine learning model and deep diving into Elasticsearch capabilities. We cannot simply compare both solutions because they fulfil different functions: the machine learning model predicts the best price, whilst Elasticsearch percentiles groups prices into buckets to help our customers make better informed decisions. As a result of our implementation we see that this solution brings more value to our platform and drives more users to Publi24
Feel free to reach out with any feedback or suggestions, we would love to hear from you.

You can also try our price recommendation feature on Publi24 or visit the Russmedia Equity Partners website.
Russmedia group is also sharing projects via open source — please visit our open source catalog.

Acknowledgements

  • Romina Popa (Head of Product at Russmedia Digital Romania, co-author)
  • Radu Moldovan (CTO of Russmedia Digital Romania)
  • Claudiu Silaghi (DevOps/Web Developer at Russmedia Digital Romania)

Discussion (0)