Detective Angadh
Let's see how Angadh, our Detective Master (Imaginary Character) resolved a Case in which the website was down for more than a week in a small company ABC.
Case Details :
ABC Agency a small company which runs campaigns and advertisements business. They built a in-house web application and it was running smooth for more than 2 years. Suddenly the site is in-accessible/down for more than 1 week. Initially they had a developer who were taking care of all the development efforts and deployment this application. Unfortunately, the Developer is no more with the ABC Company and the end users has NO knowledge about the Application Hosting or Development or SourceCode related details. Now they are facing heat and their business continuity is completely halted.
Current State :
ABC have no clues on how to resolve and bring the Website up. Even though they found some subject experts they couldn't help as ABC doesn't have any information about the application code, database or hosting details.
Finally, ABC reached our Detective Master Angadh to get this resolved with the available information they have. The available information is the domain name of the in-house app they developed.
Domain Name : digital-agency.abc.com
Investigation :
After reviewing the case study Angadh prepared the following list to figure out whats going on behind with a Possible cause list and Troubleshooting checklist
Possible Causes for the Website Down : ( a Typical website with Less users)
- DNS Resolving Issues
- DDOS Attacks
- Server Down
- Web Server may be stopped
- Background Process may have crashed
- May be Licensing related Issues based on Date
- Server IP Address Changes
Checklist :
Angadh felt the only way to identify the issue is to get the details of the server and access them to troubleshoot further.
Checklist : Identify Server, Hosting and Source Code
- [x] Identify the IP Address of the Website
- [x] Identify where the Application is hosted. (i.e) Where the Server is Local Premise/Cloud
- [x] Identify the Application Platform (PHP, NodeJS etc)
- [x] Identify whether they used Apache, PHP-FPM or NGIX
- [x] Identify whether they had source code repository
Angadh used the following tools for clearing the above checklist.
A) Where the Server & Location ?
- Ping the website domain Name from terminal. ping
- Ping.eu to trace the DNS route
- IPLocation using iplocation.net
With these Tools, Angadh Identified that the application is hosted on one of the Cloud Platform (Google Cloud) along with the IP Address details from the DNS
B) How to get into the Server ?
Angadh inquired the System Admin of ABC whether they have any account in Google Cloud Platform and he got a read only access to the GCP Platform. He was seeing couple of servers running there and using the VM Instance IP (Public IP Address gathered from Checklist A ), he SSH'ed to the server.
He knows well by default, GCP will allow you to ssh to the Linux instance by injecting the on-the-fly keys to the server for the respective user who logged in and have access to the GCP Project account.
He felt he was very close to the solution and trying to explore the files and directories on the server.
C) What is the App Platform & Where is the SourceCode Repository ?
Reviewing the source code can give some hints about the platform the application was built. Angadh inquired the same SysAdmin whether they have any accounts with GitHub,TFS, BitBucket etc. The SysAdmin mentioned he has an account with BitBucket and provided an access. Angadh explored the repository and identified the right project and realised that the platform is built on NodeJS
D) How this NodeJS Web App served in the 80 port ?
/var/www/html was one of the popular directory path used by majority of platforms. Angadh tried to change the directory to that folder and he couldn't find anything there. The directory is empty.
Its going to nightmare to figure out where these files are located and how this has been served. As per the code it was defaulted to port 8080. But someway they would have overridden in the Production server through 80 port or they would have did a reverse proxy using NGINX or Apache Proxy. He thought either these services are stopped or crashed for some unexpected reasons.
Angadh couldn't find any apache or nginx server running in the server process. So, the app was served in 80 port through a different mechanism.
He even tried using CURL command to figure out whether the website is working inside the server.
$ curl http://digital-agency.abc.com
Could not resolve host: digital-agency.abc.com
$ curl http://localhost:80
Failed to connect to localhost port 80: Connection refused
$ curl http://localhost:8080
<html>
.....
.....
</html>
Hurray, Angadh happy to see that he is getting some response when he used the curl command. Now what reverse proxy running behind in the server at 80 port.
E) Time Travel - Terminal Command History
Looking at the state of information gathered so far, Angadh realised the only way to resolve the issue this to figure out the recent commands executed in the server by the old developer.
$ /etc/passwd
The above command listed some of the users. Using su , he tried to switch to that user and executed the history command
$ history > command-history.txt
This one command exported all the commands to a text file and Angadh patiently reviewed each of the one and he is surprised to see the last command used in that user shell.
sudo iptables -t nat -I PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8080
When he looked at this one command he realised that the app is not served through a traditional webserver like Apache or Nginx, but it uses directly the EXPRESS NodeJS Server and re-routed using iptables (feature in Linux Distros to route IP requests using rules)
Now Angadh executed this command again and tried accessing the site and he was able to access the site without issues.
F) The Final Part - Why the Website is up now ?
After executing this command, the request to this server with 80 Port are redirected to the NodeJS 8080 Port and the only possible reason for the previous failure is they missed to save this changes in the IPTable.
The command to save the IPTable Rules permanent
# https://unix.stackexchange.com/questions/52376/why-do-iptables-rules-disappear-when-restarting-my-debian-system
sh -c "iptables-save > /etc/iptables.rules"
Angadh suspected that this issue have arised due to some server restarts or patches applied in the VM Instance running in the GCP. He went to the VM Instance and checked the last boot time and it exactly 9 days since the website down was reported.
$ last reboot
Finally, Angadh Concluded the root cause of the Website down and tried restarting again as he is very sure that the IPTables are saved as per the previous command. After a restart the WebApp worked well and he provided a detailed report to ABC and the company was very happy that he cracked all these i short duration and get their business running again.
Angadh, knows well that the solution implemented using IPTables are not robust way. But his focus is to investigate and resolve the issues in the current environment which was working well for more than a year.
Hope you enjoyed this detective series with Angadh.
Top comments (1)
Creative format, and useful info!
Some comments may only be visible to logged-in visitors. Sign in to view all comments.