It's time to build a home server!!!! This series will be what I learned while building and configuring a server at home with the intention of making it close to common configuration standards so I can host sites/applications to test attacks on.
Originally I was thinking of using AWS or another cloud based service, but I want to start with the basics of hosting and then build up. This process will involve installing the OS and then the LAMP software stack in it.
Now, before I dive in there is a few things I need to cover that I think might be helpful to understand before you take this adventure. This will be a bit cumbersome so feel free to skip it and come back as needed. I wrote this as I was building it so it's more here if you are not sure what something is.
- What is a Server?
- Web Server vs HTTP Server?
- What is Server-less?
- Service Vs Process
- What is LAMP?
- What are Virtual Hosts?
- What is CORS?
- What is a "DNS"?
- Uncomplicated Firewalls?
- What is a Database?
- What is an Active Record?
- What is SSH and SCP?
- What is a CMS and do you really need one?
- TLS/SSL and its' Certificates
- IDS, IPS, or SIEM?
Though it doesn't sound complicated upfront, there is a lot that goes into it that we don't see. What's more, there are several types of servers that clients interact with during their time on the Internet; such as, database servers, blade server, cloud server, dedicated server, domain name server, file server, mail server, print server, web server, game server, proxy server, http server, standalone server, local network server, and application server to name a few. 1, 2, 31
Consider the common client/server restaurant example where a waiter ("server") provides various functions ("services") the restaurant patron ("client") on request. The functions that a waiter ("server") can provide can be any range of functions ("services") that they do or don't specifically perform but have access to, such as requesting a drink from the bar "server", requesting a special order from a chef "server", or requesting a bill from a hostess "server".
The same theory translates into a computer server(s) where you might request an email from a "mail server", a website from an "application server" or "HTTP server" and data for a website from a "database server". The server takes the request from the client (usually a web browser or email program) and becomes responsible for performing the service and returning the information to the client in a timely manner. And, like a restaurant, a computer can host many servers in one space or separate out each server into its own space, or host, to specialize is one service.
When building a server to host something on it's important to consider the types of servers available and what your need is. There are a lot of servers available and you won't need all of them for a website or a file sharing server, but you'll probably need more than one because each server performs for a specific task or service.
In my set up I'm going to be building a traditionally hosted server stack (or a LAMP stack) where everything shares resources of one computer. In comparison, in a large scale production environment it's common to separate out each server into their own computer, or to virtualize each server within one computer, with their own virtual resources to make troubleshooting easier.
When building a server it's important to understand the difference between two commonly mixed up server types: a web server and an application server. Anymore a lot of websites are dynamic applications and not just static websites. It can be confusing what they need to be functional at the server level when the terms "web server" and "app server" are thrown around with out much understanding what the differences really are and the different services that each offer.
An HTTP web server handles communication between a client/Browser and server. Specifically, it uses the HTTP protocol to receive HTTP requests and respond back with HTML documents, images, redirects, style sheets, scripts, and text content to list a few. But despite all that, it can't handle more complex/dynamic requests such as Java or C++.32 For dynamic requests a web server will act as an intermediary and pass the request to the best program to handle it and won't provide any functionality beyond an environment for that server-side program to execute and pass back the generated response(s). A HTTP web server works well for static websites that don't process any information, but for processing information an application server is needed.
An application server is a mixed framework of software that can handle more complex/dynamic software processing requests that web servers can't; such as APIs calls, Java applets, or C++ processing. It's generally positioned behind a web server and in front of a SQL database and it handles the business logic to generate dynamic content that web applications or desktop applications run; "that is, it’s code that transforms data to provide the specialized functionality offered by a business, service, or application." 34
In general, clients/browsers access applications by communicating with a Web server via HTTP and an Application Server has separate listeners for the HTTP, HTTPS, IIOP, and IIOP/SSL protocols. Though each listener has exclusive use of a specific port number this doesn’t mean that web servers can't deliver dynamic content though CGI (Common Gateway Interface) or plugins that act as a intermediary for a process; nor does it mean that an application server can’t serve up a static website. 33 If the dynamic content is generated using java technologies an application server is generally in use. Furthermore, if the dynamic content is generated by PHP or Perl a web server is generally being used.
Serverless is a catchy new way to get applications online, but are there really no servers involved?
-Short answer: no.
-Long answer: "Serverless is a cloud computing execution model where the cloud provider dynamically manages the allocation and provisioning of servers." 21
In the traditional hosting style, all parts of an application/web site are hosted on one server. It allowed for IT staff to have full access to control, install, and configure software/hardware as needed. But it also means that IT staff are needed to monitor it and physically protect it.
In the server-less hosting style, all parts of an application/web site are split up into different, event-driven, "services" and/or "processes" and the responsibility of their management falls on the (cloud) providers that are hosting the specific service you are using. Being event-driven, a serverless application will only start running when there’s a specific condition or input to trigger it.
According to Amazon AWS:
"With serverless computing, infrastructure management tasks like capacity provisioning and patching are handled by AWS, so you can focus on only writing code that serves your customers. Serverless services like AWS Lambda come with automatic scaling, built-in high availability, and a pay-for-value billing model... - all without managing any servers." 22
For my purpose I will be building a traditional hosting style. The server will be physically in my house, connected to my network, and I will need to monitor it, like IT staff would, to ensure it has its security updates and monitor its logs for malicious actions that I will be testing on it.
When researching server hosting the words "service" and "process" gets tossed around, but what is a "service" and what is a "process"?
A "process" is an instance of a running program, application, script, or executable (.exe program file) that can be running in the foreground or background of a system. It's one, or several, string(s) of executable instructions within its own environment, or application environment, that at any given time can be either running, sleeping, or zombie (completed process or process waiting for its' parent process to pick up the return value).
For example, both the Chrome Browser and Notepad++ are a process (applications/executables). However, Chrome is interesting because it separates each tab into its own instance of that process. In contrast, you can have multiple tabs open in Notepad++ but all of them live in one instance of the Notepad++ process.
A "service", also called a "Daemon" in a Linux environment, is a continuously listening process that runs in the background and provides the service to a client on request. Though a service can have an associated process with a user interface, they are usually started by the operating system and will run whether or not the associated process is open in the foreground. They don't interact with the desktop, are not interactive, and have no controlling terminal/interface.
For example, the Apache/httpd web server, SSH server, system logger, antiviruses, and your system clock are all services that run in the background even when the user is not logged in. Sure, you can modify settings of these services but they just run and wait for someone to request information from them.
To see what services and process are running on your system you can:
On Windows Open the task manager. This will give you tabs for viewing running processes (browsers and other programs you are running/interacting with) and services (helpers to the computer or a process).
To view services run
service --status-all to view what is currently running.
To view processes run
ps for a small list of recently called processes or
top for a continuously updated and more detailed list of running processes.
LAMP (Linux, Apache, MySQL, PHP) is an open source software stack where each component contributes essential capabilities to an application. Though almost any OS, HTTP server, Database Manager, and data processing software could be used that suites your need the LAMP stack "has a classic layered architecture, with Linux at the lowest level followed by Apache, MySQL, and PHP. Although PHP is nominally at the top or presentation layer, the PHP component sits inside Apache." 3 Though there have been a few variations of this software stack developed over the years LAMP remains continuously popular for its historically proven record for delivering high-performance web applications.
Some variations of the LAMP stack are where MySQL is replaced by PostgreSQL and renamed LAPP, a Windows OS equivalent stack known as WAMP, or sometimes just by keeping the original acronym of LAMP but changing the meaning to Linux / Apache / Middleware (Perl, PHP, Python, Ruby) / PostgreSQL to be more flexible to developers.
"A high-level look at the LAMP stack order of execution shows how the elements interoperate. The process starts when the Apache web server receives requests for web pages from a user’s browser. If the request is for a PHP file, Apache passes the request to PHP, which loads the file and executes the code contained in the file. PHP also communicates with MySQL to fetch any data referenced in the code.
PHP then uses the code in the file and the data from the database to create the HTML that browsers require to display web pages. The LAMP stack is efficient at handling not only static web pages, but also dynamic pages where the content may change each time it is loaded depending on the date, time, user identity and other factors.
After running the file code, PHP then passes the resulting data back to the Apache web server to send to the browser. It can also store this new data in MySQL. And of course, all of these operations are enabled by the Linux operating system running at the base of the stack." 3
Using "Virtual hosts" has become a common method for hosting multiple web sites on a single server or system. This is done because it allows one server to share its resources and doesn't require services to use the same host name.
There are 3 types of virtual hosts 5:
IP-based: meaning that you have a different IP address for every web site.
- For example, 172.20.30.40 and 172.20.30.50
name-based: meaning that you have multiple names running on each IP address.
- For example, blog1.example.com and blog2.example.com or example.com and example2.com
port: meaning that you have multiple ports on an IP address for every web site.
- For example, 172.20.30.40:8081 and 172.20.30.40:8080
Cross-Origin Resource Sharing (CORS) is a mechanism that allows access to resources across origins. Implemented as an extension of the Same-Origin Policy, CORS enables servers to specify any other origins allowed to share resources with though a suite of HTTP headers that define any trusted origins and associated properties that are combined during the exchange between a Browsers and a resource across web origins. For example, if an API is used on http://example.org, http://hello-world.example can opt in using the mechanism of the
Access-Control-Allow-Origin: http://example.org as a response header, which would allow the resource to be fetched cross-origin from http://example.org. 36
CORS has an "opt in" mechanism where user agents (Browsers) will, typically, isolate content retrieved from different origins, by default, to prevent malicious web site operators from interfering with the operations of benign web sites and to prevent leaking of data. 37
Be Warned: This does NOT mean that CORS is security. CORS != Security.
CORS is a way of easing up on the strict same-origin policy of resource sharing and NOT a mechanism to enforce general security or prevent against a variety of risky scenarios.
All information on the Internet is accessed by address numbers called "IP address" like 192.168.1.2. These IP addresses are similar to a street address in that they specify where information you are looking for lives. On the Internet there are a lot of addresses so instead of saving a list for each IP address of websites you like, we use a Domain Name System instead. This allows for humans to eliminate the need to memorize the laundry list of IP addresses we might need to access at some point.
The DNS helps by translating registered domain names, such as "www.examplesite.com," into IP address, such as "192.168.1.2," for us and maintains records for all other websites on the Internet.
I'm not going to get into a lot of detail here, but it is good to know that there are a 4 types of dedicated servers for DNS services that work together to translate the Domain Name into an IP address: recursive resolvers, root nameservers, TLD (Top Level Domain) nameservers, and authoritative nameservers.
If you want to read more about this process I recommend checking out GeekForGeeks and CloudFlare as they have a detailed explanation of each step that is done when requesting a domain from a DNS server.
Ubuntu's Linux kernel includes the Netfilter subsystem for packet filtering, network address translation (NAT), and other packet mangling. We use the IPtables suite of commands as the traditional userspace interface for manipulating/managing the netfilter subsystem and thus providing a "complete firewall solution that is both highly configurable and highly flexible." 10
When a packet reaches the server, it will be handed off to the Netfilter subsystem for acceptance, manipulation, or rejection based on the rules supplied to it from userspace via iptables.
Ubuntu offers a user-friendly frontend for firewall configuration through its default UFW (Uncomplicated Fire_w_all). "ufw provides a framework for managing netfilter, as well as a command-line interface for manipulating the firewall. ufw aims to provide an easy to use interface for people unfamiliar with firewall concepts, while at the same time simplifies complicated iptables commands to help an administrator who knows what he or she is doing." 10
Here are some useful things about UFW:
- The UFW has 4 levels of logging (low|medium|high|full) and logs are located in the
- Be careful with the log level you choose as even a "Medium" level will generate a ton of logs.
- The UFW supports connection rate limiting to will help protect against brute-forcing login attacks by allowing a normally connections in but will "deny connections if an IP address attempts to initiate 6 or more connections within 30 seconds." 45
- Typical usage is:
sudo ufw limit ssh/tcp
- "Users should consider using this option for services such as SSH." 46
- Typical usage is:
- The ability to test if a rule will be what you expect with the
- For example:
sudo ufw --dry-run logging medium
- For example:
- The UFW has the ability to view reports based on the live system and, with the exception of the listening report, is in raw iptables format.
- Typical usage is:
sudo ufw show [report type]or
sudo ufw show [report type] > [File Name].txt.
- Report format types are: raw|builtins|before-rules|user-rules|after-rules|logging-rules|listening|added
- Typical usage is:
- The UFW has the ability to set up an IP blacklist to deny access to open ports/services with its
- Typical usage is:
sudo ufw deny from [IP address here]
- Be careful with this as it can easily escalate to a large list of rules and there are better options for blacklists when needed.
- Typical usage is:
A database is an organized collection of data/structured information. There are many types of databased, but they are all generally stored electronically (though some hard copies can be kept depending on the need) and are usually controlled by a database management system (DBMS).
The types of databases are:
- Relational database
- Object-oriented database
- Distributed database
- Data warehouse
- NoSQL database
- Graph database
- OLTP database
For the context of this project I will be using a Relational Database
A Relational Database is a database whos data is stored in tables where each table is "linked" or "related" based on data that's common between the tables. The connection between tables is used to establish the basis of interactions among these tables and allows to quick access to information that might be spread across multiple tables.
Furthermore, "each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points." 12
When dealing with a Relational Database a RDBMS (Relational Database Management System) is the standard interface used between users/applications and the database. There are many RDBMS anymore but they all do the same basic things: store data, manage the connections between tables, query information in different tables, and retrieve data stored in a Relational Database. To interact with a RDBMS, SQL (Structured Query Language) is the standard language used for processing data; or the RDBMS can at least process SQL statements for requests and database updates.
Rails Active Record is the "M" (the model) in MVC and is the interface component that Ruby on Rails gives you between the database and your application that lets you structure your data models in a logical and nearly plain-English way. 25 It is the layer of the system that's responsible for representing business data and logic by facilitating the creation, and use, of business objects whose data requires persistent storage to a database. 26
To break it down broadly, Rails Active Record is part of the MVC (Model-View-Controller) software design pattern; which is an implementation of the Active Record pattern; which itself is a description of an ORM (Object Relational Mapping) system.
Now, I'm not going to go into detail on each of these, so checkout their links for more information, but here is a little bit on each.
- MVC is a software design pattern that separates an application into three main logical components, the model, the view, and the controller.
- Active Record Pattern is a software architectural pattern that stores in-memory object data in relational databases by wrapping "each database view/table/collection into a class, with instances of that class corresponding to individual "smart" records in each view/table/collection." 27 The interface of an object conforming to this pattern would include functions such as Insert, Update, and Delete, plus properties that correspond more or less directly to the columns in the underlying database table.
- ORM is a programming technique for converting data between incompatible type systems (such as relational databases) using object-oriented programming languages (such as Python, Java, and Ruby).
SSH (Secure Shell) is a network protocol that provides a secure channel, over an unsecured network, for two computers to communicate. 16 SSH operates on port 22 and is used to log into a remote machine and execute commands by using a client–server architecture for connecting an SSH client application with an SSH server. SSH can also be used to transfer files using the associated SSH file transfer (SFTP) or Secure Copy (SCP) protocols. 15
Based on the Secure Shell Protocol (SSH), the Secure Copy Protocol (SCP) command is a means of securely transferring files between a local and a remote host where both your authentication information (such as password or passcode) and your data are encrypted.
Because SSH works in the client-server model
The protocol works in the client-server model, which means that the SSH client drives the connection setup process to the server where the client uses public key cryptography to verify the identity of the SSH server. "After the setup phase the SSH protocol uses strong symmetric encryption and hashing algorithms to ensure the privacy and integrity of the data that is exchanged between the client and server." 17
A CMS (Content Management System) is software with a graphical interface used to centralize your repository for content developing, editing, managing, and pushing out content. CMSs are powerful, flexible, and have become popular because they allows the creation of a website without the in-depth knowledge of website design, development, and user experience.
CMSs are "installed in a Web server's docroot, and content is requested just like any other Web page. A good CMS will handle things like granting users access to content, content administration, theming and even accepting user-submitted content, such as comments and blog posts. Apache accepts a GET request and passes that on to Drupal, which then interprets it using a variety of menu callbacks to determine which content is desired. It's smart enough to check user access controls first to determine whether the requested content is allowed, and then decides to deliver it. The theme engine within the CMS renders HTML around the content and returns an HTML document back to Apache for delivery to the end user." 18
That being said, I disagree that a CMS is needed when hosting a web site. For example, you wouldn't see a Facebook engineer or the AWS infrastructure engineer using a CMS to update the platform. They have engineers that build the CMS that we use. However, yes, if you are a small mom-and-pop company or a new entrepreneur that needs a website then a CMS probably makes since because it allows for an easy What-You-See-Is-What-You-Get (WYSIWYG) creation and customization of website. This is why companies like Wordpress and Squarespace were created; to aid in creation of simple websites with dynamic content abilities. For example, WYSIWYG is like creating a PowerPoint application where you can click and drag content from your computer into the presentation and add text where desired.
If you choose you want a CMS for your home server there are a lot of options so be sure to do your research and find one that fits your needs. And check out link 20 for some good information on some options.
TLS/SSL (Transport Layer Security/Secure Socket Layer) are two network protocols that provide a secure channel, over an unsecured network, for a server and a client/Browser to communicate over. They operate on port 443 and are most commonly seen via the HTTPS protocol as a way to ensure that only the valid parties can access the data that’s being transferred.
To communicate over HTTPS a client and a server need to establish a secure connection. To do so a digital document called a 'Certificate' is used to bind the identities of the client and server to a set of cryptographic keys called a "public key" and "private key". A publicly trusted third party—trusted company or organization called a Certificate Authority (CA) signs the Certificate to verify that a key is owned by its issuing/named entity, indicate certain expected usages of its public key for communications (usually the the X.509 or EMV standard), and so it will be implicitly trusted by client software, such as web browsers, and operating systems. 29, 38
The certificate is used to reassures users visiting an HTTPS website of the Authenticity that the server presenting the certificate is in possession of the private key that matches the public key in the certificate, Integrity that any documents (e.g. web pages) signed by the certificate have not been altered in transit by a hu-man in the middle, and Encryption of data that is being transmitted over the network. 28
It should be noted that..._
- SSL has been depreciated and TLS is the standard now but the terms are often interchanged or combined. Read more at about it through its associated RFC.
- SSL and a certificate do not mean that a web users security is not absolute. This is a trust relationship for users (to trust browsers) and CAs (to protect their security). 40
Activity on a server generates a large amount of data that is usually collected and stored and depending on the need it can then be passed off to a system or service that will do something with the data. There are a lot of options for systems that can do something with the data ranging from simple alerts to fully automated actions and there is no one solution as the one you pick will depend on the level of control and automation needed.
For example, a lot of home computers have a general system that logs events built into it, but without any extra programs the Operating System won't do anything with this data except collect it for a set amount of time. In contract, if that home computer has an anti-virus program on it, that program is able to access and analyze the logs and perform specific actions related to the program; such as populate alerts to the user or allow the user to accept/deny the potential risks. 41
The same concept applies to a server, but on a larger scale; because a server might collect data from multiple systems/programs across its infrastructure it needs more than a software solution like an anti-virus program. A server needs an IDS, IPS, or SIEM.
One option is an Intrusion Detection System (IDS) that will watch the logged data and alert IT staff is anything looks suspect.
This is a good options for a lot of companies because this allows IT staff control over what they want to do with potentially malicious traffic on their system and doesn't block any false positives.
Another option is the Intrusion Prevention System (IPS). This system generally has the same capabilities of the IDS, but takes it a step further and allows IT staff set rule to establish automated actions that can be performed on all traffic flows that enter the network.
An IPS can be a great solution as it is relatively low maintenance (requires ruleset to be update regularly), is able to perform real-time packet inspection, and terminate any malicious activity as it is attempting to happen. However, an IPS can also cause a lot of false positive actions to be performed and it can potentially block all traffic if it stops working or the rules are not configured correctly.
Enter the SIEM. A SIEM (Security Information and Event Management) is centralized security management that combines the long-term storage, analysis, and reporting of log data by the Security Information Management (SIM) with the real-time monitoring, correlation of events, notifications, and console views from the Security Event Manager (SEM).
A SIEM collects relevant information from multiple sources throughout an Organization’s infrastructure. It then assembles and analyzes the date to identify any deviations from the norm to then take appropriate action, perform forensic investigation on past security incidents, and/or prepare audits for compliance purposes. 42
These are not the only options available, but a general overview of a few popular options.
Time to build this thing! Check out my next blog in this series to see how to set it all up.