DEV Community

Cover image for SQL101: Introduction to SQL for Data Analysis
Hassie Mike Perekamoyo
Hassie Mike Perekamoyo

Posted on

SQL101: Introduction to SQL for Data Analysis

Introduction

SQL(Structured Query Language) is a programming language used to manage and manipulate relational databases.it is used to create, modify, and retrieve data from databases, also to define and manage the structure of the databases. SQL is a standard language that is widely used by developers and analysts to interact with databases.

SQL allows users to create tables and define the relationships between the tables, insert data into those tables, and query the data to access the information. SQL also provides the ability to filter, sort, group, and aggregate data, also to join multiple tables together to create more complex queries.

Most common operations that can be performed using this language include creating databases and tables, inserting, updating and deleting data, and querying data to get specific information. This language is used in a variety of applications and industries, including finance, healthcare, retail and others.

In this article, I will introduce you to SQL for data analysis. It will cover the basics of SQL, how to install MySQL, including the structure of a database, creating tables, querying data, and basic data analysis.

Installing MySQL

installation of MySQL may depend on the operating system and version of MySQL you are installing. However, here are the steps to install MySQL on a Windows system.

  1. Download the MySQL installer from the MySQL website.
  2. Run the installer executable file.
  3. Choose the setup type, usually the Developer default option. This will install all of the necessary components for developing and testing applications with MySQL.
  4. Choose the configure type, which is usually the Standalone MySQL Server option. This will install the MySQL Server as a service on the system.
  5. Configure the MySQL Server options, such as the root password, port number, and other settings. For most settings you can usually accept the default options.
  6. Complete the installation process and wait for it to finish.
  7. Once the installation is complete, you can open MySQL command-line client to verify that the Server is running and to start working with databases.

Note that the above steps are a general guide for installing MySQL on a windows system, and you may need to make adjustments on the steps based on your specific system configuration and version of MySQL you are installing. https://www.mysql.com/

Structure of a Database

A database is usually organized into tables, which contain rows of data and columns that define the type of data stored in each row. The structure of database, also called the database schema, is defined by several key components, including:

  1. Tables: The tables in a database are used to store data. Each table consists of rows and columns, with each row representing a record and each column representing a field.
  2. Fields: Fields are the individual elements of data within a table. For example, a student table might have fields for first name, last name, address, phone number and registration number.
  3. Records: A record is a single row in a table that contains data about a specific entity. For example, a student record might contain information about a single student, like name, address, phone number and registration number.
  4. Keys: Keys are used to link different tables together in a database. Primary keys are unique identifiers for each record in a table, while foreign keys are used to link records in one table to records in another table.
  5. Relationships: Relationship define how different tables in a database are related to each other. For example, a student table might have a relationship with an Exam result table, when each student can have multiple results.
  6. Indexes: Indexes are used to speed up database queries by allowing the database to quickly find specific records in a table. An index is created on one or more fields in a table can significantly improve query performance.
  7. Views: Views are virtual tables that are based on the data in one or more tables in a database. Views can be used to simplify complex queries, or to provide specific view of the data to different users.
  8. Stored Procedures: Stored procedures are precompiled database programs that can be called by other programs or users to perform a specific database operation. Stored procedures can improve database performance and security.

SQL Data Types for MySQL

The data type of a column defines what value the column can hold: integer, character, money, date and time, binary, and so on.

An SQL developer must decide what type of data that will be stored inside each column when creating a table. The data type is a guideline for SQL to understand what type of data is expected inside of each column, and it also identifies how SQL will interact with the stored data.

In MySQL there are three main data types: string, numeric, and date and time.
String Data Types

  1. CHAR(size): FIXED length string (can contain letters, numbers, and special characters). The size parameter specifies the column length in characters - can be from 0 to 255. Default is 1
  2. VARCHAR(size): A VARIABLE length string (can contain letters, numbers, and special characters). The size parameter specifies the maximum string length in characters - can be from 0 to 65535
  3. BINARY(size): Equal to CHAR(), but stores binary byte strings. The size parameter specifies the column length in bytes. Default is 1
  4. VARBINARY(size): Equal to VARCHAR(), but stores binary byte strings. The size parameter specifies the maximum column length in bytes.
  5. TINYBLOB: For BLOBs (Binary Large Objects). Max length: 255 bytes
  6. TINYTEXT: Holds a string with a maximum length of 255 characters
  7. TEXT(size): Holds a string with a maximum length of 65,535 bytes
  8. BLOB(size): For BLOBs (Binary Large Objects). Holds up to 65,535 bytes of data

Numeric Data Types

  1. BIT(size): A bit-value type. The number of bits per value is specified in size. The size parameter can hold a value from 1 to 64. The default value for size is 1.
  2. TINYINT(size): A very small integer. Signed range is from -128 to 127. Unsigned range is from 0 to 255. The size parameter specifies the maximum display width (which is 255)
  3. BOOL: Zero is considered as false, nonzero values are considered as true.
  4. BOOLEAN: Equal to BOOL
  5. SMALLINT(size):A small integer. Signed range is from -32768 to 32767. Unsigned range is from 0 to 65535. The size parameter specifies the maximum display width (which is 255)
  6. MEDIUMINT(size): A medium integer. Signed range is from -8388608 to 8388607. Unsigned range is from 0 to 16777215. The size parameter specifies the maximum display width (which is 255)
  7. INT(size): A medium integer. Signed range is from -2147483648 to 2147483647. Unsigned range is from 0 to 4294967295. The size parameter specifies the maximum display width (which is 255)
  8. INTEGER(size): Equal to INT(size)

Date and Time Data Types

  1. DATE: A date. Format: YYYY-MM-DD. The supported range is from '1000-01-01' to '9999-12-31'
  2. DATETIME(fsp): A date and time combination. Format: YYYY-MM-DD hh:mm:ss. The supported range is from '1000-01-01 00:00:00' to '9999-12-31 23:59:59'. Adding DEFAULT and ON UPDATE in the column definition to get automatic initialization and updating to the current date and time
  3. TIMESTAMP(fsp): A timestamp. TIMESTAMP values are stored as the number of seconds since the Unix epoch ('1970-01-01 00:00:00' UTC). Format: YYYY-MM-DD hh:mm:ss. The supported range is from '1970-01-01 00:00:01' UTC to '2038-01-09 03:14:07' UTC. Automatic initialization and updating to the current date and time can be specified using DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP in the column definition 4: TIME(fsp): A time. Format: hh:mm:ss. The supported range is from '-838:59:59' to '838:59:59'
  4. YEAR: A year in four-digit format. Values allowed in four-digit format: 1901 to 2155, and 0000.

Creating Tables

Before you can store data in a database, you need to create a table. The CREATE TABLE statement is used to create a new

table in the database. Here is an example:

CREATE TABLE customers ( customer_id INT, first_name VARCHAR(50), last_name VARCHAR(50), email VARCHAR(50), phone VARCHAR(20) ); 
Enter fullscreen mode Exit fullscreen mode

In this example, we're creating a table called "customers" with five columns: customer_id, first_name, last_name, email, and phone. The customer_id column is an integer, while the other columns are strings of varying lengths.

Inserting Data

Once you've created a table, you can insert data into it using the INSERT INTO statement. Here is an example.

INSERT INTO customers (customer_id, first_name, last_name, email, phone) VALUES (1, 'Hunter', 'Vee', 'HunterVee@example.com', '555-555-5555'); 
Enter fullscreen mode Exit fullscreen mode

In this example, we're inserting a single record into the customers table. The values in the parentheses represent the data being inserted into the table.

Querying Data

Now that we have data in our database, we can start querying it to extract information. The SELECT statement is used to query data from a table. Here is an example:

SELECT * FROM customers; 
Enter fullscreen mode Exit fullscreen mode

In this example, we're selecting all columns from the customers table. The * is a wildcard character that represents all columns.

You can also specify which columns you want to select, like this:

SELECT first_name, last_name FROM customers; 

In this example, we're only selecting the first_name and last_name columns from the customers table.

Enter fullscreen mode Exit fullscreen mode

Filtering Data

To filter data in SQL, you can use the WHERE clause. Here is an example:

SELECT * FROM customers WHERE first_name = 'Hunter'; 
Enter fullscreen mode Exit fullscreen mode

In this example, we're selecting all columns from the customers table where the first_name is 'Hunter'. This will return all records where the first name is Hunter.

You can also use other comparison operators, such as <, >, <=, >=, and !=. Here's an example using the < operator:

SELECT * FROM customers WHERE 

customer_id < 10; 
Enter fullscreen mode Exit fullscreen mode

In this example, we're selecting all columns from the customers table where the customer_id is less than 10.

Sorting Data

To sort data in SQL, you can use the ORDER BY clause. Here's an example:

SELECT * FROM customers ORDER BY last_name; 
Enter fullscreen mode Exit fullscreen mode

In this example, we're selecting all columns from the customers table and ordering them by last_name. This will return all records sorted in alphabetical order by last name.

You can also sort by multiple columns by separating them with a comma, like this:

SELECT * FROM customers ORDER BY last_name, first
Enter fullscreen mode Exit fullscreen mode

Creating relationships between entities

Cardinality refers to the quantity of elements that interact between two related tables. Identifying the cardinality helps make sure you’ve divided the data into tables most efficiently.

Each entity can potentially have a relationship with every other one, but those relationships are typically one of three types:

One-to-one relationships
When there’s only one instance of Entity A for every instance of Entity B, they are said to have a one-to-one relationship (often written 1:1). You can indicate this kind of relationship in an ER diagram with a line with a dash on each end:

To implement a 1:M relationship as you set up a database, simply add the primary key from the “one” side of the relationship as an attribute in the other table. When a primary key is listed in another table in this manner, it’s called a foreign key. The table on the “1” side of the relationship is a considered a parent table to the child table on the other side.

Many-to-many relationships
When multiple entities from a table can be associated with multiple entities in another table, they are said to have a many-to-many (M:N) relationship. This might happen in the case of students and classes, since a student can take many classes and a class can have many students.
In an ER diagram, these relationships are portrayed with these lines:

Unfortunately, it’s not directly possible to implement this kind of relationship in a database. Instead, you have to break it up into two one-to-many relationships.

To do so, create a new entity between those two tables. If the M:N relationship exists between sales and products, you might call that new entity “sold_products,” since it would show the contents of each sale. Both the sales and products tables would have a 1:M relationship with sold_products. This kind of go-between entity is called a link table, associative entity, or junction table in various models.

Each record in the link table would match together two of the entities in the neighboring tables (it may include supplemental information as well). For instance, a link table between students and classes might look like this:
Mandatory or not?
Another way to analyze relationships is to consider which side of the relationship has to exist for the other to exist. The non-mandatory side can be marked with a circle on the line where a dash would be. For instance, a country has to exist for it to have a representative in the United Nations, but the opposite is not true:

mandatory or not
Two entities can be mutually dependent (one could not exist without the other).https://www.tech-recipes.com/database/one-to-one-one-to-many-table-relationships-in-sql-server/

Recursive relationships
Sometimes a table points back to itself. For example, a table of employees might have an attribute “manager” that refers to another individual in that same table. This is called a recursive relationship.

Redundant relationships
A redundant relationship is one that is expressed more than once. Typically, you can remove one of the relationships without losing any important information. For instance, if an entity “students” has a direct relationship with another called “teachers” but also has a relationship with teachers indirectly through “classes,” you’d want to remove the relationship between “students” and “teachers.” It’s better to delete that relationship because the only way that students are assigned to teachers is through classes.

Top comments (1)

Collapse
 
yankho817 profile image
Yankho Chimpesa

Amazing piece.