As a data analyst ,one uses loads of data in order to make informed decisions .Often, data stays in an SQL database ,follow the link below to get an introduction to SQL (https://dev.to/emme_42/introduction-to-sql-for-data-analysis-3fj7)
Since data is often stored in an SQL database, one ought to understand the SQL query commands. This article will take you through the essential SQL commands for data science.
**Data definition**
Used to create (define) data structures such as tables, indexes, clusters i.e.,
• CREATE databases, tables
• ALTER databases, tables
• DROP tables
Although most of the time the client gives you data in a database(already created ),it is essential to know how to create databases. Databases are created using the create(); query
for example in MYSQL,
create database database_name;
create table department(
list the columns and their data types );
Drop tables -used to delete a certain table if it is not being used / not needed for analysis.
Data Manipulation
The data manipulation language is used to access and update data; it is not important for representing the data. (Of course, the data manipulation language must be aware of how data is represented, and reflects this in the constructs that
it supports i.e.
• SELECT - extracts data from databases - to get all the content from a specific table in the database
_select *
from table_name; _
• UPDATE- updates data in a database-
update table_name
set column1=value1;
• DELETE- deletes data from tables
_delete from table name; _
• INSERT INTO - inserts data into tables
insert into table_name (
column1,colum 2
values(value 1,value two);
Alter table - used to add, delete, or modify columns in an existing table, also used to add and drop various constraints on an existing table.
ALTER TABLE table_name
ADD column_name datatype;
Sorting on some attribute/ Data retrieval with simple conditions**
- WHERE This is used to retrieve specific entries that meet specific conditions. example in a dataset with employees ,we would want to know which employees earn more than 50,000 , use, select employee_salary from employee where employee_salary >=50,000;
2.ORDER BY
Used to sort records in order. Sorts the records in ascending order by default. To sort the records in descending order, use the DESC keyword.
_select employee_salary
from employee
where employee_salary >=50,000
order by emoloyee_salary;
3.Limit
Used to give a limited no of entries.
select employee_salary
from employee
where employee_salary >=50,000
order by emoloyee_salary
limit 10;
-gives only first 10 entries
*AGGREGATIONS *
Used to get a summary of the dataset to get insights.
- Group by The GROUP BY statement groups rows that have the same values into summary rows syntax select sum(column_name) from table_name where (condition) group by column_name;
2.Count
it returns the number of rows that matches a specified criterion.
select count(column_name)
from table_name
where condition;
JOINS
This command is used to combine data from two or more tables in a database.
Examples
Inner join
It returns only the rows where there is a match between columns in both tables.
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;Left join
It returns all the rows from the left table and matching rows from the right table. If there is no match in the right table ,the result will have null values.
SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;Right join
it returns all records from the right table , and the matching records from the left table .If there is no match in the left table ,the result will have null values.
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
- Outer join
Used to return all the rows from one or both tables.
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;
AVG function
It returns the average value of a numeric column.
SELECT AVG(column_name)
FROM table_name
WHERE condition;Having function
The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);Sum function
It returns the total sum of a numeric column.
SELECT SUM(column_name)
FROM table_name
WHERE condition;
CHANGING DATA TYPES
Cast
it converts a value (of any type) into a specified datatype.
CAST(expression AS datatype(length))Round
It rounds a number to a specified number of decimal places.
ROUND(number, decimals, operation)
WINDOW FUNCTIONS
A window function performs a calculation across a set of table rows that are somehow related to the current row.
Here are some examples ,
1.Row number()
This is a function assigns a unique sequential number to each row within a partition.
ROW_NUMBER() OVER (
[PARTITION BY expr1, expr2,...]
ORDER BY expr1 [ASC | DESC], expr2,...
)
The window functions are a bit complex and we urge you to do more research about these commands.
These commands are used in all data analysis processes hence if you want to perfect your analysis practice these commands using the open source databases.
Top comments (0)