The GROUP BY clause is mostly used with aggregate functions in sql. Aggregate functions in sql are different from single row functions in one main way: Single row functions return a single result row for EVERY row of a queried table or view. Aggregate functions performs a calculation on a set of values in a column and returns one single value. The 5 most common aggregate functions in sql are: COUNT(), MAX(), MIN(), AVG(),SUM(). These do exactly what they sound like they do, for example:
select max(supply) from fruit_imports;
The other main difference between aggregate functions and single row functions is that while you use the "WHERE" clause to filter information in a single row function, you would use the "HAVING" clause to filter information in an aggregate function query (examples and screenshots coming up, don't worry)
In the graphic above, we see the anatomy of a GROUP BY query.
If you are using a GROUP BY to group by a particular column, you must specify that column in the GROUP BY clause. Any non-aggregate (not a MIN,MAX,COUNT,AVERAGE or SUM) column must be mentioned in the GROUP BY clause. Please see the screenshot below:
In the screenshot above, there are two columns, the sum_supply column and the state column. The sum_supply column is the aggregate column and the state is the non-aggregate column. The state column is the one that is included in the GROUP BY clause because it is non-aggregated column.
Having is used to filter group data. If you are filtering data with the HAVING clause, it must come after the GROUP BY clause. HAVING is used on aggregate functions in lieu of the WHERE clause.
SELECT first_name, COUNT(*) FROM employees GROUP BY first_name HAVING COUNT(*)>2;
The results in the screenshot show all the employee names that are mentioned more than 2 times. That means in the full list of employee first names, there are 3 Roslyns and 3 Billies.
Thanks for reading! Hope this informative. Please feel free to comment, clarify or add to this :)