DEV Community

Cover image for 5 Essential SQL Skills Learned from a Decade of Experience
David Au Yeung
David Au Yeung

Posted on

5 Essential SQL Skills Learned from a Decade of Experience

In my ten years of working with SQL, I've honed several critical skills that significantly enhance database management and data manipulation. Here’s a detailed tutorial on these skills, complete with practical examples.

Exercise Setup

--Your Preparation
CREATE TABLE Customers (
    CustomerUID         UNIQUEIDENTIFIER DEFAULT (NEWID()) NOT NULL,
    CustomerNumber      BIGINT IDENTITY(1,1) NOT NULL,
    LastName            NVARCHAR(100)    NOT NULL,
    FirstName           NVARCHAR(100)    NOT NULL,
    DOB                 DATE             NOT NULL,
    IsDeleted           BIT              NOT NULL DEFAULT 0,
    CreateBy            NVARCHAR(100)    NOT NULL DEFAULT 'SYSTEM',
    CreateDate          DATETIME         NOT NULL DEFAULT GETDATE(),
    ModifyBy            NVARCHAR(100)    NOT NULL DEFAULT 'SYSTEM',
    ModifyDate          DATETIME         NOT NULL DEFAULT GETDATE(),  
    CHECK (YEAR(DOB) >= 1900),
    PRIMARY KEY (CustomerUID) 
);

CREATE TABLE Products (
    ProductUID          UNIQUEIDENTIFIER DEFAULT (NEWID()) NOT NULL,
    ProductName         NVARCHAR(1000)   NOT NULL,
    ProductCode         NVARCHAR(1000)   NOT NULL,
    AvailableQuantity   INT              NOT NULL,
    IsDeleted           BIT              NOT NULL DEFAULT 0,
    CreateBy            NVARCHAR(100)    NOT NULL DEFAULT 'SYSTEM',
    CreateDate          DATETIME         NOT NULL DEFAULT GETDATE(),
    ModifyBy            NVARCHAR(100)    NOT NULL DEFAULT 'SYSTEM',
    ModifyDate          DATETIME         NOT NULL DEFAULT GETDATE(),
    CHECK (AvailableQuantity >= 0),
    PRIMARY KEY (ProductUID)
);

CREATE TABLE Orders (
    OrderUID            UNIQUEIDENTIFIER DEFAULT (NEWID()) NOT NULL,
    CustomerUID         UNIQUEIDENTIFIER,
    OrderNumber         NVARCHAR(1000)   NOT NULL,
    OrderDate           DATETIME         NOT NULL,
    IsDeleted           BIT              NOT NULL DEFAULT 0,
    CreateBy            NVARCHAR(100)    NOT NULL DEFAULT 'SYSTEM',
    CreateDate          DATETIME         NOT NULL DEFAULT GETDATE(),
    ModifyBy            NVARCHAR(100)    NOT NULL DEFAULT 'SYSTEM',
    ModifyDate          DATETIME         NOT NULL DEFAULT GETDATE(),
    PRIMARY KEY (OrderUID),
    FOREIGN KEY (CustomerUID) REFERENCES Customers(CustomerUID)
);

CREATE TABLE OrderItems (
    OrderItemUID        UNIQUEIDENTIFIER DEFAULT (NEWID()) NOT NULL,
    OrderUID            UNIQUEIDENTIFIER,
    ProductUID          UNIQUEIDENTIFIER,
    Quantity            INT              NOT NULL,
    IsDeleted           BIT              NOT NULL DEFAULT 0,
    CreateBy            NVARCHAR(100)    NOT NULL DEFAULT 'SYSTEM',
    CreateDate          DATETIME         NOT NULL DEFAULT GETDATE(),
    ModifyBy            NVARCHAR(100)    NOT NULL DEFAULT 'SYSTEM',
    ModifyDate          DATETIME         NOT NULL DEFAULT GETDATE(),
    PRIMARY KEY (OrderItemUID),
    FOREIGN KEY (OrderUID) REFERENCES Orders(OrderUID),
    FOREIGN KEY (ProductUID) REFERENCES Products(ProductUID),
);

--Create customers
INSERT INTO Customers (LastName, FirstName, DOB) VALUES
('Au Yeung', 'David', '19801231')
, ('Chan', 'Peter', '19820115')

--Create products
INSERT INTO Products (ProductName, ProductCode, AvailableQuantity) VALUES
('Android Phone', 'A0001', 100)
, ('iPhone', 'I0001', 100)

--David bought 10 iPhone
INSERT INTO Orders (CustomerUID, OrderNumber, OrderDate) VALUES
((SELECT TOP 1 CustomerUID FROM Customers WHERE FirstName = 'David' AND IsDeleted = 0)
, 'ORD0001'
, GETDATE())

INSERT INTO OrderItems (OrderUID, ProductUID, Quantity) VALUES
((SELECT TOP 1 OrderUID FROM Orders WHERE OrderNumber = 'ORD0001' AND IsDeleted = 0)
, (SELECT TOP 1 ProductUID FROM Products WHERE ProductCode = 'I0001' AND IsDeleted = 0)
, 10)

SELECT * FROM Customers
SELECT * FROM Products
SELECT * FROM Orders
SELECT * FROM OrderItems
Enter fullscreen mode Exit fullscreen mode

Skill #1: Finding Customers Without Orders Using LEFT JOIN

One common task is identifying customers who have not placed any orders. This can be efficiently done using LEFT JOIN combined with a check for NULL values.

Example:

SELECT c.*
FROM Customers c 
LEFT JOIN Orders o ON o.CustomerUID = c.CustomerUID AND o.IsDeleted = 0
WHERE o.OrderUID IS NULL;
Enter fullscreen mode Exit fullscreen mode

This query retrieves all customers who do not have associated orders, allowing you to target them for marketing or engagement strategies.

Skill #2: Avoiding Duplicates with NOT EXISTS

When inserting new records, especially in batch operations, ensuring that duplicates do not occur is crucial. Using NOT EXISTS can prevent this effectively.

Example:

IF NOT EXISTS (SELECT 1 FROM Products WHERE ProductName = 'iPhone') 
    INSERT INTO Products (ProductName, ProductCode) VALUES ('iPhone', 'I0001')
ELSE
    PRINT 'Duplicate Product Name!';
Enter fullscreen mode Exit fullscreen mode

This query checks if the product already exists before attempting to insert it, thus maintaining data integrity.

Skill #3: Enhancing Readability with Temporary Tables

Using temporary tables can simplify complex queries, especially when dealing with subqueries. This improves readability and maintainability of your SQL code.

Example:

SELECT ProductUID
INTO #BestSeller
FROM OrderItems
WHERE IsDeleted = 0
GROUP BY ProductUID
HAVING SUM(Quantity) > 5;

SELECT * FROM Products WHERE ProductUID IN (SELECT * FROM #BestSeller);

DROP TABLE IF EXISTS #BestSeller;
Enter fullscreen mode Exit fullscreen mode

Here, we create a temporary table to hold order IDs of best-selling items, making the subsequent query clearer.

Skill #4: Utilizing Common Table Expressions (CTEs) for Sequential Queries

CTEs are beneficial for creating more readable and organized queries, especially when dealing with sequential data or hierarchical relationships.

Example:

;WITH cte AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY CustomerUID ORDER BY CreateDate DESC) AS rn
    FROM Orders
)
SELECT * 
FROM cte
WHERE rn = 1;
Enter fullscreen mode Exit fullscreen mode

This CTE retrieves the most recent order for each customer, showcasing how CTEs can simplify complex logic.

Skill #5: Using Transactions for Data Integrity

When performing updates, especially those that could potentially affect large portions of your data, wrapping your operations in a transaction is essential. This practice allows you to ensure data integrity by either committing or rolling back changes.

Example:

BEGIN TRAN;

UPDATE Products
SET AvailableQuantity = 0
WHERE ProductCode = 'I0001' 
AND IsDeleted = 0;

-- Check the results before COMMIT
SELECT * FROM Products WHERE ProductCode = 'I0001';

-- Uncomment to commit or rollback
-- COMMIT;
-- ROLLBACK;
Enter fullscreen mode Exit fullscreen mode

This transaction ensures that your updates are only applied if you are satisfied with the results, helping to avoid unintended consequences.

Conclusion

These five skills—using LEFT JOIN to find unmatched records, preventing duplicates with NOT EXISTS, enhancing query readability with temporary tables, utilizing CTEs for complex queries, and ensuring data integrity with transactions—are invaluable in SQL. Mastery of these techniques can significantly improve your efficiency and effectiveness in database management.

Feel free to share your experiences or ask questions in the comments below!

Top comments (24)

Collapse
 
aaronre16397861 profile image
Aaron Reese

Thanks for taking the time to craft a post on SQL. I have to call out a couple of things.
1) I'm not sure which dialect of SQL you are using but it is generally not recommended to use GUIDs or UUIDs as identity keys. There are many reasons but the primary one is index fragmentation.
2) on #2 (if exists) you say it is a good way to prevent duplicates when doing batch updates however your code leads to individual transactions for each update which will be magnitudes slower than a set-based update. In a batch situation this should be a left joint to the target table and only insert records where the joined table identifier is null. Ideally duplicate records would be prevented by a Unique Key.
3) I know code examples are simplified but your temp table would be better as a CTE. By using a temp table you lose any indexing optimisations and table locking as the underlying source data could change after building the temp table.
4) your transaction example is valid, however it is more relevant where you need to update multiple tables or records where system integrity demands all succeed or all fail. Examples would be a} update order status to shipped AND reduction qtys from product availability, b) debit my account and credit your account in a bank transfer.
In your example no other process can update the product table (and possibly read from it) until you choose to commit or rollback.

Collapse
 
auyeungdavid_2847435260 profile image
David Au Yeung

Hi Aaron, thanks for joining the discussion.

Regarding your first point, while using BIGINT can be sufficient for most situations when considering performance, there are cases where GUIDs can be beneficial. In complex scenarios that involve multiple tables, databases, or servers, GUIDs can provide a significant advantage. Additionally, many replication scenarios often require the use of GUIDs to ensure uniqueness across different systems.

Collapse
 
aaronre16397861 profile image
Aaron Reese

I'm not saying don't include UUIDs, just don't use them as a primary key. In most scenarios the vast majority of records you are interested in are the most recent and so you want them grouped together in the index to make page writes faster and generate fewer pages splits (and have a higher page full percentage to save space). UUIDs by their nature are spread evenly over the pages so you need more of them, less full and the entire index has to be traversed to get records with a common profile (normally temporal)
If you need database unique identifier I would normally prefer TIMESTAMP to UUID.

Thread Thread
 
auyeungdavid_2847435260 profile image
David Au Yeung

Yes Aaron, you are right!

Thread Thread
 
bd9000 profile image
David Nuss

Agreed. Using UUIDs and GUIDs as indexes absolutely destroys performance.

Thread Thread
 
auyeungdavid_2847435260 profile image
David Au Yeung

Yes, David, I appreciate your feedback. I will incorporate better practices in my next example. Thank you!

Collapse
 
dan_maroff profile image
Dan Maroff

I think what Aaron is saying is that using GUID types as a primary key will create a fragmented clustered index. Primary keys should be a sequential type (INT, BIGINT) otherwise your joins will result in inefficient table scans. With that said, GUIDs as you mentioned are a great way to uniquely identify a records across different systems or environments. But it’s more efficient to put those GUIDs in a separate column and keep a sequential type as the primary key.

Thread Thread
 
auyeungdavid_2847435260 profile image
David Au Yeung

Thanks for your clarification, I got the point now.

Collapse
 
auyeungdavid_2847435260 profile image
David Au Yeung

Point 2 is particularly useful in real-world scenarios. Imagine receiving a ton of unstructured information that needs to be organized for various ad hoc promotional events. In such cases, you might find yourself willing to trade off some performance in exchange for improved data accuracy :)

Collapse
 
nh profile image
Neil Hoskins

Really useful article and well written.

How about putting your transaction in a TRY/CATCH? Also, use the SQL Query Analyzer and now, ask your AI friend how to maximize efficiency. I've learnt more from doing this than any book I've read (I've been working with MS SQL Server since v7).

Everyone has a different opinion on how to do things (just read the comments here). The key for me is readability (#3), it's like coding software, use comments, format your code, just remember that someone else will inevitably have to maintain your code for you at some point.

Collapse
 
auyeungdavid_2847435260 profile image
David Au Yeung

Well said, Neil. Using TRY...CATCH with transactions is indeed common in stored procedures, especially for handling issues like division by zero. However, my main point in this article is to emphasize that using BEGIN TRAN is a best practice for everyone. It helps prevent human errors during data manipulation, particularly when working long hours 😉

Collapse
 
auyeungdavid_2847435260 profile image
David Au Yeung

And what kind of SQL analyzer you recommend? I usually only use execution plan for analysis.

Collapse
 
nh profile image
Neil Hoskins

I use 'Estimated Plan' in Azure Data Studio, so same. There are other tools on the market though, I tend to stick with what I know works for me.

Collapse
 
auyeungdavid_2847435260 profile image
David Au Yeung

If you want to learn more about SQL, please comment here :)

Collapse
 
ecstatic_niya profile image
Niya

Interested

Collapse
 
ladetunji profile image
Ladetunji OSIBANJO

I'd love to

Collapse
 
donsmak profile image
donsmak

damn thats so interesting thanks alottt

Collapse
 
auyeungdavid_2847435260 profile image
David Au Yeung

I will share more about SQL later, thanks for your support ;)

Collapse
 
nasma_agencywork_0b81c971 profile image
Comment deleted
Thread Thread
 
nasma_agencywork_0b81c971 profile image
Comment deleted
Thread Thread
Collapse
 
dev_olatunji profile image
Oladipupo Isaac Tunji

This is amazing. Thanks for sharing.

Collapse
 
aniruddhaadak profile image
ANIRUDDHA ADAK

wow amazing .

Collapse
 
dirktay profile image
Dirk Taylor

Thanks!

Collapse
 
jeremymoorecom profile image
Jeremy Moore

Thank you for sharing

Some comments may only be visible to logged-in visitors. Sign in to view all comments.