DEV Community

Cover image for Practicing String Manipulation in SQL
StrataScratch
StrataScratch

Posted on • Originally published at stratascratch.com

Practicing String Manipulation in SQL

A detailed walkthrough of the solution for a Google interview question to practice SQL String Manipulation.

With the wealth of data being captured by companies, not all of them will be structured and numerical. So today, our focus is to hone your skill in manipulating strings in SQL by introducing several advanced functions.

Interview Question Example to Practice SQL String Manipulation

Let’s dive into an example question from an interview at Google to practice SQL string manipulation. The question is entitled ‘File Contents Shuffle’. It asks:

Practicing String Manipulation in SQL

Link to the question: https://platform.stratascratch.com/coding/9818-file-contents-shuffle

Video Solution:

To understand the question a bit better, let’s have a look at the dataset we’re working with.

1. Exploring the Dataset

Practicing String Manipulation in SQL

The table google_file_store provides a list of text files with the filename as one column and its contents on the other. Both columns are string data.

The question asks us specifically to look at the record where the filename is final.txt’. Notice that there are punctuation marks and duplication of some words like ‘the’, ‘and’, and ‘a’.

When dealing with strings, always remember that data may not be ‘clean’. Watch out for punctuation marks, numbers, a mix of upper and lower cases, double spaces, and duplication of words. State how you’d like to deal with these scenarios or clarify this with your interviewer. For today, we will ignore them first.

The contents of the ‘final.txt’ need to be sorted alphabetically, returned in lowercase with a new filename called ‘wacky.txt’.

2. Writing Out the Approach

Practicing String Manipulation in SQL

Once you’ve fully understood the requirements of the question, formulate a plan of how you’ll build the solution. Oftentimes, you already have an idea of what this is but I strongly suggest writing this out step-by-step. This forces you to identify any gaps in your thinking or errors that you may have missed otherwise.

From the instructions alone, you could easily write out these steps:

  1. Filter the table where the filename is ‘final.txt’
  2. Sort its contents alphabetically
  3. Convert the words into lowercase
  4. Return the contents with ‘wacky.txt’ as the filename column

While this sounds simple at the start, there are several important steps missed. To avoid this, I would also encourage you to think about the input and output at each step.

For example, the output of Step 1 is:
SELECT * FROM google_file_store
WHERE filename = 'final.txt'

Practicing String Manipulation in SQL

The contents are in string format and are encoded in one row only so we cannot immediately sort out the words alphabetically. If, instead, each word has its own row, we can do the usual sort through the ORDER BY() clause.

So we need to prepare the data first so we can manipulate them more easily later on. Let’s call this the data preparation step with the aim of having a string convert to the column of words. This is how we will do it:

Data preparation:

  1. Convert the string into an array by splitting the text using a space to identify the individual words
  2. Explode the array column-wise so that each element in the array becomes its own row

This will allow us to proceed to Step 2 where we can sort the new column alphabetically and turn it into lower case.

Then, we would like to return the result as a string like this:

Practicing String Manipulation in SQL

And we cannot do that directly with the current format so another data transformation is required. This time, it has to be the reverse of Step 2 where the aim is to summarize the contents of a column into an array and stitch these elements together in a string format.

Data reformatting:

  1. Aggregate column into an array
  2. Combine elements of the array using a space, returning this as a string

Therefore, our full approach follows:

  1. Filter the table where the filename is ‘final.txt’
  2. Data preparation: a) Convert the string into an array using space as a delimiter b) Explode the array column-wise
  3. Sort its contents alphabetically
  4. Convert the words into lowercase
  5. Data reformatting: a) Aggregate column into an array b) Combine elements of the array using a space, returning this as a string
  6. Return the contents with ‘wacky.txt’ as the filename column

Don’t you feel more confident about tackling the question now that you have the steps written out? This will also provide you a good reference point if you ever feel stuck in the interview.

3. Coding the solution

Practicing String Manipulation in SQL
Let’s code up the query.

1). Filter the table
First, let’s only look at the file ‘final.txt’. We can do this through using an equality condition in the WHERE() clause since we know the exact filename we are looking for.

SELECT * FROM google_file_store
WHERE filename = 'final.txt'

However, if we only knew it started with ‘final’, we can use the LIKE() or ILIKE() function. These two functions are used to match strings based on a given pattern. The only difference is that LIKE() is case sensitive and ILIKE() is not.

Here, we can use the ILIKE()function with the wildcard operator, %, representing zero or more characters. This allows us to retrieve the records where the filename starts with ‘final’.

SELECT * FROM google_file_store
WHERE filename ILIKE 'final%'

Practicing String Manipulation in SQL

2). Data preparation
Next, let’s prepare the data for manipulation. We will use the STRING_TO_ARRAY() function which takes in a string and converts this to an array (or a list). The elements in this array are based on the delimiter we specify. So if we use a space as a delimiter, it creates an individual element whenever it sees a space. Essentially, it will break up our text into words like this:

SELECT STRING_TO_ARRAY(contents, ' ') AS word
FROM google_file_store
WHERE filename ILIKE 'final%'

As you can see, arrays provide a lot of information at one go but we cannot access or analyze its contents easily so a common manipulation done on arrays is ‘exploding’ them. We can do this with the UNNEST() function, which will take an array as an input and output a column where each array element becomes accessible as a separate row. Imagine this as a row-to-column transformation.

SELECT UNNEST (STRING_TO_ARRAY(contents, ' ')) AS word
FROM google_file_store
WHERE filename ILIKE 'final%'

String Manipulation in SQL

3). Sort the contents alphabetically
Having transformed our data earlier makes the sorting straightforward using the ORDER BY() function.

SELECT UNNEST (STRING_TO_ARRAY(CONTENTS, ' ')) AS word
FROM google_file_store
WHERE filename ILIKE 'final%'
ORDER BY word

String Manipulation in SQL

4). Convert the words into lowercase using LOWER()

SELECT LOWER(word) AS CONTENTS
FROM
(SELECT UNNEST (STRING_TO_ARRAY(contents, ' ')) AS word
FROM google_file_store
WHERE filename ILIKE 'final%'
ORDER BY word) base

String Manipulation

5). Data reformatting
Finally, to return the contents in a string format, we’ll do the reverse of the steps earlier.

First, we will aggregate the rows of the contents column into an array using the ARRAY_AGG()function. ARRAY_AGG() is an aggregate function so like your SUM() and AVG(), it will take a column and output a single row summarizing the set of values. But here, instead of performing a calculation, it will return an array listing all the values of the column.

SELECT ARRAY_AGG(LOWER(word)) AS contents
FROM
(SELECT UNNEST (STRING_TO_ARRAY(contents, ' ')) AS word
FROM google_file_store
WHERE filename ILIKE 'final%'
ORDER BY word) base

Then, we can return this as a text by combining these individual words. The ARRAY_TO_STRING() takes in an array, combines the individual elements using a specified delimiter like a space, and returns the output as a string.

In the same query, we’ll hardcode the filename as ‘wacky.txt’ so our final solution looks like:

SELECT 'wacky.txt' AS filename,
ARRAY_TO_STRING(ARRAY_AGG(LOWER(word)), ' ') AS contents
FROM
(SELECT UNNEST (STRING_TO_ARRAY(contents, ' ')) AS word
FROM google_file_store
WHERE filename ILIKE 'final%'
ORDER BY word) base

Bonus

For more advanced users of SQL, you may be familiar with the REGEX_SPLIT_TO_TABLE() function which gives the same output as the UNNEST(STRING_TO_ARRAY()) combination we used earlier.

REGEX_SPLIT_TO_TABLE() will take in a string, separate these by a delimiter and return a table with each element in a separate row.

This is helpful for more complex manipulations where the use of regex is required. In this example, however, the delimiter is simply a space so the code is:

SELECT regexp_split_to_table(contents, ' ') AS word
FROM google_file_store
WHERE filename ILIKE 'final%'

SQL String Manipulation

And this gives us the same result as we had in Step 2!

Conclusion

This was an interesting example to level up your SQL string manipulation skills and I hope you learned something new today.

If you ever find yourself stuck doing SQL string manipulation, remember you can transform the data in another format first if that makes the next steps easier. Converting strings to arrays is now one of the tricks up your sleeve to impress your interviewer.

Practice more SQL interview questions and test your new skills on our coding platform where you can look specifically for string-related questions.

Top comments (0)