Originally published on my blog: https://www.kevinpeters.net/how-to-query-with-postgre-sql-wildcards-like-a-pro/
How to filter with any characters or wildcards in PostgreSQL
# Get all entries like Melissa, Melli.
# Not depending on how many characters are wildcarded.
SELECT * FROM example_table WHERE example_column LIKE 'Mel%';
# Get all entries like Meli, Melo.
# Depending on how many characters are wildcarded.
SELECT * FROM example_table WHERE example_column LIKE 'Mel_';
# Or for multiple characters, here 2
SELECT * FROM example_table WHERE example_column LIKE 'Mel__';
# Get all entries which include mel, not case-sensitive.
# To match: Melissa, Emelio, Mel
SELECT * FROM example_table WHERE LOWER(example_column) LIKE '%mel%';
# With an ILIKE expression
SELECT * FROM example_table WHERE example_column ILIKE '%mel%';
# With a regular expression (Regex)
SELECT * FROM example_table WHERE example_column ~* 'mel';
Quicklinks
- Basic filtering in PostgreSQL or SQL
- The
%
wildcard operator in a nutshell - The
_
wildcard operator in a nutshell - Search with dynamic cases or full text search
- Conclusion
Basic filtering in PostgreSQL or SQL
PostgreSQL wildcards are an important tool for searching in SQL databases. Most of the time text in PostgreSQL databases is saved in columns as TEXT
or VARCHAR
type. The general recommendation is to use TEXT
though. It is no problem though if something like the VARCHAR
type is used. All the solutions work the same here in PostgreSQL.
Searching for an exact text in a column is quite easy with a statement like:
SELECT * FROM example_table WHERE example_column='Mel';
From a table with a lot of entries of fictional person data, it will filter all rows which have the column example_column
with the value Mel
.
id | ... | example_column | ... |
---|---|---|---|
... | ... | ... | ... |
203 | ... | Max | ... |
204 | ... | Michael | ... |
205 | ... | Mel | ... |
206 | ... | Melissa | ... |
207 | ... | Mel | ... |
208 | ... | Taylor | ... |
209 | ... | Emelio | ... |
... | ... | ... | ... |
After executing the query it would filter out all non-highlighted rows above, resulting in the following table:
id | ... | example_column | ... |
---|---|---|---|
205 | ... | Mel | ... |
207 | ... | Mel | ... |
Exactly what we wanted to achieve. But for more advanced use cases like filtering all names which start with Mel
, we would need to use wildcards.
The %
wildcard operator in a nutshell
In our previous dataset, we filtered all values which start with Mel
in the exact casing. Now we filter all rows which have the value like Mel out of our original dataset. This can be achieved with a wildcard character. In this case, here it is not important how many characters are wildcarded. This basically means it could be anything like Melissa
, Melloremipsum
or Mel
. For getting this to work, PostgreSQL offers the character %
. With this, we can build a query in theory which can match all of the words above. There is one nuance though, instead of using the WHERE ... = '...'
filter, we need to use a statement called LIKE
. The statement will look like SELECT * FROM ... WHERE ... LIKE '...'
.
id | ... | example_column | ... |
---|---|---|---|
... | ... | ... | ... |
203 | ... | Max | ... |
204 | ... | Michael | ... |
205 | ... | Mel | ... |
206 | ... | Melissa | ... |
207 | ... | Mel | ... |
208 | ... | Taylor | ... |
209 | ... | Emelio | ... |
... | ... | ... | ... |
To get these results the query could look like:
SELECT * FROM example_table WHERE example_column LIKE 'Mel%';
And it would result in the following table as the SQL query result.
id | ... | example_column | ... |
---|---|---|---|
205 | ... | Mel | ... |
206 | ... | Melissa | ... |
207 | ... | Mel | ... |
The %
sign in the query acts as a match for any character. But not even just a match. It is a match for zero to infinite characters. That is why this operator is matching Mel
and Melissa
. A lot of use cases result out of this and that is why this is the most used wildcard in SQL or PostgreSQL overall.
The _
wildcard operator in a nutshell
In the last search, we have defined infinite wildcards with the %
which match strings with zero to an infinite amount of characters and other filters. Sometimes we have use-cases to just wildcard a specific character in some text. For example, there could be types with or without a single number attached. These types could be SALE
but also SALE1
or SALE8
. In this example, our matches look really similar, but if we know that there is just one number in the matching string, we could use the wildcard operator _
to match exactly the cases we want. For our sale types, we could use SALE_
.
In our example table, we could have different results based on this. To make a good search though, we could find all the names in the table which have three characters but must start with M
. In the example table, there are three entries that will match the condition.
id | ... | example_column | ... |
---|---|---|---|
... | ... | ... | ... |
203 | ... | Max | ... |
204 | ... | Michael | ... |
205 | ... | Mel | ... |
206 | ... | Melissa | ... |
207 | ... | Mel | ... |
208 | ... | Taylor | ... |
209 | ... | Emelio | ... |
... | ... | ... | ... |
The highlighted results in this table should be queried for with a PostgreSQL wildcard. The query is looking quite easy again.
SELECT * FROM example_table WHERE example_column LIKE 'M__';
An important point here is that the query is using two PostgreSQL wildcard characters which is a simple underscore _
. The reason for this is that one underscore _
will replace one character in the queried value.
After executing the query with the PostgreSQL wildcard, it will output the following result:
id | ... | example_column | ... |
---|---|---|---|
203 | ... | Max | ... |
205 | ... | Mel | ... |
207 | ... | Mel | ... |
This result is not a surprise and resulted in our expected output. It includes the rows for the example_column
s Max
and Mel
, where for Mel
there exist two rows.
Search with dynamic cases or full-text search
Another problem most people are facing is to search in different cases with PostgreSQL wildcards. This functionality in most programming languages is called include
or similar. In JavaScript, the includes
prototype function is existing for arrays. This can be used to check if an array includes a specific value. This works relatively well for primitive data types.
const numbers = [1, 4, 9];
const includesFour = numbers.includes(4);
console.log(includesFour);
The above script will output true
to the console since the list includes the number 4
. The includes
functionality can also be used for texts like:
const name = "Emilio";
const includesPattern = name.includes("mel");
console.log(includesPattern);
This script also outputs true
to the console. A problem here is that names like Melissa
would not be matched with the same script but just exchanging the name because the functionality is case-sensitive.
const name = "Melissa";
const includesPattern = name.includes("mel");
console.log(includesPattern);
// Outputs false
A simple trick is existing in almost all programming languages to make the incoming value, which is the name in this case, lowercase. Basically it will transform all letters of a word to the small version of the letter. Melissa
would get to melissa
.
In JavaScript this is done in a way like this:
const name = "Melissa";
const includesPattern = name.toLowerCase().includes("mel");
console.log(includesPattern);
// Outputs true
The includes
functionality is not existing in PostgreSQL or SQL in general. But for these cases, the PostgreSQL wildcards exist. As in the section earlier mentioned, there is an operator %
which matches zero to an infinite amount of random characters in a word. We can use this operator to match our desired rows.
id | ... | example_column | ... |
---|---|---|---|
... | ... | ... | ... |
203 | ... | Max | ... |
204 | ... | Michael | ... |
205 | ... | Mel | ... |
206 | ... | Melissa | ... |
207 | ... | Mel | ... |
208 | ... | Taylor | ... |
209 | ... | Emelio | ... |
... | ... | ... | ... |
All of the highlighted rows have Mel
or mel
of the value of the example_column
included. A simple approach here is to write a query which includes both like:
SELECT * FROM example_table
WHERE example_column LIKE '%mel%'
OR example_column LIKE '%Mel%';
id | ... | example_column | ... |
---|---|---|---|
205 | ... | Mel | ... |
206 | ... | Melissa | ... |
207 | ... | Mel | ... |
209 | ... | Emelio | ... |
This is quite tedious, if there are more cases then just the first letter changing. The more dynamic approach to this would be to convert the example_column
value to lower-case to work in a simpler way with it. To enable this feature and adjust the query, we just have to use the LOWER
functionality of PostgreSQL. The functionality should be applied to the example_column
column. The SQL query for this scenario would look similar to this:
SELECT * FROM example_table WHERE LOWER(example_column) LIKE '%mel%';
It would result in the exact same table output as before but the query looks far more condense and understandable.
id | ... | example_column | ... |
---|---|---|---|
205 | ... | Mel | ... |
206 | ... | Melissa | ... |
207 | ... | Mel | ... |
209 | ... | Emelio | ... |
Another approach would be to use the ILIKE
keyword. This keyword is supported in a large amount of PostgreSQL versions. It is not in the SQL standard which means that it might be not transferrable to other SQL dialects. Our SQL query from before would look like the following snippet:
SELECT * FROM example_table WHERE example_column ILIKE '%mel%';
This looks a bit cleaner, but it is good to mention that this is a bit slower than the LOWER
conversion which we have done before. But it is a good alternative for case-insensitive SQL queries.
Another alternative would be to use a Regular expression (also called Regex).
This could be realized with an SQL query like the following.
SELECT * FROM example_table WHERE example_column ~* 'mel';
This query is even slower, but I would not recommend using Regular Expressions in these cases because they are hard to understand for a lot of people, even though the expressions are much more flexible. Sometimes you need to have the flexibilities and then regular expressions are exactly what you want to use.
All of these approaches here work in simple solutions for full-text searches. But when performance is important for your project it might be worth it to look into to_tsvector
and to_tsquery
and proper indexing of your PostgreSQL tables. You can find more information about the text search in tables in the PostgreSQL documentation here: https://www.postgresql.org/docs/current/textsearch-tables.html.
Conclusion
PostgreSQL wildcards are an amazing tool to extend queries. Queries are adjusted to include a wide range of dynamic filtering which is important for systems with big data. Most of the time this data is unstructured on the value-base. A good example is the sale types mentioned in an earlier example which could be SALE1
or SALE2
. For some sub-systems, this difference might be important but for some systems, it is not important. Also, the data should rather be structured to include a detailed_...
column for these cases to achieve SQL normalization. With the wildcards, we can search for these "unnormalized" values in an easy way.
Top comments (4)
Since you're running on Postgres, you can use
ILIKE
for a case-insensitiveLIKE
instead of multiple conditions! There are also a couple different flavors of regular expression which are even more flexible.The leading wildcard makes your "full-text search" non-sargable and therefore likely to perform poorly on larger tables or text fields. For actual full-text search you'll want to use tsvectors and to_tsquery().
Thanks, knew about
ILIKE
but this is just an extension for PostgreSQL. Regular expressions are of course more flexible. Also thanks for mentioning the full-text search. I will extend the article to include these points.Funny, I was just thinking about naming my next child Melloremipsum.
Great article! I always forget that you can use underscore in a LIKE.