Skip to main content

Last week I found myself wondering about wildcard searches in SQL and Python. And to my own surprise, I found myself wishing for a SQL functionality in Python! For a python project that we worked on for the academy last week, we had to use wildcard searches in both languages and I actually preferred the SQL style above the options available in Python.

SQL

Since the SQL functionality is so simple, we will look at this one first. The function LIKE will search for a string in another string.

SELECT ex_column
FROM ex_table
WHERE ex_column LIKE 'pattern';

This block of code will select all values from the ex_column that adhere to the pattern. When you fill in 'a' as pattern, for example, only values from ex_column will be returned that are exactly a lower case a. When you include the wildcard '%' in the pattern this represent zero or more characters. So if the pattern would be '%a%', then the select statement returns all values from ex_column that include a lower a. If you want to specify that there is exactly one character, use the underscore wildcard to represent this in a pattern. The pattern '_a_' will return any value that has exactly one character, then a lower a, and then exactly one character again.

Python

Most functionality in Python for wildcard searches relies on the use of packages as this Stackoverflow question confirms. The accepted answer uses the fnmatch module. The method that I was most familiar with is the re module that allows the use of regex. Of course, Python is know for the fact that it has a lot of functionalities – through modules. Both of these modules are absolutely acceptable ways to achieve a wildcard search, but will involve some extra steps.

What if we want to achieve a wildcard search with pure python code? I know that searching for a string in a string is easily done in python.

print('a' in 'a')         #This will print True
print('a' in 'bbbba')     #This will print True
print('a' in 'bbbbabbbb') #This will print True
print('a' in 'bbbbbbbbb') #This will print False

So if the first string (here equal to the lowercase a) is anywhere in the second string, the statement is equivalent to True. By this result we could compare the simple in function with the SQL pattern '%a%'. What if we want the equivalent of '_a_'?

print('a' in 'bab'[1])  #This will print True
print('a' in 'babb'[1]) #This will print True as well

The second statement will evaluate as True as well, so we need to also check the length to make sure that we satisfy the pattern.

print('a' in 'bab'[1] and len('bab') == 3)   #This will print True
print('a' in 'babb'[1] and len('babb') == 3) #This will print False

As we can see, this is getting quite messy if compared to the SQL LIKE function. The difference becomes even more striking if we need to check something like '_a%a%a'. How to tell python to search for the middle lowercase a? At this point we do not know at which position it could occur in the string. We would have to count the number of occurrences of the lowercase a and make sure that is equal to three. There are string functions such as find() and count() that provide help with this, but in my humble opinion it does not compare to the SQL LIKE function.

What do you think? Do you have any better way to achieve wildcard searches in python without using a module?

Auteur

Leave a Reply