Originally published on February 22, 2022 at https://rivea0.github.io/blog
When working with strings, there might come a time that you might want to check if the string starts with or ends with another given string. Luckily, JavaScript and Python have their own built-in function to do the job, aptly named startsWith()
& endsWith()
in JavaScript, and startswith()
& endswith()
in Python. However, not to reinvent the wheel, but let's say we want to implement them our own way. Because, why not?
Negative Indexing
One thing that might be helpful, before we start off, is the concept of negative indexing. For example, not in all languages, but the last character of a string can be accessed with the index number -1. The second to last character will be -2, and so on. Python allows the use of negative indexes for strings (and for most iterables), and JavaScript's slice
method also allows negative indexing. These will come in handy.
Python example:
name = 'David'
name[-1] # d
name[-2] # i
We cannot access the character directly with negative indexes in JavaScript as it will return undefined
, but we can use slice
:
let name = 'David';
name[-1] // undefined
name.slice(-1) // d
name.slice(-2) // id
Implementing endsWith
Now, let's check if a string ends with another given string. Now that we know that negative indexes start from the end of the string, we can try something like this:
Python example:
name = 'David'
target = 'vid'
name[-len(target):] == target # True
JavaScript example:
let name = 'David';
let target = 'vid';
name.slice(-target.length) === target // true
We can take a look at what we did one by one, so that it's more clear. The first thing we see is that we get target
's length, which will be in our example's case, 3 (the length of 'vid'
). And, with negative indexing, we started from -3rd index of our original string and just compared the two. name.slice(-target.length)
will start from the -3rd index of name
up to the end of the string, which will be 'vid'
and voilà! — they're the same.
It is a nice, one-liner way to do it. Now let's try our hand at startsWith
, which will be easier than this one.
Implementing startsWith
We'll use the same components, slicing and using the target string's length. Let's do it.
Python example:
name = 'David'
target = 'Dav'
name[:len(target)] == target # True
JavaScript example:
let name = 'David';
let target = 'Dav';
name.slice(0, target.length) === target // true
Slicing the original string from the start to the length of the target string, gives us the string with the same length of target
. So, name.slice(0, target.length)
in this case, starts from the start of the string and goes up to the 3rd index (length of 'Dav'
). We only check if the two strings are the same, and that's it.
Dissecting the Implementations
We have written great one-liners, and just implemented our own way to do startsWith
and endsWith
. Here are the functions (let's write the function names in snake case so as not to confuse ourselves with the built-in ones):
In Python:
def starts_with(string, target):
return string[:len(target)] == target
def ends_with(string, target):
return string[-len(target)] == target
In JavaScript:
function starts_with(string, target) {
return string.slice(0, target.length) === target;
}
function ends_with(string, target) {
return string.slice(-target.length) === target;
}
These are fine, but what about implementing the same logic another way? Maybe, with another language? One that will help us think in lower-level.
My initial thought was that it would be something like this in C (spoiler: it was naive.):
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
bool starts_with(char *string, char *target) {
int target_length = strlen(target);
for (int i = 0; i < target_length; i++) {
if (string[i] != target[i]) {
return false;
}
}
return true;
}
bool ends_with(char *string, char *target) {
int target_length = strlen(target);
int starting_index = strlen(string) - target_length;
for (int i = 0; i < target_length; i++) {
if (string[starting_index + i] != target[i]) {
return false;
}
}
return true;
}
However, I was corrected that this is indeed problematic.
Here is the simpler and correct versions of starts_with
and ends_with
:
bool starts_with(char const *string, char const *target) {
for ( ; *target != '\0' && *target == *string; ++target, ++string );
return *target == '\0';
}
bool ends_with(char const *string, char const *target) {
char const *const t0 = target;
for ( ; *target != '\0'; ++string, ++target ) {
if ( *string == '\0' ) return false;
}
for ( ; *string != '\0'; ++string );
size_t const t_len = (size_t)(target - t0);
return strcmp( string - t_len, t0 ) == 0;
}
What we do in starts_with
is the same idea, only that we compare each character of our original string and the target string until target
ends; also handling the case if target
is longer than string
— in which case it would return false.
In ends_with
, we first check to see if target
is longer than string
(in that case, we would immediately return false). Then, using the target
's length (t_len
), we compare the string
's end of t_len
characters with our target string (t0
).
Here's the whole code:
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <string.h>
// Function prototypes
bool starts_with(char const *string, char const *target);
bool ends_with( char const *string, char const *target );
int main(void) {
char const *str = "David";
char const *target_end = "vid";
char const *target_start = "D";
// prints "true"
printf("%s\n", starts_with(str, target_start) ? "true" : "false");
// prints "true"
printf("%s\n", ends_with(str, target_end) ? "true" : "false");
}
bool starts_with(char const *string, char const *target) {
for ( ; *target != '\0' && *target == *string; ++target, ++string );
return *target == '\0';
}
bool ends_with( char const *string, char const *target ) {
char const *const t0 = target;
for ( ; *target != '\0'; ++string, ++target ) {
if ( *string == '\0' ) return false;
}
for ( ; *string != '\0'; ++string );
size_t const t_len = (size_t)(target - t0);
return strcmp( string - t_len, t0 ) == 0;
}
And now, time for some introspection.
Did we reinvent the wheel? Maybe.
Was it a problem that already been solved? That's what it was.
But, have we had some fun along the way? Well, depends on you, but I certainly did.
Top comments (6)
You should be passing
char const*
around; you should be usingsize_t
, notint
. Calculatingstrlen()
in advance is more work that necessary: just iterate until you encounter the\0
byte at the end of the string (which is whatstrlen()
does). You're also not checking for the cases wheretarget
is longer thanstring
which means your code will likely core dump.Here's a simpler (and correct) implementation of
starts_with()
:I'll leave a simpler (and correct) implementation of
ends_with()
as an exercise for the reader.Thank you for pointing these out. I only have a very basic understanding of C from a general introductory course, so I should've probably not even attempted writing about it in the article.
I guess
ends_with()
could also work like this:But, I'm not sure how I could avoid using
strlen()
here, and do it in a similar way to yourstarts_with()
.I'll update the article later on, and also would like to apologize for being quick to write a C example while still being quite the beginner, but, lesson learned. Thank you.
Nope.
size_t
is an unsigned type. Iftarget
is longer thanstring
, you'll end up with a very large positive number.For
ends_with()
, the simplest solution does usestrlen()
:However, that's slightly inefficient. If
t_len
is much longer thans_len
, then you've wasted time traversing to the end oft
. This version is more efficient:You're determining
t_len
while checking to see if you've gone past the end ofs
: if you have, thens
can't end witht
and you can just stop immediately. After, you scan for the end ofs
. Once you find it, then finally check for a match.The first example made perfect sense, but I struggle with
for ( ; s[ t_len ] != '\0'; ++s );
line from the second one.So, in the second version, with an example string being "hey" and target being "ey",
t_len
would be 2, if I understand it right. Since "ey" is not longer than "hey", we don't return false immediately. But, doesn't incrementings
until the length of target in the first for loop mean that s is now only at the last character "y". So,s[t_len]
confused me.Also, is
s0
needed since it's not used here?Sorry for asking noob questions, I'm surely missing a lot and confusing myself but trying to understand it, now even regret writing the example in the first place. Thank you for your time and patience.
You are correct:
s0
is not needed. (It was left over from an earlier version.) You've also found a bug. (Even "simple" code like this can be tricky!) I've edited the code with a corrected version.Thank you, again, for your help. I updated the article with the correct versions.