loading...

What happens when you IndexOf an empty string?

turnerj profile image James Turner ・2 min read

IndexOf seems like a fairly straight forward method, it looks for the index of a character (or set of characters - aka. a string) and tells you the index of where it appears. If you look for the index of something that isn't there it usually returns -1. (Usually because PHP and their strpos method return false instead)

What if we did the IndexOf an empty string - what do you think will happen?

//C#
"Hello World!".IndexOf(""); 


//JavaScript
"Hello World!".indexOf("");

Let's see if you guessed right...

.

.

.

It turns out if we do that, we get 0.

Uhhh, what?

Now for .NET, Microsoft does actually explain it a little:

Character sets include ignorable characters, which are characters that are not considered when performing a linguistic or culture-sensitive comparison. In a culture-sensitive search, if value contains an ignorable character, the result is equivalent to searching with that character removed. If value consists only of one or more ignorable characters, the IndexOf(String) method always returns 0 (zero) to indicate that the match is found at the beginning of the current instance.

This explanation though doesn't seem to explain why you would want it to return 0 for an empty string, just that it does.

With JavaScript, MDN briefly mentions that it does but it doesn't actually explain why. What makes it more confusing is the ECMAScript standards linked to from the MDN page don't even mention it yet Chrome, Firefox, Internet Explorer and Edge exhibit this behaviour.

I asked this on Twitter and Mateus mentioned a possible problem with it behaving this way.

While it might not be terribly often where you look for the index of a value and then compare that value from charAt(0), it shows where the behaviour of finding the index can lead to weird results.

Looking up the index of an empty string seems more like a "divide-by-zero" type problem to me. That C# and JavaScript (probably other languages too) do this though, there must be a valid reason for it.

What are your thoughts about it - should it return -1, 0 or maybe even throw an exception?

Do you know why it does this behaviour - let me know below!

Discussion

pic
Editor guide
Collapse
curtisfenner profile image
Curtis Fenner

I'd assume that the contract of s.indexOf(b) is to return the smallest integer r such that s.substring(r, r + b.length) == b.

When b is "", that result is 0.

In my opinion, most string functions like this shouldn't allow "" as an argument, because they are at least somewhat ambiguous. However, the unformity of allowing any string as the needle to search for is also nice, and in this particular case returning 0 is not so strange. I have more distaste for .replace("", "x") and .split("") which have truly ambiguous meanings.

Collapse
turnerj profile image
James Turner Author

You're right, .replace("", "x") and .split("") definitely have ambiguous meanings.

One thought I just had now though is thinking what if replace was using indexOf internally. Without a special case for an empty string, it could easily get stuck replacing the character at index 0 because indexOf said it found it there.

Collapse
merri profile image
Vesa Piittinen

You can give position as indexOf second parameter. This gives a plausible little case where you can figure out the length of a string without using length:

'abc'.indexOf('', Infinity)

Most likely useful use case I can think of would be when doing code golfing.

Other than that, I think the current functionality is perfectly fine. It is often a silly case, but I don't see a reason to block it. It does give a valid match to positions between and around characters.

Collapse
turnerj profile image
James Turner Author

Interesting idea using it to find out the length though that trick only works for JavaScript. In C#, you get an ArgumentOutOfRangeException unless you specify the start position to be within the length of the string.

I'm not sure how useful it would be for finding the position between characters as it can only tell you what the start position was.

Collapse
ahferroin7 profile image
Austin S. Hemmelgarn

I suspect it originated as an implementation detail or performance optimization, and is staying around mostly for backwards compatibility.

I can't comment on C# (I've never used C# before), but in JavaScript, String.prototype.indexOf() performs a rather fast substring search for a fixed string (usually faster than the same search made with a regex in fact).

Given that behavior, it follows logically that an empty string should return index 0, because that's the first place in the string being searched that you can find an empty string (you can technically find one at every index between 0 and the length of the string). If we assume that the original implementors just thought you would never call it with an empty string (which actually makes some sense, the only case I can think of for this happening is if you're passing user input directly to indexOf()), then it would make sense that they wouldn't implement any checks for that case (and that would explain why it isn't well documented, it was thought to either be a use case that would never come up, or was thought to be self-evident).

At this point, it doesn't matter though, because it's been a language 'feature' for long enough that it can't be chanted without significant risk of breaking something.

Collapse
turnerj profile image
James Turner Author

If you were writing an IndexOf function, say it was a for-loop, you'd be comparing characters between the two strings. Comparing an empty string to any other value would be false. It would likely mean gettting to the end, not finding anything and then returning.

That said, I could see it returning 0 if the implement substring inside of IndexOf as as it might substring the length of the input and then compare two empty strings...

Still, it just feels wrong but I agree, even if people thought it was a problem, I doubt it will be fixed because of backwards compatibility.

Collapse
lawrencejohnson profile image
Lawrence

I think in order to have an opinion, I'd have to understand why I'd be passing an empty string into IndexOf. The only reason I can think you might encounter that is if you are working with dynamic input either from a user or from an external source, but I can't think of a reason you'd be using IndexOf for that scenario. Can you give a real world example in which you would ever do this?

Collapse
turnerj profile image
James Turner Author

I'm thinking things like a basic text search function where you might want to not only say that it is found but where in the document. Returning 0 gives the false impression it is found in the string.

The only time I feel like it could return 0 is if the string it was looking the index up in itself was empty. Even then, it is still searching for nothing and finding an index.