DEV Community

A Weird Way to Substring in C++

darkmage on August 09, 2019

/* author: mike bell twitter: @therealdarkmage date: Fri Aug 9 5:12 PM website: https://mikebell.xyz I was playing around and discovered a w...
Collapse
 
codemouse92 profile image
Jason C. McDonald • Edited

The second use, ss, isn't particularly weird, actually. std::string is basically a wrapper around a c-string (char array).

A few things to consider:

  • C-strings are a linear data structure, stored in adjacent memory (as opposed to a list.)

  • C-strings have to end with the null terminator \0, which marks the end of the string. If a c-string lacked this, there would be no way to know when to stop; the actual length of a c-string is not stored anywhere. (A std::string might cache the length for efficiency reasons though. I haven't read implementation in a while, but I suspect it to be so.)

  • Applying the [] operator to a pointer is just performing pointer arithmetic.

  • & before a variable name is returning the memory address of the variable. In the case of std::string, the first thing in the object's memory is the internal c-string.

The "magic" is all happening on string ss = &s[2];...

  1. We get the address of s, which also happens to point to the beginning of the c-string inside of s.

  2. [2] is the same as address + 2. That means you're pointing to the third character in memory now.

  3. The c-string is read starting at that pointer, and goes until it encounters the \0.

  4. Said c-string is passed to the constructor for std::string, and is used to create ss.

My only caution with this method is that it makes assumptions about the implementation details of std::string. If you use another string class, it might not behave the same.

The safer way to do the same thing is to save the pointer to the c-string inside of s via s.c_str(), and then work with that pointer directly. And even then, you're still not as safe as if you just used std::string's own member functions, because if you get your pointer arithmetic wrong, you're going to have memory errors.

Collapse
 
barney_wilks profile image
Barney Wilks

A "fun" side effect of the fact that the C++ (and C) subscript operator is just pointer arithmetic means that

const char* array[] = { "Well", "This", "Is", "Strange" };

// value contains "Strange"
const char* value = (1 + 1) [array + 1];

is actually valid C++ 😆.

Because x [y] is the same as *(x + y), that means that

const char* x = (1 + 1) [array + 1];

is just the same as

const char* x = *(1 + 1 + array + 1);
// Or
const char* x = *(array + 3); // array [3]

It's worth only noting that this works where the subscript operator does use pointer arithmetic (C style arrays and pointers mostly I think) and not where the [] operator is overloaded - so you can't do

std::string v = "Hello World!";
char second_letter = 1 [v];

So I guess that means the original example of

string s = &"Hello, World"[7];
string ss = &s[2];

can be rewritten as

std::string s = & (2 * 3 + 1) ["Hello, World!" - 1];

if you were so inclined 😆.
Sometimes I worry about C++ - it doesn't exactly help itself at times... 😄

Obligatory godbolt for this:
godbolt.org/z/nHJOgs

Collapse
 
codemouse92 profile image
Jason C. McDonald

Sometimes I worry about C++ - it doesn't exactly help itself at times... 😄

Yes, but at least we get to play with esoteric hackery without needing a different language.

...and sometimes, ever so rarely, if the stars are aligned, that hackery comes in handy.

Collapse
 
therealdarkmage profile image
darkmage

I knew about the pointer arithmetic aspect from college, but I had no idea you could also do it with string constants! Thank you for the explanation!