DEV Community

Coral Kashri
Coral Kashri

Posted on • Originally published at cppsenioreas.wordpress.com on

The View of The String

Communication is a powerful tool of the humanity. It allows us to transfer ideas and thoughts from one to another. There are many ways of communications, and one of them is words. In order to communicate using words, we need the ability to understand each word in a sentence and to understand the way the words connect with each other. Words analysis today became a major topic in development, and a lot of AI tools trying to perform words analysis (NLP techniques and more). However strings might be painful in some legacy C++ versions (11/14), and the understanding that the language needs some more abilities to handle strings got attention in C++17, with std::string_view.

std::string

Before getting into details of why std::string_view is so important, we need to discuss first about the abilities of std::string.

std::string is basically a friendly wrapper for char*/char[]. It allows us to store a continuous memory allocation of chars, modifying it, iterating, and eventually displaying it. For example:


std::string str = "My str";
std::string prefix = "My ";
if (str.compare(0, prefix.size(), pre) == 0) {
    std::cout << str.substr(prefix.size()); // "str"
}

Enter fullscreen mode Exit fullscreen mode

Now, the compare function seems a little bit of C language, but there is another pitfall here. std::string::substr costs us with another std::string allocation. Because it doesn’t modify the original string instance, it returns a new string instance, which really doesn’t needed here. In order to avoid that we have to do something like that:


for (size_t i = prefox.size(); i < str.size(); ++i) {
    std::cout << str[i];
}

Enter fullscreen mode Exit fullscreen mode

Let’s see another usage example:


bool validate(const std::string& str) {
    std::string start = "lstart", stop = "lstop";
    return str.compare(0, start.size(), start) == 0 && str.compare(str.size() - stop.size(), stop.size(), stop) == 0;
}

Enter fullscreen mode Exit fullscreen mode

Here we don’t have any copies performed. But it’s easy to cause one if forgetting the & sign in the function’s signature. But again, C’s compare syntax.

std::string_view (C++17)

Since C++17 we can use string_view instance, to watch a continuous memory allocation which is already allocated. That means that we can get a sub string view, which supports iterations and comparisons actions, without allocating a new std::string instance for that, and avoiding C syntax, at the same time.


std::string str = "My str";
std::string prefix = "My ";
std::string_view str_v = str; // no allocation performed
if (str_v.substr(0, prefix.size()) == prefix) { // no allocation
    std::cout << str_v.substr(prefix.size()); // no allocation
}

Enter fullscreen mode Exit fullscreen mode

It means that for the validate function we can now simply pass a std::string_view without any need for const & specifiers:


bool validate(std::string_view str) {
    std::string start = "lstart", stop = "lstop";
    return str.substr(0, start.size()) == start && str.substr(str.size() - stop.size()) == stop;
}

Enter fullscreen mode Exit fullscreen mode

How does it work?

std::string_view is actually a structure which contains a pointer to the start of the chars buffer, and a size. This information is passed in the constructors, and extracted to a new instance in substr function. When constructing a std::string_view instance from a std::string instance, we actually using std::string::operator basic_string_view and than constructing a std::string_view from a std::string_view.

One step forward

std::string_view can also be constructed out of a char* instance, or from char* & size_t parameters. That means, that in case we only need to watch and analyse a compile time strings (that are being saved into the binary source, and therefore their addresses are valid to use), we can assign them directly to a std::string_view instance, instead of constructing a std::string instance at first.


std::string_view str = "My str"; // no string allocation
std::string_view prefix = "My ";
if (str.substr(0, prefix.size()) == prefix) {
    std::cout << str.substr(prefix.size());
}
bool validate(std::string_view str) {
    std::string_view start = "lstart", stop = "lstop"; // no string allocations
    return str.substr(0, start.size()) == start && str.substr(str.size() - stop.size()) == stop;
}

Enter fullscreen mode Exit fullscreen mode

* Important to know: when constructing a std::string_view instance out of char* without specifying the length, the length will be defined be the first null character. It’s important to use it carefully. We’ll discuss this issue further.

C++20/23 extensions

New standards brought new useful features to std::string_view and to std::string objects. In C++20 we got two new member functions: strats_with & ends_with (which are perfect for the examples above), and since C++23 we also have contains member function:


std::string_view str = "My str";
std::string_view prefix = "My ";
if (str.starts_with(prefix)) {
    std::cout << str.substr(prefix.size());
}
bool validate(std::string_view str) {
    return str.starts_with("lstart") && str.ends_with("lstop");
}

Enter fullscreen mode Exit fullscreen mode

Constexpr

All of the above functions can be made or used within a constexpr context. Since std::string_view doesn’t allocate any new data, it’s an open window for compile time programming.


constexpr std::string_view str = "My str";
constexpr std::string_view prefix = "My ";
if (str.starts_with(prefix)) {
    std::cout << str.substr(prefix.size());
}
constexpr bool validate(std::string_view str) {
    return str.starts_with("lstart") && str.ends_with("lstop");
}

Enter fullscreen mode Exit fullscreen mode

Bad practices & potential issues

std::string_view designed to allow better performances when analysing strings. However, there is always a trade-off between performances and safety, and that trade-off takes a major place when dealing with std::string_view.

Rule #1: Never return a std::string_view


std::string_view func() {
    std::string str;
    std::cin >> str;
    return str;
}

Enter fullscreen mode Exit fullscreen mode

This innocent function leads to an unsafe memory access. The str is allocating a new place on the heap for the input characters. After returning a std::string_view of this memory, in the destructor it releases the allocated memory. That means that the return std::string_view is now pointing to a released memory.

That being said, returning a std::string_view won’t always cause an unsafe memory access. Cases where the returned std::string_view pointing to a static memory, or to a memory that is accessible outside the function, are still valid, but can become invalid in the future, so the safe way is to forbid returning of std::string_view in any case.

Rule #2: Careful with null terminator

As mentioned before, null terminator is highly not recommended to use, and should always be kept in mind when using std::string_view.


std::string_view str = "my cool str";
str.remove_prefix(str.find(" "));
str.remove_suffix(str.size() - str.rfind(" "));
std::cout << str; // "cool" - OK
std::cout << str.data(); // "cool str"

Enter fullscreen mode Exit fullscreen mode

remove_prefix & remove_suffix only change the start & end point of the view. That means, that the remove_suffix function doesn’t insert a null terminator at the end, so printing the underlying data won’t be affected from it. In order to fix it we can modify the owner string (if such exists) or constructing a std::string from it, so it’ll perform it for us (without modifying the original string).


{ // Modifying owner
    std::string str = "cool str";
    std::string_view str_v = str;
    str_v.remove_suffix(4);
    str[4] = '\0';

    std::cout << str << "\n"; // "cool\0str"
    std::cout << str_v << "\n"; // "cool"
    std::cout << str_v.data(); // "cool"
}

{ // Allocating a new string
    std::string_view str = "cool str";
    str.remove_suffix(4);
    std::string modified_str(str);
    std::string_view mstr_v = modified_str;

    std::cout << str << "\n"; // "cool"
    std::cout << str.data() << "\n"; // "cool str"
    std::cout << modified_str << "\n"; // "cool"
    std::cout << mstr_v << "\n"; // "cool"
    std::cout << mstr_v.data(); // "cool"
}

Enter fullscreen mode Exit fullscreen mode

Rule #3: Don’t lose ownership

It’s important to remember that std::string_view doesn’t own the contained string, and therefore won’t protect or release it. In addition to UB or illegal memory access, it may cause in some cases memory leak (might be caused due to converting an existing code which uses std::string to use std::string_view):


const char* get() { return new char[]{"my new str"}; }

{
    std::string_view str = get();
    // Here we can call: delete str.data()
    str.remove_prefix(1); // Memory leak!
    // delete str.data() // Invalid call here. The pointer doesn't point to the allocated section head.
}

Enter fullscreen mode Exit fullscreen mode

Conclusion

std::string_view can be used to optimize both performance and readability in code sections which handle strings. However any usage comes with an additional responsibility to use it in the correct way, and avoiding unwanted behavior (especially on scaling and when code modifications enter to the picture). It is another example for a case when a great power is followed by a great responsibility.

Top comments (0)