loading...

str_view - null-termination-aware string-view class for C++

reg__ profile image Adam Sawicki Originally published at asawicki.info on 惻3 min read

tl;dr I've written a small library, which I called "str_view - null-termination-aware string-view class for C++". You can find code and documentation in the following GitHub repository. Read on to see full story behind it...

GitHub logo sawickiap / str_view

Null-termination-aware string-view class for C++

str_view

Null-termination-aware string-view class for C++.

Author: Adam Sawicki - http://asawicki.info
Version: 2.0.0-beta.1, 2020-04-26
License: MIT

Documentation: see below and comments in the code of str_view.hpp file.

Introduction

str_view is a small library for C++ It offers a convenient and optimized class that represents view into a character string It has a form of a single header file: str_view.hpp, which you can just add to your project All the members are defined as inline, so no compilation of additional CPP files or linking with additional libraries is required.

str_view depends only on standard C and C++ library. It has been developed and tested under Windows using Microsoft Visual Studio 2019, but it should work in other compilers and platforms as well. If you find any compatibility issues, please let me know. It works in both 32-bit and 64-bit code.

The class is defined as str_view_template, because it'sā€¦

Let me disclose my controversial beliefs: I like C++ STL. I think that any programming language needs to provide some built-in strings and containers to be called modern and suitable for developing large programs. But of course I'm aware that careless use of classes like std::list or std::map makes program very slow due to large number of dynamic allocations.

What I value the most is RTTI - the concept that memory is automatically freed whenever an object referenced by value is destroyed. That's why I use std::unique_ptr all over the place in my personal code. Whenever I create and own an array, I use std::vector, but when I just pass it to some other code for reading, I pass raw pointer and number of elements - myVec.data() and myVec.size(). Similarly, whenever I own and build a string, I use std::string (or rather std::wstring - I like Unicode), but when I pass it somewhere for reading, I use raw pointer.

There are multiple ways a string can be passed. One is pointer to first character and number of characters. Another one is pointer to first character and pointer to the next after last character - a pair of iterators, also called range. These two can be trivially converted between each other. Out of these, I prefer pointer + length, because I think that number of characters is slightly more often needed than pointer past the end.

But there is another way of passing strings common in C and C++ programs - just one pointer to a string that needs to be null-terminated. I think that null-terminated strings is one of the worst and the most stupid inventions in computer science. Not only it limits set of characters available to be used in string content by excluding '\0', but it also makes calculation of string length O(n) time complexity. It also creates opportunity for security bugs. Still we have to deal with it because that's the format that most libraries expect.

I came up with an idea for a class that would encapsulate a reference to an externally-owned, immutable string, or a piece of thereof. Objects of such class could be used to pass strings to library functions instead of e.g. a pointer to null-terminated string or a pair of iterators. They can be then queried for length(), indexed to access individual characters etc., as well as asked for a null-terminated copy using c_str() method - similar to std::string.

Code like this already exists, e.g. C++17 introduces class std::string_view. But my implementation has a twist that I'm quite happy with, which made me call my class "null-termination-aware". My str_view class not only remembers pointer and length of the referred string, but also the way it was created to avoid unnecessary operations and lazily evaluate those that are requested.

  • If it was created from a null-terminated string:
    • c_str() trivially returns pointer to the original string.
    • Length is unknown and it is calculated upon first call to length().
  • On the other hand, if it was created from a string that is not null-terminated:
    • Length is explicitly known, so length() trivially returns it.
    • c_str() creates a local, null-terminated copy of the string upon first call.

If you consider such class useful in your C++ code, see GitHub - sawickiap/str_view project for code (it's just a single header file), documentation, and extensive set of tests. I share this code for free, on MIT license. Feel free to contact me if you find any bugs or have any suggestions regarding this library.

Posted on by:

reg__ profile

Adam Sawicki

@reg__

All opinions are my own and do not reflect that of my employer.

Discussion

markdown guide
 

hey! just wanted to share that you can embed github repos with this syntax:

{% github https://github.com/sawickiap/str_view %}

 

Wow, I didn't know that, thanks!