DEV Community

Alexandra Kochetkova
Alexandra Kochetkova

Posted on

How to find job for Rescue Rangers: analyzing Godot Engine

Developing and playing games can be an incredibly engaging, addictive, and satisfying activity. But nothing spoils the gameplay experience like a tricky hidden bug. So, today we'll be looking at an open-source game engine—Godot Engine. Let's see how good it is, and if it's ready to give us unforgettable emotions of creating and playing games.

Godot

Godot is a versatile 2D and 3D game engine designed to power all kinds of projects. You can use it to create games or applications and then release them on desktop, mobile, or web platforms.

The engine is still young, but it's already gaining traction among those who appreciate open-source code, freeware, and good extensibility. Godot is quite beginner-friendly, so it's popular with novice developers.

Games like 1000 Days to Escape, City Game Studio: Your Game Dev Adventure Begins, Precipice, and others have been developed using the engine.

The version of Godot Engine that we used for the analysis is 4.2.2.

By the way, we've already checked Godot Engine in 2018. You can read the previous article here.

Results of checking project using PVS-Studio

The first thing we'd like to start with after looking at the report is the macro issues in the project. Parameters aren't enclosed in parentheses. Let's look at a few examples of how this can shoot you in the foot.

Fragments N1-N2

#define HAS_WARNING(flag) (warning_flags & flag)
Enter fullscreen mode Exit fullscreen mode

We need this macro to check whether a certain warning flag is set or not.

The warning_flags variable is a bitmask of the uint_32t type. This means that its value is 32 bits, where each bit is 1 if the flag is set and 0 if it isn't. The macro is used in conditional statements, where it's implicitly converted to the bool type. To understand how it works, we'll take a simplified version with 8 bits instead of 32.

Let's say we have an X flag that corresponds to the 4th bit in the mask and it's currently set. Then, the value of the warning_flags variable in the binary system looks like this:

00001000
Enter fullscreen mode Exit fullscreen mode

Now let's say that we decided to use our macro to check if the X flag is set.

We pass the flag variable with the 00001000 value to the macro, and the bitwise AND results in a non-zero value that is converted to bool with a value to true.

Now let's say we want to check the Y flag that corresponds to the third bit with the same value of the warning_flags variable*. We pass the variable with the *00000100 value to the macro, and the bitwise AND results in a zero value that is converted to bool with the value false.

Everything seems great. What can possibly go wrong? But what if someone wants to check if one of those flags is set? They can call the macro as follows:

if (HAS_WARNING(flags::X | flags::Y)) ....
Enter fullscreen mode Exit fullscreen mode

Then, the result of the operation will always be true, even if none of the flags is set. Why does this happen? Let's pretend we're a preprocessor and just substitute the expression passed to the macro:

if (warning_flags & flags::X | flags::Y) ....
Enter fullscreen mode Exit fullscreen mode

Now we'll look at the operator precedence table:

Precedence Operator Description Associativity
.... .... .... ....
11 a & b Bitwise AND Left-to-right
.... .... .... ....
13 \ Bitwise OR

Operators are listed from top to bottom in the descending precedence order. This means that the expression is processed as follows:

if (( warning_flags & flags::X ) | flags::Y) ....
Enter fullscreen mode Exit fullscreen mode

Let's say that the X and Y flags we're interested in aren't set in warning_flags. Then, the first bitwise AND operation returns 0, and the Y flag is set to it. We get an always true check.

So, the analyzer issues the following warning for this macro:

V1003 The macro 'HAS_WARNING' is a dangerous expression. The parameter 'flag' must be surrounded by parentheses. shader_language.cpp 40

As the message says, to fix it we just need to enclose the macro parameter in parentheses:

#define HAS_WARNING(flag) (warning_flags & (flag))
Enter fullscreen mode Exit fullscreen mode

Here's another example of a dangerous macro:

#define IS_SAME_ROW(i, row) (i / current_columns == row)
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

V1003 The macro 'IS_SAME_ROW' is a dangerous expression. The parameters 'i', 'row' must be surrounded by parentheses. item_list.cpp 643

If we pass an expression to the macro instead of a single variable, like this:

IS_SAME_ROW(current + 1, row)
Enter fullscreen mode Exit fullscreen mode

Then, as a result of preprocessing substitution, we get the following:

(current + 1 / current_columns == row)
Enter fullscreen mode Exit fullscreen mode

The calculation order here isn't what you'd expect.

To protect against such situations, simply wrap the macro parameters in parentheses:

#define IS_SAME_ROW(i, row) ((i) / current_columns == (row))
Enter fullscreen mode Exit fullscreen mode

Fragment N3

Now let's look at the following condition:

const auto hint_r = ShaderLanguage::ShaderNode::Uniform::HINT_ROUGHNESS_R;
const auto hint_gray = ShaderLanguage::ShaderNode::Uniform::HINT_ROUGHNESS_GRAY;

if (tex->detect_roughness_callback
    && (   p_texture_uniforms[i].hint >= hint_r
        || p_texture_uniforms[i].hint <= hint_gray))
{
  ....
}
Enter fullscreen mode Exit fullscreen mode

This condition is always true (except when the tex->detect_roughness_callback pointer is null).

To understand why this is the case, we need to look at enum Hint in the Uniform structure:

struct Uniform
{
  ....
  enum Hint 
  {
    HINT_NONE,
    HINT_RANGE,
    HINT_SOURCE_COLOR,
    HINT_NORMAL,
    HINT_ROUGHNESS_NORMAL,
    HINT_ROUGHNESS_R,
    HINT_ROUGHNESS_G,
    HINT_ROUGHNESS_B,
    HINT_ROUGHNESS_A,
    HINT_ROUGHNESS_GRAY,
    HINT_DEFAULT_BLACK,
    HINT_DEFAULT_WHITE,
    HINT_DEFAULT_TRANSPARENT,
    HINT_ANISOTROPY,
    HINT_SCREEN_TEXTURE,
    HINT_NORMAL_ROUGHNESS_TEXTURE,
    HINT_DEPTH_TEXTURE,
    HINT_MAX
  };
  ....
};
Enter fullscreen mode Exit fullscreen mode

Under the hood of such enum is an integer type, and the HINT_ROUGHNESS_R and HINT_ROUGHNESS_GRAY values correspond to the 5 and 9 numbers.

Based on the above, the condition checks that p_texture_uniforms[i].hint >= 5 or p_texture_uniforms[i].hint <= 9. This means that any value of p_texture_uniforms[i].hint passes these checks. This is what the PVS-Studio analyzer warns about:

V547 Expression is always true. material_storage.cpp 929

In reality, the programmer wanted to check if p_texture_uniforms[i].hint was in the range of 5 to 9. To do this, they had to use the logical "AND":

if (tex->detect_roughness_callback
    && (   p_texture_uniforms[i].hint >= hint_r
        && p_texture_uniforms[i].hint <= hint_gray))
{
  ....
}
Enter fullscreen mode Exit fullscreen mode

Here's a similar warning:

  • V547 Expression is always true. material_storage.cpp 1003

Fragment N4

Try to find the error here:

Error FontFile::load_bitmap_font(const String &p_path)
{
  if (kpk.x >= 0x80 && kpk.x <= 0xFF)
  {
    kpk.x = _oem_to_unicode[encoding][kpk.x - 0x80];
  } else if (kpk.x > 0xFF){
    WARN_PRINT(vformat("Invalid BMFont OEM character %x
                        (should be 0x00-0xFF).", kpk.x));
    kpk.x = 0x00;
  }

  if (kpk.y >= 0x80 && kpk.y <= 0xFF) 
  {
    kpk.y = _oem_to_unicode[encoding][kpk.y - 0x80];
  } else if (kpk.y > 0xFF){
    WARN_PRINT(vformat("Invalid BMFont OEM character %x
                        (should be 0x00-0xFF).", kpk.x));
    kpk.y = 0x00;
  }
  ....
}
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

V778 Two similar code fragments were found. Perhaps, this is a typo and 'y' variable should be used instead of 'x'. font.cpp 1970

So, PVS-Studio has found an error that occurred when copying a code fragment. Let's take a closer look at conditional blocks. They're identical except that in the first case all operations are performed on kpk.x, while in the second case they are performed on kpk.y.

However, the second condition got an error because of a copy-paste operation. Take a look at the WARN_PRINT call: if kpk.y > 0xFF, the kpk.x character is displayed when the warning is issued, not kpk.y. Searching for an error based on logs is more difficult :)

P.S.: the code really shouldn't have been multiplied like that. You can clearly see that the two code blocks differ only in the applied field. Putting the code in a function and calling it twice for different fields would be a better option:

Error FontFile::load_bitmap_font(const String &p_path)
{
  constexpr auto check = [](auto &ch)
  {
    if (ch >= 0x80 && ch <= 0xFF)
    {
      auto res = _oem_to_unicode[encoding][ch - 0x80];
      ch = res;
    }
    else if (ch > 0xFF)
    {
      WARN_PRINT(vformat("Invalid BMFont OEM character %x
                              (should be 0x00-0xFF).",ch));
      ch = 0x00;
    }
  };

  check(kpk.x);
  check(kpk.y);
  ....
}
Enter fullscreen mode Exit fullscreen mode

Fragment N5

Here are more conditions, but they're nested:

void GridMapEditor::_mesh_library_palette_input(const Ref<InputEvent> &p_ie) 
{
  const Ref<InputEventMouseButton> mb = p_ie;
  // Zoom in/out using Ctrl + mouse wheel
  if (mb.is_valid() && mb->is_pressed() && mb->is_command_or_control_pressed()) 
  {
    if (mb->is_pressed() && mb->get_button_index() == MouseButton::WHEEL_UP) 
    {
      size_slider->set_value(size_slider->get_value() + 0.2);
    }
    ....
  }
}
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

V571 Recurring check. The 'mb->is_pressed()' condition was already verified in line 837. grid_map_editor_plugin.cpp 838

This fragment contains a redundant check in the nested if statement. The mb->is_pressed() expression has already been checked above. Maybe it's a double check (it's common in GUIs), but then the developers should've added a comment about it. Or maybe something else should've been checked.

Here are similar warnings:

  • V571 Recurring check. The '!r_state.floor' condition was already verified in line 1711. physics_body_3d.cpp 1713
  • V571 Recurring check. The '!wd_window.is_popup' condition was already verified in line 2012. display_server_x11.cpp 2013
  • V571 Recurring check. The 'member.variable->initializer' condition was already verified in line 946. gdscript_analyzer.cpp 949

Fragment N6

And this is a classic case of dereferencing the pointer before checking it:

void GridMapEditor::_update_cursor_transform()
{
  cursor_transform = Transform3D();
  cursor_transform.origin = cursor_origin;
  cursor_transform.basis = node->get_basis_with_orthogonal_index(cursor_rot);
  cursor_transform.basis *= node->get_cell_scale();
  cursor_transform = node->get_global_transform() * cursor_transform;

  if (selected_palette >= 0)
  {
    if (node && !node->get_mesh_library().is_null())
    {
      cursor_transform *= node->get_mesh_library()
                              ->get_item_mesh_transform(selected_palette);
    }
  }
  ....
}
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

V595 The 'node' pointer was utilized before it was verified against nullptr. Check lines: 246, 251. grid_map_editor_plugin.cpp 246

It's rather strange to dereference a pointer and then check it a few lines down. Perhaps the dereferencing appeared in the code later than the check, and the developer didn't notice the check below.

Similar warnings:

  • V595 The 'p_ternary_op->true_expr' pointer was utilized before it was verified against nullptr. Check lines: 4518, 4525. gdscript_analyzer.cpp 4518
  • V595 The 'p_parent' pointer was utilized before it was verified against nullptr. Check lines: 4100, 4104. node_3d_editor_plugin.cpp 4100
  • V595 The 'item' pointer was utilized before it was verified against nullptr. Check lines: 950, 951. project_export.cpp 950
  • V595 The 'title_bar' pointer was utilized before it was verified against nullptr. Check lines: 1153, 1163. editor_node.cpp 1153
  • V595 The 'render_target' pointer was utilized before it was verified against nullptr. Check lines: 2121, 2132. rasterizer_canvas_gles3.cpp 2121
  • V595 The '_p' pointer was utilized before it was verified against nullptr. Check lines: 228, 231. dictionary.cpp 228
  • V595 The 'class_doc' pointer was utilized before it was verified against nullptr. Check lines: 1215, 1231. extension_api_dump.cpp 1215

Fragment N7

template <class T, class U = uint32_t,
          bool force_trivial = false, bool tight = false>
class LocalVector
{
  ....
public:
  operator Vector<T>() const
  {
    Vector<T> ret;
    ret.resize(size());
    T *w = ret.ptrw();
    memcpy(w, data, sizeof(T) * count);
    return ret;
  }
  ....
};
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

V780 Instantiation of LocalVector < AnimationCompressionDataState >: The object 'w' of a non-passive (non-PDS) type cannot be copied using the memcpy function. local_vector.h 280

This is an interesting fragment. The LocalVector class template has a conversion operator to the Vector class. In this conversion, the contents of the current vector need to be copied to the new one. To do this, the devs used the memcpy function.

It's all good as long as the T template type is trivially copyable. However, the analyzer has detected various specializations of LocalVector, which has this property violated. As an example, let's look at the LocalVector specialization:

struct AnimationCompressionDataState
{
  uint32_t components = 3;
  LocalVector<uint8_t> data; // Committed packets.
  struct PacketData
  {
    int32_t data[3] = { 0, 0, 0 };
    uint32_t frame = 0;
  };

  float split_tolerance = 1.5;

  LocalVector<PacketData> temp_packets;

  // used for rollback if the new frame does not fit
  int32_t validated_packet_count = -1;
  ....
};
Enter fullscreen mode Exit fullscreen mode

The AnimationCompressionDataState class* contains a LocalVector* that is not trivially copyable.

For this case, there's a note in the memcpy documentation: "If the objects are potentially-overlapping or not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined".

We can easily fix the code by replacing the memcpy call with std::uninitialized_copy:

operator Vector<T>() const
{
  Vector<T> ret;
  ret.resize(size());
  T *w = ret.ptrw();
  std::uninitialized_copy(data, data + count, w);
  return ret;
}
Enter fullscreen mode Exit fullscreen mode

The PVS-Studio analyzer has detected 38 more dangerous specializations, but I won't give the full list, of course:

  • V780 Instantiation of LocalVector < AnimationCompressionDataState >: The object 'w' of a non-passive (non-PDS) type cannot be copied using the memcpy function. local_vector.h 280
  • V780 Instantiation of LocalVector < LocalVector >: The object 'w' of a non-passive (non-PDS) type cannot be copied using the memcpy function. local_vector.h 280
  • V780 Instantiation of LocalVector < Mapping, uint32_t, bool, bool >: The object 'w' of a non-passive (non-PDS) type cannot be copied using the memcpy function. local_vector.h 280
  • V780 Instantiation of LocalVector < OAHashMap < uint64_t, Specialization > >: The object 'w' of a non-passive (non-PDS) type cannot be copied using the memcpy function. local_vector.h 280
  • V780 Instantiation of LocalVector < Pair < StringName, StringName >, uint32_t, bool, bool >: The object 'w' of a non-passive (non-PDS) type cannot be copied using the memcpy function. local_vector.h 280
  • ...

Fragment N8

Here's a possible violation of the program logic:

Dictionary GDScriptSyntaxHighlighter::_get_line_syntax_highlighting_impl
                                                             (int p_line)
{
  const String &str = text_edit->get_line(p_line);
  ....
  if (   is_digit(str[non_op])
      || (   str[non_op] == '.' 
          && non_op < line_length 
          && is_digit(str[non_op + 1]) ) )
  {
    in_number = true;
  }
  ....
}
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

V781 The value of the 'non_op' index is checked after it was used. Perhaps there is a mistake in program logic. gdscript_highlighter.cpp 370

The non_op value is first used as an index when accessing characters in the string. Only then the value is checked if it's less than the length.

Take a look at the access to the string after the check. If non_op < line_length, it doesn't mean that (non_op + 1) < line_length. So, an index out of bounds can happen in str[non_op + 1]. Especially since there are no null-terminated strings under the String hood.

A correct check should look like this:

if (   is_digit(str[non_op])
    || (   str[non_op] == '.' 
        && non_op + 1 < line_length 
        && is_digit(str[non_op + 1]) ) )
{
  in_number = true;
}
Enter fullscreen mode Exit fullscreen mode

Fragment N9

struct Particles
{
  ....
  int amount = 0;
  ....
};

void ParticlesStorage::_particles_update_instance_buffer(
  Particles *particles,
  const Vector3 &p_axis,
  const Vector3 &p_up_axis)
{
  ....
  uint32_t lifetime_split = ....;
  // Offset VBO so you render starting at the newest particle.
  if (particles->amount - lifetime_split > 0)
  {
    ....
  }
  ....
}
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

V555 The expression 'particles->amount - lifetime_split > 0' will work as 'particles->amount != lifetime_split'. particles_storage.cpp 959

The example is interesting because, despite the not-so-correct warning issued by the analyzer, it shows what the developers should pay attention to.

If the difference between two unsigned variables is greater than zero, then the expression is semantically equal to particles->amount != lifetime_split. The condition is considered false only if the variables are equal. If the left operand is less than the right one, a wrap-around occurs, and the resulting expression will be greater than zero. If the left operand is greater than the right one, the difference will be greater than zero.

However, the other thing to note here is that both variables have the same rank, but different signedness. The standard requires a compiler to perform implicit conversions before performing the subtraction. In this case, the unsigned 32-bit int is the common type. This can also add a few surprises if the left operand has a negative number in it.

The most correct check when signed and unsigned type expressions of the same rank are involved looks as follows:

if (particles->amount >= 0 && particles->amount > lifetime_split)
Enter fullscreen mode Exit fullscreen mode

In fact, we've reinvented std::cmp_greater, which was introduced in C++20. Starting from this version of the standard, we can write concise code:

if (std::cmp_greater(particles->amount, lifetime_split))
Enter fullscreen mode Exit fullscreen mode

Fragment N10

void AnimationNodeStateMachineEditor::_delete_tree_draw()
{
  TreeItem *item = delete_tree->get_next_selected(nullptr);
  while (item) 
  {
    delete_window->get_cancel_button()->set_disabled(false);
    return;
  }
  delete_window->get_cancel_button()->set_disabled(true);
}
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

V1044 Loop break conditions do not depend on the number of iterations. animation_state_machine_editor.cpp 693

The while loop takes exactly one iteration. It looks very much like a pattern where we only need to take the first element from the container. We can do this using the for loop:

for (auto &&item : items)
{
  DoSomething(item);
  break;
}
Enter fullscreen mode Exit fullscreen mode

So, we don't need to check if the container contains the first element. IMHO, the code is rather confusing because you expect an unknown finite number of iterations from loops.

In the fragment above, however, the while loop makes absolutely no sense. The simple if statement would've been enough:

void AnimationNodeStateMachineEditor::_delete_tree_draw()
{
  TreeItem *item = delete_tree->get_next_selected(nullptr);
  if (item)
  { 
    delete_window->get_cancel_button()->set_disabled(false);
    return;
  }

  delete_window->get_cancel_button()->set_disabled(true);
}
Enter fullscreen mode Exit fullscreen mode

Fragment N11

static const char *script_list[][2] = {
  ....
  { "Myanmar / Burmese", "Mymr" },
  { "​Nag Mundari", "Nagm" },
  { "Nandinagari", "Nand" },
  ....
}
Enter fullscreen mode Exit fullscreen mode

The reader may ask, "What's wrong here?" We wouldn't have figured it out ourselves if it weren't for the V1076 diagnostic rule. Interestingly enough, this is the first warning we've written out. The diagnostic rule checks the program code for invisible characters. Such characters are like backdoors that a programmer may not see because of the text display settings in the development environment, but the compiler sees and correctly processes them.

The analyzer warning:

V1076 Code contains invisible characters that may alter its logic. Consider enabling the display of invisible characters in the code editor. locales.h 1114

Let's take a closer look at the next line:

{ "​Nag Mundari", "Nagm" },
Enter fullscreen mode Exit fullscreen mode

It contains the backdoors with an invisible character. If we use a hex editor, we can see the following:

There are 3 bytes sandwiched between the double quotation mark and the N character: E2, 80, and 8B. They correspond to the ZERO WIDTH SPACE (U+200B) Unicode character.

The strings from the script_list array that contains the "infected" string literal go into the TranslationServer::script_map hash table. The key of the hash table is the second string literal of the pair, and the value is the first one. So, the backdoored string literal goes into the hash table as a value, and the hash search isn't broken.

Next, we can look at where the value from the hash table could potentially go. I've found a few places:

  1. The value can go into the string returned by the TranslationServer::get_locale_name function. By analyzing the caller functions, we can see that this string gets into the GUI ([1], [2], [3], [4]) one way or another.
  2. The value is returned from the TranslationServer::get_script_name function. By analyzing the caller functions, we can also see that the string will get into the GUI ([1], [2]).

Most likely, the backdoor was added accidentally by copying the name from some website. We can simply delete this character from the string literal.

Fragment N12

void MeshStorage::update_mesh_instances() 
{
  ....
  uint64_t mask = RS::ARRAY_FORMAT_VERTEX | RS::ARRAY_FORMAT_NORMAL 
                | RS::ARRAY_FORMAT_VERTEX;
  ....
}
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

  • V501 There are identical sub-expressions 'RenderingServer::ARRAY_FORMAT_VERTEX' to the left and to the right of the '|' operator. mesh_storage.cpp 1414
  • V578 An odd bitwise operation detected. Consider verifying it. mesh_storage.cpp 1414

This is a strange bitmask initialization. RS::ARRAY_FORMAT_VERTEX is written twice there, although the developers may have wanted to write a different flag.

Here's a similar warning:

  • V501 There are identical sub-expressions 'RenderingServer::ARRAY_FORMAT_VERTEX' to the left and to the right of the '|' operator. mesh_storage.cpp 1300
  • V578 An odd bitwise operation detected. Consider verifying it. mesh_storage.cpp 1300

Fragment N13

void Image::initialize_data(int p_width, int p_height, bool p_use_mipmaps,
                            Format p_format, const Vector<uint8_t> &p_data)
{
  ....
  ERR_FAIL_COND_MSG(p_width > MAX_WIDTH, "The Image width specified (" + 
                                         itos(p_width) +
                                         " pixels) cannot be greater than " +
                                         itos(MAX_WIDTH) +
                                         " pixels.");

  ERR_FAIL_COND_MSG(p_height > MAX_HEIGHT, "The Image height specified (" +
                                           itos(p_height) +
                                           " pixels) cannot be greater than " +
                                           itos(MAX_HEIGHT) +
                                           " pixels.");

  ERR_FAIL_COND_MSG(p_width * p_height > MAX_PIXELS,
                   "Too many pixels for image, maximum is " + itos(MAX_PIXELS));
  ....
}
Enter fullscreen mode Exit fullscreen mode

The analyzer warning:

V1083 Signed integer overflow is possible in 'p_width * p_height' arithmetic expression. This leads to undefined behavior. Left operand is in range '[0x1..0x1000000]', right operand is in range '[0x1..0x1000000]'. image.cpp 2200

So, we have the p_width and p_height variables of the int type. The maximum value that a 4-byte int can store is 2'147'483'647.

First, it is checked that p_width <= MAX_WIDTH, where MAX_WIDTH == 16'777'216. Then, it is checked that p_height <= MAX_HEIGHT, where MAX_HEIGHT == 16,777,216. The third check is that the product of p_width * p_height <= MAX_PIXELS.

Let's look at the case when p_width == p_height && p_width == 16'777'216. The result of multiplying these numbers is 281'474'976'710'656. To display the result, an 8-byte number is required, so there's a sign overflow. As we know, this leads to undefined behavior in C and C++.

If there are no helper functions that check for an overflow, the simplest fix may look like this:

ERR_FAIL_COND_MSG((int64_t) p_width * (int64_t) p_height > (int64_t) MAX_PIXELS,
                  "Too many pixels for image, maximum is " + itos(MAX_PIXELS));
Enter fullscreen mode Exit fullscreen mode

Fragment N14

void RemoteDebugger::debug(....)
{
  ....
  mutex.lock();
  while (is_peer_connected())
  {
    mutex.unlock();
    ....
  }

  send_message("debug_exit", Array());
  if (Thread::get_caller_id() == Thread::get_main_id())
  {
    if (mouse_mode != Input::MOUSE_MODE_VISIBLE)
    {
      Input::get_singleton()->set_mouse_mode(mouse_mode);
    }
  } 
  else 
  {
    MutexLock mutex_lock(mutex);
    messages.erase(Thread::get_caller_id());
  }
}
Enter fullscreen mode Exit fullscreen mode

V1020 The function exited without calling the 'mutex.unlock' function. Check lines: 556, 438. remote_debugger.cpp 556

This is an extremely interesting code fragment with multithreading. The PVS-Studio analyzer has detected that a mutex may not be unlocked on some execution paths. Let's get into it.

We'll start by seeing what type of mutex is being used:

class RemoteDebugger : public EngineDebugger
{
  ....
private:
  // Make handlers and send_message thread safe.
  Mutex mutex;
  ....
};
Enter fullscreen mode Exit fullscreen mode

We'll dig a little deeper into what this Mutex is:

template <class StdMutexT>
class MutexImpl
{
  friend class MutexLock<MutexImpl<StdMutexT>>;
  using StdMutexType = StdMutexT; 
  mutable StdMutexT mutex;
public:
  _ALWAYS_INLINE_ void lock() const { mutex.lock(); }

  _ALWAYS_INLINE_ void unlock() const { mutex.unlock(); }

  _ALWAYS_INLINE_ bool try_lock() const { return mutex.try_lock(); }
};

// Recursive, for general use
using Mutex = MutexImpl<THREADING_NAMESPACE::recursive_mutex>;
Enter fullscreen mode Exit fullscreen mode

So, we actually have a recursive mutex, not a regular one. It's used with a custom RAII wrapper:

template <class MutexT>
class MutexLock
{
  friend class ConditionVariable;

  std::unique_lock<typename MutexT::StdMutexType> lock;

public:
  _ALWAYS_INLINE_ explicit MutexLock(const MutexT &p_mutex)
    : lock(p_mutex.mutex) {}
};
Enter fullscreen mode Exit fullscreen mode

Developers use RemoteDebugger::mutex with RAII wrappers almost all the time. Here's just a couple of fragments: [1], [2], [3], .....

However, at one point something went wrong. The analyzer pointed to a fragment where the mutex is handled manually. So, we have several code execution options:

  1. The mutex is locked and the loop isn't executed once (N == 0). As a result, the control flow leaves the RemoteDebugger::debug function with the capture counter incremented by 1.
  2. The mutex is locked, and the loop is executed N == 1 times. Everything's fine in this case: the capture count of the recursive mutex is incremented and decremented by the same number.
  3. The mutex is locked, and the loop is executed N > 1 times. As a result, the recursive mutex will have its capture count decreased by N – 1 relative to the time before it was manually locked, which can lead to undefined behavior.

If we examine calls to the is_peer_connected function in the code base ([1], [2], [3], ....), then we can see that they occur locked by RemoteDebugger::mutex in all cases. Apparently, in this case, the programmer needed locking as well, but they implemented it manually.

Based on these assumptions, we can fix the code as follows:

void RemoteDebugger::debug(....)
{
  ....
  const auto is_peer_connected_sync = [this]
  {
    MutexLock _ { mutex };
    return is_peer_connected();
  };

  while (is_peer_connected_sync())
  {
    ....
  }
  ....
}
Enter fullscreen mode Exit fullscreen mode

I can't guarantee that the fix is absolutely correct, as only the Godot developers know what should happen here. Although, at least we got rid of the potential undefined behavior associated with unlocking the mutex on every loop iteration.

Conclusion

Errors in code vary from simple to complex, from visible to invisible. In order not to ruin the enjoyment and experience of the product, it must be constantly cleaned of bugs and errors. Static and dynamic analyzers handle this task well.

Getting started with such solutions is easier than you might think. For example, you can get a trial version of the PVS-Studio analyzer here. There are also a number of terms and conditions for its free use.

Thanks for reading and have a great day!

Top comments (0)