Riccardo Bernardini

Posted on May 25, 2019 • Edited on Jun 17, 2020

My first experience with SPARK-Ada

#ada #spark #formalchecking #bufferoverflow

I program in Ada and I like it.

Yes, I know, it sounds like a "coming out"... 😉

In the world of Ada there is a thing that I always wanted to try: it is a reduced subset of Ada called SPARK (nothing to do with Apache) that it allows for formal checking, that is, proving some properties of your code such as: absence of buffer overflow and other runtime errors, no uninitialized variable read, and so on. I like the idea to have my code (or, at least, part of it) armored by a mathematical proof 😊.

Curious about SPARK? Check out this Introduction to SPARK

Few days ago I decided to give it a try using an old function of mine: a Ruby-like split to split strings at a separator. It is small piece of code that I wrote some time ago and used safely in many programs of mine: the right "guinea pig" for the experiment.

If you are curious, the final SPARK code is on github, be my guest! 😊

The first thing to do was to replace the use of Vectors (a kind of dynamic array, similar to its C++ counterpart) with normal arrays. You see, as you can imagine, Vectors use pointers (access types in Ada jargon) and they are prohibited in SPARK. The fact is that pointers would allow for aliasing of objects and this would make the formal check impossible.

It is not as bad as it sounds: in Ada the necessity of using pointers is much reduced with respect to C or C++. The only real need is when you want to dynamically allocate structures.

Therefore, I wrote my own pseudo-Vector type with the minimal set of needed functionality. The idea was to keep the extracted pieces in a fixed array (allocated to the stack, so no explicit pointers) large enough to keep the pieces (if you split a string with length N you get at most N+1 pieces). Maybe not very memory efficient, but fine for my needs.

The resulting code (slightly pruned with respect to the original, for "didactic" purposes...) follows. It seems long, but it is just that I added few comments to explain the more Ada-esque syntax.

First the spec part (very roughly the equivalent of a *.h file)

with Ada.Strings.Unbounded;     use Ada.Strings.Unbounded;

package Token_Lists 
   with SPARK_Mode => On -- This specs is in SPARK
is

   subtype List_Length is Integer range 1 .. Integer'Last;

   -- "Tagged types" correspond roughly to what other 
   -- languages call "class".  Token_List is the desired
   -- pseudo-vector for holding the pieces.  
   --
   -- The funny (<>) means that the type is parameterized, but the
   -- parameter is not public.  This forces you to initialize 
   -- any variable of type Token_List with some kind of constructor.
   --
   -- Finally, "private" means that the internal details are not
   -- visible and are described later, in the private part. 
   --
   type Token_List (<>) is tagged private;

   function Create (N : List_Length) return Token_List
     with
       Post =>
         Create'Result.Capacity = N and Create'Result.Length = 0;
    --
    -- Post is an "attribute" and it specifies a Boolean expression
    -- that will be true when the function returns (contract). 
    --
    -- The post-condition says that the created list will have 
    -- room for N entries and it will be empty
    --

   function Capacity (Item : Token_List) return Positive;

   function Length (Item : Token_List) return Natural
     with Post => Length'Result <= Item.Capacity;
   --
   -- Number of the pieces currently in the list.  Of course 
   -- it cannot be larger than the capacity, if it happens 
   -- there is something wrong somewhere...
   ---

   procedure Append (List : in out Token_List;
                     What : String)
     with
       Pre'Class =>
         List.Length  < List.Capacity,

       Post =>
         List.Length = List.Length'Old + 1
     and List.Capacity = List.Capacity'Old;
    -- 
    -- The precondition (attribute "Pre") says that before calling
    -- the procedure there must be some room in the list; the
    -- post condition says that after the call there is a new entry,
    -- but the capacity is unchanged (it is obvious to you, not to
    -- SPARK).  SPARK will analyze the body of Append and it 
    -- will try to prove that the contract is respected, that is, 
    -- that the postcondition follows from the precondition.
    --
    -- If you arm correctly the code that calls Append, SPARK will
    -- try to prove that the precondition is always verified. If it  
    -- succeeds you know you'll never have an overflow!
    --

private
   --
   -- Some privacy needed... :-)
   --


   type Token_Array is array (Positive range <>) of Unbounded_String;
   --
   -- The funny "(Positive range <>)" says that the index of 
   -- indexes of a variable of type Token_Array is a range of
   -- positive integers.  
   -- 
   -- Yes, in Ada the array indexes do not *necessarily* start
   -- from zero... They can start from whatever you want... ;-D
   --

   type Token_List (Length : List_Length) is tagged
      record
         Tokens     : Token_Array (1 .. Length) := (others => Null_Unbounded_String);
         First_Free : Positive := 1;
      end record
     with 
       Predicate => 
         Token_List.First_Free <= Integer (Token_List.Length) + 1
     and Token_List.Tokens'Length = Token_List.Length;
    -- 
    -- This is the full definition of Token_List anticipated 
    -- in the public part above.  In Ada it is not possible 
    -- to have some field public and some private (like in C++
    -- or Ruby).  If you put the full definition in the public
    -- part, everything is public (typically considered bad practice),
    -- otherwise everything is private.  Honestly, I never felt the
    -- need of the hybrid solution.
    -- 
    -- "Predicate" is another attribute.  It specifies a condition
    -- that a variable of type Token_List must always satisfy.
    --
    -- If you ask to the compiler to check assertions, the 
    -- compiler will produce code that checks the predicate
    -- at the end of methods ("primitive procedures" in Ada 
    -- jargon) that modify the variable and if the condition 
    -- is not satisfied, an exception is raised, pointing an
    -- accusing finger against the culprit.  
    --
    -- A powerful bug trap!
    --



   function Create (N : List_Length) return Token_List
   is (Token_List'(Length     => N,
                   Tokens     => (others => Null_Unbounded_String),
                   First_Free => 1));

   function Capacity (Item : Token_List) return Positive
   is (Item.Tokens'Last);

   function Length (Item : Token_List) return Natural
   is (Item.First_Free - 1);
end Token_Lists;

Now the body (more or less the equivalent of a *.c)

pragma Ada_2012;
package body Token_Lists 
   with SPARK_Mode => On -- This body is in SPARK
is

   ------------
   -- Append --
   ------------

   procedure Append
     (List : in out Token_List;
      What : String)
   is
   begin
      --
      -- If you look at the definition of Token_List 
      -- you'll see that List.Tokens is an array. 
      -- List.Tokens'Last is the last index of Lists.Tokens
      -- 
      if List.First_Free > List.Tokens'Last then
         raise Constraint_Error;
      end if;

      List.Tokens (List.First_Free) := To_Unbounded_String (What);
      List.First_Free := List.First_Free + 1;
   end Append;

end Token_Lists;

Even if you do not know Ada, I think you should be able to understand the code.

As you can see, Append is very simple and obviously correct, right? I mean, the if at the beginning stops every tentative of writing beyond the array List.Tokens, right?

Well, SPARK complained saying that it was not able to prove that an overflow would not happen. My first reaction was something like "Are you crazy? Don't you see the check 4 lines above? What is wrong with you?"

Anyway, next to the message error there was a "magnifying glass" icon. I clicked on it and I got, SPARK's courtesy, a counterexample

 [Counterexample] List = (Length => ?, First_Free => 2147483647) 
 and List.Tokens'First = 1 and List.Tokens'Last = 2147483647

What the... ? Ah! ... Yeah, OK, you are right...

In case you did not recognize it at once, 2147483647 is 2^31 -1. If I created a Token_List with 2 Giga entries (doubtful if it would fit on the stack... anyway...) and I filled it completely, then I would actually have an overflow.

Now I had two choices: the first one was to say that I will never use 2 Giga entries and ignore the case; the second one was to make it official that you can have at most 2^31 -2 entries. This was really easy to do: it sufficed to change in the specs file the line

subtype List_Length is Integer range 1 .. Integer'Last;

that defines the type used for the list length to

subtype List_Length is Integer range 1 .. Integer'Last-1;

Notice the -1 at the end. Since this type is used in the constructor

function Create (N : List_Length) return Token_List;

this guarantees that you'll never call Create with N larger than 2^31 -2 (if you try to do it, an exception is raised). This placated SPARK.

After catching this border-line-condition bug I moved to SPARK-ing the rest of the code, catching another couple of border-line cases.

Overall, I can say that original code was fine since the bugs were triggered by some very extreme conditions; nevertheless it is nice to know that everything is clean now...

Top comments (4)

David Weller • May 26 '19

Emacs supports Ada syntax highlighting (I once made some code changes to the Lisp). I'm not sure if others have...I think Vim has it.

Oh, also VSCode has an Ada syntax highlighter (this is my current editor of choice, actually).

P.S. - Congrats on doing Ada stuff. I was one of the contributors to Ada 95, but I haven't used it since 2000.

Riccardo Bernardini • May 26 '19

I know that emacs and many other editors (gedit and jedit, for example) have Ada syntax highlighting. I was referring to dev.to visualization engine that shows Ada code in simple black-and-white.

Riccardo Bernardini • Jun 17 '20

I reply to my own reply to say that I discovered that the language name used with the code construct must be lowercase, that is ada and not Ada.

David Weller • May 26 '19

oops! :)