I’m still uncertain about the language declaration syntax, where in declarations, syntax is used that mimics the use of the variables being declared. It is one of the things that draws strong criticism, but it has a certain logic to it.
— Dennis M. Ritchie
I consider the C declarator syntax an experiment that failed.
— Bjarne Stroustrup
Cdecl (see-deh-kull) is a program for composing and deciphering C (or C++) type declarations or casts, aka “gibberish.” Granted, you don’t need such a program that often, but, when you do, it’s pretty handy. It can be used interactively on a terminal or accept input from either the command line or standard input. Some examples (where cdecl>
is its interactive prompt):
cdecl> declare p as pointer to const pointer to const int
const int *const *p;
cdecl> explain int *const (*p)[4]
declare p as pointer to array 4 of const pointer to int
I don’t recall the first time I ever used cdecl, but it was probably early on in my career (when both cdecl and I were young) some 30+ years ago. I’ve used it occasionally ever since.
My most recent use was in early 2017. It was then I realized that cdecl was really showing its age. (Its last major update was in 1996.) In the meantime, the C and C++ languages have evolved quite a bit and cdecl didn’t support any of the new keywords or syntax. That aside, cdecl also had the barest possible error reporting (syntax error
was all you ever got with no further explanation about where the error was or why).
It was then I decided to grab a copy of the source code and give it some TLC bringing it up to date to support C through C11 and C++ through C++14 as well as improve its error reporting. The result is the all-new cdecl 3.0.
Cdecl’s Original Source Code
There wasn’t much to cdecl’s source code, just three files: a lex file, a yacc grammar, and single .c
file:
$ wc -l cdlex.l cdgram.y cdecl.c
76 cdlex.l
946 cdgram.y
1321 cdecl.c
2343 total
The way the original cdecl worked was by parsing its input and synthesizing its output (either a C declaration for declare or pseudo-English for explain) by concatenating string fragments together in just the right way while reducing rules in the grammar. That deserves credit for a minimal and clever solution, but it was difficult to follow and modify.
Cdecl’s New Source Code
Using an Abstract Syntax Tree
The new cdecl now parses its input into a proper data structure of an abstract syntax tree (AST). From there, other code can walk the tree not only to generate either a C declaration or pseudo-English, but also to perform semantic checks. (The original cdecl performed a few semantic checks, but that code was enmeshed in the grammar rules.)
The AST is in the pseudo-English form because it’s easiest to think about it that way. Hence, while it’s easy to parse and generate pseudo-English from/to an AST, parsing and generating gibberish is a lot tricker since the AST can’t be walked in one order (pre vs. post) or one direction (down vs. up) in particular for arrays and functions that always have to have their size or arguments last. After a lot of hacking, spending time in lldb
, using cdecl’s --debug
option (that dumps the AST in a JSON-like format), and 500+ unit tests, it’s done.
Much Better Error Messages
The other major improvement is much better error messages, e.g.:
cdecl> declare f as pointer function (char) returning pointer to void
^
21: syntax error: "function": "to" expected
including a ^
pointing to the position of the error, the column number (21), what token caused the error, what token was expected, and all in color (customizable, of course, via the CDECL_COLORS
environment variable similar to the way gcc
does it). (Note that the colors above are not representative of the actual default colors used by cdecl. They’re just whatever the Markdown code-block parser chooses to use for this web site.)
Keeping track of token locations is made fairly easy because bison directly supports it via locations. Each node in the AST has a YYLTYPE
data member (bison’s location struct
).
Better C and C++ Languages Support
When cdecl was young, the first ANSI C (C89) and C++ were both new things. In 1988, Tony Hansen added (a pre-standard version of) C++ support. Cdecl’s language support hasn’t changed much since. Cdecl 3.0 distinguishes among K&R (pre-ANSI) C, C89, C95, C99, C11, C++98, C++03, C++11, and C++14.
Most of the semantic checking of which tokens are supported in which languages is table-driven (one of Tony’s contributions that I just expanded upon). For example:
static c_type_info_t const C_QUALIFIER_INFO[] = {
{ T_ATOMIC, L__ATOMIC, LANG_MIN(C_11) },
{ T_CONST, L_CONST, LANG_MIN(C_89) },
{ T_RESTRICT, L_RESTRICT, LANG_MIN(C_89) & ~LANG_CPP_ALL },
{ T_VOLATILE, L_VOLATILE, LANG_MIN(C_89) },
};
specifies which qualifier types (T_
) that have the given token literals (L_
) are valid in the given language(s); and:
static c_lang_t const OK_STORAGE_LANGS[][ ARRAY_SIZE( C_STORAGE_INFO ) ] = {
/* a b e r s tl td ce fi fr in mu nr o v pv */
/* auto */ { __,__,__,__,__,__,__, __,__,__,__,__,__,__,__,__ },
/* block */ { __,__,__,__,__,__,__, __,__,__,__,__,__,__,__,__ },
/* extern */ { XX,__,__,__,__,__,__, __,__,__,__,__,__,__,__,__ },
/* register */ { XX,__,XX,__,__,__,__, __,__,__,__,__,__,__,__,__ },
/* static */ { XX,XX,XX,XX,__,__,__, __,__,__,__,__,__,__,__,__ },
/* thread_local */ { XX,E1,E1,XX,E1,E1,__, __,__,__,__,__,__,__,__,__ },
/* typedef */ { XX,__,XX,XX,XX,XX,__, __,__,__,__,__,__,__,__,__ },
/* constexpr */ { P1,P1,P1,XX,P1,XX,XX, P1,__,__,__,__,__,__,__,__ },
/* final */ { XX,XX,XX,XX,XX,XX,XX, XX,P1,__,__,__,__,__,__,__ },
/* friend */ { XX,XX,XX,XX,XX,XX,XX, P1,XX,PP,__,__,__,__,__,__ },
/* inline */ { XX,XX,C9,XX,C9,XX,XX, P1,P1,PP,C9,__,__,__,__,__ },
/* mutable */ { XX,XX,XX,XX,XX,XX,XX, XX,XX,XX,XX,P3,__,__,__,__ },
/* noreturn */ { XX,XX,C1,XX,C1,XX,XX, XX,XX,XX,C1,XX,C1,__,__,__ },
/* override */ { XX,XX,XX,XX,XX,XX,XX, XX,P1,XX,C1,XX,XX,P1,__,__ },
/* virtual */ { XX,XX,XX,XX,XX,XX,XX, XX,P1,XX,PP,XX,XX,P1,PP,__ },
/* pure virtual */ { XX,XX,XX,XX,XX,XX,XX, XX,XX,XX,PP,XX,XX,P1,PP,PP },
};
specifies which storage-classes are valid in combination with which other storage-classes in which language(s) (where __
is all, XX
is none, C1
is C11, C9
is C99, PP
is any C++, P1
is C++11 or later, P3
is C++03 or later, and E1
is either C11 or C++11 or later). The two-character abbreviations are macros (not shown) to make the table fit in 80 column lines.
A few other semantic checks are done by walking the AST, for example: checking that void
, when given for a function argument list, is the only thing given; or checking that ...
(varargs) can not be the only argument and must be last. (The original cdecl didn’t catch these errors.)
Final Thoughts
Cdecl 3.0 was a fun project to hack on. I’ll probably keep hacking on it every now and again. The new source code is profusely commented — have a look!
I’d like to get cdecl 3.0 into software (Linux, BSD, etc.) distributions, especially to replace older versions. If you’d like to help, please do.
Top comments (0)