Article::Article
I cleaned up the project, I compiled it to Wasm using emscripten, that's nice; but something is missing : tools to develop in my langage. Who would like to use something without a highlighter and auto-completion? Even the brainfuck langage has some VSCode extensions to highlight the code!
But this issue is already something of the past now, because I created a VScode extension providing auto-completion, semantic highlighting.
Why semantic highlighting instead of syntax highlighting?
VScode offers two APIs to provide highlighting, syntactic and semantic. With the syntax one, I need to provide a TextMate grammars and it looks like the opposite of fun. With the semantic one, I need to provide a list of tokens with their meaning. It may seem more complicated, but I already have a complete parser, so it will be a piece of cake.
The informations VSCode needs
For the highlighting
As I said, for the semantic highlighting VSCode needs a list of tokens. Each token must have the following information:
- the line where it starts
- the column where it ends
- the length
- the type
VSCode already provides some types by default:
- variable: for PainPerdu references
- function: for PainPerdu labels
- comment: (do it really need to say it?)
- string: for PainPerdu file inclusion (because it is between two double quotes)
- number: for integers (there is no decimal in PainPerdu)
- operator: for symbols
For the auto-completion
VSCode will only need the list of labels and references. That's it.
The C++ implementation
Define the token
The token is defined with the following structure:
struct Token
{
enum class Type
{
REFERENCE,
LABEL,
COMMENT,
STRING,
NUMBER,
OPERATOR
};
bool operator==(const Token&) const = default;
Type type;
std::size_t line;
std::size_t start_column;
std::size_t length;
};
I won't linger on this code; it is a very simple structure containing the information stated in the previous section.
Get the tokens
I just need to get the tokens, therefore I won't need to build a whole parse tree, I will only use the callback system that I already used to handle the errors in this previous article to just fill a vector of token.
For the Move right
operator >
it would look like this:
template <>
struct ToTokenAction<operators::MoveRight>
{
template <typename ParseInput>
static void apply(const ParseInput& in, std::vector<Token>& tokens)
{
tokens.push_back(
Token
{
.type = Token::Type::OPERATOR,
.line = in.position().line,
.start_column = in.position().column,
.length = in.size()
});
}
};
But because I'm lazy and I don't want to write almost the same code twenty times, I will use the following macro:
#define DefineAction(Match, TokenType) \
template <> \
struct ToTokenAction<Match> \
{ \
template <typename ParseInput> \
static void apply(const ParseInput& in, std::vector<Token>& tokens) \
{ \
tokens.push_back( \
Token \
{ \
.type = TokenType, \
.line = in.position().line, \
.start_column = in.position().column, \
.length = in.size() \
}); \
} \
};
// You can then use it like this:
DefineAction(operators::MoveRight, Token::Type::OPERATOR)
There is still one issue, I need to differentiate the labels
and the references
, because for now they are only considered as identifiers
.
I know that a label
must be preceded by some specific operators, same for the references. Each time the parser see one of these operators, it knows that the next identifier will be either a reference
or a label
.
I can update the macro to reflect this and then handle correctly the labels
and references
:
// Little helper struct to centralize some information
struct ToTokenState
{
enum NextIdentifierType
{
NONE, // If there is an identifier with NONE, this is a bug
LABEL,
REFERENCE
};
std::vector<Token> tokens;
NextIdentifierType next_identifier_type;
};
// A little helper function
inline Token::Type get_token_type(ToTokenState::NextIdentifierType next_identifier_type)
{
switch (next_identifier_type)
{
case ToTokenState::NextIdentifierType::LABEL :
{
return Token::Type::LABEL;
}
case ToTokenState::NextIdentifierType::REFERENCE :
{
return Token::Type::REFERENCE;
}
default:
{
throw std::runtime_error("Someone did a shitty code trololol");
}
}
}
// Empty default action
template<typename Rule>
struct ToTokenAction {};
// The macro has one more argument
#define DefineAction(Match, TokenType, IdentifierType) \
template <> \
struct ToTokenAction<Match> \
{ \
template <typename ParseInput> \
static void apply(const ParseInput& in, ToTokenState& state) \
{ \
state.tokens.push_back( \
Token \
{ \
.type = TokenType, \
.line = in.position().line, \
.start_column = in.position().column, \
.length = in.size() \
}); \
state.next_identifier_type = IdentifierType; // Update accoringly \
} \
};
// Now use the macro
DefineAction(operators::DefineLabel, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::LABEL)
DefineAction(operators::MoveRight, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(operators::MoveLeft, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(operators::Increment, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(operators::Decrement, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(ResetCase, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::NONE)
DefineAction(operators::DefineReference, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(operators::UndefineReference, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(operators::MoveToReference, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(operators::GoToLabel, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::LABEL)
DefineAction(operators::Rewind, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::LABEL)
DefineAction(operators::IfCurrentValueDifferent, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(operators::IfCursorIsAtReference, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(operators::IfReferenceExists, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::REFERENCE)
DefineAction(GetChar, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::NONE)
DefineAction(PutChar, Token::Type::OPERATOR, ToTokenState::NextIdentifierType::NONE)
DefineAction(Comment, Token::Type::COMMENT, ToTokenState::NextIdentifierType::NONE)
DefineAction(ReadFile, Token::Type::STRING, ToTokenState::NextIdentifierType::NONE)
DefineAction(Integer, Token::Type::NUMBER, ToTokenState::NextIdentifierType::NONE)
// Handle the identifier
template <>
struct ToTokenAction<Identifier>
{
template <typename ParseInput>
static void apply(const ParseInput& in, ToTokenState& state)
{
if (state.next_identifier_type != ToTokenState::NextIdentifierType::NONE)
{
state.tokens.push_back(
Token
{
.type = get_token_type(state.next_identifier_type),
.line = in.position().line,
.start_column = in.position().column,
.length = in.size()
});
state.next_identifier_type = ToTokenState::NextIdentifierType::NONE;
}
}
};
// Because I'm not that dirty, I under my macro since I don't need it anymore
#undef DefineAction
And finally, I the code to use this machinery and get all the tokens:
std::vector<Token> Parser::get_tokens(std::string_view input)
{
ToTokenState token_state;
pegtl::memory_input mem_input(input.data(), input.size(), "");
pegtl::parse<Grammar, ToTokenAction>(mem_input, token_state);
return token_state.tokens;
}
Get the labels and references
Now that I am able to get the tokens, this part will be very easy.
The only way to define a label is like this: :my_label
and the only way to define a reference is like this : #my_reference
. I will detect these patterns to get the desired list.
There is two solutions to do it:
- Make a parse tree and with a custom Selector to get only label/reference with identifiers
- Use the same technique as the tokens
Both are really simple but I chose the second one because the code is really compact. Here's the code for the reference, but for the labels, the code is almost the same:
template<typename Rule>
struct GetDefinedReferencesAction {};
template<>
struct GetDefinedReferencesAction<operators::DefineReference>
{
// This is called when a # is encountered
template <typename ParseInput>
static void apply(const ParseInput&, std::vector<std::string>&, bool& is_next_reference)
{
is_next_reference = true;
}
};
template<>
struct GetDefinedReferencesAction<Identifier>
{
// This is called for all identifiers
template <typename ParseInput>
static void apply(const ParseInput& in, std::vector<std::string>& references, bool& is_next_reference)
{
// If the previous caracter was a #
if (is_next_reference)
{
// Add it in the list
references.push_back(in.string());
is_next_reference = false;
}
}
};
That's it!
Create the bindings
Now to create the bindings I need a new CMake target, it is the same way as the web interpreter with the difference that the nodejs shipped with VSCode does not use the flag --experimental-wasm-eh
. This mean I won't use the flag -fwasm-exceptions
in the CMakeLists.txt:
# Create the target
add_executable(ExtentionBindings ExtentionBindings.cpp)
# Basic options
target_compile_options(ExtentionBindings PRIVATE -Wextra -Wall -Wsign-conversion -Wfloat-equal -pedantic -Wredundant-decls -Wshadow -Wpointer-arith -O3)
# Don't forget to link with --bind
target_link_options(ExtentionBindings PRIVATE --bind)
# Obviously I need my PainPerdu library
target_link_libraries(ExtentionBindings PRIVATE PainPerdu)
# Put the output files directly in the good directory
set_target_properties(ExtentionBindings PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${CMAKE_CURRENT_LIST_DIR}/../../vscode_extention/painperdu-bakery/generated")
Same for the cpp files, it will be very similar as what I did in my previous article about emscripten.
Add some getter functions to the Token
structure:
struct Token
{
enum class Type
{
REFERENCE,
LABEL,
COMMENT,
STRING,
NUMBER,
OPERATOR
};
using TypeIntType = std::underlying_type_t<Type>;
bool operator==(const Token&) const = default;
Type type;
std::size_t line;
std::size_t start_column;
std::size_t length;
// Getter for the bindings
Type get_type() const { return type; }
// Little spoiler, I added this function because VSCode will only need an index :)
TypeIntType get_type_index() const { return static_cast<TypeIntType>(type); }
std::size_t get_line() const { return line; }
std::size_t get_start_column() const { return start_column; }
std::size_t get_length() const { return length; }
};
Then I can really create the bindings
#include <vector>
#include <string>
// Do not forget this inscmude
#include <emscripten/bind.h>
#include <PainPerdu/PainPerdu.hpp>
using namespace emscripten;
// Little helper functions
std::vector<PainPerdu::parser::Token> get_tokens(const std::string& input)
{
return PainPerdu::Parser().get_tokens(input);
}
std::vector<std::string> get_defined_labels(const std::string& input)
{
return PainPerdu::Parser().get_defined_labels(input);
}
std::vector<std::string> get_defined_references(const std::string& input)
{
return PainPerdu::Parser().get_defined_references(input);
}
// The module
EMSCRIPTEN_BINDINGS(PainPerduParserModule) {
// I need to declare the vectors
register_vector<std::string>("VectorString");
// And the Token class with all its getters
class_<PainPerdu::parser::Token>("PainPerduToken")
.function("get_type_index", &PainPerdu::parser::Token::get_type_index)
.function("get_line", &PainPerdu::parser::Token::get_line)
.function("get_start_column", &PainPerdu::parser::Token::get_start_column)
.function("get_length", &PainPerdu::parser::Token::get_length);
// Then declare what a vector of Token is
register_vector<PainPerdu::parser::Token>("VectorToken");
// Last step, declare functions I will use in javascript
function("get_tokens", &get_tokens);
function("get_defined_labels", &get_defined_labels);
function("get_defined_references", &get_defined_references);
}
The javascript implementation
The setup
I just followed the vscode tutorial. To be honest if I go any deeper I would either be paraphrasing or be less clear than the tutorial itself. It is very clear and well written.
Basically, I just did:
npm install -g yo generator-code
yo code
And it generated the structure of the extension, then I created a folder to place my wasm code with its js glue code (generated from C++). In the rest of this article, my whole code will be in a file named extension.js
and at the beginning it looks like this:
// Include vscode stuff
const vscode = require('vscode');
// Use the bindings to my PainPerdu library
const bindings = require('./generated/ExtentionBindings')
/**
* @param {vscode.ExtensionContext} context
*/
function activate(context) {
}
function deactivate() { }
module.exports = {
activate,
deactivate
}
The semantic highlighting
As I said earlier, we need to provide a list of tokens to VSCode and it provides utility classes to build it: SemanticTokensBuilder
and SemanticTokensLegend
.
The SemanticTokensLegend
is just a little class describing the types of token that will be provided. There is a lot of predefined type of token like variable
or function
. You can see the list of predefined type of token or how to create your own in the documentation. The SemanticTokensLegend
is created from a list of string and I will use this legend for now on:
const tokenTypeStrings = [
'variable',
'function',
'comment',
'string',
'number',
'operator'
];
const legend = new vscode.SemanticTokensLegend(tokenTypeStrings);
The SemanticTokensBuilder
can then be used with the legend previously created:
const builder = new vscode.SemanticTokensBuilder(legend);
builder.push(start_line, start_column, length, index_of_the_type);
return builder.build();
Note that VSCode considers than the line and columns of a document start at 0. Also for the type it uses an index, the index of variable
is 0, function
is 1, etc.
Now I need a DocumentSemanticTokensProvider
, when using TypeScript it is possible to implement the interface vscode.DocumentSemanticTokensProvider
. It must have a method named the provideDocumentSemanticTokens that returns the list of tokens. I used it like this:
class DocumentSemanticTokensProvider {
async provideDocumentSemanticTokens(document) {
// Use the C++ code to create the list of tokens
const allTokens = bindings.get_tokens(document.getText());
// Create the builder using the lgend
const builder = new vscode.SemanticTokensBuilder(legend);
// Convert my list of tokens into one that VSCode understand
for (var i = 0; i < allTokens.size(); ++i) {
let token = allTokens.get(i);
// In my C++ code, column and lines start at 1, but with VSCode it starts at 0
builder.push(token.get_line() - 1, token.get_start_column() - 1, token.get_length(), token.get_type_index());
}
return builder.build();
}
}
And the last step is to make VSCode use the DocumentSemanticTokensProvider
:
function activate(context) {
// I want only to use it for my language
const selector = { language: 'painperdu' };
// Some boiler plate code
context.subscriptions.push(vscode.languages.registerDocumentSemanticTokensProvider(selector, new DocumentSemanticTokensProvider(), legend));
}
Here's what it can look like with some code from my Brainfuck interpreter written in PainPerdu:
The auto-completion
Auto-completion almost work the same, except that instead of a list of tokens I need to return a list of element with their type: in this case either a variable (for a PainPerdu reference) or a function (for a PainPerdu label).
I will create 2 GoCompletionItemProvider
, one giving the list of reference, the other one the list of labels:
class GoCompletionItemProviderLabels {
provideCompletionItems(document) {
const labels = bindings.get_defined_labels(document.getText());
let result = [];
for (var i = 0; i < labels.size(); ++i) {
let label = labels.get(i);
let completionItem = new vscode.CompletionItem();
completionItem.label = label;
completionItem.kind = vscode.CompletionItemKind.Function;
result.push(completionItem);
}
return result;
}
}
class GoCompletionItemProviderReference {
provideCompletionItems(document) {
const refs = bindings.get_defined_references(document.getText());
let result = [];
for (var i = 0; i < refs.size(); ++i) {
let ref = refs.get(i);
let completionItem = new vscode.CompletionItem();
completionItem.label = ref;
completionItem.kind = vscode.CompletionItemKind.Variable;
result.push(completionItem);
}
return result;
}
}
And now I just need VSCode to use them:
const selector = { language: 'painperdu' };
context.subscriptions.push(vscode.languages.registerDocumentSemanticTokensProvider(selector, new DocumentSemanticTokensProvider(), legend));
// These characters can only be followed by a label
context.subscriptions.push(vscode.languages.registerCompletionItemProvider(selector, new GoCompletionItemProviderLabels(), '.', '*', '&'));
// These characters can only be followed by a reference
context.subscriptions.push(vscode.languages.registerCompletionItemProvider(selector, new GoCompletionItemProviderReference(), '#', '@', '$', '>', '<', '+', '-', '?', '!'));
}
And voilà! It was that simple!
Article::~Article
It is my very first VSCode extension and even if it may seem dumb, I'm proud of it. Don't hesitate to point some errors or some things I could have done better, I'm not very comfortable using Javascript and I don't know well the VSCode API.
The extension could be improved, but I doubt that anybody will really use this extension. The whole project around this language is just an excuse to try some tools and have some fun.
The extension is named PainPerdu Bakery and you can find it on the vscode marketplace.
Top comments (0)