As a developer of traditional business application, i face often requirements to validate, parse and evaluate expression written by end users. It seems a tough question when i started coding. It's pretty easy now thanks to open source framework.
So this time, the requirements was to manipulate a small amount of data (less than 50 sheets with 100 lines each in an excel like).
After reviewing the requirements, it appears
- the end user despite having basic computer skills doesn't have any knowledge of any data manipulation language
- a simple SQL like (with significantly less keywords) seems to fit the end user needs
- a data manipulation through Linq can fit the needs, but cannot be easily expressed, and can raise issue if dynamically evaluated
In order to start, we need to list all supported characters and all supported keywords to get the structure of the expression to be parsed.
For a SQL Select, we can have
- Comma ","
- single quote "'"
- Dot "."
- Star "*"
- WHERE etc ..
For example, the sql syntax can be found here for SQL Server !
Once we know what shoud be parsed, we can list each character that will occurs.
This list will feed an enumeration used when reading each character of the expression building a list of token.
Once we have the list of token, we can parse it.
When we parse, we define the first expression that will lead to other smaller expression and so on till we arrive at the smallest and simplest one.
Defining an expression is simple with Superpower:
public static TokenListParser<SqlToken, SelectClause> SelectClause = from keyword in Token.EqualToValue(SqlToken.Keyword, "select") from columns in Expression.ManyDelimitedBy(Token.EqualTo(SqlToken.Comma)) select new SelectClause(columns); }
This expression defines that we need a token keyword (a string) of value "select", followed by 0 or multiple columns defined by another expression and so on.
public static TokenListParser<SqlToken, Expression> Constant = Token.EqualTo(SqlToken.Number) .Apply(Numerics.IntegerInt32) .Select(n => (Expression) new ConstantExpression(n));
At the end of the parsing, we have a tree of objects that defines the expression.
var tokenizer = new SqlTokenizer(); var tokens = tokenizer.Tokenize("select 1 + 23, 456"); Console.WriteLine("Tokens:"); foreach (var token in tokens) Console.WriteLine(token); Console.WriteLine(); var result = SqlParser.SelectClause.Parse(tokens);
The logic for computing the result can either be defined in the expression if the logic is simple as in the example.
For our case, we had to extract externally because we have to:
- get all the restriction defined in the WHERE clause
- get the name of the data source that need to be joined or selected
- get the data we need from sources
- eval the linq expression