I think writing code to parse stuff is fun.
Sometimes it's nice to get away from regex hell and make our own fun hell.
I want to note: I have no real idea what the best way to do any of this is. There's way smarter people than me that made much better parsers than me.
I'm doing this just because it's fun. And why can't we have a little fun once in a while?
A simple INI file
INI files are pretty simple. Let's parse one in javascript.
INIs are really just a vague concept so we can do whatever we want!
The file to parse
key=value
hello=world
;comment here
[section]
foo=bar
What we want our object to look like after parsing
{
key: "value",
hello: "world",
section: {
foo: "bar"
}
}
We are going to go over the file character by character.
Depending on what state we are in we will do different things.
In the default state let's trim whitespace until we get to a comment, section or key.
function isWhiteSpace(at){
if(at === "\t" || at === " "){
return true;
}
}
let state = "default";
while(index < file.length){
let at = file[index];
if(state === "default"){
if(at === "\r" && file[index+1] === "\n"){
index+=2;
}else if(at === ";"){
state = "comment";
} else if(at === "["){
state = "section";
} else if(isWhiteSpace(at){
//we have to negate the index++ so we don't eat the first character of the key
index--;
state = "key";
}
index++;
}
Parse some keys
We want to keep track of the key name. And we will keep parsing for a key name until a character =
. (We could allow it to be escaped eg \=
but I don't want to.)
//...
let keyName = "";
while(/*...*/) {
// ...
else if(state === "key"){
if(at !== "\n" && at !== "="){
keyName += at;
} else if(at === "\n"){
state = "default";
keyName = "";
} else {
state = "value";
}
index++;
}
}
So now we have a key. You may have notice we just ignore a line that doesn't have an =
at all. You could throw an error or make it an empty string. But I don't want to.
Parse a value
The process for getting a value is much the same as getting a key. We just go until the end of the line. But we also want to set the key to the value when we get to the end of the line.
//...
const result = {};
//at first the section is just the root of result
let section = result;
let value = ""
while(/*...*/){
//... All that other code...
else if(state = "value"){
if(at !== "\n"){
value += at;
} else {
//End of the line let's do this
section[keyName] = value;
keyName = "";
state = "default";
}
index++;
}
}
You'll notice how I use section
instead of result there because we also ...
Parse a section
When we get to a section we need to nest. That is why we want a final result
as well as a reference to the current section
we are in
//...
let sectionName = "";
//while(){
//...other stuff
else if(state == "section") {
if(at === "]" || at === "\n"){
result[sectionName] = {};
section = result[sectionName];
sectionName = "";
state = "default"
} else {
sectionName += at;
}
index++;
}
//}
So now we have sections for this. We have no end of section so it's the current section until a new one comes along.
Parsing comments
Comments are easy we just do nothing until the new line.
else if(state === "comment") {
if(at === "\n"){
state = "default"
}
index++
}
The entire parser
Okay I broke it down to show you and so here's the entire thing.
const file = `
key=value
hello=world
;comment here
[section]
foo=bar
`
function isWhiteSpace(at){
if(at === "\t" || at === " "){
return true;
}
}
let index = 0;
const result = {};
let section = result;
let state = "default";
let keyName = "";
let value = "";
let sectionName = "";
while(index < file.length){
let at = file[index];
if(state === "default"){
if(at === ";"){
state = "comment";
} else if(at === "["){
state = "section";
} else if(!isWhiteSpace(at)){
state = "key";
index--;
}
index++;
} else if(state === "key") {
if(at !== "\n" && at !== "="){
keyName += at;
} else if(at === "\n"){
state = "default";
keyName = "";
} else {
state = "value";
}
index++;
} else if(state === "value") {
if(at !== "\n"){
value += at;
} else {
//End of the line let's do this
section[keyName] = value;
keyName = "";
value = "";
state = "default";
}
index++;
} else if(state == "section") {
if(at === "]" || at === "\n"){
result[sectionName] = {};
section = result[sectionName];
sectionName = "";
state = "default"
} else {
sectionName += at;
}
index++;
} else if(state === "comment") {
if(at === "\n"){
state = "default"
}
index++
}
}
So parsing stuff can be fun. Try it out here. Give it some other input and see where it breaks. I bet I made a lot of stupid mistakes. But that's part of the fun.
Let me know if you have any questions and checkout out my project: DropConfig
Top comments (0)