This week I worked on adding a .TOML config parsing feature to a tool called Scrappy.
KrinsKumar / Scrappy
Scrappy
Scrappy
Scrappy is a command line tool that will convert any website that can be scraped into a markdown.
How to use scrappy
- Download the repo.
- Run the following commands with the updated path variable that points to the location of the repo, run with
sudo
if there is a permission issue.
npm i
chmod +x /<PATH>/Scrappy/src/args/command.js
ln -s /<PATH>/Scrappy/src/args/command.js /usr/local/bin/scrappy
- You will need groq API key to convert from page to md. Once you obtain you key, run the following command to update the key in you system.
scrappy --api-key <YOUR_API_KEY>
or
scrappy --a <YOUR_API_KEY>
Config
To set default options and arguments, you can create a .scrappy.toml
file in your home directory ~/
with the following config options:
url = "some_url"
inputFile = "some_input_file"
outputFile = "some_output_file"
tokenUsage = true | false
stream = true | false
Features
-
Input: The main feature is that you can convert any…
The tool is authored by my classmate Krinskumar, or Krins for short.
The Programming
The options I added support for in the config file were: an input URL, an input file, an output file, toggling streaming, and toggling token usage data.
I also updated the README with information on how to use the config file.
feat: add support for `~/.scrappy.toml` config file #9
Fixes #8. Enables parsing of a config file at ~/.scrappy.toml
with support for the following keys:
url
inputFile
outputFile
tokenUsage
stream
-
Installed npm package
smol-toml
-
Added an async function
parseConfig(configFilePath)
to handle parsing the config file./** * Parses a TOML config file for options * @param {string} configFilePath Path to .toml config file * @returns {{ url: string | undefined, inputFile: string | undefined, outputFile: string | undefined, tokenUsage: boolean | undefined, stream: boolean | undefined }} An object containing the parsed options */ export async function parseConfig(configFilePath) { const configfileContent = await fsP.readFile(configFilePath, { encoding: "utf8", }); const configOptions = TOML.parse(configfileContent); return configOptions; }
-
Made
validateArgs()
an async function so it can call the above function. Added a try/catch around the call which catches parsing errors but lets the program continue if the file doesn't exist. -
Modified argument/config parsing flow:
- Use URL CLI option if present
- if missing, use URL config option if present
- if missing, use input file argument if present
- if missing, use input file config option if present
- if missing, exit
-
Added comments explaining above flow
-
Added config section to README
I used the module fs:promises
in order to take advantage of async/await syntax, while the rest of the program uses fs
. Let me know if you'd like this changed.
</div>
<div class="gh-btn-container"><a class="gh-btn" href="https://github.com/KrinsKumar/Scrappy/pull/9">View on GitHub</a></div>
The hardest part was understanding the control flow for parsing arguments since Krins's program manually handled argument parsing with conditionals as opposed to working with a library like yargs or commander. I wrote it out to help me understand, and added comments explaining it.
Another thing I ran into trouble with was the fs
module in Node. I tried using async/await
syntax with it because I remembered doing the same for my project, but it wouldn't work. Turns out I was confusing it with the fs/promises
module. Although Scrappy already imported fs, I went ahead and imported fs/promises
anyway because it's easier to work with, and left a note in the pull request about the new import.
I had a decent idea of what I needed to do because somebody else had already added the feature to my project. For example, since the person contributing to my project used fs/promises
too (because my code already used it), I knew what code the error thrown would have if the config file wasn't found.
I also used a JSDoc style comment for the function I added. I saw them in another assignment I've been working on and wanted to learn how they work so I could comment my code better. This enabled me to add a description, and specify parameter and return value types, which would be recognized by VSCode's IntelliSense, making it easier to work with. I think it's super helpful being able to view how a function works by hovering over it in the editor, so adding that capability to code I worked on felt good.
Using git remote
I also received a pull request from my classmate Harshil to add the feature to my project.
This week, I learned how to use git remote add
to add the contributor's fork as a remote to my local repository. In the past, when working with other people's forks, I'd always clone them to a separate folder, but when trying this method while reviewing this pull request, I realized doing it this way was much faster and fairly straightforward. I'll definitely be using this method in the future - it should help save quite a bit of time.
That's it for this post, thanks for reading!
Top comments (0)