loading...

Discussion on: What is a project, you are really proud of?

Collapse
canro91 profile image
Cesar Aguirre
  1. Parsinator
  2. It's a Net Standard library to parse pdf and plain-text files
  3. Personal
  4. Kind of. There's a slightly similar version in production in a project for a past work
Collapse
mayankjoshi profile image
mayank joshi Author

So, from what I read on Github, all I understood is that it is parse a PDF and plain-text files and generates a Machine parsable XML file.

Parsinator allows you to extract relevant information from any text-based file.

By this line, you mean that it only parses the main content, the actual information and ignore lots of unnecessary details, like date, page number. right?
If this is the case then there might be lots and lots of base cases you have taken care of.

Tell me if my understanding is wrong.

I need to dig deeper now, as it sounds interesting.

Collapse
canro91 profile image
Cesar Aguirre

Yes, you're right. The idea was to parse from a set of composable rules or "parsers" the relevant information from a pdf or a plain text file. It was heavily inspired by parsers combinators from Haskell. The main use case was given a pdf for a multi-page invoice, create an xml to feed a REST API.