The 4 minute time for processing a single commit doesn't actually happen. That was for running git blame on the repository as it is currently, so there will be a lot of processing per file as git tries to find the origin of each line.
What if by default you go back in time only a default amount? Usually people don't need all the statistics since the beginning of time. Maybe your tool could go back 30 days of commits (or 2 months, or 3, don't know, an arbitrary default) and if the user specificies a different range then you go back til the dawn of the ages. This might save startup time as well (which is already a little slow with Python).
Now the implementation runs blame on all the changed files, but it should be possible to use the diff information to run blame only on the changed parts of each file.
Good idea! According to the doc git blame can take a range of lines, so you might be able to use that:
➜ devto git:(master) git blame -L 10,20 README.md
243c44e2 (Mac Siri 2018-08-08 10:36:32 -0400 10) </div>
243c44e2 (Mac Siri 2018-08-08 10:36:32 -0400 11) <br/>
^301c608 (Mac Siri 2018-02-28 16:11:08 -0500 12) <p align="center">
^301c608 (Mac Siri 2018-02-28 16:11:08 -0500 13) <a href="https://www.ruby-lang.org/en/">
0725b85e (Vinicius Stock 2019-01-09 18:59:38 -0200 14) <img src="https://img.shields.io/badge/Ruby-v2.6.0-green.svg"alt="ruby version"/>
^301c608 (Mac Siri 2018-02-28 16:11:08 -0500 15) </a>
^301c608 (Mac Siri 2018-02-28 16:11:08 -0500 16) <a href="http://rubyonrails.org/">
14551ea8 (Mac Siri 2018-07-12 13:19:13 -0400 17) <img src="https://img.shields.io/badge/Rails-v5.1.6-brightgreen.svg"alt="rails version"/>
^301c608 (Mac Siri 2018-02-28 16:11:08 -0500 18) </a>
65110550 (Mac Siri 2018-08-08 12:07:00 -0400 19) <a href="https://travis-ci.com/thepracticaldev/dev.to">
65110550 (Mac Siri 2018-08-08 12:07:00 -0400 20) <img src="https://travis-ci.com/thepracticaldev/dev.to.svg?branch=master"alt="Travis Status for thepracticaldev/dev.to"/>
It would be nice to allow use of other databases.
I was saying that mainly because it would be so much easier to ship a self contained command line tool, instead of asking people to install PostgreSQL to use it. SQLite is present in many standalone apps for that very reason.
The Postgres requirement is because some columns use the ARRAY type
Got it, a possible alternative is to use the new JSON type in SQLite, you might be able to get something out of it.
I'm a generalist developer, preferring to have some skills in a variety of areas to being really good at only a few. I need to see how a technology solves real problems to really understand it.
What if by default you go back in time only a default amount? Usually people don't need all the statistics since the beginning of time. Maybe your tool could go back 30 days of commits (or 2 months, or 3, don't know, an arbitrary default) and if the user specificies a different range then you go back til the dawn of the ages. This might save startup time as well (which is already a little slow with Python).
Good idea! According to the doc git blame can take a range of lines, so you might be able to use that:
I was saying that mainly because it would be so much easier to ship a self contained command line tool, instead of asking people to install PostgreSQL to use it. SQLite is present in many standalone apps for that very reason.
Got it, a possible alternative is to use the new JSON type in SQLite, you might be able to get something out of it.
A drawback of using SQLite and writing to the DB is that it doesn't play behave very well with lots of concurrency
That's a very good idea!
Thanks, a JSON array could work as an ARRAY replacement. I'll have a look.