DEV Community

Discussion on: Introducing Git Hammer: Statistics for Git Repositories

Collapse
 
rhymes profile image
rhymes

The 4 minute time for processing a single commit doesn't actually happen. That was for running git blame on the repository as it is currently, so there will be a lot of processing per file as git tries to find the origin of each line.

What if by default you go back in time only a default amount? Usually people don't need all the statistics since the beginning of time. Maybe your tool could go back 30 days of commits (or 2 months, or 3, don't know, an arbitrary default) and if the user specificies a different range then you go back til the dawn of the ages. This might save startup time as well (which is already a little slow with Python).

Now the implementation runs blame on all the changed files, but it should be possible to use the diff information to run blame only on the changed parts of each file.

Good idea! According to the doc git blame can take a range of lines, so you might be able to use that:

➜  devto git:(master) git blame -L 10,20 README.md
243c44e2 (Mac Siri       2018-08-08 10:36:32 -0400 10) </div>
243c44e2 (Mac Siri       2018-08-08 10:36:32 -0400 11) <br/>
^301c608 (Mac Siri       2018-02-28 16:11:08 -0500 12) <p align="center">
^301c608 (Mac Siri       2018-02-28 16:11:08 -0500 13)   <a href="https://www.ruby-lang.org/en/">
0725b85e (Vinicius Stock 2019-01-09 18:59:38 -0200 14)     <img src="https://img.shields.io/badge/Ruby-v2.6.0-green.svg" alt="ruby version"/>
^301c608 (Mac Siri       2018-02-28 16:11:08 -0500 15)   </a>
^301c608 (Mac Siri       2018-02-28 16:11:08 -0500 16)   <a href="http://rubyonrails.org/">
14551ea8 (Mac Siri       2018-07-12 13:19:13 -0400 17)     <img src="https://img.shields.io/badge/Rails-v5.1.6-brightgreen.svg" alt="rails version"/>
^301c608 (Mac Siri       2018-02-28 16:11:08 -0500 18)   </a>
65110550 (Mac Siri       2018-08-08 12:07:00 -0400 19)   <a href="https://travis-ci.com/thepracticaldev/dev.to">
65110550 (Mac Siri       2018-08-08 12:07:00 -0400 20)     <img src="https://travis-ci.com/thepracticaldev/dev.to.svg?branch=master" alt="Travis Status for thepracticaldev/dev.to"/>

It would be nice to allow use of other databases.

I was saying that mainly because it would be so much easier to ship a self contained command line tool, instead of asking people to install PostgreSQL to use it. SQLite is present in many standalone apps for that very reason.

The Postgres requirement is because some columns use the ARRAY type

Got it, a possible alternative is to use the new JSON type in SQLite, you might be able to get something out of it.

A drawback of using SQLite and writing to the DB is that it doesn't play behave very well with lots of concurrency

Thread Thread
 
vorahsa profile image
Jaakko Kangasharju

What if by default you go back in time only a default amount? Usually people don't need all the statistics since the beginning of time.

That's a very good idea!

Got it, a possible alternative is to use the new JSON type in SQLite, you might be able to get something out of it.

Thanks, a JSON array could work as an ARRAY replacement. I'll have a look.