Gitstats – the easiest way to see stats.
This will be a shortish take on an application called gitstats which I found few months back.
First of all apologies for not communicating enough. Have just been behind with work and stuff and hence haven’t had dedicated time to write and share stuff which I liked. So some months back I had found gitstats and found it was already packaged in Debian. As aptitude shows :-
$ aptitude show gitstats
Automatically installed: no
Maintainer: Vincent Fourmond
Uncompressed Size: 66.6 k
Depends: python (>= 2.4.4), git (>= 1:1.7) | git-core (>= 1:18.104.22.168), gnuplot-nox
Description: statistics generator for git repositories
GitStats is a statistics generator for git repositories. It examines the repository and produces some interesting statistics from the history.
Currently it outputs only HTML.
It is the equivalent of statcvs and statsvn for git repositories.
The depends it shares is illusionary, you need all the binaries of gnuplot as well as groff . Gnuplot for all the graphs and groff for manipulation of the text.
Once it’s installed, the usage is simple. You simply need a directory which has .git in order for gitstats to draw inferences from as well as a directory to dump all the inferences to .
In my case I chose gtimelog as it’s a project I loved and have contributed to via triaging various bugs and filing wishlist bugs with upstream.
The first thing I did was to navigate just a directory above where gtimelog resides
$ cd games
Then I made a directory called gtimelog-stats
/games $ mkdir gtimelog-stats
Once this was done fired the gitstats command :-
~/games$ gitstats gtimelog gtimelog-stats
Once the command is given it first looks at the gnuplot version, once that is satisfied it starts doing its job :-
[0.42837] >> gnuplot --version
Output path: /home/shirish/games/gtimelog-stats
Git path: gtimelog
[0.12687] >> git shortlog -s HEAD | wc -l
[0.00215] >> git show-ref --tags
[0.00195] >> git log "10b085c83ba4364166070cd29807b87b76f0e21d" --pretty=format:"%at %aN" -n 1
[0.00197] >> git log "93cb6872d38f262e6e8406189ab5204d75f25ce0" --pretty=format:"%at %aN" -n 1
[0.00413] >> git shortlog -s "0.2.3"
[0.00240] >> git shortlog -s "0.2.4" "^0.2.3"
[0.01125] >> git rev-list --pretty=format:"%at %ai %aN <%aE>" HEAD | grep -v ^commit
[0.01027] >> git rev-list --pretty=format:"%at %T" HEAD | grep -v ^commit
[0.01751] >> git ls-tree -r --name-only "77d0e807e3b34bce91a3fb1c0daca054594a84ef" | wc -l
[0.01930] >> git ls-tree -r --name-only "7a4523d4502a78ef9f9a10289268d66f35e04edd" | wc -l
[0.01581] >> git cat-file blob 70df8f5d862abf89bd7a6edefd44bf3720bbbdc7 | wc -l
[0.02148] >> git cat-file blob 9455aa72f02699414905dd3d95084a87f1779a53 | wc -l
[0.20930] >> git log --shortstat --first-parent -m --pretty=format:"%at %aN" HEAD
[0.15780] >> git log --shortstat --date-order --pretty=format:"%at %aN" HEAD
[0.00298] >> git --version
[0.01352] >> gnuplot --version
[0.08315] >> gnuplot "/home/shirish/games/gtimelog-stats/lines_of_code_by_author.plot"
[0.01980] >> gnuplot "/home/shirish/games/gtimelog-stats/day_of_week.plot"
[0.02148] >> gnuplot "/home/shirish/games/gtimelog-stats/domains.plot"
[0.02053] >> gnuplot "/home/shirish/games/gtimelog-stats/commits_by_year.plot"
[0.01883] >> gnuplot "/home/shirish/games/gtimelog-stats/hour_of_day.plot"
[0.02043] >> gnuplot "/home/shirish/games/gtimelog-stats/commits_by_year_month.plot"
[0.02094] >> gnuplot "/home/shirish/games/gtimelog-stats/files_by_date.plot"
[0.02112] >> gnuplot "/home/shirish/games/gtimelog-stats/lines_of_code.plot"
[0.01984] >> gnuplot "/home/shirish/games/gtimelog-stats/month_of_year.plot"
[0.04385] >> gnuplot "/home/shirish/games/gtimelog-stats/commits_by_author.plot"
Execution time 2.65777 secs, 1.33573 secs (50.26 %) in external commands)
You may now run:
As can be seen gitstats traverses each shortlog, each release tag, each commit and at the very end after computing the git and gnuplot version again (although why it does want to know the gnuplot version again is beyond me) it generates graphs/plots for the same.
Before we see the html page being generated (courtesy groff) it is time to see what all is in gtimelog-stats
activity.html commits_by_author.png day_of_week.dat files_by_date.plot hour_of_day.png lines_of_code.plot arrow-down.gif commits_by_year.dat day_of_week.plot files_by_date.png index.html lines_of_code.png arrow-none.gif commits_by_year_month.dat day_of_week.png files.html lines.html month_of_year.dat arrow-up.gif commits_by_year_month.plot domains.dat gitstats.cache lines_of_code_by_author.dat month_of_year.plot authors.html commits_by_year_month.png domains.plot gitstats.css lines_of_code_by_author.plot month_of_year.png commits_by_author.dat commits_by_year.plot domains.png hour_of_day.dat lines_of_code_by_author.png sortable.js commits_by_author.plot commits_by_year.png files_by_date.dat hour_of_day.plot lines_of_code.dat tags.html
This is what you get when you view the stats from the browser :-
The best page though is the activity page in the tabs shared :-
So as can be seen, while the project had a long life it really came into its own around 2011 and then had a massive jump in 2015. This was in part, due to the fact that yours truly needed a time-keeping/ time logging mechanism for some of my clients. I had looked at various apps. including project hamster, kimai, task coach, rachota, dotproject, Kontact and Timeslot tracker. Each of these I had to reject due to something or the other. For e.g. Timeslot tracker uses java which is a bit heavy for a simple tool like this. Similarly Kontact is too big and unwieldy if I’m just looking for a single user time-tracking solution. When none of the above appealed to me, I found gtimelog. It was simple, easy to work with (documentation is and was lacking but that I guess that is the bane of almost every free software project) . Anyways, worked with the developer doing simple QA and discussions and today it is as good as a tool as any of the above mentioned.
Last and not the least have to share the authors life as that tells who all participated in the project.
I was shocked to find my name in the same project which features Martin Pitt (a renowned Ubuntu and Debian Developer) and Barry Warsaw (a python guru who made many a PEP as well as one of the principal authors behind GNU Mailman). My only commit was also due to Marius, the principal author of the tool who felt I had contributed to the app. so it would be good if I did a commit which would add my name with a commit so now you can see it in the credits.
Anyhow, the tool itself is fantastic and I am sure reading this, people will be jumping headlong both into gtimelog as well as gitstats. All those college students who keep asking what project they should be doing, this is a good example of a tool which can be used to bring out lot of analysis about the community around a tool/project and how a tool/project is looking over a period of time. Even if you are not good at programming, if you are good at analysis and there is quite a bit of meta-data which every project churns out that it could work for using the stats to have an overview about a project/projects which would help in decision-making. There is also possibility of some glory and fame for the one who does the hard work of bringing such stats to the fore.
Till another time, adieu 🙂
Update – 04/10/2015 – The Debian Maintainer updated gitstats to latest upstream, see http://bugs.debian.org/800731