DEV Community

Discussion on: Enforcing the Use of GitHub?

ikirker profile image
Ian Kirker

It's notoriously difficult to change the workflow of researchers, even when the change is widely acknowledged as being an improvement by others.

So, there are three ideas that may be helpful to consider:

  1. You want to make the new workflow as easy as possible to adopt. Putting in extra unautomated steps, even if beneficial in the long run, will make it more difficult to convince people to adopt it. One way to reduce the cost-factor of extra steps is to add things that make people feel positive about them. Automated testing, for regression testing, doctesting, and where possible unit-testing, is a good example here: if you're not using Travis or something like it already, try to start, with the branch and same-repo PR testing as well. Also try to make the documented usage steps include the git steps where possible.

  2. To a certain extent, you can raise the flag of Research Integrity. It's perfectly reasonable in a research environment to make make useful tooling that essentially requires you to be working with a version that knows where it came from (e.g. by git ref), and that really noisily fails if it doesn't. (How you do this will depend on working language obviously.) Carry through that metadata into all outputs. Point out when any outputs don't include metadata. This may mean pointing at your colleague's output and saying, "Well, how do I know what code this was made with? How could I reproduce this?"

  3. On Github, you can set up a repo such that it won't accept pushes or un-reviewed PRs to specific branches (see Protected Branches under the Settings tab). You could set this up on your "production" branches and your main/master branch -- the ones you use to produce data for papers and the like, and the ones you intend to keep long-term. (Versions used for production should probably also be tagged.)

Keeping code manageable and useful long-term in academia is a real problem. It's easy for researchers to work alone, untidily on code and for it to become a spaghettified mess that is more and more painful to modify by successive grad students. Code review and various automated tools around it can help, and can often fit into frequently-adopted research group meeting practices. Do read up on good code review practices before trying this for the first time, though: it's really easy for code review to become a really antagonistic and hostile affair, and some people will take it this way no matter what you do.

One minor language thing: git is not Github. Using git doesn't automatically mean using Github, and, to a lesser extent, vice-versa.

Also, if you're adding in doctesting, bear in mind it's not a great idea to make it your main form of testing: it's really easy to make it really unwieldy and it'll get in the way. Also, edge cases, such as you'd want to test, often don't make great examples, and doctesting is best for showing examples.