Jimmy Tang

Yet more random mutterings

Git-subtree Is Now in Mainline Git

Comments

It seems that git-subtree has been merged into the mainline git project. git-subtree is one of those really useful wrappers for manipulating git repositories if you are an integrator or don’t really like git-submodule. It certainly is a nicer alternative to git-submodule. There is less chance of the person cloning a git-subtree repository messing up the checkout because they forgot to initialise the submodules. There is also the added advantage of not needing to fiddle around scripting up the git-archive commands for exporting the source tree for a release if you use git-subtree; actually git-subtree is just plain useful.

Slurm User Group Meeting 2012

Comments

This time the usergroup meeting is in BSC in Spain (this place is cool, I’ve been at BSC for a few conferences and meetings before). I wish I was doing some HPC right now, I’m very tempted to submit an abstract or two on the last two things with slurm that I was involved with.

The php-slurm package needs some love, its somewhat in use at work. I just wish I had more time to fix the memory leaks. There is also slurm-bank which is seeing some good use in work, though it could do with a rewrite in C.

MOSH - an Alternative to SSH

Comments

This tool hasn’t been around for too long, but long enough to make it into quite a few distros. It looks like it’s a good alternative to autossh and tmux which I have been using for the past few years. Like many new tools like this, if it’s not around everywhere it’s not as useful as it could be.

The only minor criticism of the application is that the mosh command itself is written in perl, it would be nicer if the wrapper command would be written in C/C++ like the rest of the application. It would certainly make it more portable. I guess I should shut up or write my own mosh wrapper replacement.

Other than that, I’ve got it installed on one or two machines and it seems nice enough so far. I now need to test it out in the field on a bad connection when I’m away at a team meeting. I’m still not entirely sure of how good or bad it is in terms of security compared to SSH.

Prototyping and Testing Systems

Comments

One of the issues with with dogfooding your own projects to accelerate development might be the lack of control and feedback from the specifications and requirements process. To try and mitigate this effect, automated testing should be done, that is specification, feature and behavioural testing. Call it what you will, but the basic idea is to get a common understanding between the stakeholder, project owner and developer to understand what is being built and to write automated tests collectively to ensure that it is being delivered. This might be a narrow view of the whole area, but I’m just taking what works for me and using it to deliver the project.

There are many specification/feature/behavioural testing tools out there for almost language that you can think of, so use what works for you and your team. The testing process not only ensures that the prototype is working the way that you intend, but it is also a process where documentation can also be written at the sametime. This documentation could be used as an initial proposal to the stakeholder to put forward what you think they want if there are no clear specifications or requirements in place.

The interns and I have been working on a small prototype system for a bigger project and the benefits of writing tests are beginning to show. It has become apparent to the interns that have been working on this project that testing is a good thing, especially if it can be automated. We’re not quite doing TDD or BDD, but it’s something that is in between, we’re getting there with a tiered set of tests.

We’re finding that (probably) about 50% of the time of the team is spent on refactoring, writing tests and documentation. Testing combined with the automated builder/tester, the team is writing code smarter and better instead of just churning out masses of code which isn’t well tested or documented. Given the choice and based on experience I would prefer to have code that is tested and documentated, rather than lots of cool half-working and half-tested features.

The testing process has been a fantastic way for me to steer the interns, given how little expertise I have with javascript. The tests let me learn how the interns have been putting the prototype together, but it also lets me fuzz up the tests to make sure things are working and to also write new tests to communicate what I think is needed when appropriate. We’ve somewhat combined minimal QA into the development and testing process.

In the end we hope to have a functional prototype system which does one thing (one set of workflows) well, have lots of documentation, have tests to back it up and prove that it works. While having an implementation is great for the potential stakeholder, having documentation and tests puts us in an even stronger position.

Gitbuilder Aggregator

Comments

We use git and gitbuilder in work for a large number of projects, we also try and test things as much as we can. I first noticed that someone had written an aggregator for gitbuilder at ceph gitbuilders, this seemed like a great idea (and it is) except the aggregator at the time didn’t quite work very fast and needed some ajax magic.

I had asked for a copy of the aggregator script from the ceph developers, this was really just a perl hack as they said, but it works. Since we had some students doing an internship here to learn new things, I got one of the interns to write an ajax’d up version of the aggregator.

After a few weeks worth of usage and minor changes, it’s a bit more ready to share with everyone, the ajax’d up version of the aggregator can be found at my github account in the develop branch. For fun I updated the main gitbuilder cgi scripts to use twitter bootstrap and add a link to the errcache file that gitbuilder generates.

We found that with large builds the logs would just swamp out the errors and warnings and having access to the errcache helped a lot in narrowing down where to look for problems, hence the linking to the errcache.

At somepoint it might be worth re-implementing the gitbuilder scripts in a single language in a generic way such that it works with other DVCS’s that have the bisect feature.

Dogfooding Your Own Project to Accelerate Development

Comments

Should you dogfood your own project that you are developing? The answer is probably yes, especially if you have no clear cut requirements from the stakeholder in a project with a greenfield for development. There is a lot to be said about having a working implementation that can be presented and refined.

Sometimes the project that you are working on won’t have clear requirements for implementation, so you should probably take basic assumed cases and run with it. Starting early to see what works and what doesn’t work is a pragmatic approach which the waterfall crowd might not like. But hey, an implementation speaks for itself.

If you don’t use what you develop, then it is very hard to relate to the customer/end-user in the long run. On the note of dogfooding your own work, sometimes best-practice might cost too much in terms of time and money, sometimes good-enough practice might just be enough to deliver a functioning product.

Accelerating development by dogfooding your own work and using good-enough practices should increase throughput of development, but not necessarily quality. Again in a greenfield project where there aren’t many requirements due to the schedule, it’s worth taking this approach until hard requirements get delivered.

Git Rerere for Long Lived Feature Branches

Comments

I turned this feature on for a few of my git repos but I had completely forgotten about it. As far as I recall the feature has been around for a few years now. It can be turned on globally by doing

git config --global rerere.enabled 1

It pretty much automates the resolution of conflicts in long lived branches. I’ve been lazy recently and I have just doing merges instead rebasing, which lead me to re-discover git rerere.

Enabling Latent Semantic Indexing for Octopress

Comments

For those that care about having related posts on their Octopress blog. It’s actually quite easy to turn it on, it’s nice to have and useful. But it’s not enabled by default in Octopress.

This feature already exists in jekyll, enabling this feature in Octopress is a trival task.

Firstly add this to your _config.yml file

lsi: true

Then create a file such as source/_includes/custom/asides/related.html with the following content, add it to one of your asides in _config.yml

<section>
<h1>Related Posts</h1>
<ul class="posts">
    {% for post in site.related_posts limit:5 %}
    <li class="related">
        <a href="{{ root_url }}{{ post.url }}">{{ post.title }}</a>
    </li>
    {% endfor %}
    </ul>
</section>

It is possible to style the list, but in the above I have chosen to keep the same style as the recent posts.

Probable issues with enabling LSI

There are some issues with enabling LSI in jekyll/octopress, the primary issue will be performance. The default implementation will be slow if you have lots of posts to classify. It would be recommended that rb-gsl be installed to accelerate the classification process.

Planet DRI - a News Aggregator for Digital Humanities and Digital Preservation Projects in Ireland

Comments

This is just a temporary solution till something better or more appropriate comes along. It’s just an ikiwiki news aggregator for websites and projects which are related to Digital Preservation, Digitial Humanities and other related bits and pieces.

It was pretty much setup for myself to keep up to date with all the latest happenings. I found myself falling behind in being knowledgeable in all things related to Digital Repositories Ireland.

I hope that my team mates will appreciate and contribute to this little effort. I don’t think I’ve ever come across a Digital Humanities and Digital Preservation aggregator before.