Alfresco: Simple File Diff

I’ve heard asked many times by customers and community members if there was a way to diff files in Alfresco and alas there isn’t an OTB way to do this.  A month ago the discussion came up again internally.  And I thought it might be fun to tackle this as side project just to see if/what was possible.  So I took an evening and hammered out a simple Java class that did a comparison between two text files.  Once I saw that I had at least the basics (annotate the differences between two files) and had gotten the question of basic possibility/difficulty out of the way I moved on to other projects.

Today almost the entire family is sick so I thought I’d pick up the project again, moving the Java class to a Java Backed web script.

The web script is a simple GET that takes the nodeRef of two files, or two versions of the same file and outputs a simple HTML page that highlights the differences between the two.  There are no complex algorithms that take into account shifts in blocks or identifies just the text in a line that has changed.  It is a simple line by line comparison of two pieces of content. It is not integrated in to Share or Explorer at this time.  I might take that as a separate sick day project (or accept any code contributions to add that).

I’ll admit right off that the code is ugly and repetitive.  But this is more of a Proof of Concept than a full production ready implementation (though it could definitely be used as such to provide a quick view of differences).

I’ve also probably bored you with the above so let’s just jump right in before I completely lose you… ;-)

Using The Web Script

The web script is called by the following URL:

alfresco/service/file/{protocol_1}/{identifier_1}/{id_1}/diff/{protocol_2}/{identifier_2}/{id_2}

For real world examples we’ll first look at comparing two files

alfresco/service/file/workspace/SpacesStore/dd83c9f6-81b7-462b-8a1a-1e9a2af251dd/diff/workspace/SpacesStore/ca65d129-6c2c-4ba0-936d-d7626f94f23a

Second comparing two versions of the same file

alfresco/service/file/workspace/SpacesStore/dd83c9f6-81b7-462b-8a1a-1e9a2af251dd/diff/versionStore/version2Store/ca65d129-6c2c-4ba0-936d-d7626f94f23a

What is returned is, as stated above, a simple HTML highlighting the differences

Each line that is different is highlighted in blue.  Simple and to the point.

The Code

This is just a little Declarative Web Script that reads the content line by line and then compares the hash of each line to see differences.  When a difference is found it is wrapped in HTML to annotate the difference so that when displayed, CSS can take care of highlighting the differences.

A couple of things that I think are important to note:

  • File length: When comparing two files there is always the possibility that one is longer/shorter than the other.  To simplify the comparison, I just append lines with a single space to the shorter file, simplifying any computational work needed for the comparison caused by the difference in length.
    • I mentioned above that the appended line contains a single space.  This is done so the that the line appears in the output.  <div> tags with no content can be ignored by some browsers. The annotation/presentation uses a combination of <div> and <pre> tags.  The space is maintained in a <pre> tag forces the div element to be visible.
  • **Special Characters: **Because the output for the comparison is targeted for HTML, it is important to escape all characters/strings that could be interpreted by the browser as presentation elements.  Apache Commons (included with Alfresco) has classes to help do this.
  • Gotcha!: When I was initially testing the code, the file content kept appending files to the previous request.  So remember when defining a Collection as a class scoped variable to call clear() on the List to make sure it is empty before it gets reused.

This extension is available as an AMP. The source is available in the Google Code Project.