From the section about how the qualitative difference between the two algorithms was found:<p>"In the second step, we conducted a manual comparison between two diff outputs produced by Myers and Histogram algorithms from all files in the sample. The first two authors of this paper were involved to independently annotate the diff outputs that makes the result is expected to be more reliable. ... The comparison results between two authors from 377 files were subsequently computed to find the kappa agreement.Footnote16 We obtained 70.82%, which is categorized into ‘substantial agreement’ (Viera and Garrett 2005). This means, the statistic result of our manual study is acceptable."<p>Even though I'm inclined to agree with the example given in the paper and a lot of work clearly went into the qualitative evaluation, this feels like a very weak way to perform a qualitative analysis. Specifically:<p>- this is a sample size of two academic authors who chose to write a paper together about the quality of different diffing algorithms, ie, a very skewed and small sample.<p>- there is no mention of any blinding in the labeling process, so any preconceptions about the quality of different diffing may have been present in qualitative grading -- or it may not have! We don't even know.<p>- there does not seem to be a clear mention of how the representative sample was chosen, or of what factors were taken into consideration for determining a representative sample of changes, so that reviewers/other researchers could potentially make different choices in the future and draw informed comparisons with this work.<p>To sum up: in my admittedly not at all authoritative opinion this portion of the paper cannot conclude more than something like, "further study is warranted on this topic, with a far better controlled and far larger sample size, and clearer explications of the methodological choices".<p>Regardless of that, it was an interesting read and not something previously on my radar as worth experimenting with at all! Kudos to the authors for drawing attention to it and for the other more quantitative aspects of the paper (which I examined less and charitably assume are top notch).