For context, this is getting released just after the city council committee on technology had a public hearing about a bill that would require any software used by the government to prosecute folks for crimes to be open source:<p><a href="https://www.fwdeveryone.com/t/FrtKpHJ2T0mCic3j_FSzPQ/nyc-software-transparency-bill-1696" rel="nofollow">https://www.fwdeveryone.com/t/FrtKpHJ2T0mCic3j_FSzPQ/nyc-sof...</a><p>This bill would set the standard for the entire country, so the testimony is actually pretty interesting and worth watching, especially the opening speech by councilman Vacca.<p><a href="https://councilnyc.viebit.com/player.php?hash=xHL4m7vQXCM9" rel="nofollow">https://councilnyc.viebit.com/player.php?hash=xHL4m7vQXCM9</a><p>I also testify at 56:00, albeit hadn't prepared remarks in advance so wasn't as polished as some of the other folks.
Code in "Comparison.cs" on line 1307<p><pre><code> // if we don't have frequencies, use the default (this should never happen, but here it is)
if (tblFreq == null || tblFreq.Rows.Count == 0)
{
a = (float)0.02;
b = (float)0.02;
}
</code></pre>
is kind of fishy. Why not bomb out instead of failing silently for things that are never supposed to happen? Not saying this is inherently wrong - I'm certainly not capable of actually understanding the details of the analysis
I haven't had a chance to dig into the analysis part of the code yet, but here's an amusing variable assignment that jumped out at me in `FST.Common/ComparisonData.cs`.<p><pre><code> compareMethodIDSerializationBackupDoNotMessWithThisVariableBecauseItsAnUglyHackAroundACrappyLimitiationOfTheFCLsCrappyXMLSerializer = comparisonID;</code></pre>
On first glance it uses very old frameworks (for example, ASP.NET WebForms) and there are 0 unit tests.<p>A concerning thing is that a dev database username and password was committed to multiple files, like this: <a href="https://github.com/propublica/nyc-dna-software/blob/master/FSTServiceConsoleHost/app.config#L17" rel="nofollow">https://github.com/propublica/nyc-dna-software/blob/master/F...</a>.
The accompanying article/blog post linked in the Readme (<a href="https://www.propublica.org/article/federal-judge-unseals-new-york-crime-labs-software-for-analyzing-dna-evidence" rel="nofollow">https://www.propublica.org/article/federal-judge-unseals-new...</a>) has good background & information about the history of this, as well as potential impact moving forward.
I work in DNA analysis and stuff like this makes me happy specifically because I know just how easy it is to have flawed analysis techniques written in software.
Is there a large dataset of anonymized DNA samples that we could try this tool against? Does 23andme do something like that? I hope they don't but it would be very useful for just this one case!
Nested ternaries, ungodly amounts of repetition, ZERO tests, hardcoded sensitive info... this code wouldn't pass code review anywhere. I don't care when it was written, any day it is used to send people to jail it should be the best possible code humans can write. And it is so far from that.