TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Logging in large mathematical models

70 pointsby pablobazabout 7 years ago

8 comments

meukabout 7 years ago
For my master thesis, I implemented a new and fancy algorithm. The code seemed to be fine and dandy for the usual, simple test cases. After using more elaborate test cases, I found cases that didn&#x27;t work well.<p>After contacting the author, who indicated he didn&#x27;t have such problems, I literally spent months trying to debug the code. When I finally gave up and re-wrote my implementation basically from scratch, and found the same problems, I contacted the author again. He then indicated that indeed there was a problem with the method for these cases, that he understood the problem and found a way to fix it. In hindsight, the problem was not hard to understand (but still, the claims in the paper were unwarranted IMO).<p>Conclusion? I wish I was a math prodigy, then I would have spotted the problem instantly. Also, be wary of claims made in papers.
评论 #16647340 未加载
petercooperabout 7 years ago
I know it&#x27;s a bit of a tangent, but proactive, large-scale logging of models like this (such as those used in machine learning) may become desirable to meet the requirements of GDPR. If you have to be able to explain how an algorithm made a decision, you need to be able to pull up data like this somehow.
评论 #16648410 未加载
arethuzaabout 7 years ago
A while back I was working on a system doing fairly complex engineering calculations and I implemented detailed logging of both the values used and the actual calculations performed.<p>This allowed me to be able to generate a spreadsheet (with the values and calculations in place) that could show a non-developer exactly how the outputs had been calculated (you could use the features of Excel to add visual annotations of precedents and dependencies).<p>I was pretty pleased with that approach.
no_identdabout 7 years ago
If you want IMMENSELY powerful logging, take a look at how the trace logging of Racket&#x27;s Medic Debugger works, an absolutely ingenious solution:<p><a href="https:&#x2F;&#x2F;docs.racket-lang.org&#x2F;medic&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;docs.racket-lang.org&#x2F;medic&#x2F;index.html</a><p>Highly interestingly, albeit a bit off topic, the authors of the paper from Medic originated very recently took this technique, and cranked it to 11:<p><a href="https:&#x2F;&#x2F;conf.researchr.org&#x2F;event&#x2F;sle-2017&#x2F;sle-2017-papers-debugging-with-domain-specific-events" rel="nofollow">https:&#x2F;&#x2F;conf.researchr.org&#x2F;event&#x2F;sle-2017&#x2F;sle-2017-papers-de...</a><p>…which won them a distinguished paper award!
评论 #16662386 未加载
dmichulkeabout 7 years ago
This looks to me like a standard logging toolchain where you just have programmatic access to the logs.<p>This is like claiming you save and load a json object (instead of its serialization) in some hashmap &#x2F; DB for fast lookup.<p>Am I missing something?
评论 #16647547 未加载
vogabout 7 years ago
That&#x27;s a great approach to logging&#x2F;debugging complex models on large datasets.<p>I&#x27;m pretty sure this can be applied outside the math, e.g. on systems with complex business rule over large datasets.
trextrexabout 7 years ago
I implemented a library that does exactly this -- <a href="https:&#x2F;&#x2F;github.com&#x2F;IGITUGraz&#x2F;SimRecorder" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;IGITUGraz&#x2F;SimRecorder</a> (In case anyone finds it useful). It supports storing data in both hdf5 and redis (although I wouldn&#x27;t recommend redis for storing large numpy arrays)
cocoablazingabout 7 years ago
What’s the advantage of the file system&#x2F;repo&#x2F;bespoke diag database over storing the numpy arrays in the existing database infrastructure?<p>Doesn’t implementing this system with HDF5 cause headaches for concurrency in either direction?
评论 #16679480 未加载