I think that the test cases included with packages might have the advantage of being able to obfuscate URLs or other strings as benign test dummy data.<p>This would be especially easy by using the technique called string sampling that the author mentions. I could choose a "Lorem ipsum" like text for use as dummy data, but ensure that the first letter of every word, when combined, forms the domain name of a server that will be used to download a second malicious payload.
This is why we designed TUF and in-toto to detect MitM attacks anywhere in the software supply chain between developers and end-users themselves, and provide E2E compromise-resilience.<p>It's strange that the paper doesn't mention us considering that we have considerable expertise in this very area.<p><a href="https://www.datadoghq.com/blog/engineering/secure-publication-of-datadog-agent-integrations-with-tuf-and-in-toto/" rel="nofollow">https://www.datadoghq.com/blog/engineering/secure-publicatio...</a>
What is the effectiveness of obfuscation? My understanding is that the existing dynamic analysis tools can usually defeat anything obfuscated within O(1 day).
They mentioned the dataset that they collected a few times in the paper, but I didn't find the actual data. Is that typical for this type of research?