The author's problem is pretty simple: the test repo is required for pre-merge tests to pass, but it can be updated independently, without having pre-merge tests pass.<p>And the answer is pretty simple: pin the specific test repo version! Use lockfiles, or git submodules, or put "cd tests && git checkout 3e524575cc61" in your CI config file _and keep it in the same repo as source code_ (that part is very important!).<p>This solves all of author problems:<p>> new test case is added to the conformance test suite, but that test happens to fail. Suddenly nobody can submit any changes anymore.<p>Conformance test suite is pinned, so new test is not used. A separate PR has to update conformance test suite version/revision, and it must go through regular driver PR process and therefore must pass. Practically, this is a PR with 2 changes: update pin and disable new test.<p>> are you going to remember to update that exclusion list?<p>That's why you use "expect fail" list (not exclusion) and keep it in driver's dir. Ad you submit your PR you might see a failure saying: "congrats, test X which was expect-fail is now passing! Please remove it from the list". You'll need to make one more PR revision but then you get working tests.<p>> allowing tests to be marked as "expected to fail". But they typically also assume that the TB can be changed in lockstep with the SUT and fall on their face when that isn't the case.<p>And if your TB cannot be changed in lockstep with SUT, you are going to have truly miserable time. You cannot even reproduce the problems of the past!
So make sure your kernel is known or at least recorded, repos are pinned. Ideally the whole machine image, with packages and all is archived somehow -- maybe via docker or raw disk image or some sort of ostree system.<p>> Problem #2 is that good test coverage means that tests take a very long time to run.<p>The described system sounds very nice, and I would love to have something like this. I suspect it will be non-trivial to get working, however. But meanwhile, there is a manual solution: have more than one test suite. "Pre-merge" tests run before each merge and contain small subset of testing. A bigger "continuous" test suite (if you use physical machines) or "every X hours" (if you use some sort of auto-scaling cloud) will run a bigger set of tests, and can be triggered manually on PRs if a developer suspects the PR is especially risky.<p>You can even have multiple levels (pre-merge, once per hour, 4 times per day) but this is often more trouble than it worth.<p>And of course it is absolutely critical to have reproducible tests first -- if you come up to work and find a bunch of continuous failures, you want to be able to re-run with extra debugging or bisect what happened.