"If we have a CUDA kernel that continuously runs for 10 seconds but only uses 1 SM, on an H100, this would register 100% utilization, but the SM efficiency would be 1 / 132 = 0.7%."<p>does this situation register 100% utilization?
BTW, the SM OCCUPANCY is also a metric you need to care about if you concern on kernel efficiency