TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Full screen triangle optimization

115 pointsby rckabout 2 years ago

9 comments

ddrenabout 2 years ago
I wonder how this is implemented in the GPU. From my time working on a 3D renderer a long time ago, triangles with offscreen vertices would be clipped into smaller triangles, so in the end you would still be rendering multiple triangles anyway. I imagine it would also be possible to clip the scanlines instead.
评论 #35076792 未加载
londons_exploreabout 2 years ago
A bigger reason to do this is that on some (shoddy) hardware, the user sees a tear line along the diagonal of the triangles.<p>It&#x27;s as if sometimes one triangle was rendered before the vsync, while the other was rendered after it.
评论 #35076224 未加载
oblabout 2 years ago
<p><pre><code> In actual hardware shading is done 32 or 64 pixels at a time, not four. The problem above just got worse. </code></pre> While it&#x27;s true that there are &quot;wasted&quot; execution in 2x2 quads for derivative computation, it&#x27;s absolutely not the case that all lanes of a hardware thread (warp &#x2F; wavefront) have to come from the same triangle. That would be insanely inefficient.<p>I dont think that it&#x27;s publicly documented how the &quot;packing&quot; of quads into lanes is done in the rasterizer for modern GPUs. I&#x27;d guess something opportunistic (maybe per tile) taking advantage of the general spatial coherency of triangles in mesh order.
评论 #35076281 未加载
评论 #35075867 未加载
ttoinouabout 2 years ago
Why didn&#x27;t they ever implemented a rectangle primitive to be drawn instead of a triangle ? Anyway, here the perf impact is negligible
评论 #35075934 未加载
评论 #35075781 未加载
评论 #35075937 未加载
评论 #35075866 未加载
评论 #35078144 未加载
评论 #35077672 未加载
nsajkoabout 2 years ago
&gt; In my microbenchmark1 the single triangle approach was 0.2% faster than two.<p>Sounds like something that would be within the margin of error? Seems especially meaningless because it&#x27;s just the <i>average</i> of the timings, instead of something that would visualize the distribution, like a histogram or KDE.
评论 #35076666 未加载
评论 #35076722 未加载
lukkoabout 2 years ago
This is interesting, but also wouldn&#x27;t the texture mapping &#x2F; UVs be more confusing and possibly outweigh the benefit of micro-optimisation?<p>The good thing about having 4 vertices is can just use a vertex position and set of texture coordinates (x,y) on each one and the texture can just be mapped exactly.
评论 #35082209 未加载
teucrisabout 2 years ago
&gt; In my microbenchmark1 the single triangle approach was 0.2% faster than two. We are definitely deep into micro-optimization territory here :)<p>In the 3D graphics space, this kind of knuckle-shaving is deeply revered!
ladon86about 2 years ago
Would this still be true on a tiled rendering GPU, i.e. mobile?<p>If not, is there any possibility that dividing a fullscreen quad into _more_ triangles would actually end up faster?
评论 #35075701 未加载
ww520about 2 years ago
That&#x27;s a pretty neat trick, letting the GPU to exclude the out of bound regions of the enlarged triangle and only render the visible rectangle.