Well, sure; but the problem of font rendering *specifically* is an "embarrassing...

Jasper_ · on April 4, 2022

Having a queue of 1,000 independent work items to do doesn't mean something is "embarrassingly parallel". Operating systems are a classic example of something that's hard to parallelize, and they have 1,000 independent processes they need to schedule and manage. Heterogeneous tasks makes parallelism hard!

Cores in GPUs do not operate independently, they have hierarchies of memory and command structure. They are good at sharing some parts and terrible at sharing other parts.

Exploiting the parallelism of a GPU in the context of curve rasterization is still an active research problem (Raph Levien, who has posted elsewhere in this thread, is one of the people doing the research), and it's not easy.

I restrained from commenting on the specifics of how curves are rasterized, but if you want to imagine it, think about a letter, maybe a large "g", think about the points that make it up, and then come up with an algorithm to find out whether a specific point is inside or outside that outline. What you'll quickly realize is that there's no local solution, there's only global solutions. You have to test the intersection of all curves to know whether a given pixel is inside or outside the outline, and that sort of problem is serial.

The work division you want (do a bit of work for each curve), is exactly backwards from the work division a normal GPU might give you (do a bit of work for each pixel), pushing you towards things like compute shaders.

I could go on, but this comment thread is already too deep.

derefr · on April 5, 2022

That's super interesting, actually!

> The work division you want (do a bit of work for each curve), is exactly backwards from the work division a normal GPU might give you (do a bit of work for each pixel)

Doesn't this mean that you could:

1. entirely "offline", at typeface creation time:

1a. break glyphs into their component "convex curved region tiles" (where each region is either full, empty, or defined by a curve with zero inflection points)

1b. deduplicate those tiles (anneal glyph boundaries to minimize distint tiles; take advantage of symmetries), to form a minimal set of such curve-tiles, and assign those sequence numbers, forming a "distinct curves table" for the typeface;

1c. restate each glyph as a grid of paint-by-numbers references (a "name table", to borrow the term from tile-based consoles) where each grid position references its tile + any applied rotation+reflection+inversion

2. Then, at scene-load time,

2a. take each distinct curve from the typeface's distinct-curves table, at the chosen size;

2b. generate a (rather large, but helpfully at most 8bpp) texture as so: for all distinct-curve tiles (U pos), for all potential angled-vector-line intersections (V pos), copy the distinct-curve tile, and serialize the intersection data into pixels beside it

2c. run a compute shader to operate concurrently over the workload tiles in this texture to generate an output texture of the same dimensions, that encodes, for each workload, the alpha-mask for the painted curve for the specified angle, iff the intersection test was good (otherwise generating a blank alpha-mask output);

2d. (this is the part I don't know whether GPUs can do) parallel-reduce the UxV tilemap into a Ux1 tilemap, by taking each horizontal strip, and running a pixel-shader that ORs the tiles together (where, if step 2c is done correctly, at most one tile should be non-zero per strip!)

2e. treat this Ux1 output texture as a texture atlas, and each typeface nametable as a UV map for said texture atlas, and render the glyphs.

To be clear, I'm not expecting that I came up with an off-the-cuff solution to an active "independent research problem" here; I'm just curious why it doesn't work :)

Jasper_ · on April 5, 2022

If you allow yourself to do this work offline, that's one thing, but keep in mind that 2D realtime graphics are a requirement. People still need to render SVGs, HTML5 canvas, the CSS drawing model, etc. Grid fitting might eventually go out of favor for fonts, but that's something that means you need different outlines for different sizes of fonts. See Behdad's excellent document on the difficulties of text subpixel rendering and layout [0]. Also, there's things like variable fonts which we might want to support.

The work to break a number of region tiles such that each tile has at most one region might be too fine-grained (think about tiger.svg), and probably equivalent in work compared to rasterizing on the CPU, so not much of a gain there. That said, tiled options are very popular, so you're definitely on to something, though tiles often contain multiple elements.

Going down this way lies ideas like Pathfinder 3, Massively Parallel Vector Graphics (Gan et Al), and my personal favorite, the work of adamjsimmons. I have to read this comment [1] a bit between the lines, but I think it's basically that a quadtree or other form of BVH is computed on the CPU containing which curves are in which parts of the glyph, and then the pixel shader only evaluates the curves it knows are necessary for that pixel. Similar in a lot of ways to Behdad's GLyphy.

I have my own ideas I eventually want to try on top of this as well, but I think using a BVH is my preferred way to solve this problem.

[0] https://docs.google.com/document/d/1wpzgGMqXgit6FBVaO76epnnF... [1] https://news.ycombinator.com/item?id=18260138

EDIT: You changed this comment between when I was writing and when I posted it, so it's not a reply to the new scheme. The new scheme doesn't seem particularly helpful for me. If you want to talk about this further to learn why, contact information is in my HN profile.

kllrnohj · on April 4, 2022

> If you've got 1000 glyphs at a specific visual size to pre-cache into alpha-mask textures;

How often does that happen? There are definitely languages where that is a plausible scenario (eg, Chinese), but for the majority of written languages you have well under 100 glyphs of commonality for any given font style.

And then as you noted, you cache these to an alpha texture. So you need all of those 1000 glyphs to show up in the same frame even.

> Especially on a modern low-power system (e.g. a cheap phone), where you might only have 2-4 slow CPU cores, but still have a bounty of (equally slow) GPU cores sitting there doing mostly nothing?

But the GPU isn't doing nothing. It's already doing all the things it's actually good at like texturing from that alpha texture glyph cache to the hundreds of quads across the screen, filling solid colors, and blitting images.

Rather, typically it's the CPU that is consistently under-utilized. Low end phones still tend to have 6 cores (even up to 10 cores), and apps are still generally bad at utilizing them. You could throw an entire CPU core at doing nothing but font rendering and you probably wouldn't even miss it.

The places where GPU rendering of fonts becomes interesting is when glyphs get huge, or for things like smoothly animating across font sizes (especially with things like variable width fonts). High end hero features, basically. For the simple task of text as used on eg. this site? Simple CPU rendered glyphs to an alpha texture is easily implemented and plenty fast.