Question for those smarter than me: What is an application for an int128 type an...

sparkie · 2026-02-02T04:07:26 1770005246

Cryptography would be one application. Many crypto libraries use an arbitrary size `bigint` type, but the algorithms typically use modular arithmetic on some fixed width types (128-bit, 256-bit, 512-bit, or some in-between like 384-bits).

They're typically implemented with arrays of 64-bit or 32-bit unsigned integers, but if 128-bits were available in hardware, we could get a performance boost. Any arbitrary precision integer library would benefit from 128-bit hardware integers.

ThatGuyRaion · 2026-02-02T04:40:17 1770007217

I suppose that makes sense -- though SIMD seems more useful for accelerating a lot of crypto?

sparkie · 2026-02-02T05:10:00 1770009000

SIMD is for performing parallel operations on many smaller types. It can help with some cryptography, but It doesn't necessarily help when performing single arithmetic operations on larger types. Though it does help when performing logic and shift operations on larger types.

If we were performing 128-bit arithmetic in parallel over many values, then a SIMD implementation may help, but without a SIMD equivalent of `addcarry`, there's a limit to how much it can help.

Something like this could potentially be added to AVX-512 for example by utilizing the `k` mask registers for the carries.

The best we have currently is `adcx` and `adox` which let us use two interleaved addcarry chains, where one utilizes the carry flag and the other utilizes the overflow flag, which improves ILP. These instructions are quite niche but are used in bigint libraries to improve performance.

wahern · 2026-02-02T08:16:02 1770020162

> but It doesn't necessarily help when performing single arithmetic operations on larger types.

For the curious, AFAIU the problem is the dependency chains. For example, for simple bignum addition you can't just naively perform all the adds on each limb in parallel and then apply the carries in parallel; the addition of each limb depends on the carries from the previous limbs. Working around these issues with masking and other tricks typically ends up adding too many additional operations, resulting in lower throughput than non-SIMD approaches.

There's quite a few papers on using SIMD to accelerate bignum arithmetic for single operations, but they all seem quite complicated and heavily qualified. The threshold for eeking out any gain is quite high, e.g. minimum 512-bit numbers or much greater, depending. And they tend to target complex or specialized operations (not straight addition, multiplication, etc) where clever algebraic rearrangements can profitably reorder dependency chains for SIMD specifically.

eisenwave · 2026-02-02T08:07:29 1770019649

In 2024, I've published a C++ proposal for a 128-bit integer type: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p31...

You can find a lot of motivation for 128-bit integers in that paper, such as fixed-point operations, implementing 128-bit (decimal) floating-point, financial calculations, cryptography, etc. However, the proposal has been superseded by P3666, which aims to bring C23's _BitInt type to C++, which wouldn't just allow for 128-bit integers (as _BitInt(128)) but for any other width as well.

cornstalks · 2026-02-02T05:48:00 1770011280

I implemented a rational number library for media timestamps (think CMTime, AVRational, etc.) that uses 64-bit numerators and denominators. It uses 128-bit integers for intermediate operations when adding, subtracting, multiplying, etc. It even uses 128-bit floats (represented as 2 doubles and using double-double arithmetic[1]) for some approximation operations and even 192-bit integers in one spot (IIRC it's multiplying a 128-bit and 64-bit ints and I just want the high bits so it shifts back down to 128 bits immediately after the multiplication).

I keep meaning to see if work will let me open source it.

[1]: https://en.wikipedia.org/wiki/Quadruple-precision_floating-p...

PaulDavisThe1st · 2026-02-02T05:55:37 1770011737

  int64_t a, b, c, r;

  r = (a * b) / c; /* multiplication step could overflow so use 128bits */

cmovq · 2026-02-02T07:55:47 1770018947

Last time I checked LLVM had surprisingly bad codegen for this using int128. On x86 you only need two instructions:

    __asm (
        "mulq %[multiplier]\n"
        "divq %[divisor]\n"
        : "=a"(result)
        : "a"(num), [multiplier]"r"(multiplier), [divisor]"r"(divisor)
        : "rdx"
    );

The intermediate 128bit number is in rdx:rax.

bonzini · 2026-02-02T08:18:21 1770020301

That only works if you are sure to have a 64-bit result. If you can have divisor < multiplier and need to detect overflow, it's more complicated.

bsder · 2026-02-02T05:17:29 1770009449

Intersection calculations from computational geometry. Intersection calculations generally require about 2*n+log2(n) bits.

If you like your CAD accurate, you have to operate in integer space.

fluoridation · 2026-02-02T04:35:13 1770006913

The last time I used one I wanted UNIX timestamps + fractional seconds. Since there was no difference between adding 1 bit or 64, I just gave it 32 bits for the fraction and 32 more bits for the integral part.

rurban · 2026-02-02T07:11:59 1770016319

Any application which uses arithmetic on 64bit ints, because most operations can overflow. And most libs/compilers don't check for overflows.

bandrami · 2026-02-02T04:08:02 1770005282

It's an opaque way to hold a GUID or an IP6 address

marginalia_nu · 2026-02-02T13:20:25 1770038425

This is especially true when dealing with the UUID versions where sort order is meaningful.

adgjlsfhk1 · 2026-02-02T04:11:22 1770005482

It's used fairly frequently (e.g. in turning 64 bit division into multiplication and shifts).

green7ea · 2026-02-02T06:38:03 1770014283

I made a time sync library over local network that had to be more precise than NTP and used i128 to make sure the i64 math I was doing couldn't overflow.

I32 didn't cover enough time span and f64 has edge cases from the nature of floats. This was for Windows (MACC not GCC) so I had to roll out my own i128.

PhilipTrettner · 2026-02-06T13:36:50 1770385010

We use them for exact predicates in our mesh booleans library. To really handle every degenerate case we even have to go quite a bit higher than 128bit in 3D.