Windows 9x can run 16-bit realmode (V86), 16-bit protected mode, and 32-bit protected mode code in the same process by using different segment descriptors. Too bad amd64 wasn't compatible with that model, nor the virtualisation features that came afterwards, or Intel could've made ARM32/64-mode segments a reality if they decided to add an ARM decoder to their microarchitecture.
> ... 16-bit realmode (V86), 16-bit protected mode, and 32-bit protected mode code in the same process by using different segment descriptors...
> ...Intel could've made ARM32/64-mode segments a reality...
While I myself admire this particular breed of masochism, the direction that Intel currently wants to take is apparently quite the opposite.
In May last year, they proposed X86S[1][2][3] which tosses out 16-bit support completely, along with 32 bit kernel mode (i.e. the CPU boots directly into 64 bit mode, 32 bit code is only supported in ring 3).
The proposal trims a lot of historical baggage, including fancy segmentation/TSS shenanigans, privilege rings 1 & 2, I/O port access from ring 3, non-flat memory models, etc... limiting the CPU to 64 bit kernel mode, and 64 or 32 bit x86 user mode. With the requirement for 64 bit kernel mode, it effectively also removes un-paged memory access.
The TSS was always one of the most obnoxious aspects of the 80286 that stuck around much longer than it should have. On 386 or anything newer, using it was _slower_ than implementing it in software, yet you still needed them to implement task gates necessary for things like exceptions and interrupts.
If anyone actually has a serious need to use ancient 16 bit software, emulators like 86Box work very well. Software that old doesn’t really need performance faster than, say, a Pentium 90, which 86Box has no trouble achieving on my M1 (ARM) MacBook.
You can also use winevdm[1] on modern 64 bit Windows operating systems. I have this in production use for a niche case where someone can’t give up a particular 16 bit app, and I didn’t want to tangle with a VM for them.
The technical details of making sure a modern CPU still functions exactly like an 80386, which in turn made sure it functioned like an 80286, when you fire up a 16 bit task on, say, 32-bit Windows 10 (or 64-bit with something like winevdm[1]) sound like a nightmare for a microcode engineer or QA tester.
Oh it doesn't, AMD and Intel gave up on that awhile back. v8086 mode might... but I'd guess it has quite a bit of errata. Everything else has most certainly changed. CPUs don't support the A20 gate for example. Nor do they truly support real mode (they boot in 'unreal mode' now). If you want a 386 compatible you're looking at ALi or DM&P CPUs that are basically Pentium/486/386 clones.
I'd argue the break started with the Pentium Pro, at that point things shifted architecturally.
The 80286 and 80386 never had special support for the "A20 gate". That was provided by (often slow) external circuitry.
Some CPUs (I cannot remember which) built in an A20 gate to their CPUs to improve performance.
The P6 was a complete implementation of the 80286 and 80386, Virtual 8086 mode, TSS, and all - you could boot DOS or an 80286 operating system on a P6 without any problems, although the design was not optimised to improve performance of 16-bit software. This was enough of a problem that they rolled back that design by the Celeron era because there were still a lot of people using 16-bit apps.
Actually it did get used. Linux and Windows used the x86 TSS for process context-switching for years.
During that time, Linux had a limit on the number of processes, which was due the maximum number of TSS entries that fit in the x86 GDT.
Eventually the Linux kernel was changed to the more versatile context-switch method it uses today. Among other things, this change was important for thread performance, as thread context switches can skip the TLB flush. Same for kernel mode tasks. Software task switching also greatly increased the number of processes and threads that can be launched, from about 8000 (across all CPU cores) to millions.
> the direction that Intel currently wants to take is apparently quite the opposite.
It's not just Intel. It's clear that ARM is also going in the same direction, by allowing newer cores to be 64-bit (AArch64) only, dropping compatibility with the older 32-bit ARM ISA (actually three ISAs: traditional 32-bit ARM, Thumb, and Thumb2), and IIRC some manufacturers of ARM-based chips are already doing that.
Allegedly there are already off list SKUs from both AMD and Intel that don't support 16/32bit code and boot up without the legacy bits. How far they went in that? I don't know. I'd hope they removed LDT etc. and reduced GDT to just ES and GS (or just used an esbase and gsbase MSRs).
A tiny amount of die area, a huge amount of engineering and validation effort. If segmentation issues can cause the register renamer to lose track of who owns a physical register that's the sort of issue that's terrible to find and debug but which also can't be allowed in a real device. Intel has traditionally been able to just throw more engineers at the problem than their competitors, but I"m not sure that'll be the case going forwards.
Mainline OS's have been 64bit for about 15-20 years by this point, the point is to trim parts of X86 that isn't used when running a 64bit OS.
Notice that only 32bit kernel/R-0 is removed, but not usermode/R-3 so even when reducing this your 64bit Windows will still run clean 32bit software built for Win95 from the 90s.
Even today you need to run a virtualized 32bit OS to run old 16bit software (the negative part is if you still run a virtualized 32bit OS then it'll need to be emulated instead of HW virtualized if the virtualization solutions allowed that).
> Intel apparently forgot what made them worth choosing over competitors like ARM
People (myself and others I know) choose ARM chips because they don't absolutely mandate the purchase of sanctioned chipsets/other supporting components you don't have access to, impossible-to-obtain specs, etc.