I'd love to see what is possible with a tiny runtime/OS in the kilobyte size and running a microservice written in a native language off of each core, everything out of the L3 cache.
I imagine the throughput would be amazing. Single thread per core, e.g. cooperative multitasking. Do this for stream orientated workflows, or even for processing data that is in reasonable sized chunks, it might be screaming fast!
When I was younger, I was very pleased when I stumbled across the trick of putting all of a DOS system on a RAM-disk, which made the system fly. Then I got to college and discovered in my CS classes that CPUs now had multi-megabyte on-CPU cache, which immediately led me to wonder how hard it would be to make a DOS system that ran completely out of cache. I'm not convinced that it would be practical for any number of reasons, but it still blows me away every time I look at what modern CPUs have in-package:)
Coreboot does something similar to this. Very early in the boot process the RAM has not yet been initialized and so cannot be used. You need to execute some code to set up the RAM before you can use it. One approach to handling this is to write that initialization code in assembly language and make sure it only uses registers and doesn't touch any memory.
But that's inconvenient. You'd much rather be able to write your code in something like C. Any C code compiled by an ordinary compiler is going to access memory (at least for the stack). Writing and maintaining a custom C compiler for this is a lot of work and still comes with a lot of limitations on the C code that it will accept.
So they set up the cache in a way that it will never try to write its contents out to RAM. Then you can use the output of an ordinary C compiler that uses memory accesses and those accesses will all be served by the cache and never touch RAM. They call this "Cache as RAM".
With some work you could probably boot into DOS like this.
Is there a way to mess with the cache or MMU and prevent it from _trying_ to write out to DRAM, and be able run a minimal system purely from cache with the DRAM slots completely empty?
Interesting that no one to my knowledge has built an architecture where sram is ‘memory’ such that it would be transparent to C code but no slower dram is used. 32MB is more than my 1998 computer had.
This is effectively what you do with some of the smaller MCUs -- you're running out of a megabyte or two of on-die static RAM. This is more to eliminate latency in RTOSes than it is to improve throughput but it's the same idea.
More than eliminating latency I guess is done to eliminate latency variance, which is inherent in the relatively unpredictable behaviour of caches, and is required to characterize worst case scheduling of realtime systems.
Yeah I've worked on projects with Toshiba MCUs with 10's of kB of RAM, and the last truly embedded project I was on used a safety-critical MCU from TI with about a megabyte of internal ECC RAM.
> I'm not convinced that it would be practical for any number of reasons, but it still blows me away every time I look at what modern CPUs have in-package:)
The optimism is premature. A CPU may flush cache even when it pages exactly the same page. Modern CPUs have too much of completely unfathomable prefetch, and cache coherence logic
Ha. I still use ram disk for local training of machine learning models. You can get a SOLID throughput boost by just loading up the dataset in RAM... My work desktop has something like 192Gb, so can handle a pretty sizeable dataset this way.
There is an interesting argument to be made for DSP like workloads for these chips. With x8 PCIe 4.0 lanes to carry inter-socket traffic.
If I had infinite money to spend I would build a hypercube out of these bad boys. each "atom" having 6 x8 PCIe connectors (up,down,left,right,front,back) with corner nodes dedicating a x16 PCIe port for I/O in and out of the cube.
Immerse that bad boy in flourinert and contemplates the deepest secrets of the universe :-)
Nominally hypercube architecture is 'n' dimensional[1], which relates to the number of channels at each vertex. Assuming you took the front and back plane of this thing and connected them together it would qualify as having 6 dimensions (6 channels per vertex). The first such system I had a chance to play with was the Intel one[2] which was order 6 using Intels iAPX432 processors.
Oh, I see. If your cube is exactly 4x4x4, and you loop it in each dimension, then it's topologically equivalent to an order-6 hypercube with a node on each vertex.
In the general case, you need log2(N) connections in each direction to turn an NxN or NxNxN topology into a hypercube, so it's not as simple as "left/right". (For example, for N=8, you might connect each position to position xor 1, position xor 2, and position xor 4.)
OP's referring to an old architecture which was called "the hypercube" which involved a couple (I can't remember if 16 32, 64 or what number) of microprocessors connected to one another in a certainly-not-2d topology.
Edit: oh a bit of searching and it was in the order of thousands of microprocessors. It was called "Connection Machine", you can read about it in Wikipedia[0].
I learned from an electrical engineer that works on systems for the power grid that running the whole OS in CPU cache is a requirement for some of those systems. I guess that makes sense when you are working with things that move at the speed of electricity.
No, the system is probably using tightly coupled memory; the CPU has single cycle access to all of system memory, effectively making a cache redundant.
And the reason is likely "we are building something realtime, we can't afford variable latency".
For the electricity grid in today's world, that's over-engineering things. The 60Hz electricity grid doesn't really need anything being changed at more than 600Hz, and considering your CPU core runs at 3,000,000,000Hz, you can totally waste a lot of cycles before you start missing deadlines.
"This meeting is to announce we've settled on an architecture for the next gen grid optimization program: Electron UI with a Ruby on Rails app underneath the hood that calls to AWS Lambda for anything that's cpu-heavy. State is persisted to Firebase as serialized YAML, which should scale really well. And to our normal crowd of performance zealots, don't even start with your normal "latency" and "outages" griping. We're comfortable with this stack and besides, we can waste a _lot_ of cycles before we start missing deadlines!"
Even if your CPU cycle budget is large, designing a realtime system requires proving thet it won't be exceeded, by adding up the worst-case running time of the worst-case sequence of events.
Avoiding high latency memory gives a few orders of magnitude of margin, which could be needed to afford good functionality on a reasonably cheap CPU; it's not a mere useless improvement from "fast enough" to "as fast as possible".
In hard realtime systems, the problem is not "we have enough time to do the computation" but "we can prove the computation will run in the available time".
The complexity in modern mulit user operating systems (many processes, a run queue, spin locks) along with modern processors (multi tier memory architecture, micro-code, branch prediction, long stalls for memory access) mean it's very hard to prove that anything will run in finite time.
IMO the terms "hard realtime" and "soft realtime" are unnecessarily confusing to beginners. I think that "deterministically bounded latency" and "probabilistically bounded latency" are more descriptive. But they're long and difficult to type, so we'd inevitably call them DBL and PBL, and be back at confusion.
And yet, so much software on modern CPUs takes seconds rather than milliseconds to perform and output simple calculations. Especially with the mantra "developer time is more expensive than computing power", few developers nowadays have recent experience of optimizing programs for sub-ms reaction times. You could probably make it work with standard hardware. But since developers have to be retrained for that field anyway, why not use hardware that's battle-tested and gives you more leeway?
In addition, the advantage of optimizing at every point is that you can use hardware not designed at the limit of what's possible, likely giving you lower error rates.
Would KolibriOS fit the constraints to do something like this? Per the description it "requires only a few megabyte disk space and 8MB of RAM to run." but has a recognizably usable GUI and office suite.
kdb+ [1] used to advertise already about a decade ago that their database code fits into L3 cache, making it extremely fast in execution. With 16MB, you can probably fit the core of a pretty much fully featured database in there, accelerating data driven jobs that GPUs are suited for.
I'm thinking of simplicity. No VMs, no containers, just a single binary.
The majority of microservices I've written would fit into a system like this just fine.
NodeJS is built around the idea that a single thread of execution is enough for a stupidly large % of backend code. On top of that is ton of infrastructure to deploy slews of 100 line JS microservices that each do one thing.
So, back to basics. No VM, no disk at runtime, a single statically linked binary running on top of nearly bare metal.
Available memory is whatever space the program code doesn't take up.
Ignoring l3 being unaddressable, I wonder if UEFI applications are powerful enough to pull something like image deployment off.
This is probably all less efficient than the current Russian doll system of VMs and containers in use right now, but it is fun to imagine!
I thought they all use custom hardware by now? But it's probably useful for applications that go beyond simple algorithms, work on time series data could be much faster.
It's have to be a special setup for sure, a board with no memory even hanging off of it.
In theory all you'd need is some tight integration with the NIC (or a few of em!) and tiny driver that knows how to DMA to the NIC.
Larger problem of course is that L3 isn't addressable. I wonder how much money a cloud provider would have to pony up to AMD to get an exception to that? :)
A huge # of workloads would fit in 16MB with no storage hanging off. Heck I think every microservice I've ever written could manage that.
There have been some efforts of late to make a cloud first OS, honestly once you remove everything except for "talk to network controller" and "parse JSON", well so long as you don't mind writing in a language that isn't JS, a few MB is plenty for lots of workloads.
Apparently there are ways to treat the cache as a static scratch pad. It is used by BIOSes during early startup, but I'm not sure whether the magic incantations are publicly known. It is second hand knowledge though, so it might just be folklore.
I'm not sure why you are getting downvoted, because kdb+/q would be a perfect candidate. Too bad kOS is no longer still a thing (or maybe it never was).
What’s interesting and surprising to me is that the new Epyc 2 chips by AMD have about the same cost per double precision teraflop as a GPU, even the ones with good floating point hardware support. I assume the cost for these accelerators is probably a little cheaper for high performance computing folk able to buy in bulk, but still I was very surprised. I expected there to be an order of magnitude difference is cost per flop, even with doubles. Once AMD introduces AVX512, the cost per flop should improve even more.
Also, in the same vein, I was surprised that double and single floating point cost per flop has stayed fairly stagnant the last couple years as NVIDIA seems focused on improving lower precision performance (ie for machine learning inference).
This is probably because you're comparing them to consumer GPUs, which are designed this way - favoring integer and single-precision functional units on the SM cores. That's a sort of a marketing/tiering strategy by nVIDIA. The Teslas have good double-precision performance - but they are priced waaaay higher than the consumer cards.
> The Teslas have good double-precision performance - but they are priced waaaay higher than the consumer cards.
Yup - the Tesla V100 is 7 TFLOPs of double-precision at around ~$9000
A huge split between consumer & HPC happened in the aftermath of the Fermi (2010) architecture. Fermi was really bad in the consumer space from all the wasted die spent on unused double-precision. It was late, hot, and loud. And barely even faster than the competition.
With Maxwell Nvidia basically removed all the FP64 support from the architecture itself ( https://www.anandtech.com/show/9059/the-nvidia-geforce-gtx-t... - native FP64 rate is 1/32'd the FP32 rate) - the result was a huge boost to gaming performance. But it also meant that HPC users who want double-precision had to use Tesla cards. The actual architectures between GeForce & Tesla are different now, it's not "just" a lockout anymore.
Exactly. About 780Gflops/$ for Tesla V100 and 460Gflops/$
(peak) for the Epyc 7742.
I was very surprised it was this close. I thought the accelerator would be an order of magnitude cheaper per double Gflop. And AMD isn't using AVX512, yet.
A different question is how easy it is to use those FLOPS. The memory bandwidth on a GPU is much higher than on a CPU. Higher FLOP count is useless if the CPU is stalled on memory.
This is a good point. One place I worked tried to move thier numerical simulations to GGPU, and found they had to move from a recursive algorithm to one more suited to GPU architecture.
They did the rewrite, but found the the results were 10% less precise in the convergence on a solution.
So the imense parallel nature of a GPU is only useful if your algorithms are the right shape, such as a fixed number of matrix multiplies.
Imagine the very near future when AMD starts using TSCM's 5nm process, which has approximately double the transistor density of the current 7nm process used for the EPYC 7002 series.
They could go to DDR5, PCIe 5, AVX 512, and still have a transistor budget left over for whatever they like.
The 'whatever' is the interesting part. What exactly does a GPU do that a CPU doesn't?
Typical GPUs have crazy high memory bandwidths and good latency hiding by using many (thousands) of threads.
So if AMD does something like increase the number of memory channels and implement 4-way SMT, they're poised to upset NVIDIA in the HPC space in a big way.
Many people would much rather program for a general-purpose processor than the CUDA platform with all of its quirks and limitations...
The memory bandwidth gap between CPUs and GPUs is absolutely ludicrously massive. GPUs already crossed the 1TB/s mark. Epyc Rome is only 204GB/s with DDR4-3200.
It's been like this for a decade at least, I don't expect that gap to shrink anytime soon.
But realize also the Tesla V100 is still on TSMC 12nm. If Nvidia is moving these they are obviously also going to make 7nm and eventually 5nm variants. Which will also benefit from 2x+ density.
300GB/s x 1.5 for more channels x 2.5 for DDR5 = 1.1 TB/s.
Not too shabby! As you said, it would likely be eclipsed by the next-gen NVIDIA accelerator, but... damn, over a terabyte per second for general-purpose compute is just nuts.
In principle, AMD could go even higher if they really tried to optimise the platform for this one metric, but server CPUs tend to be "balanced", so I doubt this will happen. One can dream...
No it isn't, it's ~1.5x the speed of DDR4: 4800 vs. 3200.
The 8400 is a hypothetical module that they _plan_ to make not that they've actually managed to make. And the first generation of CPUs with DDR5 support are unlikely to immediately support the maximum DDR5's spec plans to achieve. Just like CPUs only very recently officially supported DDR4-3200, despite that being on the market for years and years (the 9900K only officially supports up to DDR4-2666 even).
> AMD could add extra memory channels, a 50% increase is reasonable.
Say what now? A 50% increase is reasonable? You're expecting 12-channel memory? The 8-channels in Epyc Rome is already the most of any CPU on the market. I don't see any chance at all that this jumps to 12 in a single generation?
That's not going to happen. Extra memory channels are very expensive die-wise. Nvidia and AMD achieve these rates with hbm, which has very wide buses (4096 bits) and short traces from stacking. I can't see any way CPU memory will compete until they move to hbm. Keep in mind gddr6 is available in GPUs now, and is faster than ddr5, but much slower than hbm.
So can Intel, but they don't. Hbm would likely require them to sell a fixed-size memory amount, which can be severely limiting for server applications. Not to mention it's extremely power hungry compared to ddr, so you won't get anywhere near the amounts ddr gives without making power consumption go way up.
EPYC is already a modular architecture, literally nothing stops AMD replacing a couple of "compute" dies with HBM2 stacks. They could release CPUs that don't require DIMM sockets at all. E.g.: instead of 2 sockets + a bunch of DIMM sockets, the same motherboard space could be used for 4 sockets with embedded memory.
They could but then you're cutting your FLOPS down to get your memory bandwidth up. And HBM2 doesn't get you much capacity. The 7nm Instinct MI50 has 4 stacks of HBM2 to achieve 32GB in capacity. So other than as a joke toy, what would you do with a 32-core / 64-thread CPU with 32GB of RAM? That's what you'd end up with if you swapped out 4 compute dies for 4 HBM2 stacks.
Assume that in 1-2 years HBM capacity doubles, and it's a quad-socket motherboard. You'd have 64GB per socket, or 256GB total.
Remind me how much memory an NVIDIA accelerator has?
To play Devil's advocate, putting HBM2 in the package doesn't magically solve everything. The intra-socket bandwidth could be enormous, but the inter-socket bandwidth would still be whatever it is now, and would be difficult to increase.
Epyc doesn't do quad sockets. Is this just another hypothetical "what if" at this point with no basis in reality?
Because sure, a hypothetical non-existent Epyc re-designed to compete in the double precision floating point space favoring memory bandwidth above all else could be really cool. Then again, so could anything else custom designed exclusively for that use case.
> but the inter-socket bandwidth would still be whatever it is now, and would be difficult to increase.
64 PCI-E 4.0 lanes form the CPU-CPU interconnect currently.
Since we're making up stuff why not assume that's doubled next generation along with being PCI-E 5.0? So that'd be 500GB/s give or take.
That would be great, but to date hbm is always part of the board. I'm all for selling motherboards with the ram already on it if it means higher bandwidth, but it's just never happened before.
That's not exactly true once you have to deal with vector extensions like AVX-512. It's quite a pain to write by hand (C intrinsics) and many of the ways to abstract it away end up giving you a GPU-like programming model (eg. Intel ISPC).
Plus, this has largely been tried before with Xeon Phi and it didn't end so well.
Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code.
>Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code.
There are a lot of tasks that a GPU can do faster than a CPU but it would require batching a large amount of work before you can gain a speedup. EYPC CPUs do not suffer that limitation. If all you have is an array with 4 elements you can straight up run the vector instructions and then immediately switch back to scalar code. Meanwhile with a GPU you probably need at least an array with 10000 elements or more.
>The first observation is that in modern compilers, the resulting performance from auto-vectorization optimization is still far from the architectural peak performance.
https://dl.acm.org/doi/fullHtml/10.1145/3356842
Maybe we can get get more benefits if we invest more resources in optimizing compilers than in inventing yet another Javascript framework?
That is a problematic metric. You see, you don't buy individual GPUs or CPUs, you buy systems. And typically, you can stick multiple GPUs on the same system (more than CPU)s. There's also the question of what those servers cost, and how many rack units each one takes up etc.
And then - it might make more sense to measure power consumption and maintenance costs than up-front price.
I looked at both. Recent consumer cards have terrible double performance (with exception of Titan series) of course, but the extreme cost of the non-consumer cards offsets their double flop performance to be approximately no better[1] than Epyc 2 per dollar.
[1]EDIT: Tesla v100s are still about 40% lower cost per Gflop than Epyc 7742, but still definitely the same order of magnitude, unlike my expectation.
The rise of AMD has been like a dream come true. I only wish Intel could get their cards straight and compete in the prosumer space.
It used to be that Intel was a good choice of money wasn’t really an object, but as time has gone on it has become harder to justify Intel chips regardless of the pricing.
VIA is out of the western market for the foreseeable future. They are currently manufacturing x86 CPUs for the Chinese government to break the government's reliance on western technology.
They do still make CPUs though. Just incredibly slow ones currently: https://www.youtube.com/watch?v=-DanhnASClQ (although with guaranteed government funding TBD how much that changes)
I thought I read a few months ago that Intel still does well in the video editing space (especially with 4k), because their chips have onboard decoding hardware. Is this still true?
Intel's mainstream consumer CPUs have integrated graphics that includes video transcoding. Their server platforms don't have that, and neither do the workstation and high-end desktop processors based on the same silicon. Or to put it another way: if the CPU offers more than one 16-lane PCIe link, it doesn't have integrated graphics.
AMD's platforms are similarly split, except that their mainstream desktop processors with integrated graphics are lagging behind the CPU-only processors by a generation.
My understanding is that GPU rendering for h26{4,5} is a lot faster, but the image quality is nowhere close to CPU based rendering. Given that Premiere Pro isn't in the business of real time rendering or transcoding, the business case isn't there for full rendering support. If you really need the speed, maybe render into a fast but large lossless format, then use ffmpeg gpu transcoding?
It's not that Premiere Pro cares about image quality above all else, because it does use GPU-accelerated decoding. It's that Premiere only uses Intel's integrated GPU for decoding and not with Nvidia's or AMD's GPU decoder.
Quick Sync's quality is worse than NVENC's these days so it's likely more along the lines of Intel contributed the patches than Adobe not wanting it out of purity reasons.
That's true, but turning on hardware encoding didn't make a big difference in encoding times for me. I'm planning on upgrading to a Ryzen 9 3900X to make everything else smoother
But that's just the thing - for a while, you could spend extra in Intel for a premium system. It wasn't a good $/performance tradeoff, but at the upper end it was there. What's wild about this moment in time is that Intel has lost even the high end; there isn't an Intel premium, because they are not the best product at all.
For how long -- Apple switched from IBM to Intel due to performance reasons. Their mobile chips use ARM. Perhaps only time before Apple offer MacBooks with AMD chips.
If Intel sells $5 CPUs, will you change your mind? I don't understand the obsessive praise of AMD. Keep in mind that Jim Keller, an Intel ex-chip architect lead the AMD Zen platform when he was hired in 2013 (after working at Apple on the A4/A5 SOCs). It is the same dude competing against himself. Btw, he is back at Intel since 2018 to help Intel out.
I'm not sure how this relates to the parent comment. They stated that no matter what your budget is, Intel is always the worse option currently. Their chips cost more and do less when they used to cost more and do more.
I got the fastest CPU my motherboard could handle. But looking at the performance charts, there were AMD CPUs with 25% and 50% more performance costing hundreds of dollars less than what I spent.
I could have spent that money on a motherboard and some extra RAM, but didn't feel like doing the work. Meaning this was an upgrade from an under powered CPU to a newer chip that was not available when I did the build originally.
The vast majority of users would find a 50% performance improvement and a huge price reduction to be worth the extra effort of replacing the mobo, especially since they already have to replace the cpu and likely put the mobo in to start with.
>> If Intel sells $5 CPUs, will you change your mind?
Sure, if they were selling i9s for $5.
But the OP's point is the opposite. "Regardless of the pricing" means even that if you ignored the huge costs of Intel's best CPUs they were generally the highest performance option given your unlimited budget.
That's not really the case anymore, except for very specific workloads.
> I don't understand the obsessive praise of AMD.
"obsessive" seems unnecessary. People are praising AMD because they are finally bringing performance competition to the CPU market, which is great for consumers.
I mean, it's nice to see that after years of stagnation, AMD is now able to hit best IPC, more cores/chip, best perf/watt, and close enough singlethreaded perf. And AMD $/perf is better too. An Intel chip at the right price is probably fine -- although, they've lost a lot of perf to mitigate security issues, so I dunno. Intel has also been pretty stagnated the last several years (what are we on, the 5th respin of Skylake?)
Jim Keller's work at Intel should probably start showing up in 2021 or 2022... Could be pretty exciting, or Intel corporate politics could bury it all.
I don't know how Jim Keller is going to help Intel since Intel has trouble with its fabs and process nodes, not with the architecture. If Intel would be using TSMC or Samsung, it would be at least on par with AMD with IPC and core count.
Don't forget Keller's earlier work on the DEC Alpha, which was a real engineering marvel for its time, finding its way into the Cray T3D and T3E supercomputers.
Based on Keller's work with the DEC Alpha, I was looking forward to investing in PASemi, but Apple acquired PASemi to design mobile chips before PASemi IPO'd.
So after years of an effective Intel monopoly I'm really glad to see AMD is back in a way that I don't think I've seen since the Athlon64/Opteron days. Back then it was AMD who pushed the x86-64 instruction set when Intel was claiming EPIC was the future (ha).
At this point, Intel's move to 10nm processes is an embarrassment. I'm sure it's a difficult problem but historically Intel has been reasonably good at planning process advancements but in the case of 10nm they've been off by years. I believe the original goal was 2017? And we're still not there yet.
I would dearly love to see an honest postmortem of this and see what went wrong. Who made promises they'd miss by so much, why, what the issues were and so on.
The last PC I built (because apparently I still do that, even though it annoys me no end) has an Intel 9700 in it. At the time that was probably the best choice. 6 months later and it would no doubt have been a Ryzen.
I hope AMD keeps this up as we need the competition.
They got comfortable with Moore's Law, which in fairness, had held for a long time, and continued to use the model that increasing feature density quadratically was a linear problem. Now, it turns out that once you get near the 10nm gate size range, the difficulty diverges from linear to exponential (and perhaps even higher eventually as some hard limit is approached).
That, and Intel isn't an innovative company anymore. Now they are a process company riding on their manufacturing dominance and x86 market share. It looks a lot like Apple under Tim Cook, except add another decade since there was innovative leadership (Andy Grove). They are a few consultants removed from IBM at this point.
> It looks a lot like Apple under Tim Cook, except add another decade since there was innovative leadership (Andy Grove)
Ehh this is debatable. Apple has put out a few products that have completely changed the market under Cook's tenure. AirPods have introduced a new head phone paradigm. Apple Watch is waaaay ahead of the competition. The iPhone X made full screen phones mainstream and introduced UI gestures that were copied by Android.
Sure there have been some missteps cough butterfly keyboard cough but I’d say overall, they are still producing interesting products that define a large part of the consumer tech market
What I think Cook misses that Jobs got, and made for more exciting releases, is the idea of a totally integrated service. The iPod's victory was also a victory for iTunes. The iPhone was also the App Store. And when Jobs left, those kinds of distinct pairings did too. They are hard to concieve of, and to execute on.
In contrast, the AirPods and Apple Watch are more straightforward "make 'em smaller" incremental moves. The engineering work is leading in many respects, but it doesn't upend a market.
And Intel does have a history that was like Apple's in parts. A big part of their advantage as the PC market heated up was in marketing an entire nomenclature of what the platform could be and to provide comprehensive path-of-least-resistance solutions around that, ensuring that the industry fell in line around their technical lead rather than IBM or some competitor.
Those bones are still there in parts of the company - Intel chipsets are pretty well regarded for dependability(seeing Windows crash because of Intel drivers is a very rare event) and they've been good at getting the corporate office to standardize on them - but increasingly the platform is getting defined around mobile and server needs, which are a more competitive space generally. Intel doesn't get to call the shots on 5G, for example - and huge data center customers are in the business of optimizing the system end-to-end to provide the most efficient general computing resource possible; everything they touch commoditizes, and they will put their foot down if they smell enterprise contract crap.
I think you're simplifying the AirPods and Apple Watch; while glamorising other Apple products. The iPod wasn't the first MP3 player. It was a MP3 player that worked well. The iPhone was actually not the first smartphone; it was the first smartphone that worked well thanks to its multitouch screen.
Do you remember the first generation iPad? I owned it, and let me tell you. It was, almost literally, 9 iPhones stuck together.
The AirPods is more than just make it smaller. People praise its convenience, and its innovation is in skipping the cumbersome bluetooth pairing process.
I'm not really sure what AirPods introduced. We had wireless headphones before. They might be the nicest ones and the most popular but I'm not sure how they created a new headphone paradime
To be fair I haven't tried too many high end wireless earbuds, but switching between the noise cancellation and transparent mode on the AirPods pro was my first "wow" moment using a piece of technology in a long time.
Latency! Lack thereof, actually. I bought an Ipad just to be able to use them. My second Apple product. The first was a Macintosh SE. That says something.
So much that got copied from Android in the meantime, the innovation of the iPhone is a long time ago :)
I have never seen someone use UI gestures.
Not sure what he is doing, but the only thing I see is updating iPhone and raising the price. In the meantime, losing market share in their most important market.
I'm pretty sure iPhones market share will sharply drop way with the current Covid situation worldwide. Not a good position while trying to get people onboard the digital services.
It depends how you define "Original". If it was the initial Tick Tock roadmap, 22 and 14 were Late, otherwise it was 2015. But then since both were 6 months late it was expected to be 2016, counting from initial 14nm launch.
We are now 4 years later and 10nm is barely working and yield. Although there were lots of promise during investor meetings of more 10nm products this year, it seems Intel wants to move pass 10nm as early as possible to 7nm and regain their lead by 2023 with 5nm. But judging Intel's recent record I am a little skeptical of their claims.
For those who missed the joke, EPIC was Intel/HP's name for the Itanium ("Itanic") style of instruction set that attempts to moves most of the work for parallel instruction scheduling out of silicon and into the compiler. (And, HP later asserted that EPIC refers specifically to the Itanium instruction set, not just Itanium-like instruction sets.)
It wasn't a terrible idea at a high level, but the implementation was terribly complex and power-hungry, needing huge caches to compensate for its low instruction density. They also bet heavily on compiler advances that never materialized or materialized later than expected.
I can imagine an alternative history where Intel took the EPIC idea, but went more conservatively and focused on minimizing complexity and the total number of transistors, designing in such a way that allowed for complex power-hungry HPC optimizations later on, but didn't depend on them for the initial roll-out. This would have resulted in a lot of the potential of the idea for HPC being left on the table, but may have allowed them to have more initial success in the server market and would potentially allowed them to scale down to cell phones more easily than trying to scale Atom down to cell phones.
Intel has been boring for the last 15 years. They incrementally boost things but rarely do any innovation. Amd lb for lb has been whooping intel in most aspects since the 90's. Sure you can buy intel at 500dollars a chip or buy and at 200 a chip that is slightly less powerful.
Hell Intel's i7 has finally caught up to the and fx octo 12 years later. Meanwhile the ryzen 8-16 core absolutely crushes anything intel has to offer regardless of price.
Intel may have been boring for the past 5 years, but suggesting they have not innovate since 2005 completely ignore the majority of tick tock execution they achieved.
Oh in 2005 or so they made rdram a verryyyyyyy slow thing
Bang for buck you can go back to 386 dx,
And has always been killing intel.
Let's see I could buy an intel chip for a couple Grand and it cracks 10k on CPU cpubenchmark
Or buy a thousand dollar amd chip and get 40 cpu bench mark
I can't understand why anyone would ever buy intel
And as for stock prices intel has definitely been boring. Amd is kicking the shit out of intel
It's extremely predictable. Amd loses because of stupid shorttraders
And they get burned. Meanwhile amd had risen 2000+ percent in five years. Intel not so much
Edit for the downvoters please prove me wrong on anything. Amd is the winner Intel has been unimpressive for decades. Pick your horse because and is three Laos ahead
One of the most impressive things about these Epyc 2 chips is the very high PCIe bandwidth. A LOT of lanes, plus support for PCIe 4 (doubles the per-lane bandwidth). Interesting options for extreme SSD storage speed &capacity (in a single node) if you combine it with a PCIe expansion box. (And Potentially other single-node performance metrics for accelerator cards supporting PCIe 4, which NVIDIA doesn’t yet.)
The lanes come from the CPU you put into the board, not the board itself. Although yes they are still cheap, at least if you go with something like the EPYC 7252 at ~$500 (which still has the full 128 PCI-E 4.0 lanes)
That said I have no idea how you would actually feed that many PCI-E lanes with an EPYC 7252, but if you can pull it off it's an insane $/lane value.
That's right. You can also use a PCIe expansion chassis (there are already ones supporting PCIe 4.0), giving you plenty of space for dual-slot-width cards.
Sure but I don't think the 8 core epyc would actually keep up with that many NVME drives. At least not if you tried to actually hit 24+ of them at once.
Linus tech tips tried this and had to upgrade the CPU from the 24 core epyc to the 32 core to get performance up to what they wanted. https://youtu.be/xWjOh0Ph8uM
Maybe just a bad deployment but there is overhead in filesystems. Especially with checksums and compression and redundancy and etc...
It's possible to bypass the CPU in some cases using NVMe over an RDMA layer with Infiniband. PCIe 4.0 dual-port 200Gbps Infiniband/Ethernet adapters exist[1] which are compatible with this approach: https://store.mellanox.com/products/mellanox-mcx653106a-hdat...
[1]Although you can't saturate both of them through even a 16 lane PCIe 4.0 port which has ~250Gbps of throughput each way.... Which to me means that PCIe 4.0 is not at all too soon.
Also if you calculate the USD per 3.0 lane value you will find you can go much,much higher in CPU prices. If you look at various combos you will find it very rare for the server CPU+board price divided by the number of 3.0 lanes or equivalent to be below 10USD.
PCIe lanes don't work like that, lanes are the unit of allocation, a lane is a lane regardless of the speed it runs at.
But yes, you can put more bandwidth down a 4.0 lane... if your device supports it. Most of the devices you will be putting on a budget home system don't support it.
It would, hypothetically, be more desirable to have 160 PCIe 3.0 lanes than 80 4.0 lanes. Of course there is no system with that many, but I'd take 128 3.0 lanes over 80 4.0 lanes for sure.
> PCIe lanes don't work like that, lanes are the unit of allocation, a lane is a lane regardless of the speed it runs at.
There's no need to be pedantic here. Just about nothing uses a single 3.0 lane, especially not in a system where you care about having a big count. For anything that was using 2-16 lanes, doubling the speed is basically the same as doubling the number of lanes. Except for the extra benefit that the max allocation goes up.
> I'd take 128 3.0 lanes over 80 4.0 lanes for sure
Maybe you'd take that today. In a few years when more devices support 4.0 that's not a great tradeoff. Especially when you can put switch chips in front of your 3.0 devices to keep all your lanes saturated.
It's just bizarre to watch how once unassailable Intel is totally floundering in multiple aspects of their main business. I wanted to upgrade my aging Core i7 workstation and looked into the current Intel HEDT lineup. Only 14nm, and even without Spectre/Meltdown mitigations the chips are way slower unless you can use AVX512. Ended up buying Threadripper 3970X with a quad-GPU capable board, even though the CPU is _more_ expensive than anything HEDT that Intel currently sells.
>It's just bizarre to watch how once unassailable Intel is totally floundering in multiple aspects of their main business.
Isn't this just history repeating itself though? We could easily replace "Intel" with any number of previous market leaders that have fallen by the wayside.
I don't remember any company flubbing their unassailable lead quite this badly. I sense there might still be some complacency behind it. Sales are probably doing well enough to not worry about it quite yet. But it's much like C19: if AMD gets the mindshare (which it is in the process of acquiring), with some lag those sales will start to die, and it'll be too late to do much about them then. Any countermeasures have to be preemptive, and I just don't see anything exciting being announced by Intel until at least 2021, whereas AMD keeps releasing bombshell products every quarter like clockwork.
> I don't remember any company flubbing their unassailable lead quite this badly.
I do.
Heck, among other examples, I remember the company being Intel, the market being x86 general purpose desktop/laptop processors, and the firm they blew their long-established unassailable lead to being AMD. I also remember AMD turning around much quicker and flubbing it back...
Actually, unless I'm mistaken, that happened twice before, the first time being the reason the now-universal standard for 64-bit x86 is what used to be “AMD64”.
> AMD gets the mindshare (which it is in the process of acquiring), with some lag those sales will start to die, and it'll be too late to do much about them then
AMD had the mindshare for quite a while before, but Intel was able to do enough about it that people apparently forget that it even happened. The market is fickle, and AMD is at least at good as flubbing advantaged positions as Intel, judging from history.
> "I don't remember any company flubbing their unassailable lead quite this badly."
Sun Microsystems comes to mind. From revenue just behind Microsoft during the peak of the dot com era to a footnote in history in just two decades. Even IBM didn't drop the ball that badly.
People are already forgetting about the embarrassing P4/Netburst era. AMD was just getting some market share when Intel released Conroe and crushed the competition.
I don't know. Many said the same about Apple & Microsoft at various times. Never discount what deep pocketbooks and competent leadership can do to right a flailing ship.
They did. AMD had a very strong position against the Pentium IV era offerings from Intel. They took it on the chin for a couple years and then the came out with Core2 and pretty solidly handled AMD for a while.
AMD doesn't look nearly as one dimensional as they did then, power consumption wasn't as interesting and Intel came out with a low power play; this time AMD seems to have offerings in every category that are compelling. It's really hard to bet against Intel with their long history though. I wouldn't be surprised if they come out strong when they get their process stuff sorted.
> AMD doesn't look nearly as one dimensional as they did then, power consumption wasn't as interesting and Intel came out with a low power play; this time AMD seems to have offerings in every category that are compelling. It's really hard to bet against Intel with their long history though. I wouldn't be surprised if they come out strong when they get their process stuff sorted.
Yeah, it's primarily a problem of node here. AMD shrunk and Intel has been struggling to get their node up and going. If Intel had a working 10nm-class node the picture would be very different. They have a whole bunch of new architectures in the pipe that get back to making substantial IPC improvements, they simply can't manufacture them yet. Even if they could simply port Skylake to 10nm it would do OK.
TSMC are kind of the real star behind AMD's success. AMD is benefitting hugely from Apple and Qualcomm and others who sink a lot of money into TSMC, while Intel has to get it running all by themselves. TSMC has substantially outrun every other foundry on the planet, the situation would be equally bad if Intel were stuck with GloFo or Samsung or IBM, right now you're either on TSMC or you're not competitive.
The one part that AMD got right is the chiplet design. Being able to manufacture server processors out of chiplets that are a fraction the size of a monolithic laptop processor and have them lose effectively no performance from scaling like this lets them use TSMC even if yields might not be fantastic on an equivalent monolithic chip.
Part of the reason they have laptop processors running a year behind the desktop/server chips is, those are monolithic processors, not chiplet, and they'e bigger and yield worse than chiplets. In this segment, Intel beat AMD to market substantially - Ice Lake has been in the market since like September, the first Renoir laptops are just shipping like sometime this month. I was looking at laptops at Costco before Thanksgiving and just under half the laptops there had Ice Lake, so it's been available in substantial numbers for a while. Renoir is still better, but it is a leapfrogging dynamic unlike, say, the server market where AMD is just better. Ice Lake actually still outperforms Renoir in per-thread performance, just not iGPU performance and has fewer cores overall, so Intel's uarch isn't terribly uncompetitive when they can actually manufacture it. Zen3 will probably match Intel and then Tiger Lake will leapfrog AMD again a bit.
I have my doubts that giant monolithic Ice Lake-SP will ever be manufacturable at any competitive cost. The lack of consumer laptop/desktop 8C Ice Lake speaks against this as well, if you can't yield an 8C at competitive prices how are you supposed to yield a 38 core processor? But, Intel seems to be plowing forward with the launch anyway this year, so maybe it is, who knows.
To make a short story long, Intel really needs to get its node situation straightened out, and probably needs to transition to a chiplet style layout to make that happen, especially for the server stuff. Obviously it is not trivial to get chiplets to scale well in terms of performance. But Intel is not behind AMD so much as they're behind TSMC, and once they can actually manufacture their products on a competitive node then they'll be back in the game.
>TSMC are kind of the real star behind AMD's success. AMD is benefitting hugely from Apple and Qualcomm and others who sink a lot of money into TSMC, while Intel has to get it running all by themselves.
The best bet for Intel would be to sell the fabs like AMD did, or keep them but start attracting other customers to spread costs when optimizing for a particular node.
Considering that all of the reports for the past year have been about how Intel hasn't been able to improve their process, how would they be able to sell or spin off their fabs? Who would want it?
For someone considering their next build with a usecase of:
- programming, docker, golang
- gaming
can anyone recommend a resource for determining the relative performance of processors? With all the new of how well AMD is doing, I’m still not sure how to look at a given task, and determine which processor would perform better.
Unless you are compiling massive projects then your best bet will likely be a 3900x. You get more cores then you can likely use to handle all the programming multitasking while also having a cpu that’s 5-10% off of the best gaming cpu available. All while keeping within a reasonable budget.
Without question, get the 3900x. It's a bit behind Intel in single-threaded performance, but only barely, and the embarrassment of cores you get compared to the i9-9900K more than makes up for it. Microcenter currently has it on sale for $379 if you buy it together with a compatible mobo, which is the deal of the century.
There is some confusion in their naming. Zen is the architecture name, and so far we've had Zen, Zen+, and Zen 2. The consumer processor line is branded Ryzen (and Ryzen Mobile for laptop parts). The HEDT processor line is branded Threadripper, and the server line is branded Epyc.
Ryzen and Threadripper 1000-series are Zen.
Ryzen Mobile 2000-series is Zen.
Epyc 7001-series is Zen.
Ryzen and Threadripper 2000-series are Zen+
Ryzen Mobile 3000-series is Zen+.
Ryzen and Threadripper 3000-series are Zen 2.
Ryzen Mobile 4000-series is Zen 2.
Epyc 7002-series is Zen 2 (Epyc skipped Zen+).
Zen 3 is expected in 2020, based on AMD guidance. Asssuming they follow their part numbering scheme, we should expect this to appear in Ryzen and Threadripper 4000-series, Ryzen Mobile 5000-series, and Epyc 7003-series.
For most programming tasks I would focus on the base of the pyramid: Make the storage low-latency for small files, then up the memory size and bandwidth, and then use a CPU appropriate to the workload(ideally, it can go wide and parallelize - otherwise you're back to single-threaded perf). This mitigates the worst case of poorly optimized builds that need to frequently return to storage, and it improves all factors of the operating system when it chooses to swap to virtual memory(Windows has become a very aggressive swapper since Win10 launched and they added a new page file system on top of the old virtual memory).
You can spend time gazing at the benchmarks for each CPU, but I would not pinpoint it as the bottleneck for a responsive and pleasant programming environment. The new AMD chips are good all around. The new Intels are still OK, but the top end probably too hot and loud to reccommend.
For gaming, high clocks/high single-threaded IPC remain the primary factors. Games are mostly designed towards a certain number of cores and speed of I/O. Fast disk and memory will reduce sources of stutter but this is dependent on how often the game tries to load something.
Lots of gaming benchmarks out there, but sadly as of yet there is really no dedicated hardware reviews for the software engineering space.
Others have mentioned Phoronix (linux benchmarking) but I would also recommend Level1Techs. They are the only reviewers I've found that actually (occasionally) dig into what kind of performance actually matters to developers.
Recently, they did a video on using the Threadripper 3990X for Unreal Engine game development[0]. Some useful insights came out of that, such as:
- Compilers favor large cache sizes (which favors AMD)
- Usually the biggest performance bottleneck isn't compiling the code, but running automated tests (especially when it involves running a bunch of VMs).
Compilation benchmarks are fine, and many review sources are adding them, but there's still a pretty big gulf between "compiling the Linux kernel/chromium" and a typical developer workload.
> I guess there are more gaming benchmarks because there are much more gamers than software engineers.
That may be true, but I don't think that's a sufficient explanation. There are plenty of benchmarks for other professional workloads, such as CAD, 3D rendering, video editing, and mathematical modeling for financial or scientific applications.
It might not help judge performance for individual tasks, but PassMark's CPU listings (https://www.cpubenchmark.net/cpu_list.php) should help with relative performance. The results have a multi- and single- core score which should help comparing different workloads.
In general the 3900X is really nice, and if I did it over again, would probably do that over the 3950X.. About my only complaint is 32gb ram modules aren't available in 3000+ frequencies, so 64gb is about the max prudent amount you'll get in AM4.
It depends on what you're shoving into docker, or how much you're building in golang...
As far as gaming, all the Ryzen 3000 series perform within a couple percent of any Intel CPU close to the price point... The 10900X might do another 3-5% better for gaming, but your electricty use will nearly double for that very minor increase, not worth it imo.
You can get fast and good latency 32 GB DIMMs now. G.Skill has 4x32GB 3600 18-22-22-42, timings not the best, but tighter timings always get harder at higher speed and chips. The 3200 kits are 16-18-18-38.
Edit: Corsair also offers 3200 kits I'm pretty sure, too lazy to look up their exact specs
I don't own either, but in researching these processors, my guess is that price, or rather value would be the biggest factor.
One of the main value propositions (for a programmer) of going from a Ryzen 5 or Ryzen 7 to a Ryzen 9 is that you get double the amount of cache (and four times the amount of an Intel Core i9-9900K).
Both the 3900X and the 3950X have that extra cache. Aside from that, the extra 4 cores are going to have diminishing returns for most programmers, and the bump in boost clocks is minimal (though the ability to achieve those clocks with the same TDP as the 3900X is impressive).
Personal experience - 2700x with an SSD and 64gb of ram. Blazing fast working with large Java heaps etc, which is a lot of what I do. Gaming is a non issue as long as you have a decent GPU. Have a 1070ti right now.
That CPU is a bit outdated now, but just a nice benchmark to show the 3*x is definitely enough.
2700X, same 64GB but a 2080 and yes it's so damn fast for programming and gaming I haven't been able to justify upgrading to a 3900/3950 other than "ooh shiny - want it" which isn't enough on it's own.
If you're at all interested in a Hackintosh build (modern performance for a lot less money than what Apple charges), you have to go with AMD graphics cards.
I built a high-end gaming machine about a year ago (Intel 8700k, Nvidia 2080TI). While its an extremely good machine, I find that I absolutely detest Windows, even for gaming. Its unusable for me for doing development work.
The hardware I picked, with the exception of the 2080TI, were oriented towards being Hackintosh compatible. That mainly includes the motherboard and cpu. I'm now contemplating purchasing a top of the line AMD gpu and moving ahead with the Hackintosh project. Another benefit to doing that switch is that the AMD products are a hell of a lot cheaper than Nvidia's while offering very competitive performance.
When I was looking for gaming benchmarks the problem I ran into was the games being benchmarked were never actual popular games so they mostly seemed irrelevant.
the "actual popular games" at a certain point in time are usually games that were released very recently. I suspect anandtech doesn't update the game benchmarks too often so you can more easily compare against parts that were reviewed before the newest games were released. they usually have at least one major title for each genre, so you can get a sense of how the part will perform in your favorite game. you can find benchmarks for newer games in YouTube reviews (I personally find jayztwocents to be useful for this).
I run one such ressource. https://www.pc-kombo.com/us/benchmark - a really rather big collection of single benchmarks, used to put all current and many older consumer processors into on ranked order. You can compare individual processors and if there are fitting benchmarks for a direct comparison, those will be shown (benchmarks comparing different generations of processors are rare). There is one variant for games, and one (with less data sadly) for general applications.
But the others are right, you should be looking for a Ryzen 3000, depending on budget 3600, 3700X, 3900X or even 3950X.
I recently built a system with a 3950x and a 2070 super and it has been great. This is the parts list I ended up with: https://pcpartpicker.com/list/4cMMXv . The system handles AAA and competitive games without any issues. I dual boot with windows 10 and Arch linux and it's been a great experience.
The specific issue is how userbenchmark weights CPUs seems to be very coupled to Intel's specific ideas on how many CPU cores there should be, or at least very firmly stuck in the 5 years ago. So last year the weights were 40% for single-core performance, 58% for quad-core, and 2% for "multi-core".
It looks like Userbench has since adjusted to weight up to 8 threads of performance? Which is maaaaybe less trash if all you care about is gaming. But the Core i5 series still tops charts on userbench despite reviewers no longer recommending the i5's due to performance inconsistency.
They even make ludicrous claims like that the 9100F is perfectly fine for gaming, and is even 10% better than a 2700X. They seem to be basing this decision entirely on older games or games specifically built for as broad a userbase as possible (eg, CSGO, Fortnite & Overwatch). Meanwhile actual reviews say things like "The quad-core Core i3-9100F was hopeless in Battlefield V, pretty bad in Assassin’s Creed: Odyssey, fairly useless in The Division 2, and weak in Shadow of the Tomb Raider." https://www.techspot.com/review/1983-intel-vs-amd-budget-cpu...
So even if you're an Intel fan, userbench is still a terrible way to pick a CPU.
HWUB is not known for being particularly even-handed in their editorial positions. They tend to 'beg the question' by picking game suites that produce the outcome they want to discuss, and tend to over-reach on the conclusions.
"fairly useless" here is over 60 fps average in the heavy titles and 90-110 fps in the multiplayer titles, with a similar ratio of minimums as the 1600 AF (so no more or less prone to stutter). And that's with them loading the dice by picking the absolute most thread-heavy games they could find, most games the 9100F does comparatively much better than that.
And the reality is that Zen1 and Zen+ actually are pretty weak in gaming. Zen2 made a ~30% improvement over Zen1 in gaming performance (much better than the "average" gains for other workloads), and it's still 10-15% behind the fastest Intel processors. Zen1 especially was hot garbage in gaming, those thread-heavy titles aren't representative of its average performance. About all you can say is that it aged better than the 4Cs that Intel had on the consumer platform at the time (or the 8100/9100F/etc that followed), an OC'd 8700K lays a smackdown on it and an OC'd 5820K remains extremely viable even today.
I'm not going to defend userbenchmark's composite scores, but gaming performance does heavily depend on per-core performance even today. Having 8 faster cores is still more desirable for gaming than 16 slower cores. And single-core performance is a good analogue of "per-core performance" so this number remains very relevant.
> "fairly useless" here is over 60 fps average in the heavy titles and 90-110 fps in the multiplayer titles
You missed the point. The point was UB claimed the 9100F was 10% faster than a 2700X. In reality the 2700X absolutely massacres the 9100F in gaming performance. Higher average FPS, higher min FPS, etc...
Even the 1600AF trivially beats the 9100F.
> with a similar ratio of minimums as the 1600 AF (so no more or less prone to stutter).
1600 AF in Battlefield V: 126 average, 91 1% lows
9100F in Battlefield V: 116 average, 49 1% lows
That's not a similar ratio at all.
> Zen2 made a ~30% improvement over Zen1 in gaming performance
No it didn't. You're massively misrepresenting (or mis-remembering) Zen1's gaming performance.
> Zen1 especially was hot garbage in gaming, those thread-heavy titles aren't representative of its average performance.
No it wasn't. It lost to the equivalent Intel CPU, but it was far from bad. You could easily pair a Zen1 CPU with just about any GPU and never see a significant bottleneck. The exception being the absolute top-end. And, critically, if you had an older Intel quad core, like a 7600K, the Zen1/Zen+ CPUs were still an upgrade in gaming performance.
> Having 8 faster cores is still more desirable for gaming than 16 slower cores.
Of course, but you still need enough cores to avoid stuttering. Which means...
> And single-core performance is a good analogue of "per-core performance" so this number remains very relevant.
Is not correct at all. Single-core performance isn't an analogue of anything these days. You need a minimum number of cores and good single-core performance.
And it's not just HWUB with these conclusions that an i5 is no longer sufficient. Gamersnexus has the same recommendations: "In more games each year, we’re noticing the cut-down Core i5 exhibiting high frametime variability that counteracts its fleeting performance superiority with unreliable, stuttery behavior. The AMD R5 3600 is more reliable and consistent in its performance across all games we’ve tested, making it the better gaming option." ( https://www.gamersnexus.net/guides/3533-best-cpus-of-2019-ro... )
It's really specific to Far Cry 5, and it produces some quite strange results that people are putting way too much weight on.
0.1% lows really tank on FC5 on processors without SMT, for example a 5.2 GHz 9600K has less than half the 0.1% FPS as a stock Pentium G5600 2C4T processor. In other words it's stuttering on the 9600K but running ok on the G5600.
4C4T processors (R3 1200) do OK, but that one is AMD, so it isn't clear whether it's specifically something the engine is doing wrong around Intel processors, or if there's some hardcoded assumption that if there are 6+ cores then SMT must be available, or what.
But I mean, this specific game is not evidence that "6C6T is no longer sufficient for gaming", it's just a badly programmed game that has something going wrong under the hood on 6C6T processors.
The actual scores themselves are useful though. When it says "quad core: x% faster" or "multi core FP: y% faster" that is actually fairly accurate (as accurate as a synthetic can be). People just don't like the way userbenchmark weights these numbers in the composite score ("effective speed").
It's still a very useful site for comparing niche hardware that will never get a true review - how does a J5005 compare to a i5 750? How does a Xeon E5-1650 compare to a Ryzen 1600? Probably not going to ever be directly tested. The only alternatives are things like Passmark that are much less accurate. UserBenchmark lets you compare against all kinds of niche or rare hardware at will, that's an incredibly valuable resource. Some people are just so butthurt about the "effective speed" composite scores that they can't bring themselves to scroll past a single line, which is a little ridiculous.
Generally r/AMD constantly gets their panties in a bunch about something or other, it's constant conspiracies about how this or that is a NVIDIA or Intel backed conspiracy. Don't take them too seriously.
At times they have sent death threats because they didn't like the conclusion of a review. After the initial Ryzen launch they decided that Steve from GamersNexus (among others) was an Intel shill and started threatening his family. iirc there have been other "incidents" as well.
(most of those "removed" posts are people justifying it because Steve is an Intel shill who put out a "biased review")
They really take the whole fanboy thing to a whole new level. It is practically a uniquely toxic subreddit, even among other "brand" subreddits, more like a sports team sub or something.
I don't think him jumping back and forth is a bad thing. I'd prefer if Intel and AMD (and others) were at each others throats constantly. We're the winners in that scenario, with more and faster and cheaper chips to choose from.
It already sounds like the environment in Intel isn't great, and a lot of the staff dynamics are very driven by cost-cutting. Great video on youtube posted in the last week with a bunch of leaks from Intel employees: https://youtu.be/agxSclh27uo
Pretty standard fare for monopoly/cartel in all segments in America. Pump stock, hit options, golden parachute.
I will say the last time AMD had a brief lead on Intel with Athlon, they laid on their laurels and started milking customers in record time. I think that was Hector Ruiz.
The last time, it was pretty clear from the mobile processors that the core engineering talent was somewhere in the company, I think the Core processor was from an Israeli team rather than the one pushing out the high-frequency pipeline-stalls-be-damned stuff.
But I get the feeling with the stunning, STUNNING process lead collapse that the engineering talent is fundamentally gone.
Intel had a two or three year lead it was thought at one point.
If I were an EE I’d wait it out, at least for the CPU biz. AMD has been feat or famine, so a bit too unpredictable, though of late they have been crushing it.
Feast or famine works really well if you’re selling stock options as soon as they vest. Well assuming the cycle time is fast enough and you’re staying through multiple cycles.
It’s like x every year vs cycling between 0 and 4x.
There's no carrier silicon and I'm not sure what distinction you're making between binning and fusing. The only way to get 256 MB of cache is to also have 64 cores, the only way to get 192 MB is to have 48, etc.
not per core, but per ccx. a ccx is a group of cores. if you need something that isn't in your ccx's chunk of l3 cache you have to hop through the memory controller and pay a latency penalty.
Install an old version of Linux, limit it to 256MB with a kernel code hack or kernel command line option, and you'd be running entirely in L3.
I also think there might be a niche for unikernels here to compile number crunching and other CPU-heavy tasks down to sizes that would run almost 100% from L3 cache. Wow.
I thought it is worth pointing out, Intel already sell a HEDT CPU that is faster and cheaper than these AMD Counterpart, the 10980xe with 18 Core and higher Clock speed.
All it takes is Intel to bin them with ECC Memory and renamed them to Xeon to compete.
And it seems AMD is in no hurry to release their Zen 3. Giving plenty of time for he market to digest their Zen 2. I just hope their Enterprise and Server Sales Department do better. Because right now, while on paper / benchmarks they are doing great, their sales figure aren't showing all the enthusiasm many sites and comments are claiming.
BigCo's move slowly. It'll take some time before sales cycles close. I'd give a quarter or two lag between now and really promising sales numbers to account for how slowly things in a large data-center change. A human, somewhere, has to rack each of those things :P
I'd love to see what is possible with a tiny runtime/OS in the kilobyte size and running a microservice written in a native language off of each core, everything out of the L3 cache.
I imagine the throughput would be amazing. Single thread per core, e.g. cooperative multitasking. Do this for stream orientated workflows, or even for processing data that is in reasonable sized chunks, it might be screaming fast!