Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AMD Claims World’s Fastest Per-Core Performance with New EPYC Rome 7Fx2 CPUs (tomshardware.com)
523 points by ItsTotallyOn on April 14, 2020 | hide | past | favorite | 246 comments


The 7F52 has 16MB of cache per core.

I'd love to see what is possible with a tiny runtime/OS in the kilobyte size and running a microservice written in a native language off of each core, everything out of the L3 cache.

I imagine the throughput would be amazing. Single thread per core, e.g. cooperative multitasking. Do this for stream orientated workflows, or even for processing data that is in reasonable sized chunks, it might be screaming fast!


When I was younger, I was very pleased when I stumbled across the trick of putting all of a DOS system on a RAM-disk, which made the system fly. Then I got to college and discovered in my CS classes that CPUs now had multi-megabyte on-CPU cache, which immediately led me to wonder how hard it would be to make a DOS system that ran completely out of cache. I'm not convinced that it would be practical for any number of reasons, but it still blows me away every time I look at what modern CPUs have in-package:)


Coreboot does something similar to this. Very early in the boot process the RAM has not yet been initialized and so cannot be used. You need to execute some code to set up the RAM before you can use it. One approach to handling this is to write that initialization code in assembly language and make sure it only uses registers and doesn't touch any memory.

But that's inconvenient. You'd much rather be able to write your code in something like C. Any C code compiled by an ordinary compiler is going to access memory (at least for the stack). Writing and maintaining a custom C compiler for this is a lot of work and still comes with a lot of limitations on the C code that it will accept.

So they set up the cache in a way that it will never try to write its contents out to RAM. Then you can use the output of an ordinary C compiler that uses memory accesses and those accesses will all be served by the cache and never touch RAM. They call this "Cache as RAM".

With some work you could probably boot into DOS like this.

https://www.coreboot.org/data/yhlu/cache_as_ram_lb_09142006....


Not just coreboot, pretty much any BIOS uses cache-as-RAM.


Wow that is so cool. Most of us never have to dig that deep and so weird stuff like that usually flies completely under the radar...

Thank you for sharing!


This is a common approach for most bootloaders, in fact for several microprocessors (the IBM 440 comes to mind), it is the only way to boot.


Is there a way to mess with the cache or MMU and prevent it from _trying_ to write out to DRAM, and be able run a minimal system purely from cache with the DRAM slots completely empty?


Interesting that no one to my knowledge has built an architecture where sram is ‘memory’ such that it would be transparent to C code but no slower dram is used. 32MB is more than my 1998 computer had.


Many embedded systems have a block of SRAM directly addressable.


modify tcc maybe


Run windows 95 in L3 cache and use this to store the install: https://github.com/prsyahmi/GpuRamDrive


This is effectively what you do with some of the smaller MCUs -- you're running out of a megabyte or two of on-die static RAM. This is more to eliminate latency in RTOSes than it is to improve throughput but it's the same idea.


More than eliminating latency I guess is done to eliminate latency variance, which is inherent in the relatively unpredictable behaviour of caches, and is required to characterize worst case scheduling of realtime systems.


Megabyte of RAM? That's a very big MCU for most of embedded programmers. 20KB is plenty for most, and some even go to zero RAM (ATtiny11).


Working with embedded systems. Different MCUs' memory ranges from bytes to gigabytes. What's plenty for one project isn't for another.


Yeah I've worked on projects with Toshiba MCUs with 10's of kB of RAM, and the last truly embedded project I was on used a safety-critical MCU from TI with about a megabyte of internal ECC RAM.


> I'm not convinced that it would be practical for any number of reasons, but it still blows me away every time I look at what modern CPUs have in-package:)

The optimism is premature. A CPU may flush cache even when it pages exactly the same page. Modern CPUs have too much of completely unfathomable prefetch, and cache coherence logic


But they're also configurable via various config registers, and can be set to not touch RAM, as most early-boot firmware uses.


No, all tricks to control cache behaviour are not 100% reliable on modern CPUs, or at least intel ones


Ha. I still use ram disk for local training of machine learning models. You can get a SOLID throughput boost by just loading up the dataset in RAM... My work desktop has something like 192Gb, so can handle a pretty sizeable dataset this way.


Isn't this what 5-eyes have been doing with MiniX3? etc


There is an interesting argument to be made for DSP like workloads for these chips. With x8 PCIe 4.0 lanes to carry inter-socket traffic.

If I had infinite money to spend I would build a hypercube out of these bad boys. each "atom" having 6 x8 PCIe connectors (up,down,left,right,front,back) with corner nodes dedicating a x16 PCIe port for I/O in and out of the cube.

Immerse that bad boy in flourinert and contemplates the deepest secrets of the universe :-)


That would just be a cube, right?


Nominally hypercube architecture is 'n' dimensional[1], which relates to the number of channels at each vertex. Assuming you took the front and back plane of this thing and connected them together it would qualify as having 6 dimensions (6 channels per vertex). The first such system I had a chance to play with was the Intel one[2] which was order 6 using Intels iAPX432 processors.

[1] The (original?) Oak Ridge National Labs paper -- https://www.osti.gov/biblio/6487986-parallel-computing-hyper...

[2] https://www.nature.com/articles/313616b0.pdf?proof=trueIn


Oh, I see. If your cube is exactly 4x4x4, and you loop it in each dimension, then it's topologically equivalent to an order-6 hypercube with a node on each vertex.

In the general case, you need log2(N) connections in each direction to turn an NxN or NxNxN topology into a hypercube, so it's not as simple as "left/right". (For example, for N=8, you might connect each position to position xor 1, position xor 2, and position xor 4.)


OP's referring to an old architecture which was called "the hypercube" which involved a couple (I can't remember if 16 32, 64 or what number) of microprocessors connected to one another in a certainly-not-2d topology.

Edit: oh a bit of searching and it was in the order of thousands of microprocessors. It was called "Connection Machine", you can read about it in Wikipedia[0].

[0] https://en.wikipedia.org/wiki/Connection_Machine


One of my favorite essays ever is on the subject of the Connection Machine and Richard Feynman: http://longnow.org/essays/richard-feynman-connection-machine...


Thank you for this, that was a really excellent read. Nothing better than old computer architecture and some early ML!


TMC was one of the coolest companies of all time; I really wish they were talked about more often.


I learned from an electrical engineer that works on systems for the power grid that running the whole OS in CPU cache is a requirement for some of those systems. I guess that makes sense when you are working with things that move at the speed of electricity.


No, the system is probably using tightly coupled memory; the CPU has single cycle access to all of system memory, effectively making a cache redundant.


And the reason is likely "we are building something realtime, we can't afford variable latency".

For the electricity grid in today's world, that's over-engineering things. The 60Hz electricity grid doesn't really need anything being changed at more than 600Hz, and considering your CPU core runs at 3,000,000,000Hz, you can totally waste a lot of cycles before you start missing deadlines.


"This meeting is to announce we've settled on an architecture for the next gen grid optimization program: Electron UI with a Ruby on Rails app underneath the hood that calls to AWS Lambda for anything that's cpu-heavy. State is persisted to Firebase as serialized YAML, which should scale really well. And to our normal crowd of performance zealots, don't even start with your normal "latency" and "outages" griping. We're comfortable with this stack and besides, we can waste a _lot_ of cycles before we start missing deadlines!"


Even if your CPU cycle budget is large, designing a realtime system requires proving thet it won't be exceeded, by adding up the worst-case running time of the worst-case sequence of events. Avoiding high latency memory gives a few orders of magnitude of margin, which could be needed to afford good functionality on a reasonably cheap CPU; it's not a mere useless improvement from "fast enough" to "as fast as possible".


In hard realtime systems, the problem is not "we have enough time to do the computation" but "we can prove the computation will run in the available time".

The complexity in modern mulit user operating systems (many processes, a run queue, spin locks) along with modern processors (multi tier memory architecture, micro-code, branch prediction, long stalls for memory access) mean it's very hard to prove that anything will run in finite time.


IMO the terms "hard realtime" and "soft realtime" are unnecessarily confusing to beginners. I think that "deterministically bounded latency" and "probabilistically bounded latency" are more descriptive. But they're long and difficult to type, so we'd inevitably call them DBL and PBL, and be back at confusion.


And yet, so much software on modern CPUs takes seconds rather than milliseconds to perform and output simple calculations. Especially with the mantra "developer time is more expensive than computing power", few developers nowadays have recent experience of optimizing programs for sub-ms reaction times. You could probably make it work with standard hardware. But since developers have to be retrained for that field anyway, why not use hardware that's battle-tested and gives you more leeway?

In addition, the advantage of optimizing at every point is that you can use hardware not designed at the limit of what's possible, likely giving you lower error rates.


Would KolibriOS fit the constraints to do something like this? Per the description it "requires only a few megabyte disk space and 8MB of RAM to run." but has a recognizably usable GUI and office suite.

https://kolibrios.org/en/


Better the original MenuetOS which supports now 64bit, though this is not open source anymore.


Unikernel as a c++ header would be a good candidate: https://github.com/includeos/IncludeOS


kdb+ [1] used to advertise already about a decade ago that their database code fits into L3 cache, making it extremely fast in execution. With 16MB, you can probably fit the core of a pretty much fully featured database in there, accelerating data driven jobs that GPUs are suited for.

[1] https://kx.com/


This is kind of what the SPUs of the PS3 were like, except with way way less RAM.


Besides cryptographic heavy services, I'm having trouble imagining a microservice that would benefit from this.

Disk, Network, or in-memory artifact reference would seem to be much bigger bottlenecks.


I'm thinking of simplicity. No VMs, no containers, just a single binary.

The majority of microservices I've written would fit into a system like this just fine.

NodeJS is built around the idea that a single thread of execution is enough for a stupidly large % of backend code. On top of that is ton of infrastructure to deploy slews of 100 line JS microservices that each do one thing.

So, back to basics. No VM, no disk at runtime, a single statically linked binary running on top of nearly bare metal.

Available memory is whatever space the program code doesn't take up.

Ignoring l3 being unaddressable, I wonder if UEFI applications are powerful enough to pull something like image deployment off.

This is probably all less efficient than the current Russian doll system of VMs and containers in use right now, but it is fun to imagine!


> Ignoring l3 being unaddressable

If you set the right config registers, you can map the cache to a linear address space and use it like RAM.

I think to do that, you loose access to actual RAM though, so it isn't awfully practical.


Basically unikernels on the Jailhouse hypervisor.


I can see this being used in hft houses


I thought they all use custom hardware by now? But it's probably useful for applications that go beyond simple algorithms, work on time series data could be much faster.


They use custom hardware for the basic kind of latency arb but software is used for more complicated stuff.


My eyes read that as

>I can see this being used to lift houses.

My evening has been made that much more interesting.


I don't understand why you would want to run tiny systems for that. Properly configured Linux processes will do the same.


I'm not sure with cache aliasing you'd be guaranteed that it would stay all in L3.


It's have to be a special setup for sure, a board with no memory even hanging off of it.

In theory all you'd need is some tight integration with the NIC (or a few of em!) and tiny driver that knows how to DMA to the NIC.

Larger problem of course is that L3 isn't addressable. I wonder how much money a cloud provider would have to pony up to AMD to get an exception to that? :)

A huge # of workloads would fit in 16MB with no storage hanging off. Heck I think every microservice I've ever written could manage that.

There have been some efforts of late to make a cloud first OS, honestly once you remove everything except for "talk to network controller" and "parse JSON", well so long as you don't mind writing in a language that isn't JS, a few MB is plenty for lots of workloads.


Apparently there are ways to treat the cache as a static scratch pad. It is used by BIOSes during early startup, but I'm not sure whether the magic incantations are publicly known. It is second hand knowledge though, so it might just be folklore.


If you allocate one big block it would fit even in a 1-way cache. Modern chips appear to have 16-way L3, so you'd have to be trying to spill.


For most general purpose work, you’d probably lose throughout. Chances are you’d rather have the heap be in the cache rather than the code segments.


kdb


I'm not sure why you are getting downvoted, because kdb+/q would be a perfect candidate. Too bad kOS is no longer still a thing (or maybe it never was).


>640k ought to be enough for anybody


What’s interesting and surprising to me is that the new Epyc 2 chips by AMD have about the same cost per double precision teraflop as a GPU, even the ones with good floating point hardware support. I assume the cost for these accelerators is probably a little cheaper for high performance computing folk able to buy in bulk, but still I was very surprised. I expected there to be an order of magnitude difference is cost per flop, even with doubles. Once AMD introduces AVX512, the cost per flop should improve even more.

Also, in the same vein, I was surprised that double and single floating point cost per flop has stayed fairly stagnant the last couple years as NVIDIA seems focused on improving lower precision performance (ie for machine learning inference).


This is probably because you're comparing them to consumer GPUs, which are designed this way - favoring integer and single-precision functional units on the SM cores. That's a sort of a marketing/tiering strategy by nVIDIA. The Teslas have good double-precision performance - but they are priced waaaay higher than the consumer cards.


> The Teslas have good double-precision performance - but they are priced waaaay higher than the consumer cards.

Yup - the Tesla V100 is 7 TFLOPs of double-precision at around ~$9000

A huge split between consumer & HPC happened in the aftermath of the Fermi (2010) architecture. Fermi was really bad in the consumer space from all the wasted die spent on unused double-precision. It was late, hot, and loud. And barely even faster than the competition.

With Maxwell Nvidia basically removed all the FP64 support from the architecture itself ( https://www.anandtech.com/show/9059/the-nvidia-geforce-gtx-t... - native FP64 rate is 1/32'd the FP32 rate) - the result was a huge boost to gaming performance. But it also meant that HPC users who want double-precision had to use Tesla cards. The actual architectures between GeForce & Tesla are different now, it's not "just" a lockout anymore.


Exactly. About 780Gflops/$ for Tesla V100 and 460Gflops/$ (peak) for the Epyc 7742.

I was very surprised it was this close. I thought the accelerator would be an order of magnitude cheaper per double Gflop. And AMD isn't using AVX512, yet.


A different question is how easy it is to use those FLOPS. The memory bandwidth on a GPU is much higher than on a CPU. Higher FLOP count is useless if the CPU is stalled on memory.


However, CPUs have much much better branching performance, and are much easier to fully utilize.


This is a good point. One place I worked tried to move thier numerical simulations to GGPU, and found they had to move from a recursive algorithm to one more suited to GPU architecture.

They did the rewrite, but found the the results were 10% less precise in the convergence on a solution.

So the imense parallel nature of a GPU is only useful if your algorithms are the right shape, such as a fixed number of matrix multiplies.


Imagine the very near future when AMD starts using TSCM's 5nm process, which has approximately double the transistor density of the current 7nm process used for the EPYC 7002 series.

They could go to DDR5, PCIe 5, AVX 512, and still have a transistor budget left over for whatever they like.

The 'whatever' is the interesting part. What exactly does a GPU do that a CPU doesn't?

Typical GPUs have crazy high memory bandwidths and good latency hiding by using many (thousands) of threads.

So if AMD does something like increase the number of memory channels and implement 4-way SMT, they're poised to upset NVIDIA in the HPC space in a big way.

Many people would much rather program for a general-purpose processor than the CUDA platform with all of its quirks and limitations...


The memory bandwidth gap between CPUs and GPUs is absolutely ludicrously massive. GPUs already crossed the 1TB/s mark. Epyc Rome is only 204GB/s with DDR4-3200.

It's been like this for a decade at least, I don't expect that gap to shrink anytime soon.

But realize also the Tesla V100 is still on TSMC 12nm. If Nvidia is moving these they are obviously also going to make 7nm and eventually 5nm variants. Which will also benefit from 2x+ density.


Let's do some maths for my hypothetical EPYC 3:

1) DDR5 is about 2.5x the speed of DDR4: https://www.anandtech.com/show/15699/sk-hynix-ddr5-8400 2) Dual socket roughly doubles the bandwidth. Measurements are showing something like 300GB/s in practice: https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/6 3) AMD could add extra memory channels, a 50% increase is reasonable.

300GB/s x 1.5 for more channels x 2.5 for DDR5 = 1.1 TB/s.

Not too shabby! As you said, it would likely be eclipsed by the next-gen NVIDIA accelerator, but... damn, over a terabyte per second for general-purpose compute is just nuts.

In principle, AMD could go even higher if they really tried to optimise the platform for this one metric, but server CPUs tend to be "balanced", so I doubt this will happen. One can dream...


> DDR5 is about 2.5x the speed of DDR4: https://www.anandtech.com/show/15699/sk-hynix-ddr5-8400 2)

No it isn't, it's ~1.5x the speed of DDR4: 4800 vs. 3200.

The 8400 is a hypothetical module that they _plan_ to make not that they've actually managed to make. And the first generation of CPUs with DDR5 support are unlikely to immediately support the maximum DDR5's spec plans to achieve. Just like CPUs only very recently officially supported DDR4-3200, despite that being on the market for years and years (the 9900K only officially supports up to DDR4-2666 even).

> AMD could add extra memory channels, a 50% increase is reasonable.

Say what now? A 50% increase is reasonable? You're expecting 12-channel memory? The 8-channels in Epyc Rome is already the most of any CPU on the market. I don't see any chance at all that this jumps to 12 in a single generation?

12 channel starts to become a physical packaging problem. dual-socket with 8-channels already is basically the maximum width of a board: https://www.supermicro.com/a_images/products/Aplus/MB/H12DSU...

So you'd have to do a max of 12 dimms per CPU instead of the current 16 dimms, which cuts your max realistic capacity down by a lot.


That's not going to happen. Extra memory channels are very expensive die-wise. Nvidia and AMD achieve these rates with hbm, which has very wide buses (4096 bits) and short traces from stacking. I can't see any way CPU memory will compete until they move to hbm. Keep in mind gddr6 is available in GPUs now, and is faster than ddr5, but much slower than hbm.


The i/o die is 14nm right now. Once that shrinks, it leaves room for more channels.

EPYC could use HBM just fine if the advantage is pressing enough.


So can Intel, but they don't. Hbm would likely require them to sell a fixed-size memory amount, which can be severely limiting for server applications. Not to mention it's extremely power hungry compared to ddr, so you won't get anywhere near the amounts ddr gives without making power consumption go way up.


EPYC is already a modular architecture, literally nothing stops AMD replacing a couple of "compute" dies with HBM2 stacks. They could release CPUs that don't require DIMM sockets at all. E.g.: instead of 2 sockets + a bunch of DIMM sockets, the same motherboard space could be used for 4 sockets with embedded memory.


They could but then you're cutting your FLOPS down to get your memory bandwidth up. And HBM2 doesn't get you much capacity. The 7nm Instinct MI50 has 4 stacks of HBM2 to achieve 32GB in capacity. So other than as a joke toy, what would you do with a 32-core / 64-thread CPU with 32GB of RAM? That's what you'd end up with if you swapped out 4 compute dies for 4 HBM2 stacks.


That's 32GB per socket, with current technology.

Assume that in 1-2 years HBM capacity doubles, and it's a quad-socket motherboard. You'd have 64GB per socket, or 256GB total.

Remind me how much memory an NVIDIA accelerator has?

To play Devil's advocate, putting HBM2 in the package doesn't magically solve everything. The intra-socket bandwidth could be enormous, but the inter-socket bandwidth would still be whatever it is now, and would be difficult to increase.


> and it's a quad-socket motherboard

Epyc doesn't do quad sockets. Is this just another hypothetical "what if" at this point with no basis in reality?

Because sure, a hypothetical non-existent Epyc re-designed to compete in the double precision floating point space favoring memory bandwidth above all else could be really cool. Then again, so could anything else custom designed exclusively for that use case.

> but the inter-socket bandwidth would still be whatever it is now, and would be difficult to increase.

64 PCI-E 4.0 lanes form the CPU-CPU interconnect currently.

Since we're making up stuff why not assume that's doubled next generation along with being PCI-E 5.0? So that'd be 500GB/s give or take.


That would be great, but to date hbm is always part of the board. I'm all for selling motherboards with the ram already on it if it means higher bandwidth, but it's just never happened before.


AFAIK HBM has always been on the interposer which is no different from the AMD chiplet approach.


That's not exactly true once you have to deal with vector extensions like AVX-512. It's quite a pain to write by hand (C intrinsics) and many of the ways to abstract it away end up giving you a GPU-like programming model (eg. Intel ISPC).

Plus, this has largely been tried before with Xeon Phi and it didn't end so well.

Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code.


>Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code.

There are a lot of tasks that a GPU can do faster than a CPU but it would require batching a large amount of work before you can gain a speedup. EYPC CPUs do not suffer that limitation. If all you have is an array with 4 elements you can straight up run the vector instructions and then immediately switch back to scalar code. Meanwhile with a GPU you probably need at least an array with 10000 elements or more.


> It's quite a pain to write by hand

And we all know that autovectorisation is hit-and-miss at best.

I wonder if there will be a new C-like language that has portable SIMD-like capabilities in the same sense that "C is a portable assembly language".


>The first observation is that in modern compilers, the resulting performance from auto-vectorization optimization is still far from the architectural peak performance. https://dl.acm.org/doi/fullHtml/10.1145/3356842

Maybe we can get get more benefits if we invest more resources in optimizing compilers than in inventing yet another Javascript framework?


Another one is SyCL: https://www.khronos.org/sycl/ the SIMD part is OpenCL, SyCL provides scheduling/memory transfer on top.


There is - Intel SPMD Program Compiler. https://ispc.github.io/


But does that work on AMD ?


That is a problematic metric. You see, you don't buy individual GPUs or CPUs, you buy systems. And typically, you can stick multiple GPUs on the same system (more than CPU)s. There's also the question of what those servers cost, and how many rack units each one takes up etc.

And then - it might make more sense to measure power consumption and maintenance costs than up-front price.


I looked at both. Recent consumer cards have terrible double performance (with exception of Titan series) of course, but the extreme cost of the non-consumer cards offsets their double flop performance to be approximately no better[1] than Epyc 2 per dollar.

[1]EDIT: Tesla v100s are still about 40% lower cost per Gflop than Epyc 7742, but still definitely the same order of magnitude, unlike my expectation.


The rise of AMD has been like a dream come true. I only wish Intel could get their cards straight and compete in the prosumer space.

It used to be that Intel was a good choice of money wasn’t really an object, but as time has gone on it has become harder to justify Intel chips regardless of the pricing.


I also hope that VIA and other x86 license holders step up their game and start to compete like in early '90s.


VIA is out of the western market for the foreseeable future. They are currently manufacturing x86 CPUs for the Chinese government to break the government's reliance on western technology.

They do still make CPUs though. Just incredibly slow ones currently: https://www.youtube.com/watch?v=-DanhnASClQ (although with guaranteed government funding TBD how much that changes)


I thought I read a few months ago that Intel still does well in the video editing space (especially with 4k), because their chips have onboard decoding hardware. Is this still true?


Intel's mainstream consumer CPUs have integrated graphics that includes video transcoding. Their server platforms don't have that, and neither do the workstation and high-end desktop processors based on the same silicon. Or to put it another way: if the CPU offers more than one 16-lane PCIe link, it doesn't have integrated graphics.

AMD's platforms are similarly split, except that their mainstream desktop processors with integrated graphics are lagging behind the CPU-only processors by a generation.


Benchmarks have shown AMD outperforming this dedicated hardware through brute force. They simply have enough spare cores to run it faster in software.


Discrete GPUs have the same hardware although there may be some driver pickiness with certain apps.


Premiere Pro apparently doesn't use the GPU for h.264/5, so sadly if you want that, you have to use an Intel CPU: https://www.pugetsystems.com/recommended/Recommended-Systems...


My understanding is that GPU rendering for h26{4,5} is a lot faster, but the image quality is nowhere close to CPU based rendering. Given that Premiere Pro isn't in the business of real time rendering or transcoding, the business case isn't there for full rendering support. If you really need the speed, maybe render into a fast but large lossless format, then use ffmpeg gpu transcoding?


It's not that Premiere Pro cares about image quality above all else, because it does use GPU-accelerated decoding. It's that Premiere only uses Intel's integrated GPU for decoding and not with Nvidia's or AMD's GPU decoder.

Quick Sync's quality is worse than NVENC's these days so it's likely more along the lines of Intel contributed the patches than Adobe not wanting it out of purity reasons.


Does premiere pro have its own encoders or do they just shell out to ffmpeg or something?


That's true, but turning on hardware encoding didn't make a big difference in encoding times for me. I'm planning on upgrading to a Ryzen 9 3900X to make everything else smoother


Fair point, in these days of economic turmoil, intel premium will feel pointless


But that's just the thing - for a while, you could spend extra in Intel for a premium system. It wasn't a good $/performance tradeoff, but at the upper end it was there. What's wild about this moment in time is that Intel has lost even the high end; there isn't an Intel premium, because they are not the best product at all.


Not performance wise, but I've heard they have better tooling. Hopefully that's just a matter of time.


And for hackintoshes


For how long -- Apple switched from IBM to Intel due to performance reasons. Their mobile chips use ARM. Perhaps only time before Apple offer MacBooks with AMD chips.


> regardless of the pricing

If Intel sells $5 CPUs, will you change your mind? I don't understand the obsessive praise of AMD. Keep in mind that Jim Keller, an Intel ex-chip architect lead the AMD Zen platform when he was hired in 2013 (after working at Apple on the A4/A5 SOCs). It is the same dude competing against himself. Btw, he is back at Intel since 2018 to help Intel out.


I'm not sure how this relates to the parent comment. They stated that no matter what your budget is, Intel is always the worse option currently. Their chips cost more and do less when they used to cost more and do more.


I got the fastest CPU my motherboard could handle. But looking at the performance charts, there were AMD CPUs with 25% and 50% more performance costing hundreds of dollars less than what I spent.

I could have spent that money on a motherboard and some extra RAM, but didn't feel like doing the work. Meaning this was an upgrade from an under powered CPU to a newer chip that was not available when I did the build originally.


The vast majority of users would find a 50% performance improvement and a huge price reduction to be worth the extra effort of replacing the mobo, especially since they already have to replace the cpu and likely put the mobo in to start with.


> regardless of the pricing

>> If Intel sells $5 CPUs, will you change your mind?

Sure, if they were selling i9s for $5.

But the OP's point is the opposite. "Regardless of the pricing" means even that if you ignored the huge costs of Intel's best CPUs they were generally the highest performance option given your unlimited budget.

That's not really the case anymore, except for very specific workloads.

> I don't understand the obsessive praise of AMD.

"obsessive" seems unnecessary. People are praising AMD because they are finally bringing performance competition to the CPU market, which is great for consumers.


I mean, it's nice to see that after years of stagnation, AMD is now able to hit best IPC, more cores/chip, best perf/watt, and close enough singlethreaded perf. And AMD $/perf is better too. An Intel chip at the right price is probably fine -- although, they've lost a lot of perf to mitigate security issues, so I dunno. Intel has also been pretty stagnated the last several years (what are we on, the 5th respin of Skylake?)

Jim Keller's work at Intel should probably start showing up in 2021 or 2022... Could be pretty exciting, or Intel corporate politics could bury it all.


It's not that AMD did better, it's Intel doing worse.


AMD had perfect execution/timing the last 3-5 years.


"Keep in mind that Jim Keller, an Intel ex-chip architect lead the AMD Zen platform when he was hired in 2013"

Jim's wikipedia page doesn't say he was ex-Intel employee when he joined AMD in 2013 to work on Zen.

https://en.wikipedia.org/wiki/Jim_Keller_(engineer)


So Wikipedia is an authorative source now?


Is it wrong in this case?


> Jim's wikipedia page doesn't say he was ex-Intel employee when he joined AMD in 2013 to work on Zen.

It doesn't say what his favourite meal is, but he probably has one...


I don't know how Jim Keller is going to help Intel since Intel has trouble with its fabs and process nodes, not with the architecture. If Intel would be using TSMC or Samsung, it would be at least on par with AMD with IPC and core count.


Don't forget Keller's earlier work on the DEC Alpha, which was a real engineering marvel for its time, finding its way into the Cray T3D and T3E supercomputers.

Based on Keller's work with the DEC Alpha, I was looking forward to investing in PASemi, but Apple acquired PASemi to design mobile chips before PASemi IPO'd.


When did he work at Intel before?


So after years of an effective Intel monopoly I'm really glad to see AMD is back in a way that I don't think I've seen since the Athlon64/Opteron days. Back then it was AMD who pushed the x86-64 instruction set when Intel was claiming EPIC was the future (ha).

At this point, Intel's move to 10nm processes is an embarrassment. I'm sure it's a difficult problem but historically Intel has been reasonably good at planning process advancements but in the case of 10nm they've been off by years. I believe the original goal was 2017? And we're still not there yet.

I would dearly love to see an honest postmortem of this and see what went wrong. Who made promises they'd miss by so much, why, what the issues were and so on.

The last PC I built (because apparently I still do that, even though it annoys me no end) has an Intel 9700 in it. At the time that was probably the best choice. 6 months later and it would no doubt have been a Ryzen.

I hope AMD keeps this up as we need the competition.


They got comfortable with Moore's Law, which in fairness, had held for a long time, and continued to use the model that increasing feature density quadratically was a linear problem. Now, it turns out that once you get near the 10nm gate size range, the difficulty diverges from linear to exponential (and perhaps even higher eventually as some hard limit is approached).

That, and Intel isn't an innovative company anymore. Now they are a process company riding on their manufacturing dominance and x86 market share. It looks a lot like Apple under Tim Cook, except add another decade since there was innovative leadership (Andy Grove). They are a few consultants removed from IBM at this point.


> It looks a lot like Apple under Tim Cook, except add another decade since there was innovative leadership (Andy Grove)

Ehh this is debatable. Apple has put out a few products that have completely changed the market under Cook's tenure. AirPods have introduced a new head phone paradigm. Apple Watch is waaaay ahead of the competition. The iPhone X made full screen phones mainstream and introduced UI gestures that were copied by Android.

Sure there have been some missteps cough butterfly keyboard cough but I’d say overall, they are still producing interesting products that define a large part of the consumer tech market


What I think Cook misses that Jobs got, and made for more exciting releases, is the idea of a totally integrated service. The iPod's victory was also a victory for iTunes. The iPhone was also the App Store. And when Jobs left, those kinds of distinct pairings did too. They are hard to concieve of, and to execute on.

In contrast, the AirPods and Apple Watch are more straightforward "make 'em smaller" incremental moves. The engineering work is leading in many respects, but it doesn't upend a market.

And Intel does have a history that was like Apple's in parts. A big part of their advantage as the PC market heated up was in marketing an entire nomenclature of what the platform could be and to provide comprehensive path-of-least-resistance solutions around that, ensuring that the industry fell in line around their technical lead rather than IBM or some competitor.

Those bones are still there in parts of the company - Intel chipsets are pretty well regarded for dependability(seeing Windows crash because of Intel drivers is a very rare event) and they've been good at getting the corporate office to standardize on them - but increasingly the platform is getting defined around mobile and server needs, which are a more competitive space generally. Intel doesn't get to call the shots on 5G, for example - and huge data center customers are in the business of optimizing the system end-to-end to provide the most efficient general computing resource possible; everything they touch commoditizes, and they will put their foot down if they smell enterprise contract crap.


I think you're simplifying the AirPods and Apple Watch; while glamorising other Apple products. The iPod wasn't the first MP3 player. It was a MP3 player that worked well. The iPhone was actually not the first smartphone; it was the first smartphone that worked well thanks to its multitouch screen.

Do you remember the first generation iPad? I owned it, and let me tell you. It was, almost literally, 9 iPhones stuck together.

The AirPods is more than just make it smaller. People praise its convenience, and its innovation is in skipping the cumbersome bluetooth pairing process.


To be fair, how often do people pair their headphones? I think I paired my headphones only twice in the past month, and that took about 45 seconds.


I own an iPhone, Macbook Pro, Work Macbook Pro, Surface, and a desktop gaming PC.

Switching your bluetooth headphones between 5 devices can be... a chore.


Apple Watch ate the entire smartwatch market.


I'm not really sure what AirPods introduced. We had wireless headphones before. They might be the nicest ones and the most popular but I'm not sure how they created a new headphone paradime


To be fair I haven't tried too many high end wireless earbuds, but switching between the noise cancellation and transparent mode on the AirPods pro was my first "wow" moment using a piece of technology in a long time.


Thats a feature that is available on most high end headphones. I get that the airpods might be quite refined but they are nothing brand new.


The transparent mode is? That's definitely (good) news to me, I have an android phone so I'd like to avoid airpods


Latency! Lack thereof, actually. I bought an Ipad just to be able to use them. My second Apple product. The first was a Macintosh SE. That says something.


Honestly what paradigm did the airpods create/shift?


Seeing lots of people with wireless headphones and another device to charge at the end of the day??


Disabling the use of cheap wired headphones which also doesn't need charging?


That was pre apple. They biggest contribution apple made was making them white and a status symbol for yuppies


Airpods by Dr. Dre

So much that got copied from Android in the meantime, the innovation of the iPhone is a long time ago :)

I have never seen someone use UI gestures.

Not sure what he is doing, but the only thing I see is updating iPhone and raising the price. In the meantime, losing market share in their most important market.

I'm pretty sure iPhones market share will sharply drop way with the current Covid situation worldwide. Not a good position while trying to get people onboard the digital services.

As will the expensive gadgets.


>I believe the original goal was 2017?

It depends how you define "Original". If it was the initial Tick Tock roadmap, 22 and 14 were Late, otherwise it was 2015. But then since both were 6 months late it was expected to be 2016, counting from initial 14nm launch.

We are now 4 years later and 10nm is barely working and yield. Although there were lots of promise during investor meetings of more 10nm products this year, it seems Intel wants to move pass 10nm as early as possible to 7nm and regain their lead by 2023 with 5nm. But judging Intel's recent record I am a little skeptical of their claims.


I wonder if AMD came up with the Epyc name just to troll Intel.


For those who missed the joke, EPIC was Intel/HP's name for the Itanium ("Itanic") style of instruction set that attempts to moves most of the work for parallel instruction scheduling out of silicon and into the compiler. (And, HP later asserted that EPIC refers specifically to the Itanium instruction set, not just Itanium-like instruction sets.)

It wasn't a terrible idea at a high level, but the implementation was terribly complex and power-hungry, needing huge caches to compensate for its low instruction density. They also bet heavily on compiler advances that never materialized or materialized later than expected.

I can imagine an alternative history where Intel took the EPIC idea, but went more conservatively and focused on minimizing complexity and the total number of transistors, designing in such a way that allowed for complex power-hungry HPC optimizations later on, but didn't depend on them for the initial roll-out. This would have resulted in a lot of the potential of the idea for HPC being left on the table, but may have allowed them to have more initial success in the server market and would potentially allowed them to scale down to cell phones more easily than trying to scale Atom down to cell phones.


Intel has been boring for the last 15 years. They incrementally boost things but rarely do any innovation. Amd lb for lb has been whooping intel in most aspects since the 90's. Sure you can buy intel at 500dollars a chip or buy and at 200 a chip that is slightly less powerful.

Hell Intel's i7 has finally caught up to the and fx octo 12 years later. Meanwhile the ryzen 8-16 core absolutely crushes anything intel has to offer regardless of price.


>Intel has been boring for the last 15 years.

Intel may have been boring for the past 5 years, but suggesting they have not innovate since 2005 completely ignore the majority of tick tock execution they achieved.


Name one innovation that has actually mattered

Oh in 2005 or so they made rdram a verryyyyyyy slow thing

Bang for buck you can go back to 386 dx, And has always been killing intel.

Let's see I could buy an intel chip for a couple Grand and it cracks 10k on CPU cpubenchmark

Or buy a thousand dollar amd chip and get 40 cpu bench mark

I can't understand why anyone would ever buy intel

And as for stock prices intel has definitely been boring. Amd is kicking the shit out of intel

It's extremely predictable. Amd loses because of stupid shorttraders

And they get burned. Meanwhile amd had risen 2000+ percent in five years. Intel not so much

Edit for the downvoters please prove me wrong on anything. Amd is the winner Intel has been unimpressive for decades. Pick your horse because and is three Laos ahead


ServeTheHome goes into much greater detail when it comes to reviewing server oriented hardware, its position on the market and enterprise specifics:

https://www.servethehome.com/amd-epyc-7f52-benchmarks-review...


One of the most impressive things about these Epyc 2 chips is the very high PCIe bandwidth. A LOT of lanes, plus support for PCIe 4 (doubles the per-lane bandwidth). Interesting options for extreme SSD storage speed &capacity (in a single node) if you combine it with a PCIe expansion box. (And Potentially other single-node performance metrics for accelerator cards supporting PCIe 4, which NVIDIA doesn’t yet.)


And the lanes are cheap.

https://www.avadirect.com/Tomcat-HX-S8030-S8030GM2NE-AMD-SoC... this is a $400 standard ATX board with 80 PCIe lanes (+ 16 more via risers). That's the equivalent of 160 3.0 PCIe lanes.


The lanes come from the CPU you put into the board, not the board itself. Although yes they are still cheap, at least if you go with something like the EPYC 7252 at ~$500 (which still has the full 128 PCI-E 4.0 lanes)

That said I have no idea how you would actually feed that many PCI-E lanes with an EPYC 7252, but if you can pull it off it's an insane $/lane value.


I know it's the CPU but the board I linked is a very rare standard ATX board, almost all boards are proprietary.

I presume you could build an insane fast fileserver with a real lot of M.2 disks and multiple 100GbE ports?


That's right. You can also use a PCIe expansion chassis (there are already ones supporting PCIe 4.0), giving you plenty of space for dual-slot-width cards.


Sure but I don't think the 8 core epyc would actually keep up with that many NVME drives. At least not if you tried to actually hit 24+ of them at once.

Linus tech tips tried this and had to upgrade the CPU from the 24 core epyc to the 32 core to get performance up to what they wanted. https://youtu.be/xWjOh0Ph8uM

Maybe just a bad deployment but there is overhead in filesystems. Especially with checksums and compression and redundancy and etc...


It's possible to bypass the CPU in some cases using NVMe over an RDMA layer with Infiniband. PCIe 4.0 dual-port 200Gbps Infiniband/Ethernet adapters exist[1] which are compatible with this approach: https://store.mellanox.com/products/mellanox-mcx653106a-hdat...

[1]Although you can't saturate both of them through even a 16 lane PCIe 4.0 port which has ~250Gbps of throughput each way.... Which to me means that PCIe 4.0 is not at all too soon.


Also if you calculate the USD per 3.0 lane value you will find you can go much,much higher in CPU prices. If you look at various combos you will find it very rare for the server CPU+board price divided by the number of 3.0 lanes or equivalent to be below 10USD.


> That's the equivalent of 160 3.0 PCIe lanes

PCIe lanes don't work like that, lanes are the unit of allocation, a lane is a lane regardless of the speed it runs at.

But yes, you can put more bandwidth down a 4.0 lane... if your device supports it. Most of the devices you will be putting on a budget home system don't support it.

It would, hypothetically, be more desirable to have 160 PCIe 3.0 lanes than 80 4.0 lanes. Of course there is no system with that many, but I'd take 128 3.0 lanes over 80 4.0 lanes for sure.


> PCIe lanes don't work like that, lanes are the unit of allocation, a lane is a lane regardless of the speed it runs at.

There's no need to be pedantic here. Just about nothing uses a single 3.0 lane, especially not in a system where you care about having a big count. For anything that was using 2-16 lanes, doubling the speed is basically the same as doubling the number of lanes. Except for the extra benefit that the max allocation goes up.

> I'd take 128 3.0 lanes over 80 4.0 lanes for sure

Maybe you'd take that today. In a few years when more devices support 4.0 that's not a great tradeoff. Especially when you can put switch chips in front of your 3.0 devices to keep all your lanes saturated.


Dual socket Epyc 2 systems provide up to 160 PCIe lanes (and they’re PCIe gen4, too).


It’s just some copper and fiberglass. It’s the sheer number of transistors necessary for that many SerDes that costs $$$.


It's just bizarre to watch how once unassailable Intel is totally floundering in multiple aspects of their main business. I wanted to upgrade my aging Core i7 workstation and looked into the current Intel HEDT lineup. Only 14nm, and even without Spectre/Meltdown mitigations the chips are way slower unless you can use AVX512. Ended up buying Threadripper 3970X with a quad-GPU capable board, even though the CPU is _more_ expensive than anything HEDT that Intel currently sells.


>It's just bizarre to watch how once unassailable Intel is totally floundering in multiple aspects of their main business.

Isn't this just history repeating itself though? We could easily replace "Intel" with any number of previous market leaders that have fallen by the wayside.


I don't remember any company flubbing their unassailable lead quite this badly. I sense there might still be some complacency behind it. Sales are probably doing well enough to not worry about it quite yet. But it's much like C19: if AMD gets the mindshare (which it is in the process of acquiring), with some lag those sales will start to die, and it'll be too late to do much about them then. Any countermeasures have to be preemptive, and I just don't see anything exciting being announced by Intel until at least 2021, whereas AMD keeps releasing bombshell products every quarter like clockwork.


> I don't remember any company flubbing their unassailable lead quite this badly.

I do.

Heck, among other examples, I remember the company being Intel, the market being x86 general purpose desktop/laptop processors, and the firm they blew their long-established unassailable lead to being AMD. I also remember AMD turning around much quicker and flubbing it back...

Actually, unless I'm mistaken, that happened twice before, the first time being the reason the now-universal standard for 64-bit x86 is what used to be “AMD64”.

> AMD gets the mindshare (which it is in the process of acquiring), with some lag those sales will start to die, and it'll be too late to do much about them then

AMD had the mindshare for quite a while before, but Intel was able to do enough about it that people apparently forget that it even happened. The market is fickle, and AMD is at least at good as flubbing advantaged positions as Intel, judging from history.


...almost as if everytime Jim Keller takes the reins at AMD, they pull away from the competition...


He's at Intel now. :-)


> "I don't remember any company flubbing their unassailable lead quite this badly."

Sun Microsystems comes to mind. From revenue just behind Microsoft during the peak of the dot com era to a footnote in history in just two decades. Even IBM didn't drop the ball that badly.


> I don't remember any company flubbing their unassailable lead quite this badly. I sense there might still be some complacency behind it.

IBM deciding that the home PC market wasn't a big deal?


People are already forgetting about the embarrassing P4/Netburst era. AMD was just getting some market share when Intel released Conroe and crushed the competition.


> I don't remember any company flubbing their unassailable lead quite this badly.

GM's Electro Motive Division losing to General Electric (in railroad locomotives) is probably about at the same level.


I don't know. Many said the same about Apple & Microsoft at various times. Never discount what deep pocketbooks and competent leadership can do to right a flailing ship.


Sears is another famous example


Didn't Intel flub their lead once before -- when AMD led with x86-64...?


They did. AMD had a very strong position against the Pentium IV era offerings from Intel. They took it on the chin for a couple years and then the came out with Core2 and pretty solidly handled AMD for a while.

AMD doesn't look nearly as one dimensional as they did then, power consumption wasn't as interesting and Intel came out with a low power play; this time AMD seems to have offerings in every category that are compelling. It's really hard to bet against Intel with their long history though. I wouldn't be surprised if they come out strong when they get their process stuff sorted.


> AMD doesn't look nearly as one dimensional as they did then, power consumption wasn't as interesting and Intel came out with a low power play; this time AMD seems to have offerings in every category that are compelling. It's really hard to bet against Intel with their long history though. I wouldn't be surprised if they come out strong when they get their process stuff sorted.

Yeah, it's primarily a problem of node here. AMD shrunk and Intel has been struggling to get their node up and going. If Intel had a working 10nm-class node the picture would be very different. They have a whole bunch of new architectures in the pipe that get back to making substantial IPC improvements, they simply can't manufacture them yet. Even if they could simply port Skylake to 10nm it would do OK.

TSMC are kind of the real star behind AMD's success. AMD is benefitting hugely from Apple and Qualcomm and others who sink a lot of money into TSMC, while Intel has to get it running all by themselves. TSMC has substantially outrun every other foundry on the planet, the situation would be equally bad if Intel were stuck with GloFo or Samsung or IBM, right now you're either on TSMC or you're not competitive.

The one part that AMD got right is the chiplet design. Being able to manufacture server processors out of chiplets that are a fraction the size of a monolithic laptop processor and have them lose effectively no performance from scaling like this lets them use TSMC even if yields might not be fantastic on an equivalent monolithic chip.

Part of the reason they have laptop processors running a year behind the desktop/server chips is, those are monolithic processors, not chiplet, and they'e bigger and yield worse than chiplets. In this segment, Intel beat AMD to market substantially - Ice Lake has been in the market since like September, the first Renoir laptops are just shipping like sometime this month. I was looking at laptops at Costco before Thanksgiving and just under half the laptops there had Ice Lake, so it's been available in substantial numbers for a while. Renoir is still better, but it is a leapfrogging dynamic unlike, say, the server market where AMD is just better. Ice Lake actually still outperforms Renoir in per-thread performance, just not iGPU performance and has fewer cores overall, so Intel's uarch isn't terribly uncompetitive when they can actually manufacture it. Zen3 will probably match Intel and then Tiger Lake will leapfrog AMD again a bit.

I have my doubts that giant monolithic Ice Lake-SP will ever be manufacturable at any competitive cost. The lack of consumer laptop/desktop 8C Ice Lake speaks against this as well, if you can't yield an 8C at competitive prices how are you supposed to yield a 38 core processor? But, Intel seems to be plowing forward with the launch anyway this year, so maybe it is, who knows.

To make a short story long, Intel really needs to get its node situation straightened out, and probably needs to transition to a chiplet style layout to make that happen, especially for the server stuff. Obviously it is not trivial to get chiplets to scale well in terms of performance. But Intel is not behind AMD so much as they're behind TSMC, and once they can actually manufacture their products on a competitive node then they'll be back in the game.


>TSMC are kind of the real star behind AMD's success. AMD is benefitting hugely from Apple and Qualcomm and others who sink a lot of money into TSMC, while Intel has to get it running all by themselves.

The best bet for Intel would be to sell the fabs like AMD did, or keep them but start attracting other customers to spread costs when optimizing for a particular node.


Considering that all of the reports for the past year have been about how Intel hasn't been able to improve their process, how would they be able to sell or spin off their fabs? Who would want it?


For someone considering their next build with a usecase of:

- programming, docker, golang

- gaming

can anyone recommend a resource for determining the relative performance of processors? With all the new of how well AMD is doing, I’m still not sure how to look at a given task, and determine which processor would perform better.

Does anyone know of such a source?


Unless you are compiling massive projects then your best bet will likely be a 3900x. You get more cores then you can likely use to handle all the programming multitasking while also having a cpu that’s 5-10% off of the best gaming cpu available. All while keeping within a reasonable budget.


Without question, get the 3900x. It's a bit behind Intel in single-threaded performance, but only barely, and the embarrassment of cores you get compared to the i9-9900K more than makes up for it. Microcenter currently has it on sale for $379 if you buy it together with a compatible mobo, which is the deal of the century.


3900X is a fantastic developer desktop processor, great value.

Just get some nice fast DDR4-3600 CL16 RAM to go with it.

I recently built a gaming/developer PC, here is my part list (prices are NZD if you don’t want to have a heart attack):

https://pcpartpicker.com/list/2x4kb8

Handles any game I throw at it, and I can do 4K gaming for quite a few, though with 60Hz monitors you can see I’m not a hardcore gamer.


Plus I think latest security mitigations have reduced i9-9900Ks performance more than the R9 3900x.


IMO the 2500k deal beats it. Still running mine


I think Ryzen 3 is coming this year, so it might be better to wait a bit.


It's out and the darling of most tech news.

There is some confusion in their naming. Zen is the architecture name, and so far we've had Zen, Zen+, and Zen 2. The consumer processor line is branded Ryzen (and Ryzen Mobile for laptop parts). The HEDT processor line is branded Threadripper, and the server line is branded Epyc.

Ryzen and Threadripper 1000-series are Zen.

Ryzen Mobile 2000-series is Zen.

Epyc 7001-series is Zen.

Ryzen and Threadripper 2000-series are Zen+

Ryzen Mobile 3000-series is Zen+.

Ryzen and Threadripper 3000-series are Zen 2.

Ryzen Mobile 4000-series is Zen 2.

Epyc 7002-series is Zen 2 (Epyc skipped Zen+).

Zen 3 is expected in 2020, based on AMD guidance. Asssuming they follow their part numbering scheme, we should expect this to appear in Ryzen and Threadripper 4000-series, Ryzen Mobile 5000-series, and Epyc 7003-series.


Just built a new computer with this, 3900x is a great deal. For GPU the 2070 Super is great as well


For most programming tasks I would focus on the base of the pyramid: Make the storage low-latency for small files, then up the memory size and bandwidth, and then use a CPU appropriate to the workload(ideally, it can go wide and parallelize - otherwise you're back to single-threaded perf). This mitigates the worst case of poorly optimized builds that need to frequently return to storage, and it improves all factors of the operating system when it chooses to swap to virtual memory(Windows has become a very aggressive swapper since Win10 launched and they added a new page file system on top of the old virtual memory).

You can spend time gazing at the benchmarks for each CPU, but I would not pinpoint it as the bottleneck for a responsive and pleasant programming environment. The new AMD chips are good all around. The new Intels are still OK, but the top end probably too hot and loud to reccommend.

For gaming, high clocks/high single-threaded IPC remain the primary factors. Games are mostly designed towards a certain number of cores and speed of I/O. Fast disk and memory will reduce sources of stutter but this is dependent on how often the game tries to load something.


Lots of gaming benchmarks out there, but sadly as of yet there is really no dedicated hardware reviews for the software engineering space.

Others have mentioned Phoronix (linux benchmarking) but I would also recommend Level1Techs. They are the only reviewers I've found that actually (occasionally) dig into what kind of performance actually matters to developers.

Recently, they did a video on using the Threadripper 3990X for Unreal Engine game development[0]. Some useful insights came out of that, such as:

- Compilers favor large cache sizes (which favors AMD)

- Usually the biggest performance bottleneck isn't compiling the code, but running automated tests (especially when it involves running a bunch of VMs).

[0] https://www.youtube.com/watch?v=VQa6r6Ci1jg


Here's a compilation benchmark: https://www.anandtech.com/bench/CPU-2019/2224

I guess there are more gaming benchmarks because there are much more gamers than software engineers.


Compilation benchmarks are fine, and many review sources are adding them, but there's still a pretty big gulf between "compiling the Linux kernel/chromium" and a typical developer workload.

> I guess there are more gaming benchmarks because there are much more gamers than software engineers.

That may be true, but I don't think that's a sufficient explanation. There are plenty of benchmarks for other professional workloads, such as CAD, 3D rendering, video editing, and mathematical modeling for financial or scientific applications.


>can anyone recommend a resource for determining the relative performance of processors?

Anandtech has a section of benchmarks with many workloads. I used that for years to assess CPU relative performance. https://www.anandtech.com/bench/CPU-2019/2224

You can also look at https://hwbot.org/benchmarks , but it's harder to use.

Another good resource would be: https://www.notebookcheck.net/Benchmarks-Tech.123.0.html


It might not help judge performance for individual tasks, but PassMark's CPU listings (https://www.cpubenchmark.net/cpu_list.php) should help with relative performance. The results have a multi- and single- core score which should help comparing different workloads.


In general the 3900X is really nice, and if I did it over again, would probably do that over the 3950X.. About my only complaint is 32gb ram modules aren't available in 3000+ frequencies, so 64gb is about the max prudent amount you'll get in AM4.

It depends on what you're shoving into docker, or how much you're building in golang...

As far as gaming, all the Ryzen 3000 series perform within a couple percent of any Intel CPU close to the price point... The 10900X might do another 3-5% better for gaming, but your electricty use will nearly double for that very minor increase, not worth it imo.


You can get fast and good latency 32 GB DIMMs now. G.Skill has 4x32GB 3600 18-22-22-42, timings not the best, but tighter timings always get harder at higher speed and chips. The 3200 kits are 16-18-18-38.

Edit: Corsair also offers 3200 kits I'm pretty sure, too lazy to look up their exact specs


Yeah you can but the prices though. In my area ram prices increased about 40% since December.


Other than price, why the recommendation for the 3900x over 3950x?


I don't own either, but in researching these processors, my guess is that price, or rather value would be the biggest factor.

One of the main value propositions (for a programmer) of going from a Ryzen 5 or Ryzen 7 to a Ryzen 9 is that you get double the amount of cache (and four times the amount of an Intel Core i9-9900K).

Both the 3900X and the 3950X have that extra cache. Aside from that, the extra 4 cores are going to have diminishing returns for most programmers, and the bump in boost clocks is minimal (though the ability to achieve those clocks with the same TDP as the 3900X is impressive).


Personal experience - 2700x with an SSD and 64gb of ram. Blazing fast working with large Java heaps etc, which is a lot of what I do. Gaming is a non issue as long as you have a decent GPU. Have a 1070ti right now.

That CPU is a bit outdated now, but just a nice benchmark to show the 3*x is definitely enough.


2700X, same 64GB but a 2080 and yes it's so damn fast for programming and gaming I haven't been able to justify upgrading to a 3900/3950 other than "ooh shiny - want it" which isn't enough on it's own.


If you're at all interested in a Hackintosh build (modern performance for a lot less money than what Apple charges), you have to go with AMD graphics cards.

I built a high-end gaming machine about a year ago (Intel 8700k, Nvidia 2080TI). While its an extremely good machine, I find that I absolutely detest Windows, even for gaming. Its unusable for me for doing development work.

The hardware I picked, with the exception of the 2080TI, were oriented towards being Hackintosh compatible. That mainly includes the motherboard and cpu. I'm now contemplating purchasing a top of the line AMD gpu and moving ahead with the Hackintosh project. Another benefit to doing that switch is that the AMD products are a hell of a lot cheaper than Nvidia's while offering very competitive performance.


If the rumors that Apple is moving their Mac line to AMD is true, then it would be a homerun


Almost every hardware website measures gaming performance. Anandtech is good.

Now programming is difficult. You might want to check phoronix, they have some programming-related workloads, like compiling Linux kernel.

Also you might want to check geekbench. It consists of a several real-world usage tests like AES, etc.


When I was looking for gaming benchmarks the problem I ran into was the games being benchmarked were never actual popular games so they mostly seemed irrelevant.


the "actual popular games" at a certain point in time are usually games that were released very recently. I suspect anandtech doesn't update the game benchmarks too often so you can more easily compare against parts that were reviewed before the newest games were released. they usually have at least one major title for each genre, so you can get a sense of how the part will perform in your favorite game. you can find benchmarks for newer games in YouTube reviews (I personally find jayztwocents to be useful for this).


I run one such ressource. https://www.pc-kombo.com/us/benchmark - a really rather big collection of single benchmarks, used to put all current and many older consumer processors into on ranked order. You can compare individual processors and if there are fitting benchmarks for a direct comparison, those will be shown (benchmarks comparing different generations of processors are rare). There is one variant for games, and one (with less data sadly) for general applications.

But the others are right, you should be looking for a Ryzen 3000, depending on budget 3600, 3700X, 3900X or even 3950X.


I recently built a system with a 3950x and a 2070 super and it has been great. This is the parts list I ended up with: https://pcpartpicker.com/list/4cMMXv . The system handles AAA and competitive games without any issues. I dual boot with windows 10 and Arch linux and it's been a great experience.


I don't know about specific tasks, but I've been using https://cpu.userbenchmark.com/ to compare CPUs.


If you go to /r/amd you will see that basically everybody over there hates userbenchmark.com because it's intel-biased garbage.

Just one of many threads from the last few weeks:

https://www.reddit.com/r/Amd/comments/fyhl1g/tim_from_hardwa...


The specific issue is how userbenchmark weights CPUs seems to be very coupled to Intel's specific ideas on how many CPU cores there should be, or at least very firmly stuck in the 5 years ago. So last year the weights were 40% for single-core performance, 58% for quad-core, and 2% for "multi-core".

The idea was this was supposed to be what games care about, but it isn't. Modern games have issues with even 6c/6t CPUs, such as the horrible 1% lows on the 9600K in Far Cry 5: https://www.gamersnexus.net/hwreviews/3407-intel-i5-9600k-cp...

It looks like Userbench has since adjusted to weight up to 8 threads of performance? Which is maaaaybe less trash if all you care about is gaming. But the Core i5 series still tops charts on userbench despite reviewers no longer recommending the i5's due to performance inconsistency.

They even make ludicrous claims like that the 9100F is perfectly fine for gaming, and is even 10% better than a 2700X. They seem to be basing this decision entirely on older games or games specifically built for as broad a userbase as possible (eg, CSGO, Fortnite & Overwatch). Meanwhile actual reviews say things like "The quad-core Core i3-9100F was hopeless in Battlefield V, pretty bad in Assassin’s Creed: Odyssey, fairly useless in The Division 2, and weak in Shadow of the Tomb Raider." https://www.techspot.com/review/1983-intel-vs-amd-budget-cpu...

So even if you're an Intel fan, userbench is still a terrible way to pick a CPU.


HWUB is not known for being particularly even-handed in their editorial positions. They tend to 'beg the question' by picking game suites that produce the outcome they want to discuss, and tend to over-reach on the conclusions.

"fairly useless" here is over 60 fps average in the heavy titles and 90-110 fps in the multiplayer titles, with a similar ratio of minimums as the 1600 AF (so no more or less prone to stutter). And that's with them loading the dice by picking the absolute most thread-heavy games they could find, most games the 9100F does comparatively much better than that.

And the reality is that Zen1 and Zen+ actually are pretty weak in gaming. Zen2 made a ~30% improvement over Zen1 in gaming performance (much better than the "average" gains for other workloads), and it's still 10-15% behind the fastest Intel processors. Zen1 especially was hot garbage in gaming, those thread-heavy titles aren't representative of its average performance. About all you can say is that it aged better than the 4Cs that Intel had on the consumer platform at the time (or the 8100/9100F/etc that followed), an OC'd 8700K lays a smackdown on it and an OC'd 5820K remains extremely viable even today.

I'm not going to defend userbenchmark's composite scores, but gaming performance does heavily depend on per-core performance even today. Having 8 faster cores is still more desirable for gaming than 16 slower cores. And single-core performance is a good analogue of "per-core performance" so this number remains very relevant.


> "fairly useless" here is over 60 fps average in the heavy titles and 90-110 fps in the multiplayer titles

You missed the point. The point was UB claimed the 9100F was 10% faster than a 2700X. In reality the 2700X absolutely massacres the 9100F in gaming performance. Higher average FPS, higher min FPS, etc...

Even the 1600AF trivially beats the 9100F.

> with a similar ratio of minimums as the 1600 AF (so no more or less prone to stutter).

1600 AF in Battlefield V: 126 average, 91 1% lows

9100F in Battlefield V: 116 average, 49 1% lows

That's not a similar ratio at all.

> Zen2 made a ~30% improvement over Zen1 in gaming performance

No it didn't. You're massively misrepresenting (or mis-remembering) Zen1's gaming performance.

https://tpucdn.com/review/amd-ryzen-7-3700x/images/relative-...

3.6ghz/4.4ghz boost 3700X is ~11% faster than the 3.6ghz/4ghz boost 1800X in 1080p gaming.

Even at 720p it's a 15% gap between those two, not 30% https://tpucdn.com/review/amd-ryzen-7-3700x/images/relative-...

> Zen1 especially was hot garbage in gaming, those thread-heavy titles aren't representative of its average performance.

No it wasn't. It lost to the equivalent Intel CPU, but it was far from bad. You could easily pair a Zen1 CPU with just about any GPU and never see a significant bottleneck. The exception being the absolute top-end. And, critically, if you had an older Intel quad core, like a 7600K, the Zen1/Zen+ CPUs were still an upgrade in gaming performance.

See for example at 1440p the gap between Zen1 & Zen2 & Intel being almost nonexistent even with a 2080 Ti: https://tpucdn.com/review/amd-ryzen-7-3700x/images/relative-...

> Having 8 faster cores is still more desirable for gaming than 16 slower cores.

Of course, but you still need enough cores to avoid stuttering. Which means...

> And single-core performance is a good analogue of "per-core performance" so this number remains very relevant.

Is not correct at all. Single-core performance isn't an analogue of anything these days. You need a minimum number of cores and good single-core performance.

And it's not just HWUB with these conclusions that an i5 is no longer sufficient. Gamersnexus has the same recommendations: "In more games each year, we’re noticing the cut-down Core i5 exhibiting high frametime variability that counteracts its fleeting performance superiority with unreliable, stuttery behavior. The AMD R5 3600 is more reliable and consistent in its performance across all games we’ve tested, making it the better gaming option." ( https://www.gamersnexus.net/guides/3533-best-cpus-of-2019-ro... )


> So last year the weights were 40% for single-core performance, 58% for quad-core, and 2% for "multi-core".

And for reference, they used to have it at 30% single core, 60% quad core, 10% multi core. But that didn't advantage Intel enough, or something.


> Modern games have issues with even 6c/6t CPUs

Any idea why?


It's really specific to Far Cry 5, and it produces some quite strange results that people are putting way too much weight on.

0.1% lows really tank on FC5 on processors without SMT, for example a 5.2 GHz 9600K has less than half the 0.1% FPS as a stock Pentium G5600 2C4T processor. In other words it's stuttering on the 9600K but running ok on the G5600.

https://www.gamersnexus.net/images/media/2018/cpus/2600k/int...

4C4T processors (R3 1200) do OK, but that one is AMD, so it isn't clear whether it's specifically something the engine is doing wrong around Intel processors, or if there's some hardcoded assumption that if there are 6+ cores then SMT must be available, or what.

But I mean, this specific game is not evidence that "6C6T is no longer sufficient for gaming", it's just a badly programmed game that has something going wrong under the hood on 6C6T processors.


The actual scores themselves are useful though. When it says "quad core: x% faster" or "multi core FP: y% faster" that is actually fairly accurate (as accurate as a synthetic can be). People just don't like the way userbenchmark weights these numbers in the composite score ("effective speed").

It's still a very useful site for comparing niche hardware that will never get a true review - how does a J5005 compare to a i5 750? How does a Xeon E5-1650 compare to a Ryzen 1600? Probably not going to ever be directly tested. The only alternatives are things like Passmark that are much less accurate. UserBenchmark lets you compare against all kinds of niche or rare hardware at will, that's an incredibly valuable resource. Some people are just so butthurt about the "effective speed" composite scores that they can't bring themselves to scroll past a single line, which is a little ridiculous.

Generally r/AMD constantly gets their panties in a bunch about something or other, it's constant conspiracies about how this or that is a NVIDIA or Intel backed conspiracy. Don't take them too seriously.

At times they have sent death threats because they didn't like the conclusion of a review. After the initial Ryzen launch they decided that Steve from GamersNexus (among others) was an Intel shill and started threatening his family. iirc there have been other "incidents" as well.

https://www.reddit.com/r/Amd/comments/5xkw1b/gamersnexus_rec...

(most of those "removed" posts are people justifying it because Steve is an Intel shill who put out a "biased review")

They really take the whole fanboy thing to a whole new level. It is practically a uniquely toxic subreddit, even among other "brand" subreddits, more like a sports team sub or something.


Interesting, I was not aware of this.

As it happens I am awaiting delivery of a Ryzen 5 3600, good to know it's likely to be even better than userbenchmark suggested!


beware of submitting data to UB:

UB stores passwords in plaintext, and emails them around.

https://plaintextoffenders.com/post/183587319928/userbenchma...

https://github.com/plaintextoffenders/plaintextoffenders/blo...


Has the rise of AMD led to a shift in talent going their way as well? Not sure how the loyalty dynamics are in the chip engineering industry.


Check out https://en.wikipedia.org/wiki/Jim_Keller_(engineer) to answer your question about loyalty. He's had his hand in all kinds of chips you use.


I don't think him jumping back and forth is a bad thing. I'd prefer if Intel and AMD (and others) were at each others throats constantly. We're the winners in that scenario, with more and faster and cheaper chips to choose from.


It already sounds like the environment in Intel isn't great, and a lot of the staff dynamics are very driven by cost-cutting. Great video on youtube posted in the last week with a bunch of leaks from Intel employees: https://youtu.be/agxSclh27uo


Pretty standard fare for monopoly/cartel in all segments in America. Pump stock, hit options, golden parachute.

I will say the last time AMD had a brief lead on Intel with Athlon, they laid on their laurels and started milking customers in record time. I think that was Hector Ruiz.

The last time, it was pretty clear from the mobile processors that the core engineering talent was somewhere in the company, I think the Core processor was from an Israeli team rather than the one pushing out the high-frequency pipeline-stalls-be-damned stuff.

But I get the feeling with the stunning, STUNNING process lead collapse that the engineering talent is fundamentally gone.

Intel had a two or three year lead it was thought at one point.


If I were an EE I’d wait it out, at least for the CPU biz. AMD has been feat or famine, so a bit too unpredictable, though of late they have been crushing it.


Feast or famine works really well if you’re selling stock options as soon as they vest. Well assuming the cycle time is fast enough and you’re staying through multiple cycles.

It’s like x every year vs cycling between 0 and 4x.


AMD/Intel make incredible products, but there's mostly bitching about workplace environment at those places on Blind.


256MB of L3 cache is incredible. There must be useful classes of application that can fit entirely within that, OS and data included.


remember that with ryzen that is split up between chunks of cores, so its not entirely flexible.

that gets better with the upcoming generation though, its more unified.


Can you disable cores to get more cache per (actually running) core?


Yes. That is what AMD is doing with these chips; e.g. the 32-core version has 64 cores with half disabled and you can disable more if you want.


I thought the chiplets were binned before getting placed onto the carrier silicon, so they wouldn't need to do the core fusing that Intel does?


There's no carrier silicon and I'm not sure what distinction you're making between binning and fusing. The only way to get 256 MB of cache is to also have 64 cores, the only way to get 192 MB is to have 48, etc.


Is this possible on first gen Ryzen? I have the 1950x, and I don't see any option to deactivate cores in my BIOS.



Wow, I did not realize this was done through `msconfig` on Windows. Thank you very much for the links.


The L3 is split per core? I thought it was only typically L1 cache that was per core, with L2 and L3 being shared across the entire die.


not per core, but per ccx. a ccx is a group of cores. if you need something that isn't in your ccx's chunk of l3 cache you have to hop through the memory controller and pay a latency penalty.


Install an old version of Linux, limit it to 256MB with a kernel code hack or kernel command line option, and you'd be running entirely in L3.

I also think there might be a niche for unikernels here to compile number crunching and other CPU-heavy tasks down to sizes that would run almost 100% from L3 cache. Wow.


My website runs contemporary Debian on a 128MB VM. You don't need special measures to run in 256MB.


If the kernel sees 32 GB of RAM at boot time, you do need special measures to make sure the kernel doesn't utilize all of the visible RAM.


Why an old version? I would expect them to run with much less if you remove stuff that you don't need like a bloated GUI or a browser...



Could someone comment on how this compares with Ryzen Threadripper (e.g. 3990X)?


These are still lower clocked and lower power than Threadripper but have more cache and more memory channels.


More Ram and PCIe, lower clocks/heat.


I thought it is worth pointing out, Intel already sell a HEDT CPU that is faster and cheaper than these AMD Counterpart, the 10980xe with 18 Core and higher Clock speed.

All it takes is Intel to bin them with ECC Memory and renamed them to Xeon to compete.

And it seems AMD is in no hurry to release their Zen 3. Giving plenty of time for he market to digest their Zen 2. I just hope their Enterprise and Server Sales Department do better. Because right now, while on paper / benchmarks they are doing great, their sales figure aren't showing all the enthusiasm many sites and comments are claiming.

And that is speaking from an AMD shareholders.


BigCo's move slowly. It'll take some time before sales cycles close. I'd give a quarter or two lag between now and really promising sales numbers to account for how slowly things in a large data-center change. A human, somewhere, has to rack each of those things :P


The 10980XE uses an incredible amount of power though.


The appropriate comparison here is probably either the recent Xeon Gold refresh, or Xeon W 3200 series.


Yet, there’s no official, ready to use BLAS/LAPACK.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: