AMD’s MI350 Is A Massive Iterative Advance


AMD - logoAt their Advancing AI convention, AMD introduced the MI350/355 GPUs together with a number of specs. SemiAccurate has a technical piece coming however for now we’ll simply cowl the excessive degree specs.

AMD MI350 specs

Extra energy results in extra efficiency

The obvious query is what’s the distinction between the MI350 and MI355? That reply is straightforward, MI350 is air cooled, MI355 is direct liquid cooled (DLC). Due to this the MI350 is capped at 1000W and the MI355 will sip a mere 1400W. In response to the uncooked AMD numbers, the efficiency distinction is about 10% on a uncooked flops foundation however the actual world delta is extra in direction of 20% on the system degree. Relying on how AMD costs issues, the MI355 appears to be the no-brainer selection.

AMD MI350 construction

MI350 development is analogous however totally different

MI35x itself is a floor up new structure for the machine however the platform itself stays fairly near the present MI300 variants. The identical can’t be mentioned on the silicon degree the place there was one large architectural change, the bottom/IO die. MI300 had 4 IODs, one per compute die. This labored out wonderful and introduced quite a lot of flexibility to the household. You can combine and match CPU and GPU tiles on whim and AMD did simply that. Sadly the market didn’t care, or at the least care sufficient to warrant future heterogeneous CPU/GPU hybrids.

With out the necessity for this degree of flexibility, and with packaging advances, AMD moved to a two IOD structure. Every IOD has two CCDs/XCDs on high which nonetheless permits them to combine and match CPU and GPU however solely two at a time. That mentioned we don’t anticipate to see a CPU bearing MI35x machine this era. This transformation additionally results in some extra fascinating reminiscence selections however that may be a matter for one more article.

AMD MI350 CU performance

AMD’s per-CU efficiency comparability

At a low degree, the per-CU efficiency is broadly the identical on MI35x because it was on MI300, at the least for conventional knowledge sorts. On the newer and far more broadly used AI datatypes corresponding to BF16 and decrease INT widths, the MI35x doubles the efficiency of it’s predecessor. Do be aware that is per CU, not per machine. Every Accelerator Advanced Die (XCD) has 256 energetic CUs per die which is down from 304 within the MI300 era however web efficiency goes means up.

AMD MI350 training performance

If you wish to practice Llama 3 your self…..

AMD is claiming ~3x efficiency features over MI300 in Llama 3 coaching however benchmarks on this scale could be a little murky. This isn’t a criticism, it’s simply not simple evaluate issues on this scale independently for apparent causes. In any case the advances that MI35x brings to the desk are actual and are available from a mixture of low degree {hardware} advances, machine advances, reminiscence, IO, and software program adjustments. Briefly the tip person seen efficiency features must be greater than 2x and exceed that with some software program work.

General the MI35x household isn’t a sea change over the MI300 era nevertheless it it brings some severe advances to the desk. {Hardware} isn’t a very powerful issue for AI purchases to many shoppers, software program typically is. AMD hasn’t ignored this entrance both with the discharge of ROCM7 and varied different instruments and frameworks. The general result’s a claimed leap in coaching efficiency and a big perf/greenback benefit over Nvidia. It will likely be fascinating to see the way it all performs out out there.S|A

The next two tabs change content material under.

AMD's MI350 Is A Massive Iterative Advance 1

Charlie Demerjian is the founding father of Stone Arch Networking Providers and SemiAccurate.com. SemiAccurate.com is a know-how information website; addressing {hardware} design, software program choice, customization, securing and upkeep, with over a million views monthly. He’s a technologist and analyst specializing in semiconductors, system and community structure. As head author of SemiAccurate.com, he repeatedly advises writers, analysts, and business executives on technical issues and lengthy lead business developments. Charlie can also be obtainable by means of Guidepoint and Mosaic. FullyAccurate

AMD's MI350 Is A Massive Iterative Advance 1

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles