Implementing AI: Challenges, Opportunities, Infrastructure

The 3 major challenges in implementing AI

Companies face three primary hurdles when implementing AI:

1. Lack of AI readiness in existing infrastructure

The limited capabilities of existing IT and data centre infrastructures are among the biggest hurdles to implementing and scaling Artificial Intelligence (AI) in companies. AI workloads demand enormous computing power and often create bottlenecks, as many systems are already operating at capacity. Energy requirements, CPU resources, as well as storage and network bandwidth are frequently fully utilised. Integrating additional AI-capable hardware usually requires higher density and more space than the existing data centre architecture allows. At the same time, continuing to use outdated infrastructure incurs high opportunity costs, tying up valuable resources that would be better invested in AI innovation and efficiency improvements.

2. AI increases the demand for confidential computing

As Artificial Intelligence (AI) increasingly becomes a “killer application” affecting nearly every aspect of business and personal data, trust in data security, confidentiality, and integrity is crucial. At the same time, high-performance AI relies on a heterogeneous AI infrastructure – consisting of CPUs, GPUs, and specialised accelerators – distributed across multiple nodes and often even across different locations. Therefore, companies must integrate confidential computing capabilities into their AI workflows to protect sensitive data during processing. This includes, for example, securely encrypted virtualisations that are scalable across dozens or even hundreds of nodes, creating a trusted and secure AI execution model.

Short on time?

We’ve visually summarised this blog article just for you.

Browse through the PDF now to quickly find out how to make your company AI-ready.

Discover how!

3. Striking a balance between decisive investment and flexibility

AI initiatives can essentially be divided into two categories: productivity enhancements, achieved through automation or more efficient workflows, and innovations that unlock new revenue opportunities or enable groundbreaking insights. While productivity-focused workloads are typically inference workloads with lower training requirements and can often be handled easily by the latest generation of CPUs, innovation projects come with significantly higher training demands and may therefore require investment in new GPU resources. Consequently, maintaining a balanced investment portfolio that addresses both objectives is a central strategic challenge.

AMD EPYC™ processors: The leading CPU for AI

Processors from the AMD EPYC™ 9005 series can match the integer performance of existing hardware while requiring up to 86% fewer racks [3] and can deliver up to three times the machine learning throughput of 64-core Intel Xeon 8592+ processors [4]. Hardware-level encryption, such as secure encrypted virtualisation, also ensures that models and data remain protected during training or inference, keeping information confidential even in multi-tenant environments.

Discover more!

Key infrastructure areas for AI modernisation

AI enables businesses to achieve results faster than ever before. However, this also means that all these challenges must be addressed through a series of targeted data centre modernisations.

1. Server consolidation as a gateway to AI expansion

Modern CPUs now offer dozens, if not hundreds, of cores per socket, enabling server consolidation that saves both space and energy. A single modern server can replace up to seven existing machines [1]. At the same time, AMD EPYC™ processors are the leading CPUs for AI [2], delivering the integer performance of existing hardware with up to 86% fewer racks [3] and up to three times the machine learning throughput compared to 64-core Intel Xeon 8592+ processors [4]. In addition, the next generation of CPU architectures provides higher performance per watt, reducing energy consumption and ongoing operating costs. Hardware-level encryption, such as secure encrypted virtualisation, ensures that models and data remain confidential even in multi-tenant environments.

2. Flexible infrastructure for faster AI scaling

Companies can simplify AI expansion by maintaining x86 architectures (as opposed to ARM-based options), allowing existing x86 applications to continue running. Since deep learning is extremely data-intensive, it requires parallel processing via GPUs and greater memory bandwidth. By combining the strengths of CPUs with the parallel processing power of GPUs, even the largest models can be handled efficiently to meet growing AI demands. AMD Instinct™ MI325X accelerators match the competition with up to eight times the GPU training performance [5] while achieving up to four times higher inference performance [6].

3. Greater network efficiency and optimised management

High throughput and performance-sensitive AI workloads put significant pressure on data centre networks, making network interface cards (NICs) and data processing units (DPUs) essential to relieve strain on the backend network. DPUs, with their programmable cores, can handle networking, security, or storage services, significantly reducing CPU load. The AMD Pensando™ Salina 400 DPU, for instance, delivers twice the bandwidth, connections per second, packets per second, and storage operations compared to previous generations [7].

4. AI Beyond the data centre

AI is no longer confined to servers; AI PCs offer end users numerous everyday business benefits by processing data locally instead of relying on cloud resources. This enables real-time performance even under suboptimal network conditions, while keeping personal or sensitive data on the device—greatly enhancing data privacy. Processors from the AMD Ryzen™ AI 300 series are leading the AI PC era. These systems are now available in Microsoft Copilot+ PC laptops, delivering up to 1.4× higher multithreaded performance compared to competing products [8].

2603377_StrixPoint_AI 300 Series_01_0009_4K - Foto: AMD

AMD RYZEN™ processors lead the AI era

Processors from the AMD Ryzen™ AI 300 series support hundreds of different AI experiences and are now available in a wide range of Microsoft Copilot+ PC laptops. These systems deliver up to 1.4× higher multithreaded performance compared to competing products [8] and up to 23 hours of battery life for multiple days of use [9].

Discover more!

Fazit: Nachhaltiger Erfolg mit AMD

Das End-to-End-Portfolio von AMD bietet Unternehmen eine Möglichkeit, die Skalierung voranzutreiben und die Zeit bis zu den Ergebnissen durch die KI-Einführung deutlich zu verkürzen. Gleichzeitig ermöglicht die herausragende Performance pro Watt der AMD EPYC CPUs eine präzise Kontrolle über die laufenden Betriebskosten. Dadurch werden Platzbedarf, Energieverbrauch und Lizenzkosten reduziert, was zu nachhaltigem Erfolg führt. Darüber hinaus unterstützt AMD ein offenes Ökosystem, das Unternehmenspartnern dabei hilft, mit der schnellen Entwicklung der KI-Landschaft Schritt zu halten. Durch die Partnerschaft mit AMD können Unternehmenskunden mit einem stabilen Technologieleader zusammenarbeiten, der kontinuierlich in Forschung und Entwicklung investiert und eine lange Tradition zuverlässiger Produktbereitstellung vorweisen kann. So können Unternehmen mit KI zuversichtlich planen und skalierbaren, nachhaltigen Erfolg [10] erzielen.

Choose AMD as your trusted technology partner – for scalable AI solutions that deliver today and tomorrow

Marco Marcone Head of Marketing RNT Rausch

Marco Matthias Marcone

Head of Marketing, RNT Rausch GmbH

Your AI. Your Server. Your Control.

Imagine your AI running faster, more securely, and completely independently – right within your own network. No cloud. No data leaks. No waiting times.

Implementing your own AI has never been easier: with Yeren® Local AI, you can step into the future of intelligent automation – GDPR-compliant, made in Europe, and now available at an exclusive special price.

Discover more!

Claims

1.

9xx5TCO-002A: This scenario contains many assumptions and estimates and, while based on AMD internal research and best approximations, should be considered an example for information purposes only, and not used as a basis for decision making over actual testing. The AMD Server & Greenhouse Gas Emissions TCO (total cost of ownership) Estimator Tool – version 1.12, compares the selected AMD EPYC™ and Intel® Xeon® CPU based server solutions required to deliver a TOTAL_PERFORMANCE of 391000 units of SPECrate2017_ int_base performance as of October 10, 2024. This estimation compares a legacy 2P Intel Xeon 28 core Platinum_8280 based server with a score of 391 versus 2P EPYC 9965 (192C) powered server with a score of 3000 (https://www.spec.org/cpu2017/results/res2024q4/cpu2017-20240923-44837.pdf) along with a comparison upgrade to a 2P Intel Xeon Platinum 8592+ (64C) based server with a score of 1130 (https://spec.org/cpu2017/results/res2024q3/cpu2017-20240701-43948.pdf). Actual SPECrate®2017_int_base score for 2P
EPYC 9965 will vary based on OEM publications. Environmental impact estimates made leveraging this data, using the Country / Region specific electricity factors from the 2024 International Country Specific Electricity Factors 10 – July 2024, and the United States Environmental Protection Agency ‘Greenhouse Gas Equivalencies Calculator’. For additional details, see https://www.amd.com/en/legal/claims/epyc.html#q=SP9xxTCO-002A.

2.

9xx5-012: TPCxAI @SF30 Multi-Instance 32C Instance Size throughput results based on AMD internal testing as of 09/05/2024 running multiple VM instances. The aggregate end-to-end AI throughput test is derived from the TPCx-AI benchmark and as such is not comparable to published TPCx-AI results, as the end-to-end
AI throughput test results do not comply with the TPCx-AI Specification. 2P AMD EPYC 9965 (384 Total Cores), 12 32C instances, NPS1, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s),
1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT1000C (SMT=off, Determinism=Power, Turbo Boost=Enabled)
2P AMD EPYC 9755 (256 Total Cores), 8 32C instances, NPS1, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT0090F (SMT=off, Determinism=Power, Turbo Boost=Enabled)
2P AMD EPYC 9654 (192 Total cores) 6 32C instances, NPS1, 1.5TB 24x64GB DDR5-4800, 1DPC, 2 x 1.92 TB Samsung MZQL21T9HCJR-00A07 NVMe, Ubuntu 22.04.3 LTS, BIOS 1006C (SMT=off, Determinism=Power) Versus 2P Xeon Platinum 8592+ (128 Total Cores), 4 32C instances, AMX On, 1TB 16x64GB DDR5-5600, 1DPC, 1.0 Gbps NetXtreme BCM5719 Gigabit Ethernet PCIe, 3.84 TB KIOXIA KCMYXRUG3T84 NVMe, , Ubuntu 22.04.4 LTS, 6.5.0-35 generic (tuned-adm profile throughput-performance, ulimit -l 132065548, ulimit -n
1024, ulimit -s 8192), BIOS ESE122V (SMT=off, Determinism=Power, Turbo Boost = Enabled)

Results:
CPU Median Relative Generational
Turin 192C, 12 Inst 6067.531 3.775 2.278
Turin 128C, 8 Inst 4091.85 2.546 1.536
Genoa 96C, 6 Inst 2663.14 1.657 1
EMR 64C, 4 Inst 1607.417 1 NA
Results may vary due to factors including system configurations, software versions and BIOS settings. TPC, TPC Benchmark and TPC-C are trademarks of the Transaction Processing Performance Council.

3.

9xx5TCO-001B: This scenario contains many assumptions and estimates and, while based on AMD internal research and best approximations, should be considered an example for information purposes only, and not used as a basis for decision making over actual testing. The AMD Server & Greenhouse Gas Emissions TCO (total cost of ownership) Estimator Tool – version 1.12, compares the selected AMD EPYC™ and Intel® Xeon® CPU based server solutions required to deliver a TOTAL_PERFORMANCE of 39100 units of SPECrate2017_
int_base performance as of October 10, 2024. This scenario compares a legacy 2P Intel Xeon 28 core Platinum_8280 based server with a score of 391 versus 2P EPYC 9965 (192C) powered server with an score of 3000 (https://www.spec.org/cpu2017/results/res2024q4/cpu2017-20240923-44837.pdf) along with a comparison upgrade to a 2P Intel Xeon Platinum 8592+ (64C) based server with a score of 1130 (https://spec.org/cpu2017/results/res2024q3/cpu2017-20240701-43948.pdf). Actual SPECrate®2017_int_base score for
2P EPYC 9965 will vary based on OEM publications. Environmental impact estimates made leveraging this data, using the Country / Region specific electricity factors from the 2024 International Country Specific Electricity Factors 10 – July 2024 , and the United States Environmental Protection Agency ‘Greenhouse Gas Equivalencies Calculator’.

4.

9xx5-040A : XGBoost (Runs/Hour) throughput results based on AMD internal testing as of 09/05/2024. XGBoost Configurations: v2.2.1, Higgs Data Set, 32 Core Instances, FP32
2P AMD EPYC 9965 (384 Total Cores), 12 x 32 core instances, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, 6.8.0-45-generic (tuned-adm profile throughput-performance, ulimit -l 198078840, ulimit -n 1024, ulimit -s 8192), BIOS RVOT1000C (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=1

2P AMD EPYC 9755 (256 Total Cores), 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198094956, ulimit -n 1024, ulimit -s8192), BIOS RVOT0090F (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=1

2P AMD EPYC 9654 (192 Total cores), 1.5TB 24x64GB DDR5-4800, 1DPC, 2 x 1.92 TB Samsung MZQL21T9HCJR- 00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198120988, ulimit -n 1024, ulimit -s 8192), BIOS TTI100BA (SMT=off, Determinism=Power), NPS=1 Versus 2P Xeon Platinum 8592+ (128 Total Cores), AMX On, 1TB 16x64GB DDR5-5600, 1DPC, 1.0 Gbps NetXtreme BCM5719 Gigabit Ethernet PCIe, 3.84 TB KIOXIA KCMYXRUG3T84 NVMe®, Ubuntu 22.04.4 LTS,
6.5.0-35 generic (tuned-adm profile throughput-performance, ulimit -l 132065548, ulimit -n 1024, ulimit -s 8192), BIOS ESE122V (SMT=off, Determinism=Power, Turbo Boost = Enabled)
Results:
CPU Run 1 Run 2 Run 3 Median Relative Throughput Generational
2P Turin 192C, NPS1 1565.217 1537.367 1553.957 1553.957 3 2.41
2P Turin 128C, NPS1 1103.448 1138.34 1111.969 1111.969 2.147 1.725
2P Genoa 96C, NPS1 662.577 644.776 640.95 644.776 1.245 1
2P EMR 64C 517.986 421.053 553.846 517.986 1 NA
Results may vary due to factors including system configurations, software versions and BIOS settings.

5.

MI325-012: Overall GPU-normalized Training Throughput (processed tokens per second) for text generation using the Llama2-7b chat model running Megatron-LM v0.12 (BF16) when using a maximum sequence length of 4096 tokens comparison based on AMD internal testing as of 10/4/2024. Batch size according to largest micro-batch that fits in GPU memory for each system. AMD Instinct batch size 8, Nvidia batch size 2.
Configurations:
AMD Development system: 1P AMD Ryzen 9 7950X (16-core), 1x AMD Instinct™ MI325X (256GB, 1000W) GPU,128 GiB memory, ROCm 6.3.0 (pre-release), Ubuntu 22.04.2 LTS with Linux kernel 5.15.0-72-generic, PyTorch 2.4.0.
Vs.
An Nvidia DGX H200 with 2x Intel Xeon Platinum 8468 Processors, 1x Nvidia H200 (141GB, 700W) GPUs, 2 TiB (32 DIMMs, 64 GiB/DIMM), CUDA 12.6.37-1, 560.35.03, Ubuntu 22.04.5, PyTorch 2.5.0a0+872d972e41.nv24.8. MI325X system median 12509.82 tokens/second/GPU H200 system median 11824.09 tokens/second/GPU Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI325-012

6.

MI325-004: Based on testing completed on 9/28/2024 by AMD performance lab measuring text generated throughput for Mixtral-8x7B model using FP16 datatype. Test was performed using input length of 128 tokens and an output length of 4096 tokens for the following configurations of AMD Instinct™ MI325X GPU accelerator and NVIDIA H200 SXM GPU accelerator. 1x MI325X at 1000W with vLLM performance: 4598 (Output tokens / sec) Vs.1x H200 at 700W with TensorRT-LLM: 2700.7 (Output tokens / sec)

Configurations:
AMD Instinct™ MI325X reference platform:
1x AMD Ryzen™ 9 7950X CPU, 1x AMD Instinct MI325X (256GiB, 1000W) GPU, Ubuntu® 22.04, and ROCm™ 6.3 pre-release
Vs
NVIDIA H200 HGX platform: Supermicro SuperServer with 2x Intel Xeon® Platinum 8468 Processors, 8x Nvidia H200 (140GB, 700W) GPUs [only 1 GPU was used in this test], Ubuntu 22.04) CUDA® 12.6 Server manufacturers may vary configurations, yielding different results. Performance may vary based on use
of latest drivers and optimizations.

7.

PEN-012: Measurements conducted by AMD Performance Labs as of Aug 27, 2024 on the current specification for the AMD Pensando™ Salina DPU accelerator designed with AMD Pensando™ 5nm process technology, projected to result in delivering 400Gb/s line-rate estimated performance. Estimated delivered results calculated for AMD Pensando™ Elba DPU designed with AMD Pensando 7nm process technology resulted in 200Gb/s line-rate performance. Actual results based on production silicon may vary.
Salina projected performance:
Bandwidth: 400Gbps
Connections per second: 10M
Packets per Second: 100MPPS
Encryption Offloads: 400 Gbps
Storage IOPS: 4 Million
Actual results and specifications may vary based on production silicon.

8.

STXP-12: Testing as of Sept 2024 by AMD performance labs on an HP EliteBook X G1a (14in) (40W) with AMD Ryzen AI 9 HX PRO 375 processor, Radeon™ 890M graphics, 32GB of RAM, 512GB SSD, VBS=ON, Windows 11 Pro vs. a Dell Latitude 7450 with an Intel Core Ultra 7 165H processor (vPro enabled), Intel Arc Graphics, VBS=ON, 16GB RAM, 512GB NVMe SSD, Microsoft Windows 11 Pro in the application(s) (Best Performance Mode): Cinebench R24 nT. Laptop manufactures may vary configurations yielding different results. STXP-12.

9.

STXP-32: Based on internal testing by AMD as of 9/23/24. Battery life results evaluated by operation of a nine-participant Microsoft Teams video conference on battery. Test configuration for AMD and Intel systems run from power level 90% > 45% @150nits brightness and power mode set to “”best power efficiency.”” System config: HP EliteBook X G1a (14in) with an AMD Ryzen AI 9 HX PRO 375 processor (40W), Radeon™ 890M graphics, 32GB RAM, 512GB SSD, VBS=ON, Windows 11 Pro. System config: Apple MacBook Pro 14 with M3 Pro 12-core processor, Apple integrated graphics, 36GB RAM, 1TB SSD, MacOS 15.0. System Config: Dell Latitude 7450 with an Intel Core Ultra 7 165H processor (28W) (vPro enabled), Intel Arc Graphics, VBS=ON, 16GB RAM, 512GB NVMe SSD, Windows 11 Pro. Manufacturers may vary configurations yielding different results. Performance may also vary based on use of latest drivers. STXP-32.

10.

9xx5TCO-001C: This scenario contains many assumptions and estimates and, while based on AMD internal research and best approximations, should be considered an example for information purposes only, and not used as a basis for decision making over actual testing. The AMD Server & Greenhouse Gas Emissions TCO (total cost of ownership) Estimator Tool – version 1.12, compares the selected AMD EPYC™ and Intel® Xeon® CPU based server solutions required to deliver a TOTAL_PERFORMANCE of 39100 units of SPECrate2017_
int_base performance as of October 10, 2024. This scenario compares a legacy 2P Intel Xeon 28 core Platinum_8280 based server with a score of 391 versus 2P EPYC 9965 (192C) powered server with an score of 3000 (https://www.spec.org/cpu2017/results/res2024q4/cpu2017-20240923-44837.pdf) along with a comparison upgrade to a 2P Intel Xeon Platinum 8592+ (64C) based server with a score of 1130 (https://spec.org/cpu2017/results/res2024q3/cpu2017-20240701-43948.pdf). Actual SPECrate®2017_int_base score for 2P
EPYC 9965 will vary based on OEM publications. Environmental impact estimates made leveraging this data, using the Country / Region specific electricity factors from the 2024 International Country Specific Electricity Factors 10 – July 2024 , and the United States Environmental Protection Agency ‘Greenhouse Gas Equivalencies Calculator’. For additional details, see https://www.amd.com/en/legal/claims/epyc.html#q=epyc5#9xx5TCO-001B

Know-how about digitalisation, servers and storage

NIS2 made easy: Immutable storage contributes to compliance

The new NIS2 Directive presents companies with the challenge of enhancing their cybersecurity measures and ensuring data integrity. With immutable storage, businesses can ensure that critical data is permanently protected and cannot be altered. This technology not only provides increased security but also meets the stringent requirements of NIS2.

Discover how immutable storage can help your company comply with the directive while safeguarding your sensitive information.

Whitepaper Immutable Storage

SSD, HDD, NVMe & MACH.2 Speicherlösungen im Vergleich

Successfully implementing AI:
Understand the challenges, seize the opportunities

SPONSORED POST

The 3 major challenges in implementing AI

1. Lack of AI readiness in existing infrastructure