Apple בע"מ
לונג

Apple Silicon: Running Massive AI models

133
### Why is DeepSeek a Game Changer for Apple

** credits to alexocheema on twitter

Apple might not be the first name you think of when it comes to AI hardware, but their silicon is surprisingly well-suited for running massive models like DeepSeek V3 and R1—at a fraction of the cost of traditional GPUs.

#### **How Apple’s M2 Ultra Stacks Up Against the Competition**
Here’s a look at some of the top AI chips on the market and how they compare in memory capacity, bandwidth, and price per GB:

- **NVIDIA H100**: 80GB @ 3TB/s → **$25,000** ($312.50 per GB)
- **AMD MI300X**: 192GB @ 5.3TB/s → **$20,000** ($104.17 per GB)
- **Apple M2 Ultra**: 192GB @ 800GB/s → **$5,000** ($26.04 (!!) per GB)

Apple’s M2 Ultra, launched in mid-2023, is *4x cheaper* per GB than AMD's MI300X and a *staggering 12x cheaper* than NVIDIA's H100. That’s a huge advantage, especially for models that require large amounts of memory.

#### **Why Does This Matter for DeepSeek?**
DeepSeek V3 and R1 are *Mixture of Experts (MoE)* models with a total of **671 billion parameters**—but here’s the trick: only **37 billion** of those parameters are active at any given time when generating a token. The challenge? We don’t know which 37B will be needed in advance, so they must all be ready in **high-speed GPU memory**.

This means system RAM is too slow (resulting in less than 1 token per second), while GPU memory is expensive. Apple Silicon, however, plays a different game. Instead of going all-in on raw speed, it offers a sweet spot—a large pool of medium-fast unified memory at a much lower cost.

#### **What Makes Apple Silicon Special?**
Apple has two key technologies that make this possible:

- **Unified Memory** – Apple Silicon uses a single shared memory pool instead of keeping separate memory pools for the CPU and GPU (like NVIDIA and AMD do). This eliminates the need for expensive and slow data transfers between CPU and GPU.
- **UltraFusion** – Apple’s proprietary interconnect technology that links two chip dies together at **2.5TB/s** of bandwidth. The M2 Ultra is literally two M2 Max chips fused into one, giving it **192GB** of unified memory with **800GB/s** bandwidth.

Now, if the rumors about the **M4 Ultra** are true, Apple will take this even further by fusing two **M4 Max** chips, resulting in:
✅ **256GB of unified memory**
✅ **1146GB/s bandwidth**
✅ **57 tokens per second** for DeepSeek V3/R1 (4-bit) with **two M4 Ultras**

This means Apple users could have **serious AI compute power** at a fraction of the cost of traditional GPU setups.

#### **More Than Just Hardware**
Apple is not just delivering great hardware—it’s also making **big strides in AI software** (which is rare for them).
- **MLX** has made it easy to run machine learning workloads on Apple Silicon.
- **ExoLabs** has successfully **clustered multiple Apple devices** to run large AI models, including DeepSeek R1 (671B) on **7 Mac Minis**.

While it's still unclear which company will dominate AI model development, **one thing seems certain: AI will likely run on American hardware & Apple Silicon**

כתב ויתור

המידע והפרסומים אינם אמורים להיות, ואינם מהווים, עצות פיננסיות, השקעות, מסחר או סוגים אחרים של עצות או המלצות שסופקו או מאושרים על ידי TradingView. קרא עוד בתנאים וההגבלות.