The recent RAM price hikes have pushed GPU prices up as well. The only systems that have not yet been affected to the same extent are Macs and high-end GPUs (RTX 6000 Pro and above). However, I would already classify GPUs as out of reach: running coding models with sufficiently large context windows would require one or two RTX 6000 Pro cards, or three to six RTX 5090s.
Let’s take a look at Macs instead.
| Price | Mem Bandwidth | CPU | RAM | NVMe | CPU | GPU | Model | Price/Bandwidth | Price/Memory |
|---|---|---|---|---|---|---|---|---|---|
| 2.500 EUR | 273 GB/s | M4 Pro | 64 GB | 1 TB | 12C | 16C | Mac Mini | 9,15 EUR/GB/s | 39,06 EUR/GB |
| 4.027 EUR | 546 GB/s | M4 Max | 128 GB | 1 TB | 16C | 40C | Mac Studio | 7,14 EUR/GB/s | 31,43 EUR/GB |
| 4.200 EUR | 800 GB/s | M3 Ultra | 96 GB | 1 TB | 28C | 60C | Mac Studio | 5,52 EUR/GB/s | 43,75 EUR/GB |
| 6.720 EUR | 800 GB/s | M3 Ultra | 256 GB | 2 TB | 28C | 60C | Mac Studio | 8,40 EUR/GB/s | 26,25 EUR/GB |
| 11.900 EUR | 800 GB/s | M3 Ultra | 512 GB | 4 TB | 32C | 80C | Mac Studio | 14,87 EUR/GB/s | 23,24 EUR/GB |
Mac Studios with the M3 Ultra currently offer the best overall value. The 96 GB version is fast, and its RAM capacity is sufficient to run many general-purpose models using quantization. The 256 GB version can handle coding models at reasonable quantization levels with 64k context, which is just about sufficient for running coding agents 1. The 512 GB model, however, exceeds the €10k, which I consider a hard upper limit for what can reasonably be called an “affordable” system.
The price difference between the 96 GB and 256 GB Mac Studio models is surprisingly close to current RAM market prices. For comparison, 256 GB of DDR5-6000 RAM currently costs around €3,430, or €13.39 per GB. The price delta between the 96 GB and 256 GB Mac Studio, and between the 256 GB and 512 GB versions, is approximately €12.88 per GB and €18.50 per GB, respectively. If Apple were to return to its usual memory pricing premiums, the 256 GB configuration would likely end up much closer to €10,000.
Hosted inference is both cheaper and more capable, but it cannot be used in all scenarios. In particular, freelancers and small companies are often required to rely on local systems when working for larger clients. The M3 Ultra, launched in March 2025, together with improved models, quantization techniques, and REAP, has effectively kicked off an era of affordable local coding models. That era may now enter a pause: RAM prices are expected to rise through 2026 and are likely to remain elevated until 2027 or 2028 2.
Based on benchmark results shared on r/localllama ↩︎