Hardware for local coding models is still affordable. For how long?

The recent RAM price hikes have pushed GPU prices up as well. The only systems that have not yet been affected to the same extent are Macs and high-end GPUs (RTX 6000 Pro and above). However, I would already classify GPUs as out of reach: running coding models with sufficiently large context windows would require one or two RTX 6000 Pro cards, or three to six RTX 5090s.

Let’s take a look at Macs instead.

Price	Mem Bandwidth	CPU	RAM	NVMe	CPU	GPU	Model	Price/Bandwidth	Price/Memory
2.500 EUR	273 GB/s	M4 Pro	64 GB	1 TB	12C	16C	Mac Mini	9,15 EUR/GB/s	39,06 EUR/GB
4.027 EUR	546 GB/s	M4 Max	128 GB	1 TB	16C	40C	Mac Studio	7,14 EUR/GB/s	31,43 EUR/GB
4.200 EUR	800 GB/s	M3 Ultra	96 GB	1 TB	28C	60C	Mac Studio	5,52 EUR/GB/s	43,75 EUR/GB
6.720 EUR	800 GB/s	M3 Ultra	256 GB	2 TB	28C	60C	Mac Studio	8,40 EUR/GB/s	26,25 EUR/GB
11.900 EUR	800 GB/s	M3 Ultra	512 GB	4 TB	32C	80C	Mac Studio	14,87 EUR/GB/s	23,24 EUR/GB

Mac Studios with the M3 Ultra currently offer the best overall value. The 96 GB version is fast, and its RAM capacity is sufficient to run many general-purpose models using quantization. The 256 GB version can handle coding models at reasonable quantization levels with 64k context, which is just about sufficient for running coding agents ¹. The 512 GB model, however, exceeds the €10k, which I consider a hard upper limit for what can reasonably be called an “affordable” system.

The price difference between the 96 GB and 256 GB Mac Studio models is surprisingly close to current RAM market prices. For comparison, 256 GB of DDR5-6000 RAM currently costs around €3,430, or €13.39 per GB. The price delta between the 96 GB and 256 GB Mac Studio, and between the 256 GB and 512 GB versions, is approximately €12.88 per GB and €18.50 per GB, respectively. If Apple were to return to its usual memory pricing premiums, the 256 GB configuration would likely end up much closer to €10,000.

Hosted inference is both cheaper and more capable, but it cannot be used in all scenarios. In particular, freelancers and small companies are often required to rely on local systems when working for larger clients. The M3 Ultra, launched in March 2025, together with improved models, quantization techniques, and REAP, has effectively kicked off an era of affordable local coding models. That era may now enter a pause: RAM prices are expected to rise through 2026 and are likely to remain elevated until 2027 or 2028 ².

Based on benchmark results shared on r/localllama ↩︎
Based on an analysis from December 2025 on wccftech ↩︎