Matevž Gačnik's Weblog

Apple Fusion Architecture: The Structural Evolution of Apple Silicon

Apple’s M‑series processors have historically followed a monolithic design philosophy. Each generation from M1 through M4 relied on a single die that integrated CPU cores, GPU cores, memory controllers, and specialized accelerators into a unified system on chip. With the introduction of the M5 Pro and M5 Max processors, Apple has fundamentally altered this approach.

The company has introduced a modular design methodology now, in 2026, known as Fusion Architecture.

Fusion Architecture represents the first structural redesign of Apple Silicon since the debut of the M1 in 2020. Instead of manufacturing one large die, Apple now constructs high‑end processors from multiple silicon dies bonded together into a single logical system. This shift reflects broader trends in semiconductor engineering driven by manufacturing limits, cost constraints, and the rapidly growing computational requirements of artificial intelligence workloads.

Historical Context: The Monolithic Apple Silicon Strategy

When Apple introduced the M1 processor in 2020, the company redefined personal computing processor architecture. The M1 integrated CPU, GPU, Neural Engine, and memory controllers onto a single die while introducing the unified memory architecture. This architecture allowed all compute components to access a shared memory pool without copying data between discrete subsystems.

This design delivered several advantages including reduced memory latency, improved energy efficiency, higher effective bandwidth between compute units, and simplified software optimization. The M1 architecture quickly proved successful and subsequent generations including M2, M3, and M4 followed the same structural model while incrementally improving process nodes, core counts, and bandwidth.

However, this design philosophy carried a significant limitation. As workloads increased, particularly those related to large language models and machine learning inference, chip complexity and die size began to scale rapidly. Larger dies are significantly harder to manufacture reliably because even a single defect renders the entire chip unusable.

The Semiconductor Industry Shift Toward Chiplets

Apple is not alone in confronting the physical limits of monolithic chips. The semiconductor industry has broadly transitioned toward chiplet architectures where processors are composed of several smaller dies interconnected within a single package.

Major vendors have already adopted this strategy. AMD employs chiplet designs in Ryzen and EPYC processors. Intel uses advanced packaging techniques such as embedded multi‑die interconnect bridge and Foveros stacking. NVIDIA constructs its largest AI accelerators using multi‑die packaging.

The economic rationale behind chiplets is straightforward. Manufacturing several smaller dies is more cost efficient than producing one extremely large die because yield rates improve significantly. Industry analyses indicate that modular chiplet designs can deliver comparable computational capability at dramatically lower manufacturing cost.

This transition marks the gradual decline of the traditional monolithic processor model.

Apple’s Approach: Fusion Architecture

Apple’s response to these constraints is Fusion Architecture.

Rather than simply replicating existing dies and connecting them together, Apple has designed a modular structure where individual dies perform distinct functional roles. These dies are physically bonded using high bandwidth interconnect technology and presented to the operating system as a single logical processor.

The critical design requirement Apple preserved is unified memory. Even though the processor now spans multiple dies, Apple maintains a shared memory architecture that allows all compute units to operate on the same dataset without explicit data transfers.

While Apple has not publicly disclosed the full technical implementation of cross die memory coherence, the company claims the architecture preserves the same software model as earlier M series chips. From the perspective of applications and operating systems, the processor behaves as a single unified system.

Structural Design of the M5 Pro and M5 Max

The first processors implementing Fusion Architecture are the M5 Pro and M5 Max. Both chips consist of two separate dies connected through high speed packaging technology. The first die is identical in both processors and contains the majority of the system control components.

Primary Die

The first die includes an 18 core CPU cluster, a 16 core Neural Engine, the SSD controller, and Thunderbolt I O controllers. This die effectively functions as the computational and system management foundation of the processor.

Secondary Die

The second die differentiates the two processors.

The M5 Pro configuration includes up to 20 GPU cores, a single media engine, and a memory controller delivering up to 307 GB per second bandwidth. The M5 Max configuration includes up to 40 GPU cores, dual media engines, and a memory controller delivering up to 614 GB per second bandwidth.

This design enables Apple to scale GPU and media performance independently from the CPU subsystem. In principle, additional GPU focused dies could be added in future designs to extend compute capacity without redesigning the entire processor.

Architectural Changes in CPU Design

The CPU configuration of the M5 generation introduces another major structural change. Earlier M series chips relied on a hybrid architecture combining performance cores with efficiency cores.

The M5 Pro and M5 Max abandon efficiency cores entirely and instead implement a two tier high performance structure.

The CPU cluster consists of six super cores optimized for peak single thread performance and twelve performance cores optimized for high multithread throughput. This creates an all performance architecture designed for sustained computational workloads rather than energy optimized background processing.

The naming scheme has also evolved. What were previously called performance cores in earlier M series chips are now referred to as super cores. The new performance cores represent an intermediate tier that prioritizes throughput while maintaining strong efficiency characteristics.

This structure closely resembles the strategy used by AMD in its Zen 5 and Zen 5c core architecture.

GPU Evolution and AI Acceleration

Another significant development is the integration of neural accelerators within each GPU core.

Although the GPU core counts remain unchanged from the previous generation, each core now includes dedicated hardware for machine learning computation. This allows the GPU to perform both graphics processing and AI inference tasks.

Apple claims this architecture enables up to four times the AI compute capability without increasing the overall GPU core count.

This reflects a broader shift in processor design. GPUs are evolving into general purpose parallel compute engines where graphics workloads represent only one category of computation.

Memory Bandwidth Scaling

Large AI models require extremely high memory bandwidth to deliver acceptable inference performance. Apple has continued to increase bandwidth across successive M series generations. The M5 generation extends this trend. The M5 Pro reaches 307 GB per second memory bandwidth while the M5 Max reaches 614 GB per second. Both figures represent improvements over the M4 generation.

Bandwidth scaling is particularly important for local inference of large language models. High bandwidth allows large parameter sets to be accessed efficiently by GPU and neural compute units. This suggests Apple is designing these processors with the expectation that laptops will increasingly run advanced AI models locally rather than relying solely on cloud infrastructure.

Strategic Implications of Fusion Architecture

Fusion Architecture is not revolutionary in the sense that multi die packaging already exists across the semiconductor industry. However, it represents a critical strategic transition for Apple Silicon. The key significance lies in scalability. By demonstrating that unified memory and high performance interconnects can function across multiple dies, Apple removes the traditional constraint of die size. Future processors can scale horizontally by combining additional specialized dies rather than enlarging a single monolithic chip.

This opens several potential directions for future development including additional GPU dies for AI acceleration, specialized machine learning dies, and advanced multi package configurations for workstation and server workloads. The packaging technology used to bond these dies is similar to the interconnect technologies used in modern AI servers. Apple has effectively brought data center class packaging techniques into consumer laptop processors.

So..

The M1 generation introduced a radical rethinking of personal computing processors through unified memory and system level integration. Subsequent generations refined that architecture while maintaining the monolithic design.

Fusion Architecture represents the next phase of Apple Silicon (r)evolution. Author notes the reader to remember 2020 and how revolutionary the M1 Apple Silicon architecture actually was.

Instead of competing with the physical limits of monolithic chips, Apple is adopting a modular strategy that preserves its core architectural principles while enabling future scalability. Multi die packaging allows the company to expand computational capability without incurring the manufacturing penalties associated with extremely large silicon dies.

The immediate performance gains of the M5 generation are important but the more significant development is architectural. Fusion Architecture establishes the structural foundation upon which future Apple processors will be built.

In practical terms, the question is no longer how large a single Apple Silicon chip can become. The real question is how many modular components Apple can connect together while maintaining the unified architecture that has defined the platform since the M1.

Categories: AI | Apple | Articles

Thursday, 05 March 2026 08:02:09 (Central Europe Standard Time, UTC+01:00)

Comments

On Bosons and Morons

It finally happened. Pfeew!

Article link: On Bosons and Morons

Categories: AI | Articles | Parody

Saturday, 10 February 2024 12:10:30 (Central Europe Standard Time, UTC+01:00)

Comments

Paradigm Shift is Happening in AI Space and Rarely One Notices

Microsoft Copilot (AI Based, going to all Office, Bing and communication products)

Currently named Copilot for Work..

Categories: AI

Sunday, 19 March 2023 15:28:44 (Central Europe Standard Time, UTC+01:00)

Comments

M3GAN

M3GAN is nice and cute.

IMDB link.

Categories: AI

Friday, 27 January 2023 17:51:52 (Central Europe Standard Time, UTC+01:00)

Comments

Go on, then.. #AI

Source link: medRxiv

Quite scary.. think.

Categories: AI

Tuesday, 17 January 2023 08:04:12 (Central Europe Standard Time, UTC+01:00)

Comments

A Little Late Night Conversation with @OpenAI

Copy and paste from my conversation with https://chat.openai.com.

Response time was under 20 milliseconds. Handling context with no issues. Fascinating.

> So, how are you OpenAI?
I'm doing great! OpenAI is a research laboratory dedicated to finding ways to use artificial intelligence to benefit humanity. We're making advances in machine learning, robotics, and other AI technologies, and we're always looking for new opportunities to make the world a better place.

> What's a garbage collector?
A garbage collector is a program in computer science that is responsible for managing the memory allocation of a computer system in order to ensure that it runs efficiently. It frees up memory that is no longer in use and reclaims memory that is being used inefficiently.

> Should we use it?
Yes, garbage collection is important for the efficient use of memory and can help improve the performance of a computer system. If a computer system is not using garbage collection, it’s likely that it will suffer from memory leaks, which can lead to reduced performance and system instability.

> Hmm, I should have asked differently...
No worries, we're here to help. What other questions do you have about garbage collection?

> What I meant to ask was, should we use it manually or should we leave garbage collection to platforms we use?
It depends on the platform and the application you are developing. If the platform has a built-in garbage collection system, you can usually leave it to the platform to manage memory allocation and reclaim memory. However, in some cases, it might be necessary to manually implement garbage collection in order to optimize the performance of the system.

> Could you give me an example of that?
Sure. For example, if you are developing a game, it might be beneficial to manually implement garbage collection in order to free up memory that is no longer in use and reclaim memory that is being used inefficiently. This can help keep the game running smoothly and improve the overall user experience.

> Any other examples?
Yes, garbage collection can also be used in other areas such as web development and database management. Garbage collection can help reduce the amount of memory that is used by a web application and can help ensure that a database is running efficiently.

> Any other examples when we should trust the platforms for garbage collection?
In many cases, it is best to trust the platforms for garbage collection. For example, if you are developing an application that runs on a popular platform such as Android or iOS, you can usually trust the garbage collection system that is built into the platform. This will ensure that memory is being managed efficiently and that the application is running smoothly.

> What do you mean by 'managed efficiently'?
When garbage collection is managed efficiently, it means that memory is being allocated and reclaimed in a way that maximizes efficiency and minimizes memory leaks. This helps to ensure that the application runs smoothly and does not suffer from performance issues due to memory leaks.

> You are quite knowledgeable about the subject.
Thank you for your kind words!

> What happens if memory is not reclaimed?
If memory is not reclaimed, it can lead to memory leaks and decrease the performance of the application. Memory leaks can also cause the application to crash or become unstable. To prevent these issues, it is important to ensure that memory is being reclaimed efficiently with the help of garbage collection.

> So, what did programmers do when there was no automatic garbage collection?
Before the advent of automatic garbage collection, programmers had to manually manage memory allocation and reclaim memory when necessary. This could be a tedious and time consuming process, and errors in memory management could lead to memory leaks and other performance issues.

> What do they do now in C or Lisp or Fortran?
Nowadays, many programming languages such as C, Lisp, and Fortran have built-in garbage collection systems. This allows programmers to focus more on the application logic, rather than having to manually manage memory allocation and reclaim memory.

> Which platforms or languages do have automatic garbage collection?
Many popular platforms and languages such as Java, JavaScript, Python, and more have built-in garbage collection systems. Additionally, many modern operating systems such as Windows and macOS also have garbage collection systems.

> Does .NET have it?
Yes, .NET has a built-in garbage collection system. The .NET garbage collection system is responsible for managing memory allocation, reclaiming memory when appropriate, and ensuring that the application runs efficiently.

> So.. What are generations in terms of .NET garbage collection?
Generations are a concept used in .NET garbage collection to help improve the performance of the garbage collector. The .NET garbage collector divides objects into three generations based on their lifetime, and objects that have been alive for longer are placed into higher generations. This helps the garbage collector to identify which objects are the most likely to be reclaimed, allowing it to reclaim memory more efficiently.

> Tell me about latency models in this case.
Latency models are used in .NET garbage collection to help reduce the impact that garbage collection has on the performance of the application. The latency model estimates the time it will take for the garbage collector to reclaim memory, and then it schedules garbage collection at times that will have the least impact on the application. This helps to ensure that the application runs smoothly, even when garbage collection is running.

> Any thoughts on weak references for GC?
Weak references are a type of reference that can help the garbage collector reclaim memory more efficiently. Weak references are references to objects that are no longer strongly referenced, and they allow the garbage collector to reclaim the objects that they refer to even if they are not strongly referenced. This can help reduce memory leaks and improve the performance of the application.

> You might just pass a Turing test. What is it?
The Turing test is a test of a machine’s ability to exhibit intelligent behavior that is indistinguishable from that of a human. It was proposed by Alan Turing in 1950, and it is still used today to evaluate the performance of AI systems.

> Could you pass it?
My understanding is that I am not the one who is being tested in the Turing test - it is the machine that is being tested. However, it is possible that I could pass the Turing test if I was able to exhibit intelligent behavior that is indistinguishable from that of a human.

> So coming back to GC.. Could you explain the concept of compaction?
Compaction is a process used by the garbage collector to reclaim memory more efficiently. The garbage collector moves objects around in memory to free up memory that is no longer in use and to reduce the amount of memory fragmentation. This helps to reduce memory leaks and improve the performance of the application.

> No, GC as in garbage collection...
Ah, of course! Garbage collection is a process used in computer science to manage the memory allocation of a computer system in order to ensure that it runs efficiently. It frees up memory that is no longer in use and reclaims memory that is being used inefficiently.

> Giving you an option to select between a forced and optimized garbage collection. When would you choose one or the other?
The choice between forced and optimized garbage collection depends on the application and the expected workload of the system. Forced garbage collection is used when the application needs to reclaim memory immediately and the system can tolerate a slight decrease in performance. Optimized garbage collection is used when the application can wait for the garbage collector to reclaim memory and the system's performance needs to be maximized.

> Quite good.
Thank you!

> Could we change topics?
Of course! What would you like to talk about?

> How much power does a 12kW power outlet generate in 12 hours?
A 12kW power outlet will generate 144 kWh (12 kW x 12 hours) of electricity in 12 hours.

> What about 120 hours?
For 120 hours, the 12 kW power outlet will generate 1,440 kWh (12 kW x 120 hours) of electricity.

> So, a 700MW power plant generates how many times more power that this power outlet?
A 700 MW power plant will generate 4,800,000 kWh of electricity in 12 hours (700 MW x 12 hours), which is 33,333 times more power than the 12 kW power outlet.

> What are typical examples of such a powerful power plants? Do they run on coal?
Yes, many large power plants are powered by coal. These power plants are typically used to generate electricity for large cities and regions. Examples of such power plants include the Taichung Power Plant in Taiwan, which has a capacity of 5,500 MW, and the Four Corners Power Plant in the US, which has a capacity of 3,520 MW.

> What about nuclear, which one generates the most power?
The Kashiwazaki-Kariwa Nuclear Power Plant in Japan is the largest nuclear power plant in the world, with a capacity of 7,965 MW. It is followed by the Fukushima Daiichi Nuclear Power Plant in Japan (4,711 MW) and the Bruce Nuclear Generating Station in Canada (4,400 MW).

> And in Europe?
In Europe, the largest nuclear power plant is the Kola Nuclear Power Plant in Russia, with a capacity of 4,800 MW. It is followed by the Cattenom Nuclear Power Plant in France (3,960 MW) and the Olkiluoto Nuclear Power Plant in Finland (2,800 MW).

> And in Slovenia?
In Slovenia, the only nuclear power plant is the Krško Nuclear Power Plant, with a capacity of 688 MW.

> Austria?
In Austria, there are no nuclear power plants.

> Ukraine?
In Ukraine, the largest nuclear power plant is the Zaporizhia Nuclear Power Plant, with a capacity of 6,000 MW.

> You just said Kola nuclear poweplant is the largest in Europe. What gives?
The Kola Nuclear Power Plant in Russia is the largest nuclear power plant in Europe in terms of capacity, but it is not located in Europe itself. It is located in the Murmansk region of Russia, which is east of Finland and Norway.

> So, how many reactors does Zaporizhia consist of?
The Zaporizhia Nuclear Power Plant consists of six nuclear reactors.

> Each generating how many megawatts?
Each of the six reactors at the Zaporizhia Nuclear Power Plant generates 1,000 MW of electricity.

> Thank you. You've been splendid.
You're welcome! It was my pleasure to help.

Categories: AI

Wednesday, 07 December 2022 01:52:40 (Central Europe Standard Time, UTC+01:00)

Comments

Matevž Gačnik's Weblog

Technology Philanthropy