InfoQ Homepage News Apple's ML Compute Framework Accelerates TensorFlow Training

Apple's ML Compute Framework Accelerates TensorFlow Training


As part of the recent macOS Big Sur release, Apple has included the ML Compute framework. ML Compute provides optimized mathematical libraries to improve training on CPU and GPU on both Intel and M1-based Macs, with up to a 7x improvement in training times using the TensorFlow deep-learning library.

Apple's Machine Learning blog gave a high-level overview of the ML Compute framework. ML Compute improves the performance of compute-graph-based deep-learning libraries such as TensorFlow by optimizing the graph itself and executing its primitives via accelerated libraries such as BNNS for CPU training and Metal Performance Shaders for the GPU. To take full advantage of ML Compute, Apple has provided a version of TensorFlow binaries targeted for the platform. Tests of the optimized TensorFlow library on several popular neural network benchmarks show "dramatically faster" training times compared to the standard code, with an up to 7x improvement for the optimized library on Apple's new M1 hardware.

Most deep-learning systems use a compute-graph-based framework such as TensorFlow or PyTorch. These systems describe a neural network as a series of linear algebra operations on multidimensional numeric arrays, or tensors. These operations can often be sped up markedly by the use of low-level, hardware-specific implementations. Even more performance gains are available by using GPU hardware to perform the operations, as GPUs are designed specifically for large-scale linear algebra.

However, GPU support in the deep-learning frameworks must be coded for a specific hardware platform's acceleration libraries. The two most popular deep-learning frameworks, TensorFlow and PyTorch, support NVIDIA's GPUs for acceleration via the CUDA toolkit. This poses a problem for deep-learning development on Macs. Apple has used Intel hardware for the integrated GPUs in the Mac since 2010, and while macOS does support an external GPU (eGPU), Apple's official documentation only recommends AMD-based hardware. Support for NVIDIA hardware is spotty, with reports of bugs and slow performance after OS upgrades. There are other solutions for deep-learning acceleration on the Mac, including PlaidML, but these have drawbacks such as difficult setup or lack of support for low-level TensorFlow APIs.

Recently Apple released the new M1 "system on a chip," which not only contains a built-in GPU, but also includes a 16-core "Neural Engine" which supports 11 trillion operations per second. Apple claims the Neural Engine will support up to 15x improvement in ML computation. Around the same time, Apple released Big Sur, the latest version of macOS, which contains the ML Compute framework. ML Compute wraps several low-level API calls for performing neural-network operations: BNNS on CPU and Metal Performance Shaders on GPU.

Using a version of the TensorFlow framework compiled to target these APIs, Apple trained several common neural network models and compared training times to those obtained using a standard version of TensorFlow. Experiments were conducted on an Intel-based 2019 Mac Pro with an AMD GPU and on both an Intel-Based 2020 MacBook Pro with Intel GPU and an M1-based 2020 MacBook Pro. Apple did not provide raw numbers, but claims it is "up to 7x faster" on the M1 compared to the Intel MacBook, and the accompanying graphs show speedup of approximately an order of magnitude for some networks on the 2019 Mac Pro.

Image Source:

Users on Hacker News welcomed the announcement, noting the difficulty with workarounds such as PlaidML. On Twitter, machine-learning consultant Stephen Purpura reported good performance with the new M1 hardware, comparing it to NVIDIA's GPUs:

On my models currently in test, it’s like using a 1080 or 1080 TI. Given the benchmarks, it should be a little faster, so I will take a look at what might be causing the slowdown.

Apple's binaries for the accelerated TensorFlow library are available on GitHub. Although Apple has not yet released the source code, the TensorFlow blog mentions plans to integrate Apple's fork into the TensorFlow open-source mainline. The PyTorch team has not announced any plans for support of ML Compute or the M1 chip, although users have suggested this via GitHub issues.

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Yeah but what IS the Neural Engine, really?

    by Bas Groot /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Cool that the "Neural Engine" on the M1 chip can do 11 TFlops, but can somebody, instead of obediently repeating the roaring phrases of Apple's Marketing Department, explain to us, what the Neural Engine actually -is-?

    Well I started googling, and I found this very readable bunch of pages by someone who dug into it:

    In short, a GPU-like processor that is good at 16-bit float math in huge bulk, especially matrix operations. You can only use it through the Apple CoreML library. And Apple shrouds the rest of it all in mystery and secrets, because that's what Apple does.

  • Re: Yeah but what IS the Neural Engine, really?

    by Bas Groot /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I can't edit it anymore but the way I wrote it, it can be read as criticizing the author but that is not what I mean: I refer to the dozens of articles I found, while googling what the Neural Engine is, that all turned out to repeat the marketing except this one.

  • Re: Yeah but what IS the Neural Engine, really?

    by Anthony Alford /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    That's a good point: I could have done more to explain what the neural engine is. My assumption was indeed that it was what you posted (a matrix arithmetic accelerator) but I could have made that explicit.

    Several other manufacturers are including something similar (for example, the article you shared mentions Google's TPU).

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p


Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.