Building extensions for Python 3.5 Part Two

This was never meant to become a series, but unfortunately after posting what is now part one, we found a serious regression that led us to release Python 3.5.0rc4.

If you haven’t already, I recommend going back and reading part one up until the point where it says to come back here. That will fill you in on the background.

This is the point where I ought to make the “seriously, I’ll wait” joke, knowing full well that nobody is going to read the old post. Instead, I’m going to write one sentence, and if you understand the words in it, you have enough background to continue reading.

MSVC 14.0 (and later) and the UCRT give us independence from compiler versions, provided you build with /MT and force ucrtbase.dll to be dynamically linked.

If you scratched your head about any part of that, go and read part one. You’ll thank me in about two seconds.

The /MT Problem

Previously, I discussed some of the problems that arise when compiling with the /MT option. Mostly, because the option needs to be specified throughout all your code, it was going to cause issues with static libraries that had already been built.

Well, we found one more problem.


I want to stop for a second to thank Christoph Golhke for all his help through the 3.5.0 RCs.

For those who don’t know, Christoph maintains an epic collection of wheels for Windows. Every time you have trouble installing a package with pip, it’s worth visiting his site to see if he has a wheel available for it. After downloading the wheel, you can pass its path to pip install and you’ll get your package.

As we kept making changes to the build process, Christoph kept updating his own build steps and testing with literally hundreds of packages. His feedback has directly led to these changes to Python 3.5 that will make it much easier for everyone to have solid, future-proof builds of Python and extension modules.


Ready for the gory technical details of this new issue? Here we go.

Previously, we were statically linking functions from vcruntime140.dll into each extension module built for Python 3.5. This included, among other things, the DLL initialisation routines.

On the positive side, each extension module is now completely isolated from any others with respect to initialization, locale, debugging handlers, and other state.

On the negative side, it turns out initialization is a limited resource.

The CRT has a number of features that are thread-safe, such as errno. Each thread has its own errno, which means you do not need to perform locking around every operation that may set or use it.

To implement this per-thread state, the CRT uses fiber-local storage. On Windows, fibers are a form of cooperative multithreading that work within threads (one process may contain many threads, one thread may contain many fibers), so the CRT uses fiber-local storage to ensure the finest-grain handling. If it used thread-local storage, different fibers would see the same values, and if it used a normal global, all threads would see the same value.

Fiber-local storage slots are allocated using FlsAlloc(). If you read that doc carefully, you’ll see that a potential return value (error) is FLS_OUT_OF_INDEXES, which means you have exhausted the current process’s supply of fiber-local storage.

The CRT doesn’t like it if you’ve run out of fiber-local storage, because that means a lot of its stateful functionality will be broken. So it aborts.

How many slots do we get? It isn’t documented (so it could change at any time), but here’s how we can test it:


>>> import ctypes
>>> FlsAlloc = ctypes.windll.kernel32.FlsAlloc
>>> list(iter(lambda: FlsAlloc(None), -1))
[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
>>> len(_)
121

On my machine, I can call FlsAlloc 121 times after starting the interpreter before it runs out of indices.

When each extension module has its own isolated initialization routine, each one will call FlsAlloc. Which means once I’ve loaded 121 extension modules, the 122nd will fail.

That sucks.

Going back to /MD

The only reasonable fix for this is to switch back to building both Python and extensions using /MD. So we’ve done that.

That means that vcruntime140.dll is now a runtime dependency and must be included with Python 3.5. So we’ve done that.

That means that every extension module for Python 3.5 has to depend on vcruntime140.dll and therefore must always be build with MSVC 14.0.

Well, I don’t think so.

Initialization is no longer isolated. You can load as many extension modules as you like and they’ll all use the same fiber-local storage index because vcruntime140.dll is already initialized.

But what happens when someone comes along with MSVC 15.0 (note: not a real version number) and builds an extension that depends on vcruntime150.dll? When they publish a wheel of that extension and someone installs in on Python 3.5.0? That version of Python only includes vcruntime140.dll, so there will be an unresolved dependency and the extension will fail to load.

We’ve also made a change to fix this. The hint was in that last sentence: the extension has an unresolved dependency.

Extension Dependencies

It is no longer (and technically never was) Python’s responsibility to provide dependencies required by extension modules. If you release an extension module, you are responsible for including any dependencies, or instructing your users on how to get them.

But simultaneously, we really don’t want to break python setup.py bdist_wheel upload (despite upload being insecure and you should be using twine or another uploader). If someone has MSVC 15.0 and builds a wheel using that version, somehow their users need to get vcruntime150.dll onto their machines.

So we fixed distutils.

When you build an extension using distutils (which is how most extensions are built, even if you think you’re using another package), we know which compiler version is being used. This means we know which version of the CRT you are using and which version of vcruntime###.dll your extension depends on.

We also know which versions of vcruntime###.dll shipped with the target version of Python. Currently, this is a set that contains only vcruntime140.dll (it’s also an internal implementation detail, so don’t start depending on it), and for the lifetime of Python 3.5 it will only contain that value.

When you build an extension with MSVC, we find the redistributable vcruntime###.dll in your compiler install and if it is not in the set of included files, it is copied into your build.

Currently it’s hard to see this in action, but once we have a newer MSVC to play with you’ll be able to see it. Whenever you build an extension module, it will get the new vcruntime alongside it.

Because of how DLL loading works, if that version of the DLL has not been loaded yet then the one adjacent to the extension module will be used. If it has been loaded, the one that is currently in memory will be used.

So really, we only need to include it once. The trick is making sure that it is loaded first. Ultimately, the only reliable way to do this is to include it everywhere.

Python 3.5 is always going to include vcruntime140.dll at this stage (though it may move from the top-level directory into DLLs at some point), so extensions built with MSVC 14.0 will always have it available.

Extensions built with a newer version of the compiler will include their own dependencies. As they should.

Perhaps Python 3.6 will be built with a newer version of the compiler. We can still include vcruntime140.dll, so that extensions can be built with MSVC 14.0, as well as including the newer version. We simply add the new one to the list in distutils and builds stop including it.

In this way, we can ensure that, at least for simple extensions (the majority), the easiest path will remain compiler-version independent.

Overriding the build process

One of the neat things about making the fix in the build process is that there are ways to override it. We can also make changes and improvements in bugfix releases if needed. It’s also fairly easy for developers to patch their own system if they need to customize things further. However, a few simple customisations are available without having to touch any code.

There are three stages in the build process with respect to copying vcruntime (henceforth, “the DLL”).

The first stage is locating the redistributable DLL. For regular installs of MSVC, these are in a known location, so once we’ve found vcvarsall.bat we find the DLL relative to that path.

The second stage is updating build options. If we didn’t find the redistributable DLL, we will statically link it exactly as described previously. This is not recommended, but it is still allowed and supported (and there is an opt-in described below).

The third stage is copying the DLL to your build directory.

Here are a few tricks you can use to customize your build. These are current as of Python 3.5.0rc4, but could change.

Use a different DLL

To specify your own path to the redistributable DLL, use a complete VS environment (such as the developer command prompt), set the environment variable DISTUTILS_USE_SDK=1, and then set environment variable PY_VCRUNTIME_REDIST to the full path of the DLL to use.


C:\package> "C:\Program Files\Microsoft Visual Studio 14.0\VC\vcvarsall.bat"

C:\package> set DISTUTILS_USE_SDK=1

C:\package> set PY_VCRUNTIME_REDIST=C:\Path\To\My\vcruntime140.dll

C:\package> python setup.py bdist_wheel

Note that this doesn’t override any of the following steps, so if your version of Python already includes vcruntime140.dll, then it won’t be copied. However, there’s no reason you can’t specify a different name here, and then it will be copied (but then you have to deal with a different name file, and might be missing actual dependencies… not really a great idea, but you can do it).

Always statically link

The build options are updated to statically link the DLL if PY_VCRUNTIME_REDIST is empty.

If you set DISTUTILS_USE_SDK but not PY_VCRUNTIME_REDIST, you will get statically linked vcruntime140.dll. This is probably the biggest surprise from the change.

Always dynamically link, but don’t copy

Probably the most useful customisation, you can choose to dynamically link the DLL but not copy it to your build output.

Since we check the name already, any file named vcruntime140.dll will never be copied anyway. However, with a future compiler you may already know that you are distributing the correct files and do not need distutils to help.

First, you apply the opposite of the previous customization: build options are updated to dynamically link the DLL if PY_VCRUNTIME_REDIST is not empty.

However, the DLL is only copied to the output directory if os.path.isfile is True.

The tests are deliberately different. It means you can force dynamic linking without the copy by setting PY_VCRUNTIME_REDIST to anything other than a valid file:


C:\package> "C:\Program Files\Microsoft Visual Studio 14.0\VC\vcvarsall.bat"

C:\package> set DISTUTILS_USE_SDK=1

C:\package> set PY_VCRUNTIME_REDIST=No thanks

C:\package> python setup.py bdist_wheel

No customisation

And of course, if you don’t need to customise anything, you can just call setup.py directly and distutils will automatically use the latest available compiler.

Summary

It’s been a bumpy ride, especially over the last week, but we’re on track to release Python 3.5.0 in a good state.

Using MSVC 14.0 and the UCRT, we have independence from the compiler version.

When a new version of MSVC is available, extensions built for Python versions that may not include the default dependencies will bundle them in their own package. (There are still some uninstallation concerns here to resolve, but we have time for those.)

In 10 years time it should still be possible to build extensions that work with Python 3.5.0 using the latest tools available at that time. Everyone who builds for Python 2.7 today should be excited by that prospect.

Thank you all again for everyone who has contributed testing, feedback, fixes and suggestions. We hope you will love Python 3.5.

8 thoughts on “Building extensions for Python 3.5 Part Two”

  1. The post makes it clear what the build problem was and why the new system makes that work, but now you’ve got multiple copies of the runtime in the process. A few words as to why you don’t get conflicts about allocation of global resources when main() starts up would be welcome. I’m guessing from rereading part 1 that the answer is
    “””
    The basic strategy hasn’t changed. Each vcruntimeXXX.dll will still delegate management of those resources to ucrtbase.dll, which will still be dynamically linked (via ucrt.lib).
    “””

    1. That’s basically it. ucrtbase.dll does the same fiber-local storage allocation on load, but we’re only ever going to have one of those and that’s where all the important state is kept (my examples above – errno and locale – are mostly in ucrtbase.dll). vcruntime’s state is mostly about error handling and is unlikely to conflict with other extensions. Having multiple different versions of the CRT are okay provided you don’t want state to be shared, but one of the ways that is achieved doesn’t scale to having many (120+) different versions of the CRT.

      Windows’s handling of DLLs is very similar to Python’s imports. If something has been loaded (imported) already, you get a reference to the existing module (module) and initialization does not reinitialize anything. It’s also under a loader lock (importer lock) to avoid conflicts due to concurrency.

      The static linking approach makes each static copy of vcruntime look like a separate DLL, so they are all loaded and initialized independently. They weren’t sharing state – and now they are (though AFAICT we don’t rely on or care about it within most Python code) – and the UCRT part was already shared.

      So I’m not expecting new conflicts as a result of this – I’m far more worried about files needing to resolve vcruntime and not being able to (when for rc3 and earlier they didn’t need to. FWIW, no problems yet 🙂 )

  2. Why is the C Runtime Library conflating “I’m initializing a freshly-loaded DLL” with “I’m initializing a thread”? Surely these are orthogonal operations.

    1. They are, but fiber-local storage (or thread-local storage, for that matter) is best allocated before starting any threads. Otherwise you have to synchronise to initialise your synchronisation mechanism 🙂

      For the CRT, it has one opportunity to initialise FLS before those pesky users come in and start running their arbitrary code, and that’s while the DLL is still fresh from the OS’s loader.

  3. I’m running into an issue building an extension for Python (Pillow). Historically, this has been linked statically with libtiff, jpeg, etc. Presently, without setting any options, distutils uses /MD to compile the source code of Pillow, which then presents linker problems (conflicts with LIBCMT, unresolved symbols). I have tried to simply set the DISTUTILS_USE_SDK variable, which changes distutils to using /MT, but now I’ve lost my include paths to the UCRT stuff (specifically, io.h). Is this a known issue, or something I’m doing wrong? It feels very nasty to specify include paths to io.h manually, since there are already 2 different version folders in my include folder.

    1. Setting DISTUTILS_USE_SDK also implies that you need to start from a full environment, such as the Developer Command Prompt for VS 2015. If you’ve set up the environment some other way then they may be missing.

      Alternatively, there could be some issue in resolving the Win10 SDK. VS 2015 split up the UCRT headers from the C++ headers and the Windows SDK headers, so it’s entirely possible that some configurations may hit new problems.

      The /MT setting only applies to the CRT though. If you’re building the libs then you can build them with /MD, and if not then you’re probably risking cross-version incompatibilities. This setting doesn’t affect whether libraries other than the CRT are statically or dynamically linked.

  4. Probably worth noting here for future reference: for GPL-licensed extensions built using a hypothetical future version of visual studio, the automatic runtime copying described above will silently produce wheels that can’t be legally redistributed (because they will contain both GPL code and the proprietary vcruntime150.dll). This isn’t a problem for extensions that can take advantage of vcruntime140.dll, because the gpl has an exception for stuff that’s already distributed as part of the system, and for a python extension that exception extends to everything that’s distributed as part of the standard cpython downloads. But it does mean that in ten years some people could still conceivably be stuck using old versions of visual studio to build extensions targeting 3.5. (Not that it’s clear that anyone will care about 3.5 ten years from now :-).)

    If you wanted to get really fancy I guess bdist_wheel could check the license metadata and warn or require some override before copying in the runtime.

    1. Very good point. I certainly don’t want to embed that sort of knowledge into distutils itself, but it’s probably worth noting in the building for Windows docs that this may be incompatible with certain licences. (That entire doc needs an update though… it’s somewhere on my list of things to do.)

Leave a Reply

Your email address will not be published. Required fields are marked *

Are you human? Click the Grapes...