Thursday, July 13, 2017

Dealing with library version mismatch

 Note : this article was initially redacted as an answer to David Jud's comment, but it became long enough to be worth converting into a full blog entry.
In previous article, I attempted to introduce a few challenges related to designing an extensible API.

In this one, I'll cover an associated but more specific topic, on how to handle a library version mismatch.

Version mismatch is a more acute problem in a DLL scenario. In a static linking scenario, the programmer has several advantages :
  • Compiler will catch errors (if a type, or a prototype, has changed for example). This gives time to fix these errors. Of course, the application maintainer will prefer that a library update doesn't introduce any change in existing code, but worst case is, most errors should be trapped before shipping the product.
  • Compiler will automatically adapt ABI changes : if an updated type is larger/smaller than previous version, it will be automatically converted throughout the code base. Same thing happens in case of enum changes : adaptation to new enum values is automatically applied by compiler.
  • Library is available during compilation, which means programmer has a local copy that he can update (or not) according to its requirements.

Well, this last property is not always true : in larger organisations, library might belong to a "validated" pool, which cannot be easily adapted for a specific project. In which case, the user program will either have to host its own local copy, or adapt to the one selected by its organisation.

But you get the idea : problematic version mismatches are likely to be trapped or automatically fixed by the compiler, and therefore should be corrected before shipping a binary. Of course, the less changes, the better. Program maintainers will appreciate a library update as transparent as possible.

For a dynamic library though, the topic is a lot harder.
To begin with, user program typically does not have direct control over the library version deployed on target system. So it could be anything. The library could be more recent, or older, than expected during program development.

Now these two types of mismatches are quite different, and trigger different solutions :

Case 1 - library version is higher than expected

This one can, and should, be solved by the library itself.

It's relatively "easy" : never stop supporting older entry points.
This is of course easier said than done, but to be fair, it's first and foremost a question of respecting a policy, and therefore is not out of reach.
Zstandard tries to achieve that by guaranteeing that any prototype reaching "stable" status will be there "forever". For example, ZSTD_getDecompressedSize(), which has been recently superceded by ZSTD_getFrameContentSize(), will nonetheless remain an accessible entry point in future releases, because it's labelled "stable".

A more subtle applicable problem is ABI preservation, in particular structure size and alignment.
Suppose, for example, that version v1.1 defines a structure of size 40 bytes.
But v1.2 add some new capabilities, and now structure has a size of 64 bytes.
All previous fields from v1.1 are still there, at their expected place, but there are now more fields.

The user program, expecting v1.1, would allocate the 40-bytes version, and pass that as an argument to a function expecting a 64-bytes object. You can guess what will follow.

This could be "manually" worked around by inserting a "version" field and dynamically interpreting the object with the appropriate parser. But such manipulation is a recipe for complexity and errors.
That's why structures are pretty dangerous. For best safety, structure definition must remain identical "forever", like the approved "stable" prototypes.

In order to avoid such issue, it's recommended to use incomplete types. This will force the creation of underlying structure through a process entirely controlled by current library, whatever its version, thus avoiding any kind of size/alignment mismatch.

When above conditions are correctly met, the library is "safe" to use by applications expecting an older version : all entry points are still there, behaving as expected.

Whenever this condition cannot be respected anymore, an accepted work-around is to increase the Major digit of the version, indicating a breaking change.


Case 2 - library version is lower than expected

This one is more problematic.
Basically, responsibility is now on the application side. It's up to the application to detect the mismatch and act accordingly.

In David Jud's comment, he describes a pretty simple solution : if the library is not at the expected version, the application just stops there.
Indeed, that's one way to safely handle the matter.

But it's not always desirable. An application can have multiple library dependencies, and not all of them might be critical.
For example, maybe the user program access several libraries offering similar services (encryption for example). If one of them is not at the expected version, and cannot be made to work, it's not always a good reason to terminate the program : maybe there are already plenty of capabilities available without this specific library, and the program can run, just with less options.

Even inside a critical library dependency, some new functionality might be optional, or there might be several ways to get one job done.
Dealing with this case requires writing some "version dependent" code.
This is not an uncommon situation by the way. Gracefully handling potential version mismatches is one thing highly deployed programs tend to do well.

Here is how it can be made to work : presuming the user application wants access to a prototype which is only available in version v1.5+, it first tests the version number. If condition matches, then program can invoke target prototype as expected. If not, a backup scenario is triggered, be it an error, or a different way to get the same job done.

Problem is, this test must be done statically.
For example, in Zstandard, it's possible to ask for library version at runtime, using ZSTD_versionNumber(). But unfortunately, it's already too late.
Any invocation of a new function, such as ZSTD_getFrameContentSize() which only exists since v1.3.0, will trigger an error at link time, even if the invocation itself is protected by a prior check with ZSTD_versionNumber() .

What's required is to selectively remove any reference to such prototype from compilation and linking stages, which means this code cannot exist. It can be excluded through pre-processor.
So the correct method is to use a macro definition, in this case, ZSTD_VERSION_NUMBER

Example :
#if ZSTD_VERSION_NUMBER >= 10300
size = ZSTD_getFrameContentSize(src, srcSize);
#else
size = ZSTD_getDecompressedSize(src, srcSize);
/* here, 0-size answer can be mistaken for "error", so add some code to mitigate the risk */
#endif

That works, but requires to compile binary with the correct version of zstd.h header file.
When the program is compiled on target system, it's a reasonable expectation : if libzstd is present, zstd.h is also supposed to be accessible. And it's reasonable to expect them to be synchronised. There can be some corner case scenarios where this does not work, but let's say that in general, it's acceptable.

The detection can also be done through a ./configure script, in order to avoid an #include error during compilation, should the relevant header.h be not even present on target system, as sometimes the library is effectively optional to the program.

But compilation from server side is a different topic. While this is highly perilous to pre-compile a binary using dynamic libraries and then deploy it, this is nonetheless the method selected by many repositories, such as aptitude, in order to save deployment time. In which case, the binary is compiled for "system-provided libraries", which minimum version is known, and repository can solve dependencies. Hence, by construction, the case "library has a lower version than expected" is not supposed to happen. Case closed.

So, as we can see, the situation is solved either by local compilation and clever usage of preprocessing statements, or by dependency solving through repository rules.

Another possibility exists, and is, basically, the one proposed in ZSTD_CCtx_setParameter() API : the parameter to set is selected through an enum value, and if it doesn't exist, because the local library version is too old to support it, the return value signals an error.

Using safely this API feels a lot like the previous example, except that now, it becomes possible to check library version at runtime :

if (ZSTD_versionNumber() >= 10500) {
   return ZSTD_CCtx_setParameter(cctx, ZSTD_p_someNewParam, value);
} else {
   return ERROR(capability_not_present);
}

This time, there is no need to be in sync with the correct header.h version. As the version number comes directly at runtime from the library itself, it's necessarily correct.

Note however that ZSTD_CCtx_setParameter() only covers the topic of "new parameters". It cannot cover the topic of "new prototypes", which still requires using exclusion through macro detection.

So, which approach is best ?

Well, that's the good question to ask. There's a reason the new advanced API is currently in "experimental" mode : it needs to be field tested, to experience its strengths and weaknesses. There are pros and cons to both methods.
And now, the matter is to select the better one...

5 comments:

  1. Citing your answer for old thread: "I don't think it's a good idea to support both (except maybe for a limited experimentation period). To ensure long-term maintenance, it will have to coalesce onto one implementation."

    I hope that later ZSTD will have thousands of users, i.e. developers using this library. This makes situation very different to CLS/CELS, which have only a handful of users.

    So, improved error detection of individual functions should be really helpful to your users and I think that you should implement it.

    As for generic API, it should depend on real needs. Perhaps provide GitHub Issue for it, so anyone who needs it, can vote up and write why. I think that such API may be useful for generic code, and may simplify other language bindings - in particular improve their forward compatibility, since binding for 1.10.0 will still work for 1.11.0, requiring only a few extra constants defined.

    I don't see any unusual problems for maintenance if you will end up supporting both.

    ReplyDelete
  2. A few more random notes:

    Most of developers overall and hence most your users are probably using Windows. On Windows, it's usual to perform static linking, so in most of cases ZSTD will be static-linked. Moreover, these users prefer simplicity and reliability to advanced features we discussing here.

    When i use DLLs, i just perform GetProcAddress() which allows to dynamically check whether required function is present in DLL. In this case, application can accommodate earlier library release. But the automatic-linking style you assuming is probably much more widespread.

    Using library version to decide which parameters are available, will be broken once alternative implementations will arrive. In this case, generiс API may be used instead (as alternative to providing preprocessor constants corresponding to each parameter). Moreover, other languages such as Java, don't have access to C preprocessor, so each binding will either need to provide its own way to check available features, or you should provide a generic function-based way to do that.

    Bottom line: most of your audience will be happy with individual functions, although can live with generic ones. There are lot of unusual cases when generic APIs are preferable, but we may ignore them since they are pretty rare. But the case of automatic linking with DLL you mentioned, is really important.

    This means that generic API is required to allow DLL users to automatically link to various versions of library. Without generic API, DLL users will have to manually link to functions they need via GetProcAddress() or its equivalents.

    At this point, I think that you should provide both APIs, since you have important audience for both ones. But if you need to choose, i vote for generic API - this slightly beats single-version users, but anyway they are better to check return code of each set* call.

    ReplyDelete
  3. > Most of developers overall and hence most your users are probably using Windows.

    This used to be true a decade ago, but these, I'm pretty sure that most of my users are working on Unix related platforms.
    Note that my users are not typical GUI end-users, but rather programmers, and system administrators.

    My understanding from feedbacks : individual functions have an advantage when it comes to detect errors/mismatch at compile time.
    generic function has an advantage when it comes to link to an arbitrary library version, although version mismatch is not magic and still require runtime checks.

    I'm likely to propose both interfaces in the `dev` branch soon, to continue the test.
    An important point will be to determine which one feels easier to use.

    ReplyDelete
  4. Thinking about it a bit more, I now see the point of using the 'generic' api for advanced scenarios where a program wants to support multiple versions of the library. Very good explanation of the issues (I am not a C programmer).

    About supporting both, is my understanding correct that most programs will use either one (the specific) or the other (the generic) api? Mixing the apis seems to have no real use case, as soon as you call one specific api you risk a linker error.

    If I am a developer in a team that uses the generic API, it will always use it as a matter of policy, and I want to have a way to disable the specific api (i.e. by putting it behind an #ifndef) such that I can be sure that I (or my teammates :-P) never call one of the specific apis by mistake. Or maybe even put the two APIs into different headers?

    ReplyDelete
    Replies
    1. These are pretty good points David,
      maybe a #define could help to disable declaration of one API

      Delete