Memory Safety and Object Pascal
The U.S. government recently made an appeal to the software development industry, to stop using C and C++, and to favor “Memory Safe” languages instead. Object Pascal is recognized by the NSA as a memory safe language, however, there does seem to be some misunderstanding in the community about what makes a language memory safe to begin with. In this article, I’d like to discuss what I believe this all means, and where Delphi / Object Pascal fits into this picture.
The Request
Many popular tech blogs and news sites have shared this information, which I first heard about via Toms Hardware.
https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages
This is essentially an appeal to the development community to move away from C and C++ as programming languages due to memory safety concerns, leading to vulnerabilities. The NSA has recognized Delphi and Object Pascal as a memory safe language for many years. So what exactly is the problem with C and C++, and why is Pascal considered safe?
Well, of course I don’t have any insider access to the government conversations, however, I believe I understand the problem. You see, it’s been known about for quite a long time. The language features of C and C++ are quite low level, and allow you to work with memory in ways that can lead to something known as a “buffer overrun”, a particular group of vulnerabilities in software. I’m not certain just how big of a concern buffer overrun vulnerabilities are in terms of the overall number of vulnerabilities that might be found in software. I’d guess that they don’t make up nearly the large share of concerns, however, they are a concern none-the-less, and they’re a really easy target for bad-actors.
The buffer overrun.
So what is a buffer overrun? Well, most (if not all) modern processors support a feature called the “stack.”
The stack is a piece of memory reserved for storing relatively short lived, and generally small pieces of data that are used by a program as it runs. When a program calls a function or method, the parameters to the call are put into this stack memory, the function makes use of them, and it places any results that it returns into the stack memory for the calling code to make use of. This might include small buffers, such as strings of input taken from the user for instance.
Critically, one piece of information that is stored into the stack memory is the “instruction pointer.”
The instruction pointer is a CPU register which keeps track of which instruction is being executed at any given time as a program runs. When a program calls a function, along with the parameters to that function, it places onto the stack, a copy of the instruction pointer. When the function is complete, the CPU pulls the instruction pointer back off of the stack and uses it to jump back to the next instruction after the call was made. This is essentially how a program remembers what it was doing before it called a function, and is therefore able to resume what it was doing when the function returns.
If someone were able to alter the content of the stack while a program is running, they could alter a stored instruction pointer. When a function returns, if its instruction pointer has been altered to reflect some other memory location, the CPU will happily resume executing code at that new location. Essentially, new code can be injected into a running program. Unfortunately, the stack is not particularly well guarded with any programming language, and it is fairly easy for programmers to leave an “open door” to this kind of attack.
As an example.
I’ve modified several Original Microsoft XBOX games consoles to use them for various tasks over the years. Microsoft went to considerable lengths in order to secure the XBOX against copyright theft. On a hardware level, the hard disk was locked to prevent side-loading of malicious code, and the operating system designed to only allow the running of “digitally signed” code, for which only Microsoft and their game development partners would have access to the key. Despite their efforts, some clever engineers were able to hack into the XBOX and make it possible to run any software, signed or otherwise, and they were able to do it using a buffer overrun attack. Essentially they found that a file saved to USB storage for some games, was loaded into a stack-based buffer, and by simply increasing the size of that file, they were able to inject their own code, which would then open the door for replacing the operating system of the console. With the operating system replaced, there was no longer a check for digitally signed code, you could run any software you’d like on the machine.
Buffer Overrun vulnerabilities however, go far beyond some copyright theft. Being able to run custom code inside of any software, means potentially being able to claim control of the machine that it runs on. Be it the computer of a non-savvy home user, or control of a web server, or a computer involved in infrastructure, the idea of a machine being “owned” by a third party bad actor is always a concern. It’s understandable that the U.S. government would have an interest in preventing this from a cyber security stand point.
Are C and C++ really all that bad?!
Actually no, but developers can be.
For instance, the popular GNU GCC compiler has a command line switch that can be used to enable protection of the stack. The way it works is to force the compiler to implement not one, but two regions of stack memory. One of those regions is used for execution flow information (instruction pointers), and the other region is used for parameter passing or buffer allocations. Using this feature, it’s not possible to trigger a buffer overrun situation, because the end user of the application has no direct route to the call stack. The problem is, this feature is (*almost*) never used! In our day to day development, its often easy to overlook details, and it seems that this command line switch is either not very well known about, or simply very frequently overlooked.
Frankly, a large step towards making C and C++ memory safe would be to implement this feature as the default state in all compilers, however, it is currently not the default. Now, Object Pascal doesn’t implement this double stack feature as a default either, so why is it considered safer than C and C++?
Well, if the stack protection option were the only concern, we could fix the problem very easily. Unfortunately, stack protection is not the only concern. C and subsequently C++, are very flexible but also quite low level languages. They do not have many of the syntax features which help to avoid memory safety issues. For instance, in these languages a pointer is a pointer. There isn’t really any concept of a “typed pointer” (outside of class de-referencing), as there is in Pascal. While there are ways to perform bounds-checking in C and C++, they aren’t syntax features, and are also often overlooked. It’s just too easy to do bad things with pointers.
Object Pascal is different
Even before pascal became object oriented, the language had typed pointers. Niklaus Wirth himself (inventor of Pascal) had a distain for using pointers, and designed the Pascal language specifically to avoid the direct use of raw pointers. Typed pointers are better than raw pointers for memory safety because their “type” indicates the size of the item that they point to in memory. This provides the compiler with more information, allowing the compiler to prevent writing beyond the end of the data type for example, and thus protecting that memory from accidental (or malicious) overwriting.
Pascal’s features don’t end there, as having both fixed and dynamic array data types mean again, that the compiler always knows the size of the item in memory that it’s working with. The compiler can prevent accessing an array element outside the bounds of an array at compile time for statically sized arrays, and can inject code to prevent the same from happening for dynamic arrays at runtime. It’s quite a common practice in C or C++ to iterate over an array using a pointer to the array elements, but because of the stronger language features of Pascal, this is not a common practice. In fact, you have to go a little out of your way in pascal to convince the compiler to let you do this.
It is not impossible to work with raw pointers in pascal, or to overwrite memory that shouldn’t be overwritten, but it’s more difficult to do. Out of the box, pascal is very concerned with data-types, the sizes of buffer allocations, and it generally obscures direct manipulation of items on the stack. Best of all, these are the considerations of a pascal compiler by default, simply because they are part of the syntax. The “stack protection” command line option of the GCC compiler is available to pascal compilers too, but should you forget to use it, you’re still in far better shape from a memory protection stand-point in pascal than you are in C or C++.
Fun side note: I once read a blog post by a former Borland employee, which unfortunately I can’t cite because it was many years ago and I don’t even recall the name of the author (please let me know, so that I can solidify this claim). The blog post claimed that Borland had released Turbo Pascal to compete with Turbo C++ with memory safety as one of its leading concerns. That is to say that, this problem of memory safety in C and C++ was well known back in the mid 80’s, and the leading compiler vendor of the time, had released a pascal compiler with memory safety as a driving concern. The successor to that compiler today, is Delphi, the most popular commercial object pascal compiler.
Would Pascal be a good alternative to C/C++ today?
Yes!
If you ask a die hard C or C++ programmer, of course they’re going to feel like pascal is not an alternative. The same is true the other way around, many Delphi developers have never ventured into the C/C++ world, and show no interest. It’s true that while many developers know many languages, there are those that are devoted to a particular tool. I’m not here to try to convince you to switch languages, but rather, to look at the feature sets of these languages and ask, do they compete?
The two most well known and used pascal compilers today are Delphi and FreePascal. Delphi being the leading commercial pascal compiler, and FreePascal (Lazarus) being its open source alternative. While these two compilers do not have precise feature parity, they are very close. The widest differences between them are in fact more to do with their IDE’s that the compilers themselves. The main point to consider between them however, is that while they are general purpose compilers that can be put to just about any development task, their primary uses are in application development.
In terms of application development, there are few products in the market place which are actually at parity with Delphi and Lazarus. The current king in the application development space is, by and large, C# with it’s .NET framework and Visual Studio IDE. What’s interesting about this is that Delphi was once competing on the favorable side with Visual C++ and was certainly ahead of Visual Basic. It was C# if nothing else which stole market share from Delphi in the late 90’s or early 2000’s, yet Delphi has remained a viable alternative all along. In fact, if I put on my conspiracy hat for a moment, given that Anders Hejlsberg (cheif architect of .NET) was the same man responsible for creating Delphi in the first place, could it be that Microsoft chose him precisely because he was the architect of their strong competitor Borland? It’s not too far fetched.
If Delphi is to be compared, in terms of features and cross platform support, to any other product, then it’s Visual Studio, yet without the difficulties of being sand-boxed into a managed environment with garbage collection etc. Both of the mentioned Object Pascal compilers remain true native compilers, generating binary code for the target CPU, not byte-code, not interpreted, but true native Intel or Arm binary bits and bytes. If you can do it in C or C++, then you can do the same with one of these modern pascal compilers.
Frankly, if you’re a C or C++ developer willing to branch out, then you can’t go too wrong with exploring Pascal. Syntactically pascal is different, you’d certainly find C# more familiar, but if you chose that route you’ll find that the syntax similarities are only skin deep. C and C++ developers are binary native developers, those that understand the inner workings of the machine they’re targeting, that understand interoperability with third party code, and that crave the lower level control to just get stuff done. When such a developer finds themselves in the C# world, with it’s hand-holding, and abstraction from the low level, they’re faced with something quite unfamiliar. Those willing to switch to pascal on the other hand, will find that while the syntax and language features are less familiar, once over those hurdles, will be faced with a compiler quite comparable to the C and C++ compilers that they are used to.
Is it commercially viable?
This is something of a sore spot for Delphi. The answer is ultimately Yes, but the water is a little murkier than such a straight answer. The good news is that Delphi is backed by a company that has been in business for decades, and is constantly maintained and updated. In terms of licensing, it’s very comparable to a Visual Studio license at the same feature level. Delphi also has a community edition which may be used for free, even for commercial purposes, but only up to a revenue cap of only $5k (which compares incredibly poorly to the Visual Studio community cap which I think is closer to $500k). Unfortunately, while the spirit of this license is honest, Embarcadero (the manufacturer of Delphi) can be a little aggressive in chasing down license violations. It’s not too surprising, given that while Embarcadero is a decent sized enterprise, backed by its parent company Idera, they’re not Microsoft in scale. Coupled with a history of Delphi being pirated quite heavily, they have good reason to be defensive of their investment.
There’s a tension in the Delphi market too. Delphi is not nearly as popular as it once was, but there are many companies still maintaining Delphi products that have been written over the decades. Delphi’s backwards compatability is excellent, going even beyond version 1 back into the Turbo Pascal predecessor. There is a healthy market place of applications as old as 40 years, still maintained in Delphi today. With Delphi being lower in popularity than it once was, organizations maintaining older Delphi products can find it difficult to find developers that are skilled in the product. On the opposite side of this tension is the Delphi jobs market. As much as those organizations would like to find developers, when they do, they keep them. Delphi developers don’t like to move around much, when they find a position they’ll sit in it as their beards grow ever less colorful, and so those that do have the skills and are seeking a position, have to compete for open seats. The perceived cost of entry into Delphi, doesn’t make it attractive for new developers seeking the skills to get hired.
Since I was let go from Embarcadero, due to down sizing, I’ve bounced around a handful of companies as a Delphi developer. Various forces, including the covid outbreak, have made my employment a bit of a bumpy ride. While this might sound a little unfortunate, I’ve never found myself out of work for more than a few weeks. Why? Well, I’m fortunate enough to already be highly experienced with Delphi. I’m not exactly famous, but somewhat known in the community, and I hold an MVP title from Embarcadero. My experience and to some degree reputation have tickled open some doors when seeking work, however, I do now work almost entirely in software maintenance, finding green field opportunities is not easy.
Ultimately however, markets can swing at a dramatic pace, and like all other existing Delphi developers out there, I constantly hope for a swing back in Delphi’s popularity. You see, corporate world aside, the technology does speak for it’s self, and Delphi speaks loudly and confidently. Covering everything from native mobile applications, to desktop, to back-end server, and (with minor caveats) to web applications too. Delphi is what the new kids call “full stack.” It’s more than that. In a world in which new compilers are specialized to a particular task, Delphi remains a general purpose compiler with a wide spectrum of capabilities, and the maturity to handle them all with ease.
I’ve skirted around one consideration here. Free Pascal with it’s Lazarus IDE, is it commercially viable? Well, this is somewhat murky too. You see, legally, according to the developers of these products, the answer is yes. That said, both the compiler and the IDE are licensed under the copy-left GPL license. Since day one, there have been exceptions to the GPL, enabling commercial development using these tools, and I’m sure the developers of them would be happy to see them in commercial use. Some companies have already used them for commercial products too.
Unfortunately those three letters “G”, “P”, and “L” breed fear into corporations. Look, GPL fans you might not want to admit it, but the GPL was conceived as an anti-corporate license, Richard Stallman said as much himself. If a corporation were to use the Free Pascal compiler or Lazarus IDE for commercial purposes, they’d have to take great care to ensure that they are complying with the L-GPL exception terms, which can be more difficult than you might think. For instance, if one of the component classes of the library were insufficient and needed to be modified, could they modify it safely? The L-GPL terms activate upon distribution, but the GPL, that activates at link-time. If you modify GPL source code, it becomes GPL, and when you link to it, your product becomes GPL. It’s not that this can’t be avoided, but rather, for a corporate entity it’s a considerable risk. Coupled with the fact that the Free Pascal compiler is not offered with corporate backing or support, it’s not in a strong position for commercial use.
Conclusion
Object Pascal is very much deserving of its recognition as a memory safe language. It’s syntax features lend themselves to memory safe programming as a default, rather than as an option. You can, if you chose to, work around the memory safety should you need to, but you actively have to tell the compiler to do so. You can opt to work with “concrete classes” in which the memory model is essentially ownership-chain driven. In this case, you must allocate and manually free objects, which can lead to access violation errors, but ultimately an access violation is not a vulnerability. You can also design code using its ARC (automatic reference counted) interfaces for allocation safety too. Essentially, you can still shoot your own foot if you chose to, but Object Pascal is going to make that more difficult for you to do accidentally.
I like to make the cliche comparison between compilers and wood-working tools, given that compilers are essentially just tooling. In this analogy, C and C++ are a table saw without a back-lash safety guard. A skilled worker can use these tools without losing their fingers, but there’s nothing in the tool trying to keep those digits safe. Pascal is the same table saw with the safety guard, you’d have to interfere with or remove that guard if you want to put your fingers at risk. Both tools can make the same cuts, but with different approaches to safety.
Commercially, Delphi is viable, but it is hungry for more users. If your engineers haven’t simply laughed off the U.S. government dipping their fingers into the software development world, and you would like to consider complying, you could do far worse than to give Delphi a shot. At this moment in time, with so much of the worlds software infrastructure based on C or C++ code, it does seem like the “urge” from government is going against a lot – but by the flip side of the same coin, how long will it be before government software contracts are mandated not to use C and C++? There are other alternative compilers which promise memory safety, but do they have the maturity that Delphi does? Do they have the same corporate backing? Do they have the application development pedigree? This is an opportunity for companies to revisit Delphi and in doing so, breed new opportunity into the technology.
As always,
Thanks for Reading!