More and more I’ve been seeing Delphi applications which shoe-horn in a binding to python, and this has left a lot of Delphi developers bemused as to why. I believe I can explain why.
What is Python?
If you’re in software development at all, you can’t have missed mention of python over the past decade. It has been the fastest growing programming language, in terms of popularity, in the western world. So what exactly is it?
Well, it’s an object oriented programming language which is interpreted, and which supports some powerful high-level data-types and dynamic typing. Being interpreted, it’s possible to perform partial execution of python programs as they are being developed, much like older development tools such as BASIC or FoxPro. Ask anyone that has used this feature and they’ll tell you that compared to pre-compiled languages, this feature makes rapid prototyping much more, well, rapid. The dynamic typing feature also accelerates prototyping by offering a high degree of code reuse, where code may be written to perform the same operation across multiple data-types. Combine these features with the high level data types and object oriented nature of Python, and you have a language which is very well suited to a high-paced development cycle, capable of tasks from powerful scripting through to high volume data processing. It’s certainly enticing.
I’m not pythons biggest fan.
It’s been a very long time since I used a programming language which is not strongly typed, and while many would cite python as being “readable” code, to my tastes it’s the opposite, or at least the extreme. You see, I’ve become very accustomed to semi-colons and the use of clear logic boundaries using pascals “begin” and “end” or the braces of C styled languages. Where python is readable, it is so in so-much as it is closer to a natural human language than other programming languages, and while I’m certain that it’s fans love this fact about python, I guess I am a little set in my ways. I like to have clear and clean signals about where logic begins and ends, and I don’t really enjoy using computer languages that don’t offer this. Certainly any longer, but I did start out programming in BASIC as a child, so believe me python fans, I do get where you’re coming from. It’s just not to my tastes.
For the longest time, more strict syntax languages based on C/C++ dominated the development landscape. Though users of these languages would not like to admit it, Pascal has also remained a strong rival. You see, back in the early 90’s, Borland was “where it was at” for programming tools. Sure you could pick up some visual basic from Microsoft or an open source C++ compiler for what were then still very experimental *nix based operating systems, but if you wanted a tool to get business programming done, you wanted first Borland’s Turbo C++, Turbo Pascal or in extreme cases Turbo Assembler. Barring the extreme, Turbo C++ had been their dominating language prior to the release of Turbo Pascal which had actually been intended to replace C++. A little later, their successors came along in the form of C++ Builder and Delphi of which Delphi (Pascal) was the strongest in the market place for a solid decade or more. I believe it was the development of C#, another strongly typed, C-style syntax, which really made the biggest dent in the popularity of Delphi along with some turmoil in the business that make the product. Today, C++ Builder and Delphi still sell well for Embarcadero (the current owners of the product lines), but it is only fair to admit that they do not dominate the development world like they once did.
The world of development today doesn’t really have such a dominating product. While some languages are more successful than others, with C/C++/C# and Java taking the high positions, there are dozens of computer languages from a variety of vendors. Programming languages are becoming less general purpose, and more aligned with specific domains for which they are most adept. Python is one such language, and so if we are to explain it’s rise in popularity, we must make one or two assumptions. First we must assume that there is one or more programming domains in which it excels, and then we must determine which domain that is. It will probably not come as a huge surprise, but I believe (as I’m sure many do) that the domain is that of machine learning.
Python in machine learning.
In 2014 Alphabet Inc (parent of Google) acquired a four year old British research and development company named DeepMind, that had been developing artificial intelligence applications. The acquired technology was later included into the Alpha Go project which became famous for defeating the human champion at the game of “Go”. Around that same time, the Google Brain project, founded by Jeff Dean, Greg Corrado and Andrew Ng in 2011, was promoted from a google X project, which went on to release the “TensorFlow” machine learning framework. What both TensorFlow and DeepMind had in common was Python.
** Alpha Go may have migrated away from Python since, however TensorFlow remains predominantly a Python driven system with some C++ Accelerators and an nVidia CUDA back end.
Lets take a look at some important dates for TensorFlow. First released as Apache Licensed open source late in 2015, TensorFlow reached version 1.0.0 in December 2017 and released version 2.0 in September of 2019. Now, take a look back at the graph of python popularity which I used to header this post. Note those three dates and you can’t miss it.
Python recovered from a downward trend late in 2015, began a meteoric rise in late 2017 and another monumental climb in late 2019. Coincidence? I think not. Python is not only the language behind TensorFlow, but with several machine learning frameworks written in Python, it is THE language for machine learning application.
There are probably two strong reasons for Pythons use in machine learning, and one of the two is it’s dynamic typing. When working with large volumes of potentially unclean data, a language which aids in smoothing that data by being flexible to data types, does lend it’s self to the task. However, the real winning feature for Python is not the dynamic typing, but a particular data type, the tensor.
You see, ALL machine learning algorithms today are based on, in one form or another, artificial neural networks. I’m sure that someone will think of an exception to that statement now that I’ve written it down, but make no mistake, I’m talking about machine learning specifically, not the wider “artificial intelligence” field. While there are many variations on the theme, the fact is that machine learning is done by “tuning a general function approximation by adjustment of free parameters”, or to put it another way, artificial neural networks.
I’ve thus far been careful to specify “artificial neural networks” rather than simply saying “neural networks”, a convention I may now drop, but I’ve done so for a reason. You see, the other name for a neural network is a brain. Whether insect, mammal or some other animal, all brains in the natural world are networks of a type of cell called a neuron – which is the most basic processing unit of a brain. The most basic processing unit of a computer however is the transistor. Transistors are able to switch at hundreds of millions of times per second, by comparison, neurons are incredibly slow, firing at perhaps 7Hz (firings per second). Yet real brains, even the most simple of them, are able to perform computational tasks that for many decades were outside the realm of possibility for a computer. The reason is that within animal brains, neurons are wired together in ‘parallel’, meaning that a real brain may be performing many computations simultaneously. Transistors in a computer are wired in strict arrays, and the CPU of a computer runs one computation at a time, albeit much more quickly which allows a computer to ‘appear’ to be performing many calculations at once.
Which leads me to the reason for this diversion into neural networks. Artificial neural networks are very crude simulations of real brains, used in machine learning applications. Essentially, if you want a computer to do something which is computationally complex, so much so that a software developer couldn’t simply write an algorithm to do it, then you need the computer to learn to do it for its self. You need an artificial neural network. That artificial neural network however, being a simulation, is just another piece of software that must be implemented, and the developer of that software faces the problem of simulating a parallel system (a real brain) on a serial system (the CPU core). So how is this achieved?
Well, as I said, a CPU being transistor based, can perform hundreds of millions of operations per second, and so it can perform the same amount of work that a “neuron” might perform, in a fraction of the time. Therefore, a CPU can simulate a large number of neurons in series. Now it’s unfair for me to continue to call an artificial neural network a “simulation” of a real neural network, because the simulation is, as I’ve already stated, very crude. Our artificial neural networks simulate only one, very primitive behavior of real neurons. Because our simulations run as series processes rather than parallel, the simulation becomes even less precise. That is however, not the point. The point is that these artificial neural networks have repeatedly demonstrated their abilities in performing tasks that we would not have the slightest chance of programming algorithmically, and so they solve practical problems in an ‘inspired by real brains’ way.
Real brains may also contain billions of neurons, and so computationally speaking, it would still be an enormous task for a CPU to simulate real brains at large scales, however, a CPU can simulate smaller neural networks to provide useful solutions to problems.
This leads me to one last issue that had to be resolved in order to simulate neural networks, and again, it comes from parallelism. You see, real neurons are connected together in complex ways, in which the inputs to one neuron are the outputs from many other neurons. This introduces a processing problem when simulating neurons in series, because you’d have to perform the work of the neurons in the right order, ensuring that when simulating one neuron, that all of it’s input neurons have been simulated first. If the neurons aren’t simulated in the right order, well then the saying goes “garbage in, garbage out”. Essentially, the system would simply fail to be useful if the data flowing through it were not processed in the right order, and in the same order each time the neural network is used. This is thankfully a fairly simple engineering problem – artificial neural networks are arranged in successive ‘layers’ of simulated neurons. Each layer of neurons is arranged such that it takes the previous layer neurons as their inputs, but they never depend on each other, or subsequent layer neurons. This, more rigid arrangement, allows us to break the work of simulating neurons into steps that can be performed in series to keep our transistors happy, but further distances artificial neural networks from their biological inspiration. It also has the advantage however, of placing our nurons and the connections between them, into arrangements that resemble simple vectors and matrices. This fact allows me to answer the final piece of the python puzzle…
What is a tensor?
Okay, so vectors and matrices are essentially arrays of data, where the vector is a simple, single-dimensional array, and the matrix is a two-dimensional array. Arranging data into vectors and matrices has been a useful tool for mathematicians throughout the ages, and many techniques have been developed for performing mathematical operations on these data-types. For the reasons I described above, artificial neural network layers and the interactions between them, may be simulated as operations on vectors and matrices.
The word “Tensor” actually has definitions in physics and mathematics which exceed the one used in machine learning applications, and so I’ll apologize to mathematicians and physicists for propagating it here, but a tensor from a machine learning standpoint is defined as “an n-dimensional array of arbitrary data.” – In other words, the data-type which we refer to as a tensor is really a special kind of array, which may have ‘n’ dimensions. It’s an array for which the number of defined dimensions is flexible, which allows a tensor to behave as a vector, a matrix, or an array of matrices. Essentially, a tensor wraps up all of the data types needed to simulate a neural network in a single higher level data type.
Now, there is a library available for Python called “numerical py” or “numpy” for short. This library provides a data type know as the “numpy array”, a data type supporting n-dimensional arrays, that is, a tensor.
You may be thinking “<insert chosen programming language here> supports multi-dimensional arrays, what makes Python better?” – If you are, you may be slightly missing the point. Delphi Pascal (my own language of choice) supports multi-dimensional dynamic arrays, but I didn’t say that tensors are multi-dimensional arrays. I said that tensors are n-dimensional arrays. Consider this pascal code snippet.
a1: array[0..4,0..4] of single;
a2: array[0..4,0..4,0..4] of single;
In the code snippet a1 and a2 are both arrays of ‘single’ (that’s a single precision floating point). The only difference between the declarations of a1 and a2 is that a1 has two dimensions (5×5 indexed 0..4 in each), and a2 has three dimensions (5x5x5). So surely I could easily create vector and matrix data types for processing neural networks, well yes I can. However, Delphi pascal is a strongly typed language, and as such the data types of a1 and a2 are distinct, different, and incompatible.
A tensor is just one data type, which can be configured to have any number of dimensions at runtime. Tensors of differing dimensions remain compatible with each other, they are the same data type, they merely have a different number of dimensions. This is where the power of tensors comes from…
The power of Tensors.
When simulating neural networks, vectors and matrices are only the beginning.
You see, in order to simulate massively parallelized systems in a series way, we’ve had to make many concessions to end up with this concept of an artificial neural network to meet engineering constraints. As we’ve done so, we’ve force fed the parallel processing into our CPU’s at significant engineering costs. For instance, I’ve talked about CPU’s as devices that can process data only in series, yet I’m typing this on a computer with a 32-core CPU – while each of those cores really is still a series device, the work of simulating neural networks, if arranged correctly, can be spread across those cores. The same is true of graphics hardware, which is often used in modern machine learning research. Graphics hardware often has many “compute units” which can be employed to process a neural network simulation in parallel. So, while the parallelism of modern hardware doesn’t even scratch the surface of comparison with the parallelism of biological brains, it does exist and can be taken advantage of when running artificial neural networks.
Fortunately, there are many techniques in machine learning to use the available hardware efficiently, and one of them is to ‘batch up’ the work. The input to a neural network is simply a vector, a one-dimensional array of data, which is then processed through a series of operations on the neurons and the connections between them, which are represented as vectors and matrices respectively. Now, if instead of placing a vector on the input of a neural network, you place a matrix instead, then precisely the same series of operations can be used to process that matrix – it’s simply treated as an array of vectors by the mathematics involved… Further, the input to the neural network may not actually be a flat vector to start with, it could be a black and white image for example, which can be represented as a matrix of intensities… or it could be a color image, an array of three (red, green, blue) layers of two-dimensional matrices of color intensities. Regardless of the number of dimensions in the input data, the math used in the neural networking simulation is the same, the algorithm therefore remains the same, the only thing that changes, is the number of dimensions of the input data type.
So then, if a programming language supports a single data-type which may be treated as a vector, a matrix, a hyper-matrix, or an array of any other number of dimensions, this language lends it’s self to the efficient use of hardware when simulating neural networks, and therefore lends its self to machine learning application.
This answers the original question that I posed, “why python?” –
Because python is ideally suited to machine learning application.