Part 1- Continued – Starting on the code.
Watch the video instead!
Source Code
The source code for this series can be found here : https://chapmanworld.com/cwVirtualMachine/
Lets get a start on writing some code.
In this first session I’m using the Lazarus 2.0.10 IDE to create the initial code, however, by part-2 I’ve added projects for several versions of Delphi as well as Lazarus. If you wish to compile the source from this part using Delphi, you can simply open the .lpr file, and copy-paste the source into a new Delphi command-line project.
Start a new command-line application. In Lazarus I chose ‘Simple Program’ because it gives the most straight forwards project template possible, and I’ve saved it as ‘VirtualMachine1’.
program VirtualMachine1;
begin
end.
The very first thing we’ll want to simulate from our model is the clock.
The clock is incredibly simple to simulate because it’s simply a regular pulse, or in pascal terms, a loop, so we’ll start with a repeat..until loop which continues so long as a boolean variable named ‘Running’ remains true. Of course, this means we must also initialize the running variable to true.
program VirtualMachine1;
var
Running: boolean;
begin
// Initialization
Running := True;
// Clock Cycle
repeat
until not Running;
end.
This, of course, is currently an infinite loop, because there is nothing to set Running to false. We’ll use a virtual-CPU instruction to set running to false when our program should end. So lets turn our attention to simulating the CPU and it’s instruction cycle.
Fetch, Perform, Proceed…
When we looked at the first instruction, the load instruction, we broke it down into seven steps. One of the reasons for there being so many steps was that we were discussing the memory device and how it responds, and we were discussing the details of how the CPU interacts with the memory in discreet actions on the address bus, the data bus, and the read-write pin.
As we begin to write our simulation, we already have a working memory system (in the host computer), and so we don’t really need to emulate the memory device directly, nor do we need to emulate the buses. Instead, we’ll break down our CPU instruction cycle into just three steps.
- Fetch an instruction from memory.
- Perform the instruction.
- Update the program counter.
Keep these three stages in mind, we’ll walk through each of them piece-by-piece, until we have a working virtual machine.
In order to fetch an instruction from memory, we need to have some instructions in a memory buffer somewhere, and before we can populate a buffer with instructions, we need to define the instructions that our CPU will understand, it’s ‘Instruction Set’, so lets begin with that.
Instruction Set…
const
opNOP = 00;
opHALT = 01;
The above two constants, opNOP and opHalt, are our CPU’s first two instructions. Don’t worry, we’ll add more, but for now, these will get us started. So what do they each do?
Well opNOP does literally nothing. The only change to the CPU having executed a NOP instruction is that the program counter is incremented. Almost all CPU instruction sets include some form of NOP or No-Operation instruction. Originally, I believe, this instruction was included with CPU’s in order to cause a delay in a program. You see, older CPU’s were much closer to what we’re simulating than modern CPU’s, and they ran at far slower speeds. They were still fast enough to keep us humans entertained, but for example, the MOS6502 8-bit CPU ran at speeds as low as 1Mhz, or 1million clock pulses per second. The NOP instruction consumed two clock cycles. So if you ran the NOP instruction five hundred times in a loop, the 6502 would be delayed by one millisecond. We could compound this to provide accurate timing of events, such as refreshing the screen for animation for example. Modern CPU’s may use NOP for the same purpose, however, there are more modern alternatives to NOP for timing now, yet it remains in most CPU’s.
Our second instruction is opHALT. I’ll give you three guesses what this one does. Simply put, it’ll set our ‘Running’ variable to false to halt our virtual CPU running, ending whatever program it is running.
Now that we have two instructions, we need some memory to store them in.
Program Memory
The memory for our program can be a simple array, however, we need to decide on the size of each element in that array.
Our model computer had only three instructions in it’s instruction set, and our virtual CPU thus far has only two, however, we’re going to need more. While recording the video for this part, I decided to make the instructions two bytes wide to give ample space for a growing instruction set, and we’ll stick with that for now, but I feel we’ll need to expand it further later in the series. So then, lets define a data type to represent a single instruction in memory…
type
TVMInstruction = uint16;
ptrVMInstruction = ^TVMInstruction;
Here I’ve defined both the TVMInstruction type as an alias of uint16 (two bytes), and a pointer type ‘ptrVMInstruction’ to point at an instruction in memory. This pointer type is going to come in handy for the program counter register as we’ll see in a moment.
We’re now ready to reserve some of our host computer memory to serve as memory space for the simulated computer.
(From here on in I’ll refer to the ‘real’ computer/cpu as the host, and the virtual machine either as the ‘VM’ or the guest.)
We’ll reserve enough memory to contain a very simple program…
const
cTestProgram: array[0..1] of TVMInstruction = (
opNOP,
opHALT
);
Well there’s an exciting program! This program has two instructions, first a NOP which does nothing, and then a HALT which ends the program. Okay, not the most exciting program, but it’s a start.
Fetch
We now have almost everything we need to perform the first step of our three step instruction cycle, the fetch step. All that is left is to add a program counter register to our CPU and initialize it to point at the first instruction in our program. The program counter can be of the type ‘ptrVMInstruction’, which we defined above, because it ‘points’ to an instruction in memory.
Lets look at our program now, with all of the pieces put together, and notice where the program counter fits in…
program VirtualMachine1;
const
opNOP = 00;
opHALT = 01;
type
TVMInstruction = uint16;
ptrVMInstruction = ^TVMInstruction;
const
cTestProgram: array[0..1] of TVMInstruction = (
opNOP,
opHALT
);
var
Running: boolean;
ProgramCounter: ptrVMInstruction;
begin
// Initialization
Running := True;
ProgramCounter := @cTestProgram[0];
// Clock Cycle
repeat
until not Running;
end.
We’ve added the program counter as a variable before our main program begins, and then during our initialization we’ve set it to point to element zero of our program in the guest memory. At this point, because we have a pointer to our instruction, we can reduce away the ‘fetch’ step, since we don’t actually need to fetch anything, we’re able to directly access the instruction in memory via the program counter as a typed-pointer. You’ll see this in action in our next step, which is to ‘Execute’ the instruction…
Execute
At this point, if you were following any other tutorial on how to write a virtual machine, you could expect to find a case statement (or equivalent ‘switch’ statement depending on the programming language). I’m going to introduce you to the same statement initially, but you should know that while case statements make sense here for teaching purposes, they are a very bad option for writing a virtual machine. We’ll look at why they are bad, and how to fix the problem later, but for now, as I said, the case statement is good for explaining what comes next.
case ProgramCounter^ of
opNOP: begin end;
opHALT: Running := False;
end;
The above code, when placed inside our repeat-until clock loop, provides both the fetch, and execute steps. As I said above, using the typed pointer to an instruction for the program counter, means that we don’t need to fetch at all, we simply inspect the instruction that the program counter is pointing to. This is what the first line of this code does.
Depending on the result being either opNOP or opHALT we then either do nothing “begin end;” or we set the ‘Running’ variable to false, which would cause our clock loop to end, thus ending the program.
If we were to run our virtual machine right now however, it would appear to hang. Why? Well the program counter points at a NOP instruction, and as we never alter the program counter inside our clock loop, it’ll continue pointing at the NOP instruction. So our virtual CPU would be stuck in a repeating loop, forever doing nothing. That’s not good, we need to implement the third and final stage of our instruction cycle…
Proceed – Increment the program counter.
All we need do to increment the program counter is to add the following instruction towards the end of the clock loop…
inc( ProgramCounter );
The above instruction increments the program counter by one element of the size determined by the pointer type, in this case a TVMInstruction, which is two bytes. For various reasons which come down to personal preference however, I prefer to write this instruction in a different way (which I do in the video for this part), and it looks like this…
{$hints off} ProgramCounter := Pointer( nativeuint(ProgramCounter) + Sizeof(TVMInstruction); {$hints on}
Okay, I know. My preferred way of incrementing the counter is pretty ugly by comparison to the simple ‘inc’ statement above, but I have some reasons for doing it this way. First of all, the ‘inc’ instruction in pascal has different behaviors depending on the item you’re incrementing. If you gave a vanilla pointer for example, it would increment the pointer by one, but if you give a typed pointer as we did above, it increments by two because of the target type being a TVMInstruction. It’s also possible to add a second parameter to the inc instruction, which is the number of increments that you want to have performed.
All of the different behaviors of inc mean that it is not always obvious while reading the code, what the instruction will actually do. If our program becomes long, which I expect it will, and if our virtual machine instruction size changes, which I also expect will happen, the inc instruction will change it’s behavior. At some point in the future, we could come back to the inc instruction and not know what it’s going to do. My ugly way of specifying the increment more explicitly, actually gives us more information about what is going on right here at this line of code.
A second reason, and the real clincher for not using inc in this case, is that we might later decide to make the program counter an integer type rather than a pointer. I’ve done this already for a particular piece of code that we’ll see in later parts, and so the inc instruction would behave incorrectly and without giving any compile-time warning that it’s behavior changed.
The way I’ve written the increment will either function correctly always, or cause the compiler to fail while building the code, alerting us to take care of the change.
So where are we now?
If we run our virtual machine, it’ll happily and dutifully do… well, nothing. It will however, end the program rather than getting stuck in an infinite loop, because having executed the NOP instruction, it’ll update the program counter and then find and execute the HALT instruction. Congratulations, it does nothing useful but you have a working virtual machine.
program VirtualMachine1;
const
opNOP = 00;
opHALT = 01;
type
TVMInstruction = uint16;
ptrVMInstruction = ^TVMInstruction;
const
cTestProgram: array[0..1] of TVMInstruction = (
opNOP,
opHALT
);
var
Running: boolean;
ProgramCounter: ptrVMInstruction;
begin
// Initialization
Running := True;
ProgramCounter := @cTestProgram[0];
// Clock Cycle
repeat
// Fetch
// Execute
case ProgramCounter^ of
opNOP: begin end;
opHALT: Running := False;
end;
// Increment program counter
{$hints off} ProgramCounter := Pointer( nativeuint(ProgramCounter) + Sizeof(TVMInstruction); {$hints on}
until not Running;
end.
But what about the case statement?
Okay, I told you that the case statement is bad for a virtual machine, but I did not explain why.
When our compiler (Delphi or Freepascal) compiles our virtual machine, it generates an assembly language file and feeds that into an assembler. In Delphi this process is a little more obscured because the compiler in use, varies depending on the target platform, and in some of those compilers the assembler is internal to the compiler its self. Under FreePascal however it is always the case that the compiler will generate an assembler file, but having fed that assembler file into an assembler to generate binary code, it then deletes the assembler file. Essentially, you never get to see it. Freepascal does however have a command-line option which tells the compiler not to delete that file, so that we can inspect the assembler code that gets generated.
In the video for this part, I ask the compiler to generate the assembler code for the case statement so that we can take a look. I’m not going to go into that detail here, if you want to know what’s going on under the hood you’ll have to watch the video – but I will explain it in more general terms here.
Essentially the generated assembler unwinds to something which looks like this psudo code…
Let RAX = ProgramCounter^
if RAX = opNOP then begin
end else if RAX = opHALT then begin
Running := False;
end
What is happening here is that the instruction at the program counter is first being tested to see if it’s an opNOP instruction, and then being tested to see if it’s an opHALT instruction. You may wonder just what is wrong with that? Well imagine what would happen if our instruction set were to grow. We could add arithmetic instructions ADD, SUB, DIV, MUL, we could add stack instructions PUSH, POP, and so on. We can quite reasonably expect to end up with around a hundred instructions before we have even a simple instruction set by most virtual machine standards.
So then, if we add many instructions to our instruction set, our virtual machine is going to have to test for each of them, in sequence, until it finds the instruction it cares about. Instructions towards the end of our instruction set could be a hundred, possibly even a thousand tests deep. If our virtual machine is going to perform all of these tests, per instruction cycle, it’s going to be wasting a whole lot of CPU time!
What we need then is some way to simply jump directly to the place in our code which handles the instruction that we have, and thankfully, we can do that, at the expense of making the code just slightly more complex.
Instruction Decoding.
There are a wide variety of mechanisms by which CPU instructions are ‘encoded’ into memory. Our mechanism is the most simple mechanism there is, each instruction has a unique number, and because the range of those numbers is small (zero for the NOP instruction, and one for the HALT instruction), and because those numbers are sequential, we could use some simple pointer arithmetic to jump directly to the code which executes them.
In order to save myself some typing here, I’m going to paste in the finished code from the video, and we’ll talk about how the code works.
program VirtualMachine1;
const
opNop = 00;
opHalt = 01;
opAlert = 02;
type
TVMInstruction = uint16;
ptrVMInstruction = ^TVMInstruction;
TVMInstructionHandler = procedure;
TInstructionSet = array[0..2] of TVMInstructionHandler;
const
cTestProgram: array[0..2] of TVMInstruction = (
opNop,
opAlert,
opHalt
);
var //- State data
Running: boolean;
ProgramCounter: ptrVMInstruction;
procedure NopHandler;
begin
Writeln('Nop');
end;
procedure HaltHandler;
begin
Writeln('Setting Running to false (HALT)');
Running := False;
end;
procedure AlertHandler;
begin
Writeln('Alert!!!!!');
end;
var //- Machinery of VM
CurrentInstruction: TVMInstructionHandler;
InstructionSet: TInstructionSet;
begin
// Initialization
InstructionSet[opNop] := @NopHandler;
InstructionSet[opHalt] := @HaltHandler;
InstructionSet[opAlert] := @AlertHandler;
Running := True;
ProgramCounter := @cTestProgram[0];
// Clock Cycle
Writeln('Starting virtual program');
repeat
// Fetch
CurrentInstruction := InstructionSet[ProgramCounter^];
// Execute
CurrentInstruction();
// Update program counter
{$hints off} ProgramCounter := pointer(nativeuint(ProgramCounter)+sizeof(TVMInstruction)); {$hints on}
until not Running;
Writeln('Ending virtual program');
Readln;
end.
The above code is for a virtual machine which understands one more instruction than we’ve seen so far, the new opALERT instruction simply writes the text ‘ALERT’ to the console. This code also has a few additional ‘Writeln’ statements to let us know what is going on as the program runs. Otherwise however, it does precisely the same as the code we’ve written thus far, but in a slightly different way.
There are three new procedures named ‘NopHandler’, ‘HaltHandler’ and ‘AlertHandler’. The first two procedures do precisely the same as our previous NOP and HALT case switches, and the third, as I’ve said, prints ‘ALERT’ to the screen. Each of these handlers is loaded into an array named ‘InstructionSet’ during the initialization of our virtual machine code, and importantly, they are loaded into the array with indicies that have the same value as the ‘opCode’ constants that we declared earlier.
This allows us to now, to call any of these handlers by it’s index. For instance, opHalt is declared as having the value one, and we load it’s handler into the instruction set array with this line…
InstructionSet[opHalt] := @HaltHandler;
So then, any time we want our virtual machine to perform a halt instruction, we just call the method pointer in “InstructionSet[1]” and our program jumps right into that handler. So the instruction code pointed to by our program counter, is also the index into the instruction set for its handler.
// Fetch
CurrentInstruction := InstructionSet[ProgramCounter^];
// Execute
CurrentInstruction();
You’ll note that we’ve reintroduced our ‘Fetch’ stage to the instruction cycle for the sake of code clarity, though we need not have even used the ‘CurrentInstruction’ callback variable. We could execute instructions on the array directly if we so wished, and save ourselves yet another few instruction cycles on the host.
Now I know, this is something of a trade off. Procedure calls to instruction handlers are more expensive for our host to perform than if they were inline instructions, however, that is a small price to pay to avoid all the branching we were performing before. Our code now performs a ‘look up’ on the instruction set handlers, and jumps to the handler, which will take precisely the same amount of instruction cycles on the host machine if we have a hundred op codes, or a thousand.
And so there you have it, a fully working, though for the time being, quite pointless virtual machine. Fear not, this virtual machine is going to become more capable over the next few parts of the series. Stick around, and if you like the series, please consider sharing links to it on your favorite social media platform!