12 thoughts on “Lockless Multi-Threading in Delphi”

  1. Thank you for sharing a simple approach to safely passing data while using multi-threading. I have attempted this in the past and was not satisfied with the result. You implied that one goal is to avoid using critical sections. Can you help me understand why a Delphi programmer should avoid them? I understand what they are for – they protect some shared memory so that no other thread can collide with writing data to the same block of memory at the same time. But I would like to know why you feel that they should be avoided.

    • In most cases, critical sections should not be avoided. Lock-less threading is the most dangerous tool in your workshop, and should only really be attempted in cases where performance is critical. If mutexes (such as TCriticalSection) can be used to protect your data between threads, it is a far safer option.

      Mutexes have disadvantages however. The more threads there are trying to access the critical section, the more contested the mutex is, and more CPU cycles are required to lock/unlock the mutex. In cases where you need near real-time performance, mutexes can therefore become a bottle-neck.

      I wrote this code because I’m working on a video game engine. In a video game system, running somewhere around 60 frames per second, your CPU has a window of 16.7ms to do all of its work from gathering input, updating the simulation, rendering a frame of animation, and handling audio output. In order to achieve the most available clock cycles, locks simply aren’t acceptable in most cases.

      I successfully integrated this lock less technique into my engine, but later added some condition variables to it to prevent burning up CPU cycles while threads are idle.

    • Indeed.
      The OmniThread library is far more advanced than what I’ve done here.

      In fact, this video was more about learning and education than providing an actual library.
      I wanted to understand how to do this, and to share the knowledge.

  2. In your video, you stated that an aligned read or write statement is atomic. Is this still the case with multi core and hyper threading processors? The 80486 processor was a single core, single processor, so all read and write operations were always serialised. With modern processors, reads and write’s could be at the same time I guess.

    • Yes it is. The user manual that I was referencing, though it says 486 or higher, was for a recent x86_64 architecture. I am not certain as to why Intel refer to operations as atomic from 486 up, as you are correct that the 486 did not have SMT features. I guess it’s possible that at the time they were preparing to add HT, which appeared in early pentiums and also xeons which were evolutions of the 486. Perhaps some 486 era xeon had threading, or maybe it was related to multi processor boards? Either way, the answer is yes, from what I see in manuals for recent multicore and SMT enabled processors, these operations remain atomic.

        • Even though CPUs back then weren’t SMP (symetrical muti-processing), there were (and still are) other devices connected to the CPU bus. Most notably, DMA (Direct Memory Access) controllers. These controllers allowed the CPU to set up a block transfer to/from certain I/O devices and that operation proceeds without further intervention from the CPU. While it’s not good practice for the CPU to be fiddling with the block of memory involved in a DMA operation, ensuring atomicity will at least keep individual memory operations consistent.

          While the CPUs themselves didn’t directly support SMP, there were systems which used external circuitry to enable these CPUs to operate in that manner. They weren’t very efficient, but they did exist. I remember the hubub when Intel introduced their new CPUs with “glue-less SMP”, which referred to the fact that SMP support was inherent in the CPU itself and didn’t require large, complicated external circuitry.

  3. Another good question on twitter @saeedkgr ..

    ” If push will be used only by one thread, there’s no need to use local variable because a single thread can’t enters one procedure twice simultaneously . Right? ”

    You still need to ensure that the fPushIndex is only updated at one atomic instant and does not go to an intermediate value (i.e. >length of array), because pull could get the wrong value for fPushIndex when it reads it. So we use a local variable to prep the new index, and then update fPushIndex in one single atomic operation.

  4. On twitter @darianmiller asks a great question.

    “In TMessagePipe.Pull: NewIndex := succ(fPullIndex) There’s some time between that line and the fPullIndex := NexIndex. What stops another thread from executing NewIndex := succ(fPullIndex) and overwriting fPullIndex later? 2 threads will eventually pull the same index”

    Each pipe is intended to be between only two threads, a sending and a receiving thread. If you have two threads which need to message a third, you should have two incoming pipes on that target thread.

Leave a Comment