How I Debugged a Bad Race ConditionPosted: 2010/07/23
The last week and a half I’ve spent searching for race conditions in my database code. A race condition is a bug where two (or more) processes interact in an unforseen way causing bad things to happen.
The symptoms of this bug where:
- Could occur at any time, usually after 2+ hours of testing.
- Appeared to change data: My tests read and write a test string of 1 character repeated many times, sometimes the string would have garbage in the middle like:
- If I reread the data after the error the problem was gone.
- Sometimes the system hardlocked. All potential debugging information would be lost.
- Enabling debugging made the problem so rare I couldn’t produce it in a reasonable time.
I assumed the read error was from forgetting to lock something, I also assumed the hardlocks were an unrelated problem. I began by isolating the simplest test that still exhibited either problem. This consumed days of time, but I learnt hardlocks happened in any tests, read errors only in the two integration tests.
The integration tests ran 32 threads and performed thousands of random selected operations: 0) Make and store data, 1) Replace data with new data, 2) Delete data (which by design moved other data around). After testing I found that both replace and delete had problems, delete producing one earlier, but I couldn’t tell if the two operations were experiencing the same problem.
The First Bug was a Deadlock
By day four I was googling race conditions and deadlocks. On a whim, I googled “SBCL deadlocks” and found: Programming with signal handling in mind:
I had the less trivial scenario in many tests. I often use sb-ext:string-to-octets, sb-sys:with-pinned-objects, and sb-sys:vector-sap to convert a string to a utf8 byte array pointer that I can copy into the database. However, the definition of sb-sys:with-pinned-objects is just:
(def!macro with-pinned-objects ((&rest objects) &body body) (declare (ignore objects)) `(without-gcing ,@body))
I modified the code to pin objects at the last possible moment after all other locks are acquired, and the deadlocks have gone away. The phantom read issue still occurred.
I Finally Stumble Across the Race Condition
I reviewed my collected error messages and noticed each phantom read occurred at exactly byte 564. I knew this was significant but my code didn’t contain 564, not as a magic-number nor a constant. It was not until next morning that I found a variable length on-disk structure that serialized as 1 or more blocks of 142 unsigned 32 bit integers; that would be 568 byte blocks, which was close enough to merit inspection.
My serialization is pretty simple: the first four bytes store the length (number of bytes) of the data, this data is written across a number of *data-size* binary blocks, and *origin-data* points to the start of the current block. The current block will always be in the page buffer. The routine ‘next’ moves *origin-data* to the next block. Next will return a new buffer, and release the old one, if the next block is on another page.
To read, I see the length in the first 4 bytes and make a big byte array. Then I use reposition 4 to set *origin-data* past the saved length. I read
(min length (- *data-length* 4)) bytes and that handles the first block. Subsequent blocks are more data, but there is a trick: After calling next I may be at the first block of the next page or *data-size* bytes ahead. In the latter case I’m 4 bytes ahead and must reposition -4 to align. I decide which case I’m in by comparing nbuffer to buffer, if they are equal I’m still on the same page.
(labels ((read-len () (let ((length (sap-ref-32 *origin-data* 0))) (make-array length :element-type '(unsigned-byte 8)))) (do-read (buffer dest offset len) (if (> len 0) (progn (with-pinned-objects (dest) (system-area-ub8-copy *origin-data* 0 (vector-sap dest) offset (min len *data-size*))) (do-read (next buffer) dest (+ offset *data-size*) (- len *data-size*))) buffer)) (reposition (n) (setf *origin-data* (sap+ *origin-data* n)))) (defun readin (buffer) (let ((bytes (read-len))) (values bytes (if (zerop (length bytes)) (next buffer) (progn (reposition 4) (let ((nbuffer (do-read buffer 0 (min (length bytes) (- *data-size* 4))))) (when (eq nbuffer buffer) (reposition -4)) (let ((already (- *data-size* n))) (do-read nbuffer already (- (length bytes) already))))))))))
Where is the problem? Not here, it’s actually in the definition of ‘start-of-next’ used by ‘next’. Can you spot it?
(defun start-of-next (buffer) (let ((next-sector (link buffer))) (if (zerop next-sector) (progn (release buffer) nil) (progn (release buffer) (let ((nbuffer (acquire next-sector))) (setf *origin-data* (first-block nbuffer)) nbuffer))))) (defun next (buffer) (if (past-end buffer (origin-data)) (start-of-next buffer) (incf *origin-data* *data-size*)))
The problem is releasing the old buffer before acquiring the next one. There is a slim chance the newly acquired next buffer will reuse the space of the just released old buffer. Thus in this case buffer will equal nbuffer, even though we are at the start of the next block. This caused an erroneously reposition -4 before the second block read. That took me over a week to find.