The Data Race Hiding Behind Correct Atomics

I thought the metrics collector was clean. The previous post explained why atomic.Pointer was the right choice for the snapshot cache -- lock-free reads, atomic increments, no mutexes on the hot path. I was feeling good about it. Then I ran go test -race while working on something else entirely, and the terminal lit up with data race warnings. Not one. Five.

The Circular Buffer: Time-Windowed Request Counts

I added the metrics page because I wanted to see what was happening on the site in real time. Most of the numbers were straightforward -- uptime, memory, goroutine count -- but I felt like something was missing. A short history of requests per minute. Just a nice little indicator of ongoing activity. The design is simple: an array of length 5 that we rotate through and increment on every request to keep a small rolling window.

Three Goroutines, One Plain int

Simple enough, right? But there are a few pieces at play. First, we need to actually record the requests. The middleware is added globally so that every request increments immediately.

func (c *Collector) IncrementRequests() {
	atomic.AddUint64(&c.requestCount, 1)
	atomic.AddUint64(&c.recentRequestCounter[c.requestIndex], 1)
}

Then we need to pull the data together and shove it onto the cached snapshot. This happens once a second and is visible on the /metrics tab.

func (c *Collector) RecentRequestCount() uint64 {
	var n uint64
	for i := 0; i < len(c.recentRequestCounter); i++ {
		n += c.recentRequestCounter[i]
	}
	return n
}

func (c *Collector) OrderedRecentRequests() []uint64 {
	size := len(c.recentRequestCounter)
	data := make([]uint64, 0, size)
	for i := range size {
		data = append(data, c.recentRequestCounter[(c.requestIndex + i + 1) % size])
	}
	return data
}

And finally, the piece that rotates the window forward every minute and zeros out the new bucket.

func (c *Collector) pruneOldestRequestCount() {
	ticker := time.NewTicker(1 * time.Minute)
	for range ticker.C {
		c.requestIndex = (c.requestIndex + 1) % len(c.recentRequestCounter)
		c.recentRequestCounter[c.requestIndex] = 0
	}
}

At this point, we've introduced the race condition. It might not be immediately apparent, but it's there.

Why It "Works" Despite the Race

This isn't a mission-critical metric on a high-traffic production system. It's a personal blog. When pruneOldestRequestCount bumps the index, there's nothing protecting it -- we don't know if anyone else is reading or writing using it at that exact moment. The worst case? A request or two lands in the wrong bucket, maybe gets blown away by the zero-out. Nobody's losing money over it. But it goes against my standards, and it's exactly the kind of thing that bites you when you stop paying attention.

go test -race: Catching What Code Review Missed

Here's where Go's tooling saved me. The race detector watches memory accesses at runtime and flags anything that isn't properly synchronized. Running it is dead simple:

go test -race ./internal/metrics/

The catch is that it only flags races that are actually exercised during the test. You need a test that hammers all the concurrent code paths at the same time. Here's the one that caught it:

func TestConcurrentAllPaths(t *testing.T) {
	c := newTestCollector()
	var wg sync.WaitGroup

	for i := 0; i < 100; i++ {
		wg.Add(1)
		go func(i int) {
			defer wg.Done()
			c.ConnOpen()
			defer c.ConnClose()
			c.TouchSession(fmt.Sprintf("sess-%d", i%10))
			c.IncrementRequests()
			c.RecordDuration(time.Duration(i+1) * time.Millisecond)
		}(i)
	}

	for i := 0; i < 50; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			_ = c.snapShot()
			_ = c.RecentRequestCount()
			_ = c.OrderedRecentRequests()
			c.ComputeResponseStats()
		}()
	}

	wg.Wait()
}

100 goroutines simulating request handlers, 50 simulating snapshot readers. The race detector immediately flagged requestIndex -- a bare int being read and written from different goroutines with zero synchronization.

The Fix: More Than Just the Write Side

My first instinct was to slap atomic.StoreInt64 on the write in the prune method and call it done:

// First attempt -- I thought this was enough
func (c *Collector) pruneOldestRequestCount() {
	ticker := time.NewTicker(1 * time.Minute)
	for range ticker.C {
		next := (c.requestIndex + 1) % int64(len(c.recentRequestCounter))
		atomic.StoreInt64(&c.requestIndex, next)
		c.recentRequestCounter[c.requestIndex] = 0
	}
}

It wasn't enough. This still has two problems -- the read of c.requestIndex in the modulo calculation is bare (no atomic load), and the zero-out re-reads c.requestIndex instead of just using next. But the bigger issue? Every reader of requestIndex was also doing bare reads. I'd made the write safe and left the reads wide open. An atomic store doesn't help if the other side doesn't do an atomic load. Both sides have to participate.

So how do you stop yourself from making half the fix? You change the type. I switched from a bare int64 with free-function atomics to atomic.Uint64, which makes bare access impossible -- every read goes through .Load(), every write through .Store(). You literally can't forget:

// Before: nothing stops you from bare access
requestIndex  int64
_ = c.requestIndex                        // compiles fine, still a race

// After: the compiler won't let you
requestIndex  atomic.Uint64
_ = c.requestIndex.Load()                 // only way to get the value

The prune method becomes:

func (c *Collector) pruneOldestRequestCount() {
	ticker := time.NewTicker(1 * time.Minute)
	for range ticker.C {
		next := (c.requestIndex.Load() + 1) % uint64(len(c.recentRequestCounter))
		c.recentRequestCounter[next] = 0
		c.requestIndex.Store(next)
	}
}

And every reader now goes through .Load():

func (c *Collector) OrderedRecentRequests() []uint64 {
	size := uint64(len(c.recentRequestCounter))
	data := make([]uint64, 0, size)
	curIndex := c.requestIndex.Load()
	for i := range size {
		ix := (curIndex + i + 1) % size
		data = append(data, atomic.LoadUint64(&c.recentRequestCounter[ix]))
	}
	return data
}

Down the Rabbit Hole

I ran the test again. More races. The requestIndex was just the one I could see by reading the code -- the detector found the same pattern hiding in three more places.

snapShot() was reading c.requestCount as a bare field, even though IncrementRequests writes it with atomic.AddUint64. Same deal with the recentRequestCounter slice elements -- written atomically in IncrementRequests, but RecentRequestCount() and OrderedRecentRequests() were just reading them bare. Every one of those needed an atomic.LoadUint64.

The sneakiest one was the durations ring buffer. RecordDuration writes time.Duration values into a fixed-size array on every request, and ComputeResponseStats iterates over that same array to compute averages. Two goroutines, same memory, no synchronization. Since time.Duration is just int64 under the hood, I changed the array to [4096]atomic.Int64 and convert on the way in and out:

// Write
func (c *Collector) RecordDuration(d time.Duration) {
	i := c.durationIndex.Add(1)
	c.durations[i%uint64(len(c.durations))].Store(int64(d))
}

// Read
for i := range c.durations {
	d := time.Duration(c.durations[i].Load())
	if d > 0 {
		buf = append(buf, d)
		sum += d
		cnt++
	}
}

After all of that, go test -race passes clean.

What I Took Away

Every single one of these races had the same shape: an atomic write paired with a bare read. It's easy to think "I made the write atomic, so it's safe" -- but that's only half the handshake. If the reader doesn't participate, it's still a race.

The atomic.Uint64 and atomic.Int64 types exist specifically to make this mistake impossible. When the type itself is atomic, there's no way to do a bare read. The compiler catches what I missed.

I found one race by reading the code. The detector found four more. Write the test first, run it with -race, and let the tooling tell you what you're wrong about. That's its own lesson.

Now, these metrics are accurate -- but they disappear every time the server restarts. That's the next problem to solve: persisting them. That's its own post.