Reverse-Engineering Legacy Search with Integration Tests

At a previous employer, we had a complex query mechanism using OpenSearch/Elasticsearch for finding certain documents. It contained dense location logic, semantic search, and years of accumulated business rules. Nobody could point me to the original requirements. The merge requests didn't have enough detail to deduce what the actual behavior was supposed to be. The code was the only source of truth, and the code was hard to read.

We had unit tests, but they could only verify so much. In C#, the OpenSearch object is not inspectable and testing serialized JSON output is flaky and error-prone. Our tests confirmed that data objects were correct and the right functions were being called, but I had no way of being sure the data in the functions worked as intended. The translated queries could change out from under me and my tests would still pass -- all I could verify was that the translation function was being called, not that the translation was correct.

I needed to test the real thing.

Probing the black box

I pulled an OpenSearch Docker container down and spun up an index that mimicked the production shape. I also wrote a quick script to pull the mapping file locally since it was low effort. Then I started probing.

The method was simple: guess what a rule does, write a test that proves or disproves the guess, and document the result. Seed some data, run the existing query builder against a real OpenSearch instance, and assert on what comes back. I can't share the exact code, but the pattern looks like this:

    [Fact]
    public async Task FilterById()
    {
        // Arrange
        await SeedDocuments([
            CreateTestData(id: 1, parentId: 100),
            CreateTestData(id: 2, parentId: 200),
            CreateTestData(id: 3, parentId: 100)
        ]);

        var searchCriteria = new SearchCriteria { parentId = 100 };

        var query = new SearchQueryBuilder()
            .SetCriteria(searchCriteria)
            .Build();

        var searchResponse = await Client.SearchAsync<Data>(s => s
            .Index(TestIndexName)
            .Query(_ => query)
        );

        // Assert
        Assert.True(searchResponse.IsValid);
        var foundIds = searchResponse
            .Hits
            .Select(h => h.Source.id)
            .OrderBy(id => id)
            .ToList();
        Assert.Equal(new[] { 1, 3 }, foundIds);
    }

Each test became a piece of documentation. "Filtering by parentId returns only matching records." "Location queries include edge cases at boundary conditions." "Semantic search boosts exact title matches." The test suite grew into a specification that the codebase never had.

Over a week, I dissected more and more logic, getting it down to individual components. I also uncovered several bugs along the way -- behavior that felt odd turned out to actually be wrong. Edge cases that returned unexpected results. Filters that silently dropped conditions. Filters that mapped to incorrect values. Missing location logic people were adamant always existed. Each bug became another test, pinning down the correct behavior so it couldn't regress.

This all culminated into a final integration test that used every part of the query structure to return a specific document from a large swath of inputs. 100 tests, all clear and documented, all running in under 3 seconds.

Getting it into CI

At this point I was still only testing locally. I needed these in the pipeline.

I split the test suite so that unit tests could still run without external dependencies. XUnit's Trait Attribute makes this straightforward:

[Trait("Category", "Integration")]
public class IntegrationTests : OpenSearchIntegrationTestBase

The Makefile handles the rest:

test.unit: build
	dotnet test --filter "Category!=Integration"

#test.integration: @ Runs integration tests (requires OpenSearch via docker-compose)
test.integration: build
	@OPENSEARCH_URL=$${OPENSEARCH_URL:-http://localhost:9200}; \
	echo "Checking if OpenSearch is running at $$OPENSEARCH_URL..."; \
	curl -s "$$OPENSEARCH_URL/_cluster/health" > /dev/null || \
		(echo "ERROR: OpenSearch is not running at $$OPENSEARCH_URL." && \
		echo "Start it with: docker-compose up -d opensearch-test" && \
		exit 1)
	dotnet test --filter "Category=Integration"

CircleCI lets you pull in images alongside your tests, so wiring up the integration job was trivial. I ran it in parallel with the existing unit tests -- 900 unit tests and 100 integration tests both complete in ~40s. Most of that is setup ceremony in CircleCI. Developers saw no increase in deploy timings.

The payoff

Within a week, I noticed red feature branches. The integration tests were failing. I opened the PR and found someone was refactoring our location logic due to cost and resource pressure. While refactoring, they had changed the behavior by mistake.

That week of background work saved weeks of silent failures and likely hours to days of debugging. Instead, the test broke quickly and the dev fixed their logic before it ever reached production.

The tests have tradeoffs -- they also break from intentional logic changes, and updating them takes time. They cost a little more to run since we host a container in CircleCI. But a handful of targeted integration tests are worth their weight in gold, especially on a codebase that everyone is afraid to touch.

When I change logic, a test breaks or I need to write a new test. On a well-maintained project, that's table stakes. On a legacy codebase with no documentation, integration tests are how you create the documentation.