Saturday, March 21, 2026

Assessing inside high quality whereas coding with an agent

There’s no scarcity of studies on how AI coding assistants, brokers, and fleets of brokers have written huge quantities of code in a short while, code that reportedly implements the options desired. It’s uncommon that folks discuss non-functional necessities like efficiency or safety in that context, perhaps as a result of that’s not a priority in lots of the use circumstances the authors have. And it’s even rarer that folks assess the standard of the code generated by the agent. I’d argue, although, that inside high quality is essential for growth to proceed at a sustainable tempo over years, slightly than collapse beneath its personal weight.

So, let’s take a better take a look at how the AI tooling performs in terms of inside code high quality. We’ll add a characteristic to an present software with the assistance of an agent and take a look at what’s taking place alongside the best way. In fact, this makes it “simply” an anecdote. This memo is under no circumstances a examine. On the similar time, a lot of what we’ll see falls into patterns and might be extrapolated, a minimum of in my expertise.

The characteristic we’re implementing

We’ll be working with the codebase for CCMenu, a Mac software that reveals the standing of CI/CD builds within the Mac menu bar. This provides a level of problem to the duty as a result of Mac functions are written in Swift, which is a standard language, however not fairly as frequent as JavaScript or Python. It’s additionally a contemporary programming language with a posh syntax and sort system that requires extra precision than, once more, JavaScript or Python.

Assessing inside high quality whereas coding with an agent

CCMenu periodically retrieves the standing from the construct servers with calls to their APIs. It at the moment helps servers utilizing a legacy protocol carried out by the likes of Jenkins, and it helps GitHub Actions workflows. Essentially the most requested server that’s not at the moment supported is GitLab. So, that’s our characteristic: we’ll implement assist for GitLab in CCMenu.

The API wrapper

GitHub offers the GitHub Actions API, which is secure and effectively documented. GitLab has the GitLab API, which can be effectively documented. Given the character of the issue house, they’re semantically fairly comparable. They’re not the identical, although, and we’ll see how that impacts the duty later.

Internally, CCMenu has three GitHub-specific information to retrieve the construct standing from the API: a feed reader, a response parser, and a file that incorporates Swift features that wrap the GitHub API, together with features like the next:

  func requestForAllPublicRepositories(person: String, token: String?) -> URLRequest
  func requestForAllPrivateRepositories(token: String) -> URLRequest
  func requestForWorkflows(proprietor: String, repository: String, token: String?) -> URLRequest

The features return URLRequest objects, that are a part of the Swift SDK and are used to make the precise community request. As a result of these features are structurally fairly comparable they delegate the development of the URLRequest object to at least one shared, inside operate:

  func makeRequest(methodology: String = "GET", baseUrl: URL, path: String,
        params: Dictionary = [:], token: String? = nil) -> URLRequest

Don’t fear in the event you’re not conversant in Swift, so long as you recognise the arguments and their sorts you’re high-quality.

Optionally available tokens

Subsequent, we must always take a look at the token argument in just a little extra element. Requests to the API’s might be authenticated. They don’t should be authenticated however they are often authenticated. This permits functions like CCMenu to entry info that’s restricted to sure customers. For many API’s, GitHub and GitLab included, the token is solely a protracted string that must be handed in an HTTP header.

In its implementation CCMenu makes use of an elective string for the token, which in Swift is denoted by a query mark following the kind, String? on this case. That is idiomatic use, and Swift forces recipients of such elective values to take care of the optionality in a secure method, avoiding the basic null pointer issues. There are additionally particular language options to make this simpler.

Some features are nonsensical in an unauthenticated context, like requestForAllPrivateRepositories. These declare the token as non-optional, signalling to the caller {that a} token have to be offered.

Let’s go

I’ve tried this experiment a few instances, through the summer season utilizing Windsurf and Sonnet 3.5, and now, not too long ago, with Claude Code and Sonnet 4.5. The strategy remained comparable: break down the duty into smaller chunks. For every of the chunks I requested Windsurf to give you a plan first earlier than asking for an implementation. With Claude Code I went straight for the implementation, counting on its inside planning; and on Git when one thing ended up going within the flawed course.

As a primary step I requested the agent, kind of verbatim: “Primarily based on the GitHub information for API, feed reader, and response parser, implement the identical performance for GitLab. Solely write the equal for these three information. Don’t make modifications to the UI.”

This gave the impression of an inexpensive request, and by and enormous it was. Even Windsurf, with the much less succesful mannequin, picked up on key variations and dealt with them, e.g. it recognised that what GitHub calls a repository is a venture in GitLab; it noticed the distinction within the JSON response, the place GitLab returns the array of runs on the high stage whereas GitHub has this array as a property in a top-level object.

I hadn’t seemed on the GitLab API docs myself at this stage and simply from a cursory scan of the generated code every little thing seemed fairly okay, the code compiled and even the advanced operate sorts had been generated appropriately, or had been they?

First shock

Within the subsequent step, I requested the agent to implement the UI so as to add new pipelines/workflows. I intentionally requested it to not fear about authentication but, to only implement the movement for publicly accessible info. The dialogue of that step is perhaps for an additional memo, however the brand new code someway must acknowledge {that a} token may be current sooner or later

  var apiToken: String? = nil

after which it could actually use the variable within the name the wrapper operate

  let req = GitLabAPI.requestForGroupProjects(group: title, token: apiToken)
  var initiatives = await fetchProjects(request: req)

The apiToken variable is appropriately declared as an elective String, initialised to nil for now. Later, some code may retrieve the token from one other place relying on whether or not the person has determined to register. This code led to the primary compiler error:

Screenshot of Xcode panel showing a compiler error in GitLabProjectList.swift. The error reads "Value of optional type 'String?' must be unwrapped to a value of type 'String'".

What’s occurring right here? Nicely, it seems that the code for the API wrapper in step one had a little bit of a refined drawback: it declared the tokens as non-optional in all the wrapper features, e.g.

  func requestForGroupProjects(group: String, token: String) -> URLRequest

The underlying makeRequest operate, for one purpose or one other, was created appropriately, with the token declared as elective.

The code compiled as a result of in the best way the features had been written, the wrapper features positively have a string and that may in fact be handed to a operate that takes an elective string, an argument that could be a string or nothing (nil). However now, within the code above, now we have an elective string and that may’t be handed to a operate that wants a (particular) string.

The vibe repair

Being lazy I merely copy-pasted the error message again to Windsurf. (Constructing a Swift app in something however Xcode is an entire totally different story, and I keep in mind an experiment with Cline the place it alternated between including and eradicating express imports, at about 20¢ per iteration.) The repair proposed by the AI for this drawback labored: it modified the call-site and inserted an empty string as a default worth for when no token was current, utilizing Swift’s ?? operator.

  let req = GitLabAPI.requestForGroupProjects(group: title, token: apiToken ?? "")
  var initiatives = await fetchProjects(request: req)

This compiles, and it kinda works: if there’s no token an empty string is substituted, which implies that the argument handed to the operate is both the token or the empty string, it’s at all times a string and by no means nil.

So, what’s flawed? The entire level of declaring the token as elective was to sign that the token is elective. The AI ignored this and launched new semantics: an empty string now alerts that no token is offered. That is

  • not idiomatic,
  • not self-documenting,
  • unsupported by Swift’s sort system.

It additionally required modifications in each place the place this operate is known as.

The true repair

In fact, what the agent ought to’ve achieved is to easily change the operate declaration of the wrapper operate to make the token elective. With that change every little thing works as anticipated, the semantics stay intact, and the change is proscribed to including a single ? to the operate argument’s sort, slightly than spraying ?? "" everywhere in the code.

Does it actually matter?

You would possibly ask whether or not I’m splitting hair right here. I don’t assume I’m. I believe it is a clear instance the place an AI agent left to their very own would have modified the codebase for the more severe, and it took a developer with expertise to note the problem and to direct the agent to the proper implementation.

Additionally, this is only one of many examples I encountered. In some unspecified time in the future the agent wished to introduce a totally pointless cache, and, in fact, couldn’t clarify why it had even urged the cache.

It additionally failed to grasp that the person/org overlap in GitHub doesn’t exist within the GitLab, and went to implement some sophisticated logic to deal with a non-existing drawback. It took greater than nudging the agent in direction of the proper locations within the documentation to speak it down from insisting that the logic was wanted.

It additionally “forgot” to make use of present features to assemble URLs, replicating such logic in a number of locations, typically with out implementing all performance, e.g. the choice to overwrite the bottom URL for testing functions utilizing the defaults system on macOS.

So, in these circumstances, and there have been extra, the generated code labored. It carried out the performance required. However the brand new code additionally would’ve added utterly pointless complexity and it missed non-obvious performance, reducing the standard of the codebase and introducing refined points.

If engaged on massive software program methods has taught me one factor it’s that investing within the inside high quality of the software program, the standard of the codebase, is a worthwhile funding. Don’t get overwhelmed by technical debt. People and brokers discover it harder to work with a sophisticated codebase. With out cautious oversight, although, the AI brokers appear to have a powerful tendency to introduce technical debt, making future growth more durable, for people and brokers.

Yet one more factor

If doable, CCMenu reveals the avatar of the particular person/actor that triggered the construct. In GitHub the avatar URL is a part of the response to the construct standing API name. GitLab has a “cleaner”, extra RESTful design and retains further person info out of the construct response. The avatar URL have to be retrieved with a separate API name to a /person endpoint.

Each Windsurf and Claude Code stumbled over this in a significant method. I keep in mind a longish dialog the place Claude Code wished to persuade me that the URL was within the response. (It most likely obtained combined up as a result of a number of endpoints had been described on the identical web page of the documentation.) Ultimately I discovered it simpler to implement that performance with out agent assist.

My conclusions

Throughout the experiments in the summertime I used to be on the fence. The Windsurf / Sonnet 3.5 combo did pace up writing code, but it surely required cautious planning with prompts, and I needed to swap backwards and forwards between Windsurf and Xcode (for constructing, working checks, and debugging), which at all times felt considerably disorientating and obtained tiring rapidly. The standard of the generated code had vital points, and the agent had an inclination to get caught attempting to repair an issue. So, on the entire it felt like I wasn’t getting a lot out of utilizing the agent. And I traded doing what I like, writing code, for overseeing an AI with an inclination to jot down sloppy code.

With Claude Code and Sonnet 4.5 the story is considerably totally different. It wants much less prompting, and the code has higher high quality. It’s under no circumstances prime quality code, but it surely’s higher, requiring much less rework and fewer prompting to enhance high quality. Additionally, working a dialog with Claude Code in a terminal window alongside Xcode felt extra pure than switching between two IDEs. For me this has tilted the scales sufficient to make use of Claude Code usually.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles