Categories
Development

A Week with GitHub Copilot

I’m very skeptical of the way current “AI” utilities are being pushed as “tools”. Everything I’ve seen so far makes it clear that they’re very fun toys, but since they don’t really “know” anything, they’re not authoritative or even trustworthy. I’m especially concerned because the companies building the software push the technology as a tool. And we’re already seeing them be abused.

But a few weeks ago, my boss asked the team about using GitHub Copilot. I rolled my eyes, expressed my dissatisfaction, but said I’d give it a try. (I at least want to have my fury informed by facts.) And having finally gotten access to Copilot, I decided I’d give it a try.

All my work is in JetBrains Rider (my favorite and only IDE), with the official GitHub Copilot extension. I’m working in an existing codebase, written in C#. (If it works well, I might try using it with some of the Angular front-end code.)

First Impressions

I got access in the afternoon on a Friday, so I didn’t get a lot of time to play with Copilot right away. I was interested in having it write a unit test for me: a simple test to verify a mapping was set up properly and working.

I wasn’t sure what I was expecting, but it failed pretty spectacularly, writing code that wouldn’t even compile (and Rider’s ReSharper plugin picked out right away). It was certainly trying to generate something useful, but it was obviously using the context of some existing tests, which obviously weren’t checking all the properties of the new mapping. Those tests were checking some of the properties, so the generated code wasn’t entirely useless, but it didn’t really save me much time when I had to come behind it and clean up. (And considering I have a regular expression I can use to convert the mapping code to test assertions with nearly 100% accuracy, Copilot looks even worse.)

I tinkered a little more in some other code, but results didn’t get much better; maybe about 20-40% correctness in the stuff it was generating.

Obviously, I wasn’t very impressed.

Second Impressions

I came back in on Monday with some new work I needed to do. My first task was pretty basic: an ASP.NET controller method to perform a search, taking a POST body with search parameters. We use MediatR to implement a mediator pattern in our code, but I already have a template for a request and handler, so I wasn’t going to bother having Copilot take a stab at that.

I started out creating the search request model, and I could tell Copilot was really trying. As this was an empty class, it didn’t have a lot of context to work with, but it was obviously checking some code in nearby files, because it was generating nearly-accurate names for properties as I added type information. When I pulled up the tool window and had it generate a bunch of options, they were very complete, but also very wrong; generating tons of properties for things that I’m pretty sure don’t show up anywhere in our code. That said, it looked like it was trying to generate a decent search request model, so I guess I’ll give it partial credit.

When adding the code for the MediatR handler and the actual SQL search query, it wasn’t particularly useful. The generated code was typically almost right, but generated property names that don’t exist.

I decided to give it a shot at generating tests for the query, and it finally showed its usefulness. It generated two tests that were actually pretty decent, fully covering the code but missing some edge cases. I did a little cleanup, added a few more tests, and called it done. I hate writing unit tests for SQL queries, but it saved me some time here and did most of the heavy lifting. I did some refactoring afterwards to extract some utility methods, and when I tried generating more tests, it actually started using those new methods (since they were now in use in the file), which is nice to see.

Another thing I’ve found that it’s better at handling: documentation. When I’m writing documentation in my code, it’s better at using the context to come up with a decent comment. (I guess that’s not too surprising: the context gives it everything it should need, and LLMs are great at generating text.)

Something I feel like I need to point out here: it’s slow. Copilot isn’t running locally, so it has to go ask the server to generate suggestions, which takes a lot of time. ReSharper is all running locally and has knowledge of the entire codebase, so it generates better results faster, and doesn’t need “AI” to help it do its job. Unit testing is a similar issue, where I’ve seen tools years ago that could generate more complete test coverage without “AI”. That said, if you truly don’t know what you’re doing, maybe it can help.

Learning How It Works

I spent some time over the next day or two reading up on how Copilot works. I already knew it’s basically “Chat GPT for code”, and I already understood the basics of how LLMs work (basically using a bunch of probabilities to “guess” the next word in a sentence). What I didn’t know was how Copilot learns about code it has no access to – our repository isn’t in GitHub.

Turns out the IDE plugin sends a bunch of context with the requests to Copilot. The context includes the code from the file you’re editing along with code from all your open tabs, so the more tabs you have open with relevant code, the better the suggestions get. This is how I typically work, but now that I knew this is what Copilot was using, I started keeping more tabs open when working on code.

Third Impressions

I feel like I was hitting a peak through the rest of the week. Copilot was occasionally generating useful code, but it was still painfully obvious it knew nothing about my code; it was just generating text that “looked” like it fit. I used it to build some SQL views and just like C#, it generated SQL that referred to tables and columns that didn’t exist.

I find I have to be more vigilant when working with Copilot. Whereas ReSharper uses context to provide suggestions that actually fit in with existing code, Copilot will often generate complete gibberish.

Final Impressions

I suppose I’m starting to find a place where Copilot can come in to play, but I’m not as enthralled as some people seem to be. Context is critical to make it work well at all. On an empty file, Copilot will just generate random text that looks like it might belong, but given enough context (via comments, member names, and code in open tabs), it starts generating somewhat usable code.

I see Copilot advertised as “Your AI pair programmer”, and maybe that’s part of the problem. It’s like working with a junior developer that doesn’t know anything and will never learn. I’ve always hated pair programming because it’s so much slower than how I normally work, and Copilot isn’t really any different. I have to spend a lot of time evaluating what it suggests instead of just writing the code I need written.

That’s part of the problem: code has strict formatting. It’s not like natural languages where you can get words out of place or skip certain things and have the same (or at least similar) meaning. Copilot doesn’t actually understand code: it doesn’t know what access modifiers or data types are; it doesn’t know what a variable or method or class is. They’re all just words and all Copilot “knows” is that some words often follow other words.

The place it might work best is when you genuinely don’t know what to do. Maybe you’re working in a new language, and Copilot can help fill in the gaps. Maybe you’re working on a method to perform a simple task and Copilot can save you some time searching. But I find these are short-term or one-time tasks and the majority of what I do is generating code to solve specific business problems; things you can’t just pull code from an open-source repository to solve.

And maybe that’s what it comes down to: Copilot is great for poor developers. Those who write repetitive code; those who never innovate or improve. It’s what worries me most about these AI “tools”: they turn everyone into the lowest common denominator, while simultaneously making them think they’re geniuses.

Categories
Projects

BattleGrid Update – AI

One of the biggest hurdles I’ve had to face with BattleGrid is the AI. How do you teach a computer to play a strategy game? It’s obviously been done plenty of times, but I don’t have the expertise or experience that developers behind big-name RTS games have.

In general, I try to go simple when developing. The simpler the code, the less error-prone it is, and the easier it is to fix problems or add new functionality. I thought my first attempts at AI for BattleGrid were pretty simple. I had narrowed the number of things an AI might do to 5 categories: attack, defend, capture, expand, or extract (claim bitstreams). There was a lot of overlap between the 5 categories, but implementing each one was difficult, and they could conflict with each other easily.

I recently decided to try simplifying the AI further. I trimmed the 5 categories to 3 (expand, extract, and build), and then further to a single goal: build. Once I made the switch, the AI worked almost flawlessly – it never broke and even seemed to be playing intelligently at times. With this single objective – build – I could focus the AI on the important parts – selecting what and where to build. When something goes wrong (can’t build in the desired location, can’t afford the desired structure), it’s easy to abort and start over, letting the AI pick something new to build.

The AI’s thinking goes something like this:

  1. Pick a structure to build.
  2. Pick a location to build the structure.
  3. Wait until I can afford to build the structure.
  4. Build the structure.

It’s a simple process, and the first two steps are the important ones.

When selecting a structure, the AI uses several factors. First, if there are bitstreams in their territory that don’t have extractors, they choose to build an extractor (as any player should). If not, they look at all the structures they can build and choose one based on their aggression and the effectiveness of the unit against their enemies. If the AI’s aggression is low, they’re more likely to select a defensive structure like shields, factories, or artillery – structures effective when turtling. If it’s high, they’re more likely to choose missile turrets, blaster turrets, or outposts – structures effective for quick expansion and harassment. The second factor – effectiveness – makes an AI more likely to choose the best structure to counter an enemy. If you have shields, they’re more likely to pick factories (units can enter the shield) or blasters (bonus vs shields). If you have artillery, they’re more likely to build a shield. Between the two, the AI can make a fair decision on what would be best to build.

When placing a structure, the AI uses one factor primarily – fragility. This is a new value I added that tells the AI how “fragile” the unit is, which basically translates to whether the structure should be placed on the “inside” of their territory (closer to outposts) or on the edge. It’s probably not the most accurate term, since it’s different from the structure’s health or strength – the blaster, for instance, has low health but a fragility of 0, telling the AI to build it close to the edge of their territory. Shields, factories, and artillery have a higher fragility, telling the AI to put it closer to their outposts or core.

Between the two, I’ve seen some interesting habits. The AI is pretty good about placement of outposts – they usually go on the edge of their territory to provide maximum expansion. Turrets are placed in about the right ranges to counter effectively – missiles are usually placed outside the range of blasters, for example. They’re also more likely to use shields or factories, which is great. Overall, I’m extremely pleased that such a small change – and more importantly a change to a simpler design – has resulted in a fairly dynamic AI that behaves well and is easy to tweak.

More importantly, I’ve started to lose against my own AI. The harder AI cheats a bit (it gets more money than the player), but I’m having a hard time out-maneuvering it. Sometimes the AI is downright brutal, using outposts to box me in and restrict my resource capacity, limiting what I can build and leaving me to build blasters on the edge of my territory in a feeble attempt to fight back. And then the AI builds a new outpost, neutralizing my blasters and leaving me powerless, waiting for my inevitable defeat. Which is a wonderful thing when I’m trying to develop a decent AI to play against. Of course, that means I need to tweak some settings so a come-from-behind victory is possible – especially now that I’m finding myself in that position.

Categories
Development

Game Development

So over the course of developing BattleGrid, I’ve learned a lot about game development – nitty gritty stuff that it’s hard to learn without actually doing any of the work. I find a lot of this fascinating. I develop applications for a living, and while the basics are all the same, the details about how to do certain things are totally different. Case in point:

I don’t care how quick it is, split it up over multiple frames
There are a lot of things that I could do with a single line of code. I’m a bit of a code snob – I love clean, elegant code that’s easy to read, understand, and maintain. (I have a coworker that doesn’t feel the same way. We don’t get along.) In game development, though, you sometimes have to take a single unit of work and split it up over multiple frames. This means doing a bit of extra work to keep track of your progress each frame, so you can continue on the next frame. There are elegant ways to do this (I think I have a decent method), but it will never be as nice as the one-line beauty that I had before.

An example would probably help. Take targeting. Each turret and unit in BattleGrid has a targeting component that picks out the best target on the map. I’ve simply defined “best” as “closest”, so we just need to do a range check to find the closest. When I started, I just did the work, something like:

public void FindTarget()
{
    var target = this.Targeting.FindClosestTarget();
    this.SetTarget(target);
}

This worked great, except when an enemy was destroyed and the 5 turrets targeting it all decided to find new targets. Finding a target for an individual turret was easy and quick enough to do in a single frame, but if too many turrets all tried to find a target at the same time, the game starts to chug. So, I developed a simple queue:

TaskQueue.Enqueue(this.FindTarget);

Without going into any detail about what that means, that just puts the “FindTarget” action onto the task queue. The task queue executes one operation every tick (about 1/60th of a second), so if 5 turrets need targets, they just queue up their “FindTarget” actions, and targets are found over the next 5 ticks. This is fast enough that a player never notices, but slow enough that it doesn’t hurt framerate.

All this talk about targeting loosely relates to my next point:

AI is hard, but rewarding
I’ve never had kids, but I imagine teaching a kid to do something and then seeing them do it gives about the same warm fuzzy feeling that seeing an AI do what you taught it gives me. I’ve done a lot of work on the AI in BattleGrid, and I know there’s still plenty to be done. One of the hardest things to figure out was how to “teach” the AI what to build in given situations. I had a few ideas that were really complex, but ended up with a pretty simple idea – give the AI “effectiveness” values to work with.

This is what players do without thinking too much about it. “Oh, that guy has a blaster turret protecting his base. Missile turrets are longer range and can destroy that turret easily. I’ll build a missile turret.” I stored “effectiveness” values for every structure in the game. For instance, I’ve set it up so blaster turrets are highly effective against all mobile units, and artillery is effective against all stationary structures.

Now, I knew I had set these values and taught the AI how to use them. Even knowing that, the first time I saw an AI build a missile turret just out of range of one of my blaster turrets, I was amazed. That’s something a player would do – build a longer-range turret out of range of the shorter-range turret.

I’m still amazed when I see the AI counter my structures with the same things I would. I build artillery, they build shields and tanks. I build tanks, they build blasters. It’s the standard rock-paper-scissors scenario, but it’s fun to see it in action.

Continuing on AI…

Sometimes the AI needs to cheat
One of the things I noticed while playing is that the AI is much more difficult when it can afford more. When I dropped the price of all the structures (for testing), the AI could more effectively counter anything I did because they had the cash to do it. So, for more difficult AI settings, the AI cheats a bit… They get twice as much income as the player. This gives them significantly more cash, which means they’re not only thinking faster, they’re also building more that the player has to fight through. Fortunately, the AI isn’t particularly devious or clever, so the player can still beat them, but it takes significantly more work, which is exactly what I wanted for more difficult AI.

Balance is hard
I’ve struggled to get everything to feel balanced. The more options you give a player, the more difficult it is to make sure those options are balanced. I want BattleGrid to support multiple styles of play and make them all viable (playing offensively or defensively), but that makes it incredibly difficult to make sure each play style gets a fair shake. I don’t really have a solution here; it just takes a lot of testing and tweaking.

There are many more examples, but this is enough for now. Hopefully over time I’ll start to find elegant ways to solve all these problems; ways that I can carry forward into other projects.