Categories
Development

A Month With GitHub Copilot

As a follow-up to my last post a week in, I decided I’d post a small update with my impressions after a little more time. Overall, my opinion hasn’t really changed, but I’ve found a few places where Copilot is particularly handy and thought it worth pointing those out.

Where it Shines

Copilot is particularly good when you’re writing a bit of code that’s a common problem. For instance, I recently needed to implement a simple base64 string conversion in .NET. This is typically something I’d go to Google to look up the code for because I never remember (it’s a write-once and never look at again sort of thing).

I wrote my function definition (it’s an extension method):
public static string ToBase64(this string input)
Then opened the Copilot tool window and had it generate some options. I can’t remember if it was the first suggestion, but one of the options gave me both the ToBase64 and FromBase64 implementations. Perfect. So it saved me a trip to Google and a few minutes of coding time. Not a huge gain, but it’s something and was actually convenient. (Though on a side note, the inline suggestion wasn’t as useful.)

In addition, as I mentioned in my previous post, Copilot is pretty good at generating documentation. I can start a sentence and sometimes it’ll give me exactly what I want; though I have to do some editing about half the time, so overall it maybe saves me a few seconds each day.

I’ll occasionally use Copilot in unit tests, but typically I find myself writing those by hand. I follow some common patterns for my tests, and I typically set up templates for the common parts, leaving me with just the specifics that change with every test. Copilot isn’t as great at filling in those details. That said, it can sometimes be pretty helpful.

Where it Fails

I’m still not letting this thing generate more than a few lines of code. It’s still obviously following patterns instead of understanding anything about code and will regularly make up property names that follow a pattern but don’t actually exist on other objects.

It also doesn’t play nice with other auto-complete extensions. I might need to find a way to disable the inline suggestions, since it keeps conflicting with ReSharper (which I find more helpful in general). Along the same lines, I think I’ll need to change the shortcut used to fill in the inline suggestions, because I’ve found that I’ll sometimes fill in Copilot suggestions when I just meant to add a tab.

Finally, as I mentioned above, Copilot isn’t good at working from scratch. It can write code, but it’s always going to be modeled on existing code, so anything “new” is basically impossible unless it’s a common problem (the sort you’d look up a solution online for).

Final Thoughts

I think Copilot works in a few very specific scenarios. If you’re learning a new language, Copilot can accelerate that process, but you’ll still need to be mindful of what it generates. If you’re solving a common problem, it can generate that code for you, saving you the time you’d spend searching for the solution. If you’re writing code based on an existing pattern in your codebase, Copilot can do a decent job of mimicking it.

However, as a senior developer, most of my work isn’t cranking out code. Most of my time is spent designing or refactoring. I mostly work with languages I’m already familiar with. My time is spent working on the “new” or context-specific problems and not things I could find solutions for online.

If Copilot were a free tool I could turn on when needed and ignore the rest of the time, I’d use it everywhere. But at $10/month, I don’t think I’d ever purchase the individual subscription, aside from possibly for a month or two when learning something new. In my day job, I’ll use it if it’s provided, but it’s not something I’d push to have. (Unlike ReSharper, which is basically a requirement wherever I work.)

I’m still annoyed by a lot of the hype around Copilot (and LLMs in general), saying that developers will be obsolete or replaced by this technology in a few years. I doubt it. Being a developer is a lot more than typing code, and Copilot is only barely able to do that reliably. It might make some developers more productive, but I don’t think it will be replacing good developers anytime soon.

Categories
Development

A Week with GitHub Copilot

I’m very skeptical of the way current “AI” utilities are being pushed as “tools”. Everything I’ve seen so far makes it clear that they’re very fun toys, but since they don’t really “know” anything, they’re not authoritative or even trustworthy. I’m especially concerned because the companies building the software push the technology as a tool. And we’re already seeing them be abused.

But a few weeks ago, my boss asked the team about using GitHub Copilot. I rolled my eyes, expressed my dissatisfaction, but said I’d give it a try. (I at least want to have my fury informed by facts.) And having finally gotten access to Copilot, I decided I’d give it a try.

All my work is in JetBrains Rider (my favorite and only IDE), with the official GitHub Copilot extension. I’m working in an existing codebase, written in C#. (If it works well, I might try using it with some of the Angular front-end code.)

First Impressions

I got access in the afternoon on a Friday, so I didn’t get a lot of time to play with Copilot right away. I was interested in having it write a unit test for me: a simple test to verify a mapping was set up properly and working.

I wasn’t sure what I was expecting, but it failed pretty spectacularly, writing code that wouldn’t even compile (and Rider’s ReSharper plugin picked out right away). It was certainly trying to generate something useful, but it was obviously using the context of some existing tests, which obviously weren’t checking all the properties of the new mapping. Those tests were checking some of the properties, so the generated code wasn’t entirely useless, but it didn’t really save me much time when I had to come behind it and clean up. (And considering I have a regular expression I can use to convert the mapping code to test assertions with nearly 100% accuracy, Copilot looks even worse.)

I tinkered a little more in some other code, but results didn’t get much better; maybe about 20-40% correctness in the stuff it was generating.

Obviously, I wasn’t very impressed.

Second Impressions

I came back in on Monday with some new work I needed to do. My first task was pretty basic: an ASP.NET controller method to perform a search, taking a POST body with search parameters. We use MediatR to implement a mediator pattern in our code, but I already have a template for a request and handler, so I wasn’t going to bother having Copilot take a stab at that.

I started out creating the search request model, and I could tell Copilot was really trying. As this was an empty class, it didn’t have a lot of context to work with, but it was obviously checking some code in nearby files, because it was generating nearly-accurate names for properties as I added type information. When I pulled up the tool window and had it generate a bunch of options, they were very complete, but also very wrong; generating tons of properties for things that I’m pretty sure don’t show up anywhere in our code. That said, it looked like it was trying to generate a decent search request model, so I guess I’ll give it partial credit.

When adding the code for the MediatR handler and the actual SQL search query, it wasn’t particularly useful. The generated code was typically almost right, but generated property names that don’t exist.

I decided to give it a shot at generating tests for the query, and it finally showed its usefulness. It generated two tests that were actually pretty decent, fully covering the code but missing some edge cases. I did a little cleanup, added a few more tests, and called it done. I hate writing unit tests for SQL queries, but it saved me some time here and did most of the heavy lifting. I did some refactoring afterwards to extract some utility methods, and when I tried generating more tests, it actually started using those new methods (since they were now in use in the file), which is nice to see.

Another thing I’ve found that it’s better at handling: documentation. When I’m writing documentation in my code, it’s better at using the context to come up with a decent comment. (I guess that’s not too surprising: the context gives it everything it should need, and LLMs are great at generating text.)

Something I feel like I need to point out here: it’s slow. Copilot isn’t running locally, so it has to go ask the server to generate suggestions, which takes a lot of time. ReSharper is all running locally and has knowledge of the entire codebase, so it generates better results faster, and doesn’t need “AI” to help it do its job. Unit testing is a similar issue, where I’ve seen tools years ago that could generate more complete test coverage without “AI”. That said, if you truly don’t know what you’re doing, maybe it can help.

Learning How It Works

I spent some time over the next day or two reading up on how Copilot works. I already knew it’s basically “Chat GPT for code”, and I already understood the basics of how LLMs work (basically using a bunch of probabilities to “guess” the next word in a sentence). What I didn’t know was how Copilot learns about code it has no access to – our repository isn’t in GitHub.

Turns out the IDE plugin sends a bunch of context with the requests to Copilot. The context includes the code from the file you’re editing along with code from all your open tabs, so the more tabs you have open with relevant code, the better the suggestions get. This is how I typically work, but now that I knew this is what Copilot was using, I started keeping more tabs open when working on code.

Third Impressions

I feel like I was hitting a peak through the rest of the week. Copilot was occasionally generating useful code, but it was still painfully obvious it knew nothing about my code; it was just generating text that “looked” like it fit. I used it to build some SQL views and just like C#, it generated SQL that referred to tables and columns that didn’t exist.

I find I have to be more vigilant when working with Copilot. Whereas ReSharper uses context to provide suggestions that actually fit in with existing code, Copilot will often generate complete gibberish.

Final Impressions

I suppose I’m starting to find a place where Copilot can come in to play, but I’m not as enthralled as some people seem to be. Context is critical to make it work well at all. On an empty file, Copilot will just generate random text that looks like it might belong, but given enough context (via comments, member names, and code in open tabs), it starts generating somewhat usable code.

I see Copilot advertised as “Your AI pair programmer”, and maybe that’s part of the problem. It’s like working with a junior developer that doesn’t know anything and will never learn. I’ve always hated pair programming because it’s so much slower than how I normally work, and Copilot isn’t really any different. I have to spend a lot of time evaluating what it suggests instead of just writing the code I need written.

That’s part of the problem: code has strict formatting. It’s not like natural languages where you can get words out of place or skip certain things and have the same (or at least similar) meaning. Copilot doesn’t actually understand code: it doesn’t know what access modifiers or data types are; it doesn’t know what a variable or method or class is. They’re all just words and all Copilot “knows” is that some words often follow other words.

The place it might work best is when you genuinely don’t know what to do. Maybe you’re working in a new language, and Copilot can help fill in the gaps. Maybe you’re working on a method to perform a simple task and Copilot can save you some time searching. But I find these are short-term or one-time tasks and the majority of what I do is generating code to solve specific business problems; things you can’t just pull code from an open-source repository to solve.

And maybe that’s what it comes down to: Copilot is great for poor developers. Those who write repetitive code; those who never innovate or improve. It’s what worries me most about these AI “tools”: they turn everyone into the lowest common denominator, while simultaneously making them think they’re geniuses.