LLM on Software Engineering

As my Google Pro AI subscription is coming to an end, there are a few things I would like to share in this post.

There is a great deal of hype surrounding these tools, and most of it is simply noise. I do not claim that my observations are comprehensive; this is my personal experience, and I believe it is not an overstatement.

Things have changed significantly since I last tried frontier models a year ago, both for the better and for the worse.

Things I found particularly helpful:

  • Revived two personal projects that I had had no time to work on a P2P file transfer tool, a cURL UI wrapper. Both are now fully functional, and I use them every day.
  • Learned simple Box2D API
  • Revamped this very personal website, primarily by updating the CSS styles (an area in which I have limited proficiency).
  • Generated test cases for real-world projects (property-based and mutation tests).
  • Learned several topics I had long found interesting yet difficult (e.g. Digital Design, Fourier Transforms, and Game Theory), then combined the results with NotebookLM. The experience was genuinely joyful.

Things I did not like:

  • Antigravity is essentially a VS Code makeover and therefore runs on Electron under the hood. Come on Google, you have world-class engineers and frontier models. Appeals to speed of delivery, velocity, and maintainability are merely excuses; the company should be capable of shipping a native cross-platform UI.
  • Recurring errors in the middle of sessions are genuinely annoying. I do not know whether this is an intentional tactic to increase token usage.
  • The system can access sensitive information such as API keys, credentials, and any other confidential data stored on my computer. This is a serious privacy concern.
  • Sadly, it does not retain learning. When I repeat the same tasks at intervals of several days, it often produces different and frequently incorrect results.

I am uncertain whether I wish to continue paying for Google’s service. I have no intention of trying Claude Code, let alone Codex (given that their models are deployed to support war efforts).

That said, I strongly dislike the centralised nature of these tools. I would much prefer to run them locally. While I am aware that capable local models exist, they remain roughly equivalent to last year’s frontier models.

There is reason for optimism. Projects such as Tinygrad (still several years from maturity) and certain Chinese models (I understand the latest Qwen runs smoothly on Apple M-series chips) offer promising paths forward.

P.S. I do not use agentic tools, as I am unwilling to grant them full access control over my systems.