Notes from building Cortado

I just launched Cortado, a small app for managing household finances with your partner.

You upload your bank statements and Cortado handles the rest: it keeps a running balance between you and your partner, categorizes every transaction, and turns cryptic bank descriptions into real vendor names. You get a clear picture of where your money goes each month, down to the subscriptions quietly adding up.

I know everyone's building their own app these days, so if you're just curious, there's no need to sign up and import your own transactions to see what this is about. I put together a quick demo that walks through the whole thing with proper, realistic data, so you can get a feel for it in a couple of minutes — access the demo here. And if you decide you want to use it for real, you can always sign up.

Cortado's landing page, with a peek at the shared household dashboard.

Why I bothered

My free time these days goes almost entirely to learning with AI, so I figured I'd point that at another personal time-sink: finally replacing the Google Sheet my partner and I had used for years to manage our finances. This one was more of a challenge than my usual side projects, though. It had to actually be reliable, held to the same bar as anything I'd ship at work. I'm not going to manage my money with a tool I don't trust.

It's pretty easy to one-shot something that looks decent and trick you into thinking you have a fully-fledged app that works reliably. But most people that tried something like this know that being passable and working reliably are two very different things. I wanted to go the distance and make something real and trustworthy, the same way I do with every product I build and work on at my day job. The difference is at work I typically have an engineering team for planning, implementation, and testing. This time I wanted to do all of it myself, and understand the true friction points at every step of making something as trustworthy, reliable, fast, and secure as possible.

In practice, that meant transactions reliably captured, calculated, categorized, and stored, and kept around indefinitely. There's no SOC 2 or formal compliance to answer to on a side project, but I tried to hold it to that bar anyway.

Getting the data in

The first complicated piece was handling the import of banking transactions reliably, from manual CSV exports. Every single bank, for every single type of account, has a slightly different format you need to expect and handle properly, and it's very sensitive: get it wrong and it can completely break the finance calculations. For now, until I have a more modern way to tackle this with direct bank connections integration (I'd love to have an API key from my Flinks friends 🙂), I'm building templates on a per-bank basis for the expected CSV format. Otherwise, you can set it up manually.

And right off the bat, if you don't handle it properly, test it thoroughly, and use real statements and real bank transactions to know the realistic scenarios you actually have to handle, you won't end up with something reliable. Again, you can probably make something that looks like it works, but as soon as the first real usage comes in, it'll break without all that care.

The CSV import wizard: upload, map columns, pick the months, review, done.

Categorizing every transaction

Beyond the monthly accounting of summing and splitting expenses with my partner, which is what I'd been using the Google Sheet for, I wanted to go a little bit further and add categorization of the transactions. This is the kind of feature that traditionally takes a machine learning approach with a lot of data and labeling, but I wanted to approach it by creating an engine powered by an LLM, since it was the first true AI feature I wanted to have. It was my chance to practice and take a dip into evals in a practical way: starting with my own golden dataset as a base, then a harness, a confusion matrix with regression bars, and a rerun loop so I can keep monitoring the engine's performance as I stack up more transactions over the coming months.

A month's spending, automatically sorted into categories.

The subscriptions that pile up

And after getting proper categorization working, I wanted to add one more thing I'd wished for back when I was on the Google Sheet but never had: visibility over my subscriptions, beyond the basic bills. We have so many different subscriptions now, including all the AI stuff, and I just want to see how much we're spending per month on subscription services. Of course there are plenty of other apps that do this, but since I'm already building this for myself, why not add one more feature I'd wanted anyway.

The subscriptions view: every recurring charge for the month, including all the AI tools I keep signing up for.

Cleaning up cryptic vendor names

And beyond properly categorizing transactions, I decided to also add a vendor registry, so the cryptic descriptions banks print on statements get consolidated into clean vendor names and icons. Recognizable brands are just nicer to scan than something like AMZN MKTP US*JK4F2, but the bigger reason is being able to total your spend per vendor. Every Amazon charge comes through as a slightly different string, so unless you map them all back to one, your Amazon spending is scattered across dozens of "vendors" that are really the same place. Consolidate them, and you can finally see what you actually spend at each one.

The vendor view: cryptic bank descriptions consolidated into clean names and icons, ranked by total spend.

What it actually took

After all, I'm proud of the result of all the pieces that I worked on, including:

Actually flexing the muscle by shipping a real AI feature, a categorization engine with an evals harness behind it
An effort to design something that might look a bit more polished than the usual bland, shabby default, with thoughtful UX and layout
Proper rigor, as if this were a traditional SaaS under compliance: real reliability, trust, and security for whoever chooses to use it

This whole thing took a few more weeks than I expected, mostly because it wasn't the one-shotted landing page or app some of my friends assumed it was at first glance. Pushing harder is exactly where you start running into the current limits of coding with AI, and where you feel that speed and task scope have a performance ceiling. A project that's complex enough, one that actually values reliability and has real concerns beyond being passable or prototype-worthy, needs a lot more from you: multiple agents, real division and orchestration of skills/agents, with the work broken into small, specific pieces so the AI doesn't drift.

And having to deal with the infrastructure piece means you have to know a little bit more of what you're doing: the proper deployment from the development stage to production. That's where it takes more of your time and attention to detail, to get something that actually works and that you can actually trust.

None of this is ever really finished, though. The vendor registry covers a lot, but there's always a long tail of merchants still to map. Keeping categorization accurate is its own ongoing job — I have to watch the evals, especially as the underlying models get upgraded and their behavior shifts. And there are always more design retouches I keep meaning to get to. These are exactly the kinds of gaps that users will tell you what you need to work on next.

If you got this far, I'd really love your feedback. Poke around the demo, or sign up and use it for real, then tell me what's missing, the one thing that'd make you actually keep using it. Or honestly, just tell me about some little everyday app you've been wanting to build for yourself.

Thank you for reading!