Comparing Experiences: GPT-4o vs. Claude 3.5 Sonnet - Great for Light Coding, A Step Backwards for Text

In the fast-evolving world of artificial intelligence, keeping up with the latest advancements can be both exciting and overwhelming. Recently, Anthropic released its newest model, Claude 3.5 Sonnet, which has generated significant buzz across social media platforms. Many are claiming it to be a major step forward in AI technology.

As a software developer and content creator with over a decade of experience, I’ve been using AI-driven tools for coding and text editing extensively. For the past few months, I’ve relied heavily on GPT-4o as my primary AI assistant. When Claude 3.5 Sonnet was released, I decided to put it to the test to see if it truly lives up to the hype or if it’s merely much ado about nothing.

In this blog post, I will share my experiences comparing Claude 3.5 Sonnet with GPT-4o, particularly focusing on their performance in coding and text editing tasks. Let’s dive into the details and see which AI model comes out on top for different use cases.

Text Editing Performance

I put Claude 3.5 Sonnet to the test on text editing and was disappointed. While no AI platform I’ve used is really up to snuff on writing text, they typically do a very good job on first drafts and an excellent job on proofreading. However, they still don’t produce the sort of high-quality text you’d want for even a casual blog post.

As the self-proclaimed King of Typos™, I routinely have GPT-4o proofread my blog posts and fix any errors. GPT-4o is far from perfect at this and frequently will “correct” something on one pass and then in the next pass tell me that the “correction” is wrong. Still, it’s incredibly quick and fairly accurate.

When I tried having Claude 3.5 Sonnet proofread my blog, it repeatedly made major changes to the text. These changes always resulted in vastly shorter versions of the text, often with key details left out. This is the exact issue I had with GPT-3.5 but that GPT-4o avoids unless I specifically ask it to condense my prose.

To be specific, my blog post started at 1,035 words and the result from Claude 3.5 Sonnet was 335 words! I would love to find an AI that helped me be more concise, but I’m looking for a little trim to clean up my hairline, not a crew cut!

I tried a few different prompts and eventually was able, through excessive prompting, to have Claude 3.5 Sonnet mostly just clean up my text instead of rewriting it. Still, I kept finding spots where it had condensed things. This means I had to look over every single line carefully to make sure nothing was missing. Why would I use Claude 3.5 Sonnet for this task, knowing I have to carefully double-check everything, when I know GPT-4o will do it and only require a brief look over?

Claude 3.5 Sonnet’s Recommendations – More Useful for Text Recomendations

After proofreading and before posting a blog post, I often ask AI how I might make the post better. I take maybe one in five recommendations from GPT-4o, but asking allows me to think more deeply on how I might improve the post. I asked both GPT-4o and Claude 3.5 Sonnet to give me recommendations.

Both said I should improve the introduction, and so I did. Oddly, they both said I should have sections with headers, which I already had. They also both gave some specific recommendations on tone.

Claude 3.5 Sonnet, however, went on with some other good suggestions. It recommended I go into more depth explaining certain topics, such as usage limits, the interface, and processing time (which I have). I really liked its recommendation that I talk about what I’d like to see in the next iteration of these LLMs.

In short, Claude 3.5 Sonnet’s recommendations were much more useful than GPT-4o’s. I’m a bit bewildered, because when it decided to rewrite my post instead of the asked-for proofreading, it didn’t implement any of these suggestions. Still, the suggestion themselves were better than the ones I get from GPT-4o.

What I’d Like to See Next – Text

I’m looking forward to the day when the AI can reliably implement its own recommendations while still sticking to what I wrote as much as possible. An output that clearly showed what changed and what didn’t would also be very helpful. Sometimes I can get GPT-4o to do this, but not reliably.

In the longer term, AI understanding my preferences for writing and making recommendations will be welcome. Looking back on my past work and recommending links and segues into some of the past concepts would be great as well.

AI is already a great co-writer. I expect it to only get better. For now, I will use Claude 3.5 Sonnet for recommendations but GPT-4o to proofread everything. I’d rather use only one system, but I can’t trust Claude 3.5 not to rewrite instead of proofread.

Coding Performance

First Impressions – Coding

My initial impression of Claude 3.5 Sonnet for coding tasks is quite positive. The first task I assigned it was to remove a database call for scraping websites and instead allow manual entry. This sort of simple task is a great place for AI to help as it can do it much more quickly than I can.

GPT-4o handled it well, quickly and easily. However, Sonnet went further, not just removing the database call but all the database-related code. This was a bit too much since the scraped data still needed to be stored in the database. A quick revised prompt later and Sonnet reinstated the necessary database calls, and the code looked error-free. With GPT-4o, about 25% of the time, the code needs tweaking due to missing imports or other minor issues, which wasn’t the case here.

Performance and Usability – Coding

Claude 3.5 Sonnet’s code executed perfectly and performed the task as expected, earning a solid B+ from me. When asked to “clean up the code,” it rewrote the entire thing, making it more class-based and well-structured. Had it rewritten the code without getting rid of vital parts I needed, it likely would have earned an A+. So far, Claude 3.5 Sonnet looks promising, though verifying its full functionality will take some time.

Additionally, I used Sonnet to upgrade chunking versus truncating in my code and other technical aspects. It performed admirably; overall rating slightly better than GPT-4o. However, a significant drawback is the frequent hitting of usage limits, even with the Pro version. This limitation is a major issue, preventing me from fully relying on it for my work.

Usage Limitations

Right now, the usage limits on Claude 3.5 Sonnet are a little vague. It is based on total system usage by all users and the length of your inputs and outputs. My estimate is that given the size of my requests, which tend to be quite large, I can do about 75 requests in 6 hours. This is the volume I typically do in 2 or 3 hours, meaning that if I use Claude 3.5 Sonnet I will be idle for 3-4 out of every 6 hours.

Given these limitations, I will continue using GPT-4o for most things and use Claude 3.5 Sonnet only in cases where GPT-4o is struggling with the code.

What I’d Like to See Next

One big thing lacking from all AI now is interactivity. By this, I mean the Large Language Model (LLM) asking the user clarifying questions. Instead, the LLM makes a bunch of assumptions and does its best to fulfill what it thinks are your wishes. If you are working with a human, this sort of interaction is so common it goes without saying. Before running off and doing something, you make sure the person or group you are working with all have a common understanding of the problem and the proposed solution.

For instance, in the coding example, I would have rather the AI said “Do you also want me to remove the data being saved to the database?” instead of automatically removing it. With text, it would have been great for it to ask “Do you want me to condense things as well?”

I believe that for AI to progress it must start asking these clarifying questions. The smarter and faster AI gets, the more problematic it will be when it makes its own incorrect decisions about how to proceed without asking for guidance in key areas.

Conclusion

While Claude 3.5 Sonnet shows great potential and delivers quality code, its usage limits are a substantial hindrance. It does a good job with making recommendations for prose but a much poorer job implementing them.

Claude 3.5 Sonnet has found a solid place in my toolkit. I will use it for tough coding problems and for recommendations on text. Once usage limits ease up, I will probably use it more for coding. I will continue to use GPT-4o for proofreading and basic coding tasks.

For coding tasks, Claude 3.5 Sonnet seems to have a slight edge in terms of code quality and structure. However, its usage limits make it impractical for heavy users like myself. GPT-4o remains more reliable for consistent, day-to-day use.

In terms of text editing, GPT-4o is clearly superior for proofreading tasks. Claude 3.5 Sonnet’s tendency to rewrite and condense text makes it less suitable for this purpose. However, its ability to provide insightful recommendations for improving content is useful.

For casual users or those with lighter workloads, Claude 3.5 Sonnet could be an excellent choice, especially for coding tasks and content improvement suggestions. For power users and those who require frequent proofreading, GPT-4o remains the better option.

I’m truly impressed with the continually improving capabilities of these models and can’t wait to see what comes next!

Lowry On Leadership