I recently held a comparison of features of Anthropics new Claude 2 model against ChatGPT with Code Interpreter enabled. Claude 2 is a more recent large language model with a much larger context window size of 100K tokens (equating to about 75K words). It also features file uploads as standard and is currently completely free. That’s a great initial pitch from a new AI model, but even with those features I found it to be lacking in some areas. Here’s a few of the reasons why.
I asked both Claude and ChatGPT to write a poem about Python. Claude 2 did a good job, but ChatGPT seemed to come up with better, more complex sentences, rhymes and examples.
Both ChatGPT’s Code Interpreter and Claude 2 can handle file uploads. However, although Claude 2 has a huge context limit of 100K tokens, it can’t upload zip files - the contents need to be extracted. This will be very noticeable should you be doing any form of data analysis since text-based data csvs compress to be much smaller. Once unzipped they can fall above this 100K limit and you’ll be asked to reduce the data size.
ChatGPT’s code interpreter on the other hand is able to handle zipped files that are under 100MB making it much more versatile, extracting it using Pythons zipfile package. It appears to be a neat little workaround of its own context limits of just 8K, the file itself not being included in it.
Whilst working with Claude, I had to massively truncate the data set in order for it to be readable by Claude 2, which is not exactly ideal if you want your analysis to be thorough.
Claude 2 doesn’t have all the features of ChatGPT’s code interpreter. When asking it to look at some data and explain what it was, it took a very long time to respond and listed the column names along with a description. Since it isn’t executing code, it’s having to step through each row and inspect it.
ChatGPT on the other hand is able to actually read the columns using Pandas and respond explaining what they are very quickly.
Claude really seems to struggle when hallucinating. In my testing I asked “What is langchain?”. Although it was released well within it’s training set and appears on wikipedia, it repeatedly fought me when providing an answer that it’s an open source framework for LLMs. For some reason it was insistent that Anthropic had actually created it….?
Eventually I got so fed up with it, I told it to “do better next time”.
ChatGPT on the other hand responds with that it it has no knowledge of Langchain and that it’s cutoff was in September 21 - this seems far more appropriate when it’s training data does not include information on the subject at all.
Although Claude can write code, it cannot execute it. After suggesting some Python code to determine the average production budget for the movie dataset, it comes up with an answer of $94M. When I actually ran the code it suggested, the result was $92M.
I’ve had to copy the code out to a new file and execute it to determine that the answer it gave in the chat did not match.
Doing this with ChatGPT instead, I’m able to use code interpreter directly within the browser to test our results. If it had for some reason turned out to be wrong it would immediately go away and try to correct itself. Since we’re able to use an un-truncated version of the data including all rows with ChatGPT, the result it gives is completely different at $38M.
Having a huge context window of 100K is obviously a great win in the world of large language models, but as I’ve shown it isn’t everything. Most of the problems I found around data analysis are going to be a real problem for any one who is looking to get reliable information from their data.
ChatGPT’s Code Interpreter features are only found in its premium subscription, but for now I think it makes most sense to be using it, especially if you’re regularly doing data analysis.