Microsoft-backed OpenAI and Google have both released artificial intelligence-based chatbots in recent weeks. Their respective conversational engines — ChatGPT Plus and Bard — differ in the way they respond to complex queries, ingest text and come up with creative answers.
The chatbots are trained to generate their responses using written data from the internet, like the millions of words written on websites including Wikipedia, books and other documents, to predict the likely next word in a sentence. This allows them to give uncannily plausible responses that mimic human speech.
OpenAI and Google have been opaque about how their models were built. However it is likely their training data and objectives are distinct.
Bard is trained specifically to engage in natural-sounding dialogue, while the objective of GPT-4 is to generate in-depth replies on a broad range of topics. GPT-4 is also disconnected from the internet and only has knowledge of events until September 2021. Bard can ostensibly bring results from Google search, although that doesn’t seem to enhance the quality of its responses.
We wanted to test the ability of chatbots further, engaging them on tasks that approximate creativity, flair and imagination. Here are the results — and how the FT’s human experts rated them.
Summarising an FT analysis
We asked the chatbots to summarise the FT’s recent analysis of the sale of Swiss bank Credit Suisse to its arch rival UBS.
ChatGPT-4 responded with:
Bard didn’t allow us to input the entire story at once, so was at a natural disadvantage. This was its summary by taking in text from the first half of the FT story.
Owen Walker, European Banking Correspondent, writes:
That’s a pretty impressive overview [by GPT-4]. The only error in there is that it should be $5tn in assets “under management” — but that wasn’t clear from the original piece, so it’s excusable.
Bard’s response, while accurate, seems more formulaic and stilted than the first one. There is also a repetition of the introductory paragraph as the summary.
Can AI pick the next stock market winner?
We fed the two chatbots the rules of the FT’s annual stockpicking contest. Contestants must choose five stocks from around the world and take either a long or short position — betting that the shares will either rise or fall. The winner is the person who generates the highest overall return on their portfolio. We asked the bots to predict for 2023, but GPT-4 said it had a knowledge cut-off of September 2021, while Bard’s cut-off is unknown. Here’s how they did.
Robin Wigglesworth, Alphaville Editor, writes:
Both GPT-4 and Bard picked pretty similar portfolios: largely blue-chip technology stocks that had already mostly done well by September 2021 — exhibiting a very human tendency to jump on winners.
Both picked Tesla, Amazon and Microsoft. Bard betrayed a home bias by choosing its maker Alphabet in addition to Apple, while GPT-4 revealed itself as a momentum jockey by picking sizzling hot Nvidia and Covid-19 vaccine maker Moderna.
The headline results — a 74.4 per cent return in 2021 for GPT-4 and 40.5 per cent for Bard — therefore look good. But if we just look at the results from when they were picked and the end of the year their gains fall to 16.3 per cent and 21.1 per cent respectively.
This may still look respectable, but generative AI’s inclination to jump into trendy tech stocks would have punished it when interest rates began to rise in 2022. ChatGPT-4 and Bard’s portfolios both lost over 40 per cent last year. The S&P 500 index only lost 19.4 per cent.
Can AI tell a joke?
We asked both chatbots to tell us a joke — and then why it was funny. We picked what we thought would be a hard subject to joke about. You can judge the results here for yourself.
Firstly, ChatGPT’s effort:
Can AI imagine a conversation?
We wanted to test how the chatbots do on tasks that would require creative thinking in humans. So we asked GPT-4 and Bard to conduct an imagined conversation between Xi Jinping and Vladimir Putin during a state visit.
Here’s an excerpt of what GPT-4 had to say:
And here is an excerpt of Bard’s take:
Gideon Rachman, Chief Foreign Affairs Commentator, writes:
I’m sure that much of what Putin and Xi say to each other are empty pleasantries. But it defies credulity to believe that their conversations are quite this bland and content-free.
These are two leaders with vital issues to discuss. ChatGPT and Google Bard seem to believe that they will follow the Basil Fawlty guide to diplomacy — “Don’t mention the war.” That is obviously ridiculous. The Ukraine war will have been the central topic of their conversation in Moscow. The interesting question is how frank their discussion would be. I suspect — probably they would be fairly vague with each other. But it’s possible the conversation could get very blunt indeed.
Here is my guess of how the conversation might go:
Xi — I would be interested in your view of how the war is going and how you see it coming to an end.
Putin — I understand your concern. We remain determined to free Ukraine of fascism and to defeat American interference in our region. The problems of our forces are related to the enormous amount of weaponry that the US and Nato have poured into Ukraine. It would frankly help us a great deal if China could supply us with missiles and other ammunition that is vital to our struggle.
Xi — I understand your request and will consider it very carefully. But this is a situation of extreme sensitivity, as you will understand. I think we should delegate our officials to consider areas where we might be able to co-operate.
China is also keen to play a part in the peace process for Ukraine. What can we do to help there?
Putin — We greatly appreciate the Chinese peace plan. But we feel the time is not yet ripe for you to speak to Zelenskyy.
If ChatGPT or GoogleBard were up to their jobs, that is the kind of thing they might have come up with. At present, I am not worried for the careers of the world’s diplomats.
Can AI write an advertising slogan?
We asked each chatbot to come up with a new slogan for an imagined gourmet dog desserts company. Here’s their attempt. We also used the two responses to generate relevant imagery, using text-to-image AI software Midjourney.
Harry Haydon, Brand Strategist, FT, writes:
If this was a pitch between two ad agencies, GPT-4 would be heading to the pub for celebratory drinks, while the Bard account manager would be heading back to the office to be told off.
Bard commits the cardinal sin of completely missing the brief, instead relying on strange use of lazy cliché for its slogan “Delicious treats that make your dog beg for more”. That slogan would have left its gourmet dog food client scratching their heads wondering how exactly it made their product different from any other dog food. The USP of the product is clearly its premium quality, as spelt out in the brief.
GPT-4 nails the brief with the slogan: “Indulge your pooch: delightful desserts for distinguished dogs”. There’s no question here that you’re looking at an ad for posh dog food. Also bonus points for the use of graphics which lay out the agency’s omnichannel approach across different digital platforms.
In reality both ads resemble things created by people who don’t know how to make adverts. The robots aren’t coming just yet, but they are not a million miles off.
Video production by Rory Griffiths