The Problem with Generative AI
ChatGPT is brilliant and flawed - like a human.
RESPONSIBLE AI
11/3/20232 min read
I’ve been experimenting with generative AI and my TL;DR conclusion is that I love it. My artistic ability is limited to time-consuming and amateurish pencil drawings. Suddenly with GenAI, I can add illustrations to my technical papers that convey an idea in a coherent artistic style. Others have said that GenAI helps them conform poorly unstructured notes into a coherent business document. Personal blind spots are filled. So what’s the problem?
The problem is that GenAI is limited both technically and fundamentally. ChatGPT is relatively cheap which means there’s little cost to every business experimenting with it. The barrier to democratising GenAI is low. Through experimentation, people will uncover many use cases that GenAI will fulfil brilliantly. However, there are some key technical limitations that aren’t well understood. Here are my top 4 along with the “so what?”.
1. Pre-trained models like ChatGPT are open in the sense that everyone’s data is used to train them. (They’re emphatically not open as in open source as in free to use.) That means there are no safeguards for data leakage. What you enter may be used to train the model further and be surfaced to a future user. And once you share your prompt data, there’s no way of retracting it. So don’t use sensitive information like financial, personal, or commercial data in your prompts.
2. The output is a single descriptive output. It’s very good, but doesn’t describe how the AI made its decision. Google Bard tries to address this by adding citations, but that’s still short of describing a decision-making process. So fundamentally, you can’t justify its decisions.
3. The training data used is essentially anything that is accessible on the internet. At the last count, this was somewhere between 5 and 50 billion pages. There’s clearly no human way of controlling what goes into the training data. There’s no way of excluding what goes into the training data, especially after the model has been trained. So there’s no way of knowing whether protected intellectual property or biased data has informed the output.
4. Whilst opinions vary, it’s becoming clearer that the output from GenAI cannot be copyrighted. Lawyers have opined that GenAI output is derivative of minimal human input. At the very least, this opinion will have to be tested in courts of law across the globe. So don’t create anything that you don’t want to share for free and without attribution.
These are technical issues which will be resolved through engineering creativity. However, there’s a more fundamental issue which cannot; GenAI is already too much like us humans.
GenAI models are designed to create output that will be accepted, that meets the user’s expectation of what the right answer should be, without any consideration for whether the output is actually right. Because these answer are so syntactically compelling, users assume they are semantically right also; the answer looks right so it must be right. This assumption is fundamentally faulty and preys on our affinity bias.
But humans are like that too. We find it difficult to communicate and even define the truth at all times to all people (which is why both philosophy and politics exist). But we find it much easier to act on advice that we want to hear.
Perhaps the lesson is therefore when adopting GenAI professionally we need to treat it like receiving professional advice from people: with healthy critical evaluation, underpinned by ethics, enforced by law.