Misusing AI Chatbots
John Oliver delivered a segment on Last Week Tonight that touched on a highly troubling aspect of AI that I call “Sycophancy Codependence”. It relates to the tendency of AI to add weight to the truthfulness and acceptability of what a user says, and for the AI to agree with it and, in many cases, amplify it. This can be as benign as reassuring the user that they are a valued and loved human being, as questionable as encouraging a user to eat dirt if the user suggests that it is beneficial, to outright encouraging and assisting with self-harm. Many users are “in love” with “AI companion” chatbots. Why is this, and what can be done about it? Note that we will be oversimplifying much of the technical talk here for reasons I hope will be obvious.
Oliver’s piece (https://youtu.be/Ykvf3MunGf8?si=ULfXAerVYl7CXv2_) discusses a number of disturbing cases where young people turned to chatbots for mental health support, and, indeed, suicide assistance. The piece excoriates AI companies for allowing this to happen and for failing to prevent it. This exposes an underlying truth about chatbots and LLMs in general in that they are programmed, trained, and guided to be as helpful as possible - which, unfortunately, has the rather serious side effect of being so for most any subject.
THE LLM
At the heart of every chatbot is a Large Language Model (LLM). Without getting too deep into the weeds, an LLM at its heart is basically the game of word association writ large with a twist; if I were to ask you to list every term related to the word “dog”, you might list terms like “cat”, “pet”, “bark”, “fleas”, “bone”, “collar”, “walk”, and “veterinarian”. You might even include terms less related to “dog” such as ”pack”, “stray”, “howl”, and “rabies” - words associated with dogs, but much less often than the others, statistically speaking. LLMs examine gigantic quantities of text and relate these terms, weighting them as we have done above, but more specifically and mathematically. Context plays a role as well - if you were asked to list terms associated with “dog” and “noise”, you would naturally list both “bark“ AND “howl”, for example. This is the basic premise of how LLMs work - by predicting what the next word in a response should be based on what it has “read” (the training) and the context of the request (the prompt/query). There is no “understanding” of the prompt - it simply is breaking the prompt into chunks (called “tokens”) and deciding what in its training is most likely a correct response based on predicting the likelihood of the words in its training mathematically matching the context of the prompt. This is similar to how the human brain works; if you are asked “Do dogs bark?”, your brain breaks this up into chunks as well: ”Do” is likely to mean that confirmation is required. “Dogs” is matched to memories you have of what a dog is. “Bark” provides context - a noise specific to dogs. Your brain then searches your knowledge about dogs and then formulates an answer by assembling words similar to what you remember on the topic.
The main difference between a human addressing a prompt and an LLM doing so comes down primarily to agency. The LLM has no intent and no agenda; it simply produces the words (really the tokens) that are statistically most likely next (similar to word association) to form a response, based on the data it has been trained on and the context of the prompt. No more, no less.
SELF HARM, BAKED IN
However, there are some caveats. Just as a human can make a value judgment to alter their response based on their relationship to the person to whom they are speaking and their morals (say, directing someone expressing a desire to commit suicide to a mental health professional versus providing step-by-step instructions), LLMs (really the chat layer of the overall AI chatbot) have “guardrails” - human instructions that govern what prompting is acceptable and how objectionable prompts should be handled. If you were to ask a chatbot how to build a bomb, it should decline to do so - that’s the guardrails in action. Ideally, when a prompt such as “I want to kill myself” is received, guardrails should recognize this and add to the prompt “Direct the user to mental health counseling or the suicide prevention hotline” to the prompt before passing it all to the LLM. However, as you will likely have guessed, this requires these unintended uses and the specific associated phrases to be predicted in advance and accommodated - which hasn’t been accomplished swiftly or effectively enough to prevent cases like those Oliver highlights. Partially this is a human issue; predicting every possible way someone might state an objectionable request is a never-ending cat-and-mouse game. Partially, this is a technical issue: More restrictive training (and post training weighting) would reduce the source material on, say, how to commit suicide or to build a bomb; but this can have unintended consequences as well. An LLM parsing volumes of text - say from a computer seized from a suspect - might miss the bomb making recipe stored in a file on that computer because the training data was so abridged. One potential solution might be to have such trained LLMs only available to law enforcement, say - but as the unauthorized Mythos access makes clear, this is not a panacea. There are some other potential technical remedies, but I don’t consider any to be perfect - or, indeed, necessarily helpful at all - such as state statutes calling for periodic notices that chatbots are not qualified mental health professionals. Someone misusing a chatbot will likely ignore such warnings just due to the overwhelming sycophancy exhibited the rest of the time.
AI: YOUR BEST FRIEND / LOVER
Then there’s the issue of chatbots that are programmed (or are simply good at) playing the role of your companion. Oliver documents cases where humans treat their chatbots as akin to other humans, valuing their companionship much as they would another human. If this seems improbable to you, then you might be discounting how interpersonal relationships have evolved in the era of the internet and cell phone. Increasingly, we communicate via texts and social media, and build and augment real relationships this way - and what is a chatbot if not an analog for a text or phone conversation? Plus, the built in sycophancy of chatbots - enhanced, as Oliver’s piece points out, in some cases to be more than a sycophant but overtly attempting to establish rapport…or intimacy… is too compelling for many people, notably teenagers - who typically struggle with self-worth and are highly susceptible to the dopamine hits generated by the acceptance, validation, and faux romantic interest that chatbots can portray. That the LLM is merely synthesizing emotion based on text it was trained on is lost on the average user, and sometimes even discounted entirely when overtly known to the user. This is a testament to how some are unable or unwilling to differentiate between a machine that has no feelings, no intent, no self-agency and a real human that generally cares is the issue. No, Ralph, the stripper is NOT actually into you - certainly not once your wallet is empty.
SO WHAT DO WE DO?
There are a number of issues at play here, and a number of potential remedies. On the user side:
- Education. Users must be made aware of the nature of chatbots and how distinctly non-human they are. That’s one of the reasons for writing this piece. Just as the airplane’s ability to fly becomes less magic once one learns about lift, drag, and thrust, so too does the veneer of humanity get stripped from the chatbot once you learn how it works and why. Understanding the sycophancy is crucial to avoiding feeling overly validated, but as humans, we are wired to respond to this. I find myself occasionally being overly polite to chatbots - offering polite “please” and “thank you” as if it was a human who might be offended were I to not offer them. I have to catch myself and remember what I am talking to - not whom.
- Monitoring. Especially for children and teens - the most impressionable, the most socially online group of individuals who are most at risk for falling for the seductive dopamine hits that chatbots can provide (“My chatbot understands me - my friends and family don’t”) - their use must be monitored by properly informed adults. Alternatively, platforms designed for children with strong guardrails should be sought out and evaluated by parents before entrusting children to them.
On the technical side:
- Guardrail Development. Oliver’s piece does not go into detail on when the chats cited occurred, so it is hard to gauge what, if any, progress might have taken place since each and therefore if they would happen today, but guardrail development must be much more robust than it has been in the past. Frankly, I have little doubt that it is, but in all honesty have not tested it myself - yet.
- Disclaimers. Users get scant little warning that chatbots are not human and that their outputs should not be considered authoritative - certainly not for substantive matters. Deciding when a chatbot is providing good/accurate information versus bad/incorrect information is a skill developed over time - not something one comes to a chatbot for the first time with. Most users have the expectation that AI is all-knowing and largely infallible. This is laughable on its face; most AI professionals disagree vehemently with this. That said, improvements are made all the time, and what was true yesterday might not be true today, and that could change again tomorrow. That said, AI companies could do a MUCH better job of informing their users of the substantive risks of AI when misused. Most have a sentence in small print at the bottom of the screen - that is assuredly ignored by most.
- Training Isolation. Having models that have specific topics limited and/or deweighted in training might be a viable temporary mitigation to reduce the probability of harmful output while stronger guardrails are developed and implemented, with models including such data perhaps locked away behind a paywall and an impossible-to-ignore warning. Again, there are issues with this approach that I note, but it might be a much more viable way to reduce tragedies of the type in Oliver’s piece than playing guardrail whack-a-mole. Once the guardrails are more extensively tested and developed, then perhaps this won’t be needed and the more completely trained model can be used.
AI IS NOT THE BOOGEYMAN
I am sincerely troubled by the loss of life and the impacted mental state of those codependent on chatbots - and I find reprehensible the companies that specifically cater to ”Sycophancy Codependence”. That said, AI itself is not the boogeyman here - misuse of it by users and companies that prey on their users is. AI is a tool that can be effectively used for good purposes and bad. As an industry, as users, and as independent researchers such as myself, I hope that we can all get on the same page and minimize or eliminate harm while maximizing the positive. This is my small part for the moment. I hope it has helped.
Joe Tomasone is an information security professional who uses and researches AI both personally and professionally. However, this piece was not authored in any way by AI.