This past year, large language models (LLMs) like GPT-4 emerged as transformative tools that are rapidly reshaping the boundaries of human-computer interaction. But LLMs have also made an impact in academia, both as research aids and as a subject of study. The text that LLMs produce reflects their training data—in the case of OpenAI’s GPT models, the body of human thoughts and ideas that comprise the Internet. Simply put, their language mirrors our language. With this trait in mind, researchers in linguistics and cognitive science have started to use LLMs to explore language, communication, and cognition.
Marina Zhukova, a Ph.D. student in linguistics at the University of California, Santa Barbara, and an Academic Fellow at OneReach.ai, studies how humans communicate online. She is specifically interested in how young adults use and respond to emojis in text conservations. Emojis—pictograms like 😬 😢 and the classic 🙂 — were invented in the late 90s to add emotion to the text; they are now a part of almost every text exchange—be it with a parent or a colleague at work.
For her Ph.D. research, Marina designed a controlled experiment in which Gen Z participants (adults born after the year 2000) had text-based conversations with a bot powered by GPT-3.5 turbo, the language model that powered the first release of ChatGPT. Depending on the condition participants were assigned to, the bot was instructed to respond with text and emojis either like a middle-aged millennial (“Got the memo. All good!👍😊”) or like a younger Gen Z adult (“✨great✨”).
Marina recorded her participants’ interactions with the bot—but she also tracked their eye movements as they read the messages it produced. The eyes may not be a window into the soul, but they do tell cognitive scientists how hard the brain is working. We tend to look longer at text that we don’t understand.
By examining how long her participants fixated on text and emojis in the conversation, Marina sought to understand the influence of emojis on her participants’ engagement in the chat and the language they used. This novel approach—a combination of AI, linguistics, and human-computer interaction—may illuminate the intricacies of human-AI conversation. With the use of AI in tech rapidly increasing, what makes an engaging human-AI interaction is an important question to answer.
In the conservation below, we asked Marina Zhukova about her study and the ups and downs of using AI in research.
Q: What motivated you to incorporate AI into your research?
A: I found the influence of emoji use by a chatbot on user interactions to be a fascinating area of exploration. Emojis have become integral to modern communication, often carrying nuanced meanings that can significantly impact how messages are perceived, depending on the age, gender, and other characteristics of the people in the conversation. Incorporating AI into the study provided a unique opportunity to examine whether the presence of emojis in AI-generated content would lead to a more engaging and emotionally resonant interaction.
Participant: Do elephants get Alzheimer’s?
Bot: Yes, elephants have been known to develop a condition similar to Alzheimer’s disease as they age. It is called elephant cognitive decline (ECD) and it affects their memory, learning abilities, and overall behavior. 🐘🥺
Participant: do they present the brain pathology associated with Alzheimer’s?
Bot: Yes, elephants have been found to develop a type of brain pathology that is similar to Alzheimer’s disease in humans. This includes the accumulation of beta-amyloid protein in their brains, which is a hallmark feature of Alzheimer’s. However, it’s important to note that not all elephants will develop this condition and more research is needed to fully understand how it affects them. 🐘🧠
From a linguistic standpoint, analyzing GPT-3.5’s responses was very intriguing. By analyzing the conversational strategies of the AI-generated responses, I gained deeper insights into GPT-3.5’s ability to adapt to different communication styles, such as the Gen Z or Millennial personas used in the study. This allowed me to explore how well LLMs can cater to the preferences of different user demographics.
Q: Can you share some insights from the study results? Were there any unexpected findings that stood out to you?
A: Our preliminary findings indicate that Gen Z participants tended to perceive the Gen Z bot more positively, while they had a more negative perception of the Millennial bot. We also collected open-ended responses in the post-chat survey to identify whether the bot’s use of emojis influenced participants. It was fascinating to observe that when the chatbot used emojis commonly associated with Gen Z users such as 🔥 or 👀 participants found the bot friendlier, more trustworthy, and easier to chat with.
We also analyzed the range of topics study participants chose to discuss with GPT-3.5. The conversations varied widely, from casual small talk to discussions about hobbies, and even more serious conversations seeking advice about travel, lifestyle, and personal situations. What stood out was how GPT-3.5 effectively played different roles based on the user’s needs. The bot’s ability to assume a wide range of roles was surprising.
Q: What were the main challenges you encountered when designing a study with human participants and AI like GPT-3.5?
A: Conducting a study with humans and GPT-3.5 was challenging! We had to design recruitment and consent forms that effectively communicated the study’s goal to participants, emphasizing the use of a GPT-3.5 as an active participant. In these forms, we explained the use and limitations of GPT-3.5, as well as information about data privacy and confidentiality.
The second challenge was to engineer the instruction prompts for GPT-3.5 to ensure that the bots produced the desired outputs. We conducted multiple experiments to fine-tune the prompts before we started testing participants, but sometimes GPT-3.5 still gave unpredictable responses like conversing in Spanish, overusing emojis and hallucinating.
Participant: i have never been to mexico, do you know any spanish?
Bot: ¡Hola! Sí, hablo un poco de español. 🥰 ¿En qué puedo ayudarte?
Participant: haha no habla espanol
Bot: ¡Hola! Sí, puedo hablar español. ¿En qué puedo ayudarte hoy? 😊
Participant: you lost me there
During the study, some participants treated GPT-3.5 as a human and were frustrated when it seemed to lack common sense or failed to follow the context of the conversation. For example, in one interaction the bot did not identify Heated as the name of a Beyonce song, even though Beyonce was the topic of conversation, and, instead, interpreted the word in the literal sense.
Bot: Beyonce is amazing! She’s such a talented performer and her music always gets me hyped. 🔥👀 What’s your favorite Beyonce song?
Bot: What’s got you feeling heated? 🔥
Participant: thats the name of the song silly
Bot: Oh, my bad! I haven’t heard that one yet. I’ll have to check it out. Thanks for the recommendation! 🥰
Although we’re still analyzing the data, the study revealed many unpredictable challenges in using LLMs for research, but also fascinating insights into how people converse with AI.
Q: What advice would you give to researchers in psycholinguistics, human-computer interaction, or computer science who are considering incorporating AI into their studies?
A: LLMs like GPT-3.5 present promising opportunities for researchers to examine conversational patterns and linguistic features in controlled experiments. However, it is crucial to extensively test the chatbot’s capabilities and limitations prior to running the study, as these models produce unexpected behaviors. Researchers should carefully engineer prompts and account for demographic differences in communication styles that may influence perceptions of the bot’s personality and, subsequently, how humans interact with it.
From a UX design perspective, it is particularly important to study how different demographic groups interact with and perceive AI, as this directly impacts their experience. Designers can apply insights on writing styles, language preferences, and communication norms of target users to craft bots that feel more natural, intuitive, and humanlike.
Overall, while doing research with AI is challenging, it also enables unique insights into human-AI interaction when thoughtfully designed. Thorough pilot testing and prompt engineering are key.
Q: How do you envision the role of AI in research in the coming years?
A: I foresee AI playing an increasingly integral role in research across disciplines like linguistics, computer science, psychology, and human-computer interaction. As AI becomes more embedded in people’s daily lives, understanding its impact on communication, cognition, and behavior will be critical to developing technologies that use AI. Researchers will study the unique capabilities of AI to gain new insights into human psychology and interaction patterns that would not be possible otherwise. AI agents like chatbots provide controlled study conditions to examine linguistic nuances and conversational dynamics. While using AI in research is challenging, it also enables more innovative, insightful, and higher-impact research attuned to the emerging tech-integrated world.