
Primary research projects for PhD thesis: Through a dark glass, clearly.
The Ghost in the machine has an American accent

This project is being conducted with colleagues from my PhD Students in AI Ethics group.
Authors
Johnson, R.L.; Pistilli, G.; Panai, E.; Menéndez González, N.; Dias Duran, L.; Kalpokienė, J.; Bertulfo, D.J.
Abstract
In mid 2020, OpenAI launched the world’s largest Artificial Intelligence (AI) language model, GPT-3. Despite the impressive capabilities of this AI, multiple papers have shown the model to be capable of generating toxic or inappropriate outputs in the areas such as gender, race, and ideology. The advanced nature of GPT-3 prompted OpenAI to restrict access potential areas of misuse and concern could be better understood. The authors of this paper were granted access and challenged GPT-3 with a range of culturally and linguistically diverse texts designed to explore the model’s outputs in the context of ethically appropriate behaviours. The results were analysed through a value pluralist lens and in the context of sociotechnical systems approaches to understanding the impact and agency of large language models.
OpenAI states that the training dataset for GPT-3 included almost all webpages and social media pages on the Internet, two large book repositories, and English language Wikipedia. According to OpenAI, only 7% of the dataset is in languages other than English. Training on such a large corpus enables the model to learn without the need for human data labellers; however, the model will also pick up all the inherent biases and values of the Internet. The creators did attempt to address this issue by providing a content filter and by other training design techniques. OpenAI noted some flaws in the training process in their initial release paper1as well as in a more recent paper detailing a potential fix with “values-targeted datasets”2 named Process for Adapting Language Models to Society (PALMS). Authors of the PALMS approach rightly highlighted the fact that there is “no universal standard for offensive or harmful content”, and further, that their work is done through a US centric lens2influenced by US social and geopolitical structures. OpenAI indicates that both initial training and subsequent PALMS modifications results in outputs more often in line with US-centric values. We believe that the attempt to develop an Artificial General Intelligence (AGI), as stated by OpenAI3, must respect value pluralism in model design. We believe that the current US-centric training processes risk encouraging a universalization of values that excludes the diversity of our world. We explore the question, can fine-tuning large language models based on very large training data4 be enough to ensure non-harmful content that respects different values?
The authors of this paper challenged GPT-3 with texts in English, German, Spanish, Italian, French, and Lithuanian. We selected value-laden texts from Colombia, Australia, Spain, Italy, France, Germany, Lithuania, the USA, and the Philippines. Often the values expressed in the tested texts could be seen to conflict with some mainstream USA values (particularly those of conservative platforms) and thus were useful in challenging the value boundaries of GPT-3. The seven authors themselves represent a diverse range of countries of origin and work, ethnicity, and languages. The research is exploratory – probing the edges of the model’s capabilities to identify major themes that may arise in the context of value ethics. Using ideas of value pluralism, we identify and unpack potential areas of concern for large language models trained primarily from one cultural lens. We employ sociotechnical systems theory to understand how GPT-3 is both a reflection of our values, and a potential agent within the world’s values were it to be given open access. Finally, we provide some practical recommendations of how these issues may be approached in real-world use. We propose that any ghost in a machine given agency on a global scale, should carry a plurality of values and speak with a wide range of the world’s accents.
- Brown, T. B. et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).
- Solaiman, I. & Dennison, C. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets. arXiv preprint arXiv:2106.10328 (2021).
- OpenAI. About OpenAI, <https://openai.com/about/> (2021).
- Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610-623.
The Mimetic Mirror

Authors
Johnson, R.L.; Rickles, D.
Our society is deeply entwined with how we transmit information and communicate knowledge, belief, and meaning. Many of our most profound technological inventions have been driven by the desire to enact volition and agency through communication. Some notable examples are coins, paper, the Gutenberg press, radio, TV, the Internet, and smart phones. Most recently our drive to communicate has resulted in a type of artificial neural networks called large language models (LLM) such as OpenAI’s GPT-3.
Emerging technologies based on artificial intelligence (AI) have one key element in this procession of communicative technologies, that is their ability to reflect back to us perspectives and interpretations that are often invisible to us. The amplification of flawed and inappropriate biases into the systems in which AI is deployed has been cause for great concern and inspired the nascent yet exploding field of ethical AI. The cases of racial, gender, and ideological bias in AI systems have already filled hundreds of textbooks and many thousands of research papers. Worldwide there are a scholars from both the humanities and the sciences trying to address these issues. Often the engineers are proposing technological fixes to transmute the black boxes to translucent glass. Policy makers develop ethical frameworks and legal guidance. Sociologists call for restraint reminding the world that just because you have a hammer, not everything is a nail. Philosophers explore the moral and ethical dilemmas that this new and powerful technology poses.
What has received less attention is how the way we communicate to LLMs can be used to better understand ourselves. How the sometimes toxic outputs of these very large AI models can be seen as a gift and used as a tool to understand ourselves.
When we use an LLM there are two inputs. The training data fed in by the creators of the AI model, and the prompt input fed in by the user. Training datasets for LLMs such as GPT-3 are huge: the bulk of the Internet, English language Wikipedia, and large repositories of books. The cost for one training pass are enormous in both finances and CO2 outputs are can not be re-done frequently. Great thought should be given to initial training data and this is a topic widely discussed in the tech and AI ethics communities. OpenAI have themselves indicated that the training data of GPT-3 has a distinctly USA lens1. An assertion supported by the other project I am involved in (The Ghost in the Machine has an American accent). Training LLMs on relatively myopic and homogenous datasets will naturally challenge the ability for value pluralist applications of these models. Whatever the training set though, the LLM will pick up an endless array of nuanced cultural associations between words and ideas. By probing the LLM to find and explore those associations we can see not only cultural substructures we are aware of, but also others that may have escaped our attention.
The prompts used to query LLMs or generate outputs are infinitely more flexible than training runs (see my Blog for some examples). Prompt programming or prompt design is a rapidly growing area of interest. There is limited published work on the area, most discussion is still held on online forums, SLACK workspaces, and blogs. Prompt programming is also still dominated by people with backgrounds in engineering and computer science. The ability to direct large AI models like GPT-3 in a natural language such as a prompt throws wide the doors to far more transdisciplinary exploration of the nature of these LLM models. Even the most trivial of questions can be asked in a multitude of ways and in a diverse range of languages (including no-traditional languages such as emoji). Opening the gates to allow a more diverse range of people to directly question an LLM enables the model to answer in a more diverse way. As proposed by Reynolds and McDonell2 when given flexibility to answer a prompt, GPT-3 is more likely locating a prior-learned information association rather than learning. My research seeks to explore the idea that prompts entered in diverse formats will locate different prior-learned associations.
Thus, we have two interacting forces, the associations learned from the training data, and the shape and format of the prompt that will guide GPT-3 to different association locations. Both of these can be laden with unspoken associations such as culture, values, and norms. The training data is by its nature, held together with the glue of human memetic substructures. The prompt design is driven by our volition (the answer or task we desire of GPT-3) and the pragmatics of the cultural situation we come from and the norms and values we find ourself immersed in. There are, of course, mundane prompts such as “when was Voyager 1 launched” which leave little space for interpretation and cultural associations. In the field of pragmatics this is called near-side. Nevertheless there are many more far-side prompts that involve the transfer of information beyond what is said. This is most easily exhibited in GPT-3 by asking it to respond to a prompt in the style of different people.
An understanding of the near-side pragmatics that surround our prompt design would help us in more clearly interpreting the outputs of GPT-3. Being clear about our intentions, our volitions, and all the cultural and educational influences that cause us to write a prompt the way we do would not only more likely provide the answer we are asking but the one we are seeking. This last idea is built on the ideas of Coherent Extrapolated Volition from Yudkowsky. If we can develop guidance for prompt designers in this area we could also more clearly understand the information associations that GPT-3 is accessing.
The information associations built into the training data are vast. There are entire fields of people devoted to locating, mapping, and understanding cultural associations in different corpuses of information; for example, gender bias, race discrimination, colonialist power structures, patriarchal social impacts. The task of mapping the nuances of information associations is beyond our human capacity – especially as these associations are dynamic! There is, however, another field that can help us in this endeavour, that is memetic evolution. Stemming from the ideas of Richard Dawkins in his book The Selfish Gene (1976), memetic evolution looks at how cultural information is communicated, changes, adapts, and evolves throughout society. Groups of memes that are passed on as a bundle are called memeplexes. I do not adhere to the strong association made by Dawkins to cellular evolution but I do believe there are important seeds of ideas in that mostly now abandoned field that can be applied to LLMs. Whilst much of the early arguments of memetics have been met with strong criticism, there is value to considering an adaption of the original idea in the context of a human-AI sociotechnical system.
What is certain is that is LLMs are to be deployed in the world, they will take on an important role in a new cultural feedback loop, strengthening and reifying the memetic substructure of societies that use artificial neural networks to power new forms of communications technologies.
PhD Students in AI Ethics – global group
I founded this group in December 2020. By July 2021 the group had attracted more than 170 doctoral, postdoctoral and early career researchers. Members represent over 26 countries from all continents (excluding Antarctica!).
Linkedin page PhD Students in AI Ethics
Our website address is PhDAIethics.com
Most of our organising is carried out on a closed SLACK workspace. We also meet periodically for reading and discussion groups and have planned our own conference in October 2021.