Proteus Effect – How an Avatar Influences the User

The relationship between man and technology has been a subject of philosophical interest for some time. Over the years, a number of theories have emerged that attempt to explain the reciprocal influence of man on technology and technology on man, or entire societies. Although debates between determinists (who claim that technology shapes humans) and constructivists (who argue that humans shape technology) will likely never be resolved, this article examines the Proteus effect, which may be closer to one of these perspectives.

What is the Proteus effect?

The Proteus effect is a phenomenon first described by Yee and Bailenson in 2007. It is named after the myth of the god Proteus, who could change his appearance in any way he wished. He was said to use this power to conceal his knowledge of past and future events. Yee and Bailenson noted that individuals using virtual avatars change their behaviour based on the observed traits of these characters while playing them in the virtual world. The researchers argue that players infer from the appearance and characteristics of their avatars how they should adjust their behaviour and overall attitude to meet the expectations set by their virtual representation. There are also grounds to believe that this effect can extend beyond digital worlds and influence behaviour and attitudes in the real world [1].

Proteus Effect – Example of Occurrence

To illustrate how the Proteus effect works with a real-world example, I will refer to a study in which the authors investigated the presence of the Proteus effect during matches played with various characters in the popular MOBA game, League of Legends. Participants in the game are divided into two teams of five players each, who then engage in battle on a map. Before starting, each player must choose a so-called champion. League of Legends allows players to play a match with one of over 140 champions [2], each characterised by different appearances and abilities. The authors of this study analysed how players communicate with each other, considering the champion they play.

The presence of the Proteus effect was measured using the game’s chat. Researchers established indicators such as vocality (“acting more vocal”), toxic behaviour (“acting more toxic”), and positive or negative valence. Valence is a form of sentiment analysis aimed at depicting the emotional state of a player. The analysis results confirmed the presence of the Proteus effect, but not for every champion or type of champion. It was primarily observed through valence and toxicity of speech. The most significant finding of this study was proving that the way players communicated via chat indeed changed with the champion they selected. Depending on the chosen character, a player did not necessarily speak more or less but could exhibit more toxic behaviour and be in a worse mood [3].

Utilising the effect

The Proteus effect is a phenomenon that particularly draws our attention to the relationship between people and virtual worlds. It clearly demonstrates that technology, in one way or another, exerts a direct influence on us, even altering our behaviour. Some researchers have attempted to explore whether this effect can be practically applied, for example, in performing certain jobs. Let’s delve into their studies.

Impact on strength

A group of five German researchers hypothesised that using a suitably matched avatar would cause the person controlling it to perform tasks better than if they embodied a different, non-distinctive character or themselves. In this case, the researchers decided to investigate whether a person whose virtual appearance suggests they are stronger than the subject would lead the subject to exert more effort in physical exercises. In addition to tracking the movements of participants wearing VR equipment, grip strength was also measured.

During the study, participants were assigned avatars according to their gender. They were subjected to a series of physical tasks, such as lifting weights of varying heaviness and squeezing a hand as hard as possible for five seconds. According to the results, the authors conclude that the study cannot be considered representative. No increase in grip strength was observed in women, though such results were evident in men. Thus, it can be partially inferred that a more muscular avatar may influence the strength of men [4].

Stimulating creativity 

The following study examined whether an avatar, as a representation of an individual in the virtual world, stimulates creativity. As part of the study, creativity sessions were organised during which participants brainstormed while embodying a particular character. Prior to the sessions, the researchers selected several avatars that were perceived as creative and neutral. Participants were divided into three groups: a control group (brainstorming in the real world), a group using neutral avatars, and a group using creative avatars, defined as inventors.  All groups held creative sessions in the same rooms—the control group gathered around a round table, while the others used equipment in the same room in separate cubicles. They then sat at a round table in a recreated space in virtual reality. 

The left part shows a room with a round table and chairs around it in a virtual space. The right part shows the prototype in the real world.
Figure 2. On the left, the virtual space with a round table and workstations recreated in virtual reality. On the right, its real-world counterpart. [5]

The researchers avoided any contact between the participants in the avatar groups before and after the main part of the brainstorming session took place; the subjects never met each other outside the experiment. A key finding, particularly relevant for the future of remote collaboration, is that the groups using non-creative avatars achieved the same results as those sitting at the table in the real world. However, the most important result is the demonstration that individuals embodying an inventor avatar consistently achieved better results for each creativity indicator used in the experiment [5].

Assistance in improving communication

Another study was conducted to explore the potential for training effective communication skills among physicians in the preoperative stage. Communication with patients can be ineffective, partly because doctors may use jargon or phrases from their professional environment. This study utilised two virtual reality experiences. During the experience, participants played the role of a patient. This enabled the researchers to describe the development and impressions that the subjects experienced.

During the experiment, participants experienced negative or positive communication styles in a situation where they were about to undergo surgery. Interviews conducted at the next research stage revealed that participants recognised the importance of good communication skills. Overall, the participants learned and adjusted their communication style in their subsequent work. Virtual reality, in which participants embodied a patient in one of the two experiences, proved effective in providing a fully immersive experience. As participants stated, they felt as if they were the patient. It can be further concluded from this study that the Proteus effect is also useful for educational purposes, improving communication, and increasing empathy towards others [6].

Summary

In the face of continuous technological development, we constantly discover new phenomena that can shape our future approach to technology. The Proteus effect demonstrates that its impact can be much more direct than we may assume. Although this phenomenon is largely harmless, it indicates how we can be influenced by our virtual representation. People have already begun exploring applications of this effect in various areas, such as mental enhancement of strength, supporting creative processes, and improving communication skills. However, to ascertain whether the Proteus effect will become a permanent aspect of our daily lives, we will need to wait and see. Additionally, it is worth noting that Microsoft has begun organising international conferences in virtual reality, utilising avatars for participation. Polish entrepreneur Gryń—former owner of Codewise—has established a company in London to scan people for such purposes. At BFirst.Tech, leveraging its expertise in Data Architecture & Management—specifically through its Artificial Intelligence Adaptations product—a project has been completed for the Rehasport clinic network, enabling surgeries to be conducted in augmented reality (AR).

References

[1] The Proteus Effect: The Effect of Transformed Self‐Representation on Behavior: https://academic.oup.com/hcr/article-abstract/33/3/271/4210718?redirectedFrom=fulltext&login=false 

[2] Number based on description at: https://www.leagueoflegends.com/en-us/champions/ (accessed 23 June 2024) 

[3] Do players communicate differently depending on the champion played? Exploring the Proteus effect in League of Legends: https://www.sciencedirect.com/science/article/abs/pii/S0040162522000889

[4] Flexing Muscles in Virtual Reality: Effects of Avatars’ Muscular Appearance on Physical Performance: https://www.academia.edu/77237473/Flexing_Muscles_in_Virtual_Reality_Effects_of_Avatars_Muscular_Appearance_on_Physical_Performance 

[5] Avatar-mediated creativity: When embodying inventors makes engineers more creative: https://www.sciencedirect.com/science/article/pii/S0747563216301856 

[6] Patient-embodied virtual reality as a learning tool for therapeutic communication skills among anaesthesiologists: A phenomenological study: https://www.sciencedirect.com/science/article/pii/S0738399123001696 

Problems in historical data and coded bias

Prater & Borden

 

In 2014, Brisha Borden, 18, was charged for committing theft of property worth eighty dollars after she decided to ride a child’s bicycle that had been left abandoned and unsecured. Brisha has committed lesser offences in the past as a juvenile.

 

A year earlier, forty-one year old Vernon Prater was caught stealing tools from a shop with a total value of $86.35. Vernon had already been charged with armed robbery, for which he received a five-year prison sentence. He was also charged with attempted armed robbery.

 

In the USA at the time, a risk prediction system was used to assess whether a person would commit other crimes in the future. This system gave a rating from 1 to 10, where the higher the numerical value, the higher the risk of committing crimes in the future. Borden – a black teenager – was given a high risk rating: 8, and Prater, on the other hand – a white, adult male – a low risk rating: 3. After two years, Brisha Borden had committed no crime, while Vernon Prater was serving an eight-year prison sentence after breaking into a warehouse and stealing electronics worth several thousand dollars. [1]

 

Hidden data

 

Automated machine learning and big data systems are increasing in number in our daily lives. From algorithms suggesting a series for the user to watch, to one that will decide the instalment of your mortgage. However, the moment an algorithm decides on such an important issue for a human being, the dangers begin to emerge. Can we even trust such systems to make important decisions? Computer algorithms give a sense of impartiality and objectivity. But is this really the case?

 

In a nutshell, machine learning algorithms “learn” to make decisions based on the data provided. Regardless of the method of this learning, be it simple decision trees or more sophisticated artificial neural networks, by design the algorithm should extract patterns hidden in the data. Thus, the algorithm will only be as objective as the learning data is objective. While one might agree that, for example, medical or weather data are objective because the expected results are not the result of human decisions, decisions about, for example, the granting of credit or employment were historically made by people. Naturally, people are not fully objective and are guided by a certain worldview and, unfortunately, also by prejudices. These biases find their way into the data in a more or less direct way.

 

The issue of preparing data suitable for training machine learning algorithms is a very broad topic. A discussion of possible solutions is a topic for a separate article.

In this case, since we do not want the algorithm to make decisions based on gender, age or skin colour, is it not possible to simply not provide this data? This naive approach, while seeming logical, has one big loophole. Information about this sensitive data can be (and probably is) coded into other, seemingly unrelated information.

 

Historical data are created by people, and unfortunately people are guided by certain biases. These decisions percolate through the data, and even if when creating a model, one considers not to include data on race, age, gender, etc. in the input, it may be that this information gets through indirectly through, for example, postcode information. It may be possible, for example, to use Bayesian networks to visualise the interconnections between different features. This tool aims to show where data, based on which one would not want to make decisions, may be hidden. [2]

 

Judicial risk assessment system in the USA

 

Reference should again be made to the algorithm used in the US penal system (COMPAS system). Julia Dressel and Hany Farid [3] tried to investigate how this system works. First, they conducted a survey in which respondents with no background in criminology were given a brief description of the accused person’s crime (including their age and gender, but not their race) and a history of previous prosecutions, their aim was to predict whether the person would be convicted again in the next two years. The results of the survey conducted showed an efficiency (67%) similar to the system used by the US penal system (65.2%). Interestingly, the proportion of false-positive responses, i.e. where defendants were incorrectly assigned to a high-risk group, was consistent regardless of race. Black people, both in the anonymous survey and according to COMPAS, were more likely to be categorised in the higher risk group than white people. As a reminder – survey respondents had no information about the race of those accused.

 

Other machine learning methods were then tested, including a logistic regression algorithm with two features in the input – age and number of previous accusations. This algorithm works in such a way that individual measurements from the training dataset are placed on (in this case) a two-dimensional plane (each axis is the value of a given feature). A straight line is then drawn separating cases from two different categories. Usually, it is not possible to draw a perfect straight line that separates the two categories without error. Therefore, a straight line for which the error is minimal is determined. In this way, a straight line is obtained that divides the plane into two categories – those who have been charged within two years and those who have not been charged (Fig.1).

Fig.1 Mode of operation of the logistic regression algorithm.

This algorithm has an efficiency (66.8%) similar to COMPAS (65.4%). In this case too, a much higher proportion of black people incorrectly classified as higher risk than white people was observed.

 

As it turns out, information about race can also permeate the arrest rate data [2][3]. In the US, for example, black people are arrested for drug possession four times more often than white people [8][9].

 

Non-functioning models

 

Sometimes models just do not work.

 

In 2012, data from a rating system for New York City teachers from 2007 to 2010 was published. This system gave teachers a rating from 1 to 100 supposedly based on the performance of the teacher’s students. Gary Rubinstein [4] decided to look at the published data. The author noted that in the statistics, teachers who had been included in the rating programme for several years had a separate rating for each year. Based on the assumption that a teacher’s rating should not change dramatically from year to year, he decided to see how it changed in reality. Rubinstein outlined the teachers’ ratings, where on the X-axis he marked the first-year teaching rating and on the Y-axis the second-year teaching rating for the same class. Each dot on the graph represents one teacher (Fig.2).

analiza danych historycznych na wykresie z różowymi kwadratami
Fig.2 Graph of teacher ratings in two consecutive years. [4]

The logical result would be a near linear relationship or some other correlation, due to the fact that the results of the same class with one teacher should not change drastically from year to year. Here, the graph looks more like a random number generator, with some classes rated close to 100, the next year had a score close to 0 and vice versa. Such a result should not be generated by the system on the basis of which teachers’ salaries are set, or even whether to dismiss such a person, as this system simply does not work.

 

Face recognition algorithms have a similar problem. Typically, such technologies are set up so that a machine learning algorithm analyses multiple images that are a face and multiple images that represent something else. The system detects patterns that are characteristic of faces that are not present in other images. The problem starts when someone has a face that deviates from those present in the training dataset. Those creating such an algorithm should try to have as diverse a training dataset as possible. Unfortunately, it turns out that there is often an under-representation of people with darker skin colour in the training datasets. Those most often have a skin colour distribution similar to the society from which the data are collected. That is, if the training dataset consists of images of US and European citizens, for example, then the percentage of each skin colour in the dataset shall be similar to that of the US and European demographics, where light-skinned people predominate (Fig.3).

wykres słupkowy przedstawiający dane historyczne z podziałem na rasy
Fig.3 Left: US census data [6]. Right: percentage of races in publicly available datasets [7].

At MIT University [5], the accuracy of facial recognition algorithms by gender and skin colour was investigated. They found that the technologies of the most popular companies, such as Amazon and IBM, failed to recognise women with dark skin colour (Figure 4). When these technologies are used in products that use facial recognition technology, there is an issue of availability and security If the accuracy is low even for one specific group, there is a high risk of someone unauthorised to access, for example, a phone. At a time when facial recognition technology is being used by the police in surveillance cameras, there is a high risk that innocent people will be wrongly identified as wanted persons. Such situations have already occurred many times. All this due to a malfunctioning algorithm, which could quite easily be fixed with the right selection of training datasets.

wykres słupkowy przedstawiający dane historyczne z podziałem na przedsiębiorstwa
Fig. 4 Investigated accuracy of face recognition technology. [5] [5]

Following the publication of the MIT study, most companies have improved the performance of their algorithms so that the disparity in facial recognition is negligible.

 

Inclusive code

 

We cannot be 100 per cent trusting of machine learning algorithms and big data, especially when it comes to deciding human fate.

 

In order to create a tool that is effective, and does not learn human biases, one has to go down to the data level. It is necessary to analyse the interdependencies of attributes that may indicate race, gender or age and select those that are really necessary for the algorithm to work correctly. It is then essential to analyse the algorithm itself and its results to ensure that the algorithm is indeed objective.

 

Machine learning models learn by searching for patterns and reproducing them. When unfiltered historical data is provided, no new, more effective tools are actually created, but the status quo is automated. And when human fate is involved, we as developers cannot afford to repeat old mistakes.

 

References: