Busting the gender difference myth

Gender difference: are men and women from different planets?

The problematic history of prejudgement

Small differences

Moderate differences

Large differences

Very large differences

Concluding comments

The idea that men and women are very different dominates popular media.

However, a recent paper shows men and women to be more similar than different. See The Gender Similarities Hypothesis by Janet Shibley Hyde, Department of Psychology, University of Wisconsin-Madison. Published in American Psychologist, September 2005, Vol. 60, No. 6.

Hyde's paper is a meta-analysis of gender similarities and differences. The data it provides is the basis for the following discussion.


Gender difference: are men and women from different planets?

The gender difference model is a publishing bonanza.

Men are from Mars, Women are from Venus, by John Gray, created a fortune by suggesting males and females are so different, they might as well be from different planets.

The Female Brain, by Louann Brizendine, states that physical and hormonal brain differences cause females to be highly social, verbal and emotional, whereas males have much less language skill, and are obsessed with sex.

The Essential Difference: Men, Women and the Extreme Male Brain, by Simon Baron-Cohen, said brain difference was the reason females are empathetic, and males are systematic. Autism was an extreme form of the non-empathetic, systematic male brain.

Brain Sex, by Anne Moir and David Jessel, was followed by Why Men Don't Iron, by Anne Moir. Both stated that hormonal and brain differences create fundamental behavioural differences, between males and females, all through their lives.

Why Men Don't Listen and Women Can't Read Maps, by Allan and Barbara Pease, lists many gender differences, and suggests these are due to evolution in roles from prehistoric times.

You Just Don't Understand: Women and Men in Conversation, by Deborah Tannen, details gender differences in communication style, expectation and understanding.

The gender divergence models are picked up, and reinforced without reservation, by mass media, and even research institutions. Because dating and relationships are a big part of our lives, reports that appear to explain perceived gender differences are very popular.

Chart(a), below, shows what a gender difference model might look like - i.e. two mostly separate populations. In this chart, all the Red population is to the left of the mean (peak) of the Blue population, and all the Blue population is to the right of the mean (peak) of the Red population.

Chart(a)

The Cohen 'd' statistic measures the difference effect size between two groups. The 'd' is calculated from the difference in means, divided by the average standard deviation.

The 'd' for the chart above is 3.33. In this chart, 0% of each population extends past the mean of the other population.

A bigger 'd' suggests a greater difference between the peaks, compared to the width of the bell curves. A big 'd' suggests the groups really are different.

A small 'd' suggests there is more variation within each population, than between them. A small 'd' suggest the groups are more the similar, than different.

Based on the large number, and popularity, of gender difference books and models, we would expect data sets like chart(a), to appear regularly. However, the reason we have to imagine this chart, is because no data set actually looks like chart(a).

Unfortunately, for the gender difference models, the actual evidence is much more like chart(b), below. Even this much variation is rare.

Chart(b)

Chart(b) has a 'd' = 0.35, and 36% of each population extends past the mean of the other population, as shown by the shaded areas.

We might be tempted to say Blue is further to the right than Red, but for much of the data this isn't the case. For example, 36% of Reds are more right than the average Blue (red shaded area). And 36% of Blues are less right than the average Red (blue shaded area).

So a statement that Blue is more to the right than Red is very misleading for more than a third of the combined population (36%).


The problematic history of prejudgement

Say we pick a random Blue and a random Red, from the populations in chart(b). Then we assume, or prejudge, that the Blue will be more to the right than the Red. We repeat this many times to create a lot of pairs, each with 1 Red and 1 Blue.

Then we check each pair to see if our prejudgement is correct. Is each Blue more to the right, than the corresponding Red in the pair?

The results will be mixed. Overall, we are probably wrong for more than a third of the pairs, approximately 36%, and correct for approximately 64% of the pairs.

This isn't much better than if the populations were identical. Then we would expect to be wrong for half of the pairs (i.e. 50%) like tossing a coin.

Is this accuracy level, of 64% correct and 36% wrong, a problem?

It depends on the outcome. If we are gambling, and we can predict 64% correct, this is great. But if we are administering drugs, and 36% of the patients die, it is probably a bad outcome. When people are involved, the cost of making the wrong judgement is important.

Prejudice comes from the word prejudge. If we have a negative view of some group, we tend to notice people in that group when they match our negative stereotype. But we don't notice examples that don't match with our stereotype.

For example, a woman complains that men are sexist. She makes regular disapproving comments when a man makes a sexist statement. However, she doesn't notice when a man is not sexist, nor when women say sexist things about men.

Almost everyone is prejudiced. It's instinctive for humans to be more relaxed with members of our own tribes, and to view other groups with more suspicion.

Because it's natural to be prejudiced, prejudging people is now discouraged in the community. This forces us to be more objective and to find other ways to judge people, rather than just by the group to which we assign them.

The greatest risk comes from people who believe they are not prejudiced, and so don't need to question their own judgements. The risk is compounded when regulatory bodies also avoid their responsibility to be objective.

In the past there were attempts to use physical attributes to judge and define people. These are now viewed as contrived and discriminatory. For example:

Royal or aristocratic bloodlines defined who was fit to rule and be given privileges in a feudal system. Other people were limited to being common or peasant stock.

Craniometry, phrenology and physiognomy were all popular during the 18th and 19th centuries. These "sciences" studied and measured skull size, skull shape, skull bumps, facial features, or limb dimensions. Proponents claimed to be able to reveal a subject's personal attributes such as intelligence, character, behaviour and likely criminality. Some of these ideas were revived to justify ethnic racism movements in the 20th Century.

Skin colour was developed as a criteria for legal rights. This was used as a justification for the African slave trade, and ownership of slaves. It was revived as justification for the apartheid structure in South Africa during the 20th century.

Women and children were treated as property, with less legal rights than men. Women were viewed as the weaker sex and prevented from doing many things.

State sponsored racism developed in the 20th century, reviving some of the earlier biasses against skin colour and physical appearance. Citizenship and legal tests were created to distinguish people, based on ethnicity and race, and to give them different levels of rights.

The way these discriminatory systems developed tells us something important. In most cases, they were developed to meet a need. There was a desire to justify differential treatment, that may have already existed.

The discrimination didn't originate through observing behavioural differences between groups, in the first instance. In fact, whether or not differences actually existed was rarely even tested. The desire for prejudice existed first, and then this was mapped back to physical characteristics, to justify the bias.

We'll come back to these points. First we need to look at how much gender difference, or similarity, there really is.


Small Differences

The data set from Hyde's paper is re-tabulated and expanded in this linked document - see Gender Similarity Summary Table. Each line in the table is a meta-analysis, aggregating many separate reports that study a single aspect of behaviour. In effect, Hyde's paper is a meta-analysis of meta-analyses.

Out of 136 meta-studies, 64 (76%) show a gender difference that was categorised as either small or effectively zero.

Effectively zero differences

If 'd' is less than 0.10, the difference effect size is categorised as 'effectively zero'. Overall, 40 meta-studies (29%) show differences of effectively zero.

Meta-studies that show an 'effectively zero' difference include reading, vocabulary, some maths studies, several self-esteem and happiness studies, negotiation attitudes, several leadership studies, physical balance, and a preference for security, challenge or power in a job.

Small differences

Chart(c), below, shows what the populations look like when there is a 'small' difference.

Chart(c)

In chart(c), 'd' = 0.10, and 46% of each population extends past the mean of the other population, as shown by the shaded areas.

A 'small' difference is when 'd' is between 0.10 and 0.35. The size of the shaded area, of each population, that extends past the mean of the other population, ranges from 46% to 36%.

Out of 136 meta-studies, 64 (47%) show a 'small' difference between the genders.

Females are slightly ahead in 4 maths categories, speech production, talking, understanding facial expressions, coping with problems, having anxiety, trust and physical flexibility. Females are slightly more likely to have a care orientation, and to prefer a comfortable physical environment in a job.

Males are slightly ahead in 9 maths categories, interrupting conversations, assertive speech, some self-esteem studies, several aggression categories, helping, sexual activity, openness, jumping, cheating, and using a computer. Males are slightly more likely to have a justice orientation, and to prefer earnings in a job.

However, if we assume any one of the 'small difference' studies is correct, and use it to judge whether a random man or woman, will be ahead of a random person of the opposite sex, we are likely to be wrong between 46% and 36% of the time.

This means we will be wrong for 1 in 2 pairs, or 1 in 3 pairs.

If the populations were identical we would expect to be wrong 50% of the time (1 in 2 pairs). So these 'small differences' don't provide much improvement in assessing people.


Moderate Differences

Chart(d), below, shows what the populations look like when there is a 'moderate' difference.

Chart(d)

In chart(d), 'd' = 0.35, and 36% of each population extends past the mean of the other population, as shown by the shaded areas.

A 'moderate' difference is when 'd' is between 0.35 and 0.65. The size of the shaded area, of each population, that extends past the mean of the other population, ranges from 36% to 26%.

Out of 136 meta-studies, 20 (15%) show a 'moderate' difference between the genders.

Females are more likely to be better at spelling and language, and to smile more.

Males are more likely to be better at spacial perception and 1 mental rotation measure, have a more positive body image, be ahead in 8 aggression measures, be more assertive, be more active, to run faster, and be more independent when using a computer.

Again, if we assume any one of the 'moderate difference' studies is correct, and use it to judge whether a random man or woman, will be ahead of a random person of the opposite sex, we are likely to be wrong between 36% and 26% of the time.

It's getting better, but we are still wrong for 1 in 3 pairs, or 1 in 4 pairs. That is still too many errors to use to judge people.


Large Differences

Chart(e), below, shows what the populations look like when there is a 'large' difference.

Chart(e)

In chart(e), 'd' = 0.65, and 26% of each population extends past the mean of the other population, as shown by the shaded areas.

A 'large' difference is when 'd' is between 0.65 and 1.00. The size of the shaded area, of each population, that extends past the mean of the other population, ranges from 26% to 16%.

Out of 136 meta-studies, 10 (7%) show a 'large' difference between the genders.

Females are better at processing facial expressions when infants (1 measure), have higher indirect aggression (1 measure), and are more tenderminded.

Males are better at mechanical reasoning and 1 mental rotation measure, are more physically aggressive, do more helping when watched, are more attracted to casual sex, and have a stronger grip.

Again, if we assume any one of the 'large difference' studies is correct, and use it to judge whether a random man or woman, will be ahead of a random person of the opposite sex, we are likely to be wrong between 26% and 16% of the time.

It's getting better, but we are still wrong for 1 in 4 pairs, to 1 in 6 pairs. Caution is still needed at this stage.


Very Large Differences

Chart(f), below, shows what the populations look like when there is a 'very large' difference.

Chart(f)

In chart(f), 'd' = 1.00, and 16% of each population extends past the mean of the other population, as shown by the shaded areas.

A 'very large' difference is when 'd' is greater than 1.00. The size of the shaded area, of each population, that extends past the mean of the other population, starts at 16% and gets smaller, as 'd' increases from 1.0.

The overlap, of the other population mean, eventually goes to 0%, when the 'd' statistic increases to 3.33. This is the high difference chart first described in Chart(a) - repeated below for your convenience.

Chart(a) - repeated

Out of 136 meta-studies, only 2 (1%) show a 'very large' difference between the genders.

These 2 studies show that males, up to the age of 20, throw further, and with greater velocity, than females of the same age. That's it!

However, there is still some overlap in the throwing studies. The difference effect size for these studies is 'd' = 2 (approx.), so the overlap of the other population mean is 2%.

Again, if we use this test to judge whether a random adolescent boy or girl, will be ahead of a random adolescent of the opposite sex, we are likely to be wrong only 2% of the time. This is a fairly accurate test as only 1 pair in 50 is likely to be wrong.

However, a girl who knows she can throw better than the average boy, may still be indignant at being prejudged in this way.

The gender difference in throwing is well beyond the difference for other motor skills. This may be because throwing is more of a learned activity, whereas running and jumping are more natural. The fact adolescent boys' have much higher interest, and participation, in junior baseball and cricket may explain some of the difference.

More participation by girls, or throwing practice, might bring this difference back towards the difference level of the other motor skills. Perhaps the underlying difference is 'moderate' to 'large', rather than 'very large'.


Concluding comments

Individual differences within the male and female groups are much bigger than the differences between the gender groups.

Differences in behaviour are explained better by Individuality (personality, IQ, EQ, life experience or personal life choices), than they are by Gender, either hard-wired or learned.

Because individual differences within gender groups are so big, individuals within many randomly-selected, male-female pairs will be quite different from each other. However, not much of the difference is explained by gender difference.

Using gender to judge people is a choice. Like other judgement systems based on physical difference, gender isn't a very accurate predictor. We all have the option to not use gender to judge or analyse situations at all.

There is enough difference between genders to suggest we will never see exact 50/50 splits between gender in engineering, nursing, prison incarceration, or full-time parenting. However, there is also enough similarity between genders to suggest any gender-based presumption, that either pigeonholes a person, or restricts their life choice, is a problem.

Homosexual relationships, and same-gender friendships, have many of the same problems as heterosexual relationships. Again, this suggests individual difference, rather than gender difference is more important.

Gender difference models are common because we like them. They allow us to map our own desires, beliefs and prejudices onto an environment where there isn't much actual difference. Research shows we act more like our own gender stereotype in a same-sex group, than we do when we are on our own, or in a mixed group. Gender difference models are popular because many others in our group share the same desires, beliefs and prejudices. As a bonding experience, we can project those prejudices onto individuals in the other group. Gender difference is fun, almost a lingua franca. But unfortunately, gender difference is inaccurate, and its prejudices are damaging.

The need for dating, coupling, and sexual relationships explains some of the increased judging of the opposite gender. We are aware of the gender stereotypes and it's easy to pigeonhole people when they appear to match our bias. However, we are less likely to notice examples that don't match our bias. A gender difference model allows us to decide that we are 'normal', and that someone else needs 'to change', or 'to accept us'. It's much harder to justify these attitudes in an individual difference model.

Applying a rigid gender difference model to a changing individual difference situation, may cause more problems than it solves.

When academics and government agencies accept rigid gender difference models, and use them as the basis for policy, this is likely to be as problematic as using any of the earlier physical difference judgement systems.

People who say that women are more collaborative than men, or that men are more aggressive than woman are following on from predecessors who claimed to be able to tell character by the bumps on someone's head. In fact, the research shows that 30% to 44% of women are more aggressive than the average man, and 40% to 50% of men have a more collaborative leadership or negotiation style than the average woman.

It's worth repeating - Using gender to judge people is a choice. Like other judgement systems based on physical difference, gender isn't a very accurate predictor. If it isn't helping, we have the option to not use gender to judge or analyse situations at all.


End of Section

Copyright © 2007-2008, Max Rollins. All rights reserved.

www.taxfundedprejudice.com

Browser compatibility: this site displays well in latest versions of these browsers - Internet Explorer 7.0, Firefox 2.0, Opera 9.2. ... more.