Proportion of men and women in different occupational categories in Canada

Results:

Click for larger image:

ProportionWomenMenDetailed

The gif shows the proportion of men and women in various full-time occupation categories in Canada. The categories are ordered by average real wages calculated from 1997-2013. Despite fluctuations, it appears as though the proportion of women has increased in many of the higher wages occupations. The occupations most dominated by women are those relating to healthcare. The occupations most dominated by men are those related to manual labour. The proportional balance of men and women in those two broad classes of occupations have remained fairly constant over the time series. When measured across all occupations, the proportion of women in the full-time workforce is less than men, but the balance has moved closer to the 50/50 mark.

Data:

Two data sets from StatsCan were used.  One to determine the proportion of men and women in the various occupational categories, the other to determine the average wages for the categories. Links to the StatsCan data can be found here and here. Very slight modifications were made to make sure the categories had the same spelling in both datasets.  Below are excel versions of the datasets (WordPress does not allow me to post csv files):

cansim2495677681604339814a

cansim7043018236822320353a

Methodology:

The below R code was run to generate the gif – it very similar to the code found here.

library(ggplot2)
library(reshape)
library(animation)

#import data, specify that "x" values contained in csv should be read as NA
directory <-  "C:/Users/Business/Dropbox/Economics Research/Blog/Occupations/National Occupational Classification (detailed)/"
pData <- read.csv(paste(directory,"cansim2495677681604339814a.csv", sep=""), na.strings="x") #this data contains occupation data from 1987-2013
qData <- read.csv(paste(directory,"cansim7043018236822320353a.csv", sep=""), na.strings="x") #this data contains wage data from 1997 to 2013

#change column names in both data sets
pData <- rename(pData, c("Ref_Date" = "Date", "GEOGRAPHY" = "Geography", "CHARACTERISTICS" = "Type.Of.Work", 
"OCCUPATION" = "Occupation", "SEX" = "Gender"))
qData <- rename(qData, c("Ref_Date" = "Date", "GEOGRAPHY" = "Geography", "CHARACTERISTICS" = "Type.Of.Work", 
"OCCUPATION" = "Occupation", "SEX" = "Gender"))

#since the time range for the occupation data set is much larger than the wage data set, it would be difficult
#to include both data sets in the same graphic.  Instead, the occupation data set will be shown with the categories
#ordered by the average wages measured from 1997-2013.
qDataAvgs <- tapply(qData$Value,qData$Occupation, mean) #first find the average wages for the different occupations
qDataAvgs <- sort(qDataAvgs, decreasing=FALSE) #sort the data in order of the average wages
qDataAvgs <- c(qDataAvgs[15], qDataAvgs[-15]) #the category for "Total Occupations" is element 15, it should be seperate for the other categories
qDataAvgsNames <- names(qDataAvgs) #now that the averages are ordered properly, we extract the names from the named vector, this will be used to order the factors
pData$Occupation <- factor(pData$Occupation, levels = qDataAvgsNames) #change the order of the occupation factors using the names vector

#create 3 subsets of data from the occupation data
bothGendersData<-pData[pData$Gender=="Both sexes",]
maleData<-pData[pData$Gender=="Males",]
femaleData<-pData[pData$Gender=="Females",]

#using the above 3 subsets of data, determine the percent of workforce for men and women
#for the various occuaptions
maleData$ValuePercent <- maleData$Value/(maleData$Value + femaleData$Value)
femaleData$ValuePercent <- femaleData$Value/(maleData$Value + femaleData$Value)

#combine the male and female data into a single, new dataset which will be used to create the graphs
pData2 <- rbind(femaleData,maleData)

#I found the code between the two hash lines at: http://ryouready.wordpress.com/2009/02/17/r-good-practice-adding-footnotes-to-graphics/
#it's a lovely bit of code that creates a custom function to add a footer to the plots
##############################################################################################################
source <- "Source: StatsCan"
author <- "www.posnorm.com"
footnote <- paste(source, format(Sys.time(), "%d %b %Y"),
                  author, sep=" / ")

# default footnote is today's date, cex=.7 (size) and color
# is a kind of grey

makeFootnote <- function(footnoteText=
                         format(Sys.time(), "%d %b %Y"),
                         size= 0.9, color= grey(0.6))
{
   require(grid)
   pushViewport(viewport())
   grid.text(label= footnoteText ,
             x = unit(1,"npc") - unit(1, "mm"),
             y= unit(0, "mm"),
             just=c("right", "bottom"),
             gp=gpar(cex= size, col=color))
   popViewport()
}
##############################################################################################################

#i got this function off stackoverflow, for the axis labels in the graph, it keeps 
#the values to a maximum of 2 decimals.  i used this because the axis labels looked 
#cluttered
fmt <- function(){
f<- function(x) as.character(round(x,2))
}

#to make the GIF, we first need to create the series of plots that we wish to loop together
#it's possible to make a loop that creates the plots and then use a program like ImageMagick
#to combine them all together.
#with the animation package, we can do both of these steps at once.  before running the below code,
#it is important to install ImageMagick.
#instead of making a loop to create the plots, we create a function that creates the plot for a given year
#we then pass the function to the saveGIF method which will loop through the years and create the GIF.

barplotGIF <- function(t) { #create a function that will create a barplot for a given year t
  graphTitle <- paste("Proportion of men and women in various occupation categories in Canada (full-time): ", t, sep=" ") #generates string for graph title
  p <-  ggplot(data=subset(pData2, Date==t & Geography=="Canada")) +  #create a ggplot object with data for year t
        geom_bar(stat="identity",aes(x=Occupation,y=ValuePercent,fill=Gender))+ #add the bar plot geom, the stat=identity uses the values in the data instead of counting the number of values
        theme_bw() +
        geom_hline(yintercept=0.5, linetype="dashed", size=0.7)+ #add a line at the 0.5 mark
        theme(axis.title.x = element_blank()) +   #removes x axis title
        theme(axis.title.y = element_blank()) +    #removes y axis title
        theme(plot.title = element_text(face = "bold")) + #makes the plot title bold
        scale_y_continuous(labels=fmt()) + #specify lables, note that the labels arguement accepts a function for a value.
        ggtitle(graphTitle) + #add the graph title
        coord_flip() #flip the coordinates, this changes the plot from a vertical bar plot to a horizontal bar plot
  print(p)
  makeFootnote(footnote)  #adds the footnote through the above custom function
}

#this is the line that creates the GIF
#lapply takes the dates, and applies them to our barplotGIF function and returns a list
#interval = 1 sets a 1 second delay between plots in the the GIF
#movie.name gives the name of the file, NOTE I found it very diffcult to specify the directory path without changing the working directory, - so i left it
#the R output shows the location of the files, usually stored in a temporary folder.
#ani. width and height specify the dimensions of the individual png files
saveGIF(lapply(min(pData2$Date):max(pData2$Date),barplotGIF), interval = 1, movie.name = "test.gif", ani.width = 1300, ani.height =750)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s