Choropleth Maps in R
Choropleth maps provides a very simple and easy way to understand visualizations of a measurement across different geographical areas, be it states or countries.
If you were to compare growth rate of Indian states and present it to a bunch of people who have 15-20 seconds to look at it and infer insights from the data, what would be the right way? The best way? Would presenting the data in the traditional tabular format make sense? Or bar graphs would look better?
Bar graphs, indeed, will look better and present the data in visually appealing manner and provide a good comparison; but, will it make an impact in 15 seconds? I personally won’t be able to bring the desired outcome, moreover data for 36 states and union territories in 36 bars will make it cumbersome to scroll up and down. We have a much better alternative to table and bar charts, choropleth maps.
Choropleth maps are thematic maps in which different areas are colored or shaded in accordance with the value of a statistical variable being represented in the map. Taking an example, let’s say we were to compare population density in different states of the United States of America in a colorful manner, choropleth maps would be our best bet for representation. To sum it up, choropleth maps provides a very simple and easy way to understand visualizations of a measurement across different geographical areas, be it states or countries.
Let’s take some examples of choropleth maps and where they come handy in presenting data.
- Choropleth maps are widely used to represent macroeconomic variables such as GDP growth rate, population density, per-capita income, etc. on a world map and provide a proportional comparison among countries. This can also be done for states within a country.
- These maps can also be used to present nominal data such as gain/loss/no change in number of seats by an election party in a country.
One of the limitations of using choropleth maps is that they don’t provide details of total or absolute values. They are among the best for proportional comparison but when it comes to presenting absolute values, choropleth maps are not the right fit.
Now, let us try to see the practical implementation of choropleth maps in R. In the following code, we will try to achieve the following objectives as part of the overall implementation of the maps.
- Download and import the maps shape in R
- Creating our own dataset and representing it in the map of India
- Merging dataset and preparing it for visual representation
- Improving visualization
- Display external data on choropleth maps
- Presenting multiple maps at once
Download and import the maps share in R
There are multiple sites from where you can download shape files for free. I used this site (http://www.diva-gis.org/gdata) for downloading administrative map of India for further processing. Once you download the file, unzip the file and set your R working directory to the unzipped folder.
We will install all the necessary libraries at once and discuss one by one as we proceed along.
# Install all necessary packages and load the libraries into R library(ggplot2) library(RColorBrewer) library(ggmap) library(maps) library(rgdal) library(scales) library(maptools) library(gridExtra) library(rgeos)
Set the working directory to the unzipped folder and use the following code to import the shape into R.
# Set working directory states_shape = readShapeSpatial("IND_adm1.shp") class(states_shape) names(states_shape) print(states_shape$ID_1) print(states_shape$NAME_1) plot(states_shape, main = "Administrative Map of India")
> class(states_shape) [1] "SpatialPolygonsDataFrame" attr(,"package") [1] "sp" > names(states_shape) [1] "ID_0" "ISO" "NAME_0" "ID_1" "NAME_1" "TYPE_1" "ENGTYPE_1" "NL_NAME_1" "VARNAME_1" > print(states_shape$ID_1) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 > print(states_shape$NAME_1) [1] Andaman and Nicobar Andhra Pradesh Arunachal Pradesh Assam Bihar [6] Chandigarh Chhattisgarh Dadra and Nagar Haveli Daman and Diu Delhi [11] Goa Gujarat Haryana Himachal Pradesh Jammu and Kashmir [16] Jharkhand Karnataka Kerala Lakshadweep Madhya Pradesh [21] Maharashtra Manipur Meghalaya Mizoram Nagaland [26] Orissa Puducherry Punjab Rajasthan Sikkim [31] Tamil Nadu Telangana Tripura Uttar Pradesh Uttaranchal [36] West Bengal 36 Levels: Andaman and Nicobar Andhra Pradesh Arunachal Pradesh Assam Bihar Chandigarh Chhattisgarh ... West Bengal > plot(states_shape, main = "Administrative Map of India")
ID_1 provides a unique id for each of 36 states and union territories; while the NAME_1 provides the name of each of the states and union territories. We will be mainly using these two fields, other fields provide name of the country, code of the country and other information which separates data of one country from the other.
Alternatively, there is another function from different package which we can use to import shape into R.
States_shape2 = readOGR(".","IND_adm1") class(States_shape2) names(States_shape2) plot(States_shape2)
> States_shape2<-readOGR(".","IND_adm1") OGR data source with driver: ESRI Shapefile Source: ".", layer: "IND_adm1" with 36 features It has 9 fields Integer64 fields read as strings: ID_0 ID_1 > class(States_shape2) [1] "SpatialPolygonsDataFrame" attr(,"package") [1] "sp" > names(States_shape2) [1] "ID_0" "ISO" "NAME_0" "ID_1" "NAME_1" "TYPE_1" "ENGTYPE_1" "NL_NAME_1" "VARNAME_1" > plot(States_shape2)
In the above code “readOGR(“.”,”IND_adm1”), “.” means that the shapefile which we want to read is in our working directory; else, we would have to mention the entire path. Also, we need to mention the shapefile name without extension otherwise it will throw an error.
Creating our own dataset and representing it in the map of India
To begin with, we will create our own data for each of the 36 IDs and call it score D, a parameter which represents dancing talent of each of the states. (Please note that this score is randomly generated and does not reflect the true dancing talent :P).
# Creating our own dataset set.seed(100) State_count = length(states_shape$NAME_1) score_1 = sample(100:1000, State_count, replace = T) score_2 = runif(State_count, 1,1000) score = score_1 + score_2 State_data = data.frame(id=states_shape$ID_1, NAME_1=states_shape$NAME_1, score) State_data
> State_data id NAME_1 score 1 1 Andaman and Nicobar 558.2268 2 2 Andhra Pradesh 961.7615 3 3 Arunachal Pradesh 1586.5746 4 4 Assam 281.1586 5 5 Bihar 853.3299 6 6 Chandigarh 1400.2554 7 7 Chhattisgarh 1608.8069 8 8 Dadra and Nagar Haveli 1260.4761 9 9 Daman and Diu 1195.7210 10 10 Delhi 744.7406 11 11 Goa 1443.5782 12 12 Gujarat 1778.3428 13 13 Haryana 560.5062 14 14 Himachal Pradesh 766.7788 15 15 Jammu and Kashmir 1118.1993 16 16 Jharkhand 901.4804 17 17 Karnataka 520.4586 18 18 Kerala 697.6118 19 19 Lakshadweep 1014.7297 20 20 Madhya Pradesh 975.1373 21 21 Maharashtra 706.3637 22 22 Manipur 970.6760 23 23 Meghalaya 1182.9777 24 24 Mizoram 986.1971 25 25 Nagaland 942.2375 26 26 Orissa 901.4541 27 27 Puducherry 1754.6125 28 28 Punjab 1570.7218 29 29 Rajasthan 1039.7029 30 30 Sikkim 708.4160 31 31 Tamil Nadu 995.2757 32 32 Telangana 1381.9686 33 33 Tripura 659.8475 34 34 Uttar Pradesh 1653.6564 35 35 Uttaranchal 1138.8248 36 36 West Bengal 1229.3981
Merging dataset and preparing it for visual representation
We will use the function fortify() of ggplot2 package to get the shape file into a data frame and then merge the data frame file and dataset together.
# Fortify file fortify_shape = fortify(states_shape, region = "ID_1") class(fortify_shape)
> fortify_shape = fortify(states_shape, region = "ID_1") > class(fortify_shape) [1] "data.frame"
#merge with coefficients and reorder Merged_data = merge(fortify_shape, State_data, by="id", all.x=TRUE) Map_plot = Merged_data[order(Merged_data$order), ]
Now, let’s create a basic visualization and see how our maps looks like.
ggplot() + geom_polygon(data = Map_plot, aes(x = long, y = lat, group = group, fill = score), color = "black", size = 0.5) + coord_map()