Thursday 20 August 2015

Obesity in the USA (2015)

In order to test the integration with the c3.js charting library with something more interesting than the basic line or bar charts (not that there’s anything wrong with these charts), and to return to visualising health data in some way, which is partly the premise of the dissertation, the excellent open data set produced by County Health Rankings & Roadmaps for the United States was downloaded.

An oft reported problem in the United States is increasing obesity with reports in the Guardian backed up by various official sources on the internet from the Centres for Disease Control and Prevention and National Institute of Diabetes and Digestive and Kidney Diseases amongst others, all predicting increasing health problems for those with obesity.

Deciles of obesity, by United States county, for the year 2015. Pan and zoom around the map, click on a polygon to view the percentage of adult obesity within the county.

The County Health rankings data includes the Federal Information Processing Standards county code which uniquely identifies counties within the United States. This makes it a simple process to join the data within QGIS to one of the County Cartographic Boundary Shapefiles from the United States Census Bureau.

Did I say simple? The only problem with numeric codes starting with a zero, is that during the import of the data into QGIS (and another leading GIS provider) the code is converted to a numeric field and the leading zero is lost. No number starts with a zero other than zero itself. The county of Autauga, Alabama now has a code of 1001 instead of 01001, and any attempt to join the data to the GEOID field in the shapefile from the Census Bureau will result in gaps in the resulting data set. There may be other ways to fix or avoid this problem, but simply adding a new text field with the QGIS Field Calculator which pads the data is the approach I took. The imported CSV file and the county boundaries can then be "simply" joined.

For reference the SQL fragment used to achieve the padding is as follows:

lpad(  tostring( "FIPS" ), 5, '0')

Classifying the layer with a graduated quantile (equal count) method, by the obesity attribute, into 10 classes splits the data into equal deciles. The addition of a blue and red colour scheme from colorbrewer makes for a nice looking map.

The final map was created using the plugin, d3's Albers USA projection, adding the county and state attributes to the popup information, and choosing a Gauge chart type. This c3.js chart type expects a single attribute in the data range, and that attribute is expected to be a percentage by default, though you can change this behaviour. Otherwise strange results will be visualised. The percentage of Adult obesity for each county is already present in the dataset and therefore used in the export. No field calculations necessary.

After the plugin has finished the export the tooltip template can be tidied to remove the field names and replace the html table with a simple div element. Finally, the colours of the gauge can be changed according to the data value, rather than the default colour, by adding a pattern and threshold to the JavaScript which replicates html colour codes from colorbrewer and the top value for each class in the graduated style used within QGIS:

color: { 
  pattern: ['#053061', '#2166ac', '#4393c3', '#92c5de', '#d1e5f0', '#fddbc7', '#f4a582', '#d6604d', '#b2182b', '#67001f'], 
  threshold: { 
    values: [24.99, 27.99, 28.99, 29.99, 30.99, 31.99, 32.99, 33.99, 35.99, 48] 
  } 
},

Although the data has not been standardised, the most obese counties can be easily picked out.