Making Faces to Make Sense of Biomedical Data

Can we exploit the extraordinary ability that humans have to read faces to make sense of abstract data -- by rendering data as concrete facial features? That's a question raised by a recent ProgrammableWeb Mashup of the Day Pubmed Faceoff. According to the Pubmed Faceoff website:

This site applies a simple, photorealistic variant of the Chernoff Faces visualization technique to impact factor data for papers in the PubMed database of biomedical literature.

Basically it allows you to search PubMed and have the results represented as a set of human faces.

You can get more details from a blog post by Euan Adie the creator of Pubmed Faceoff:

Pubmed Faceoff is a mashup of Pubmed, Carl Bergstrom’s Eigenfactors dataset and Scopus, inspired by something that Pierre Lindenbaum mentioned on Twitter. It renders PubMed results as a set of photorealistic Chernoff Faces whose facial features are determined by the age, citation count and journal impact factor associated with each paper. The idea is that you can tell at a glance which papers are new, exciting and high impact and which are languishing, uncited and unread.


Let's unpack these descriptions to understand exactly what's happening with Pubmed Faceoff. The first thing to know is that the abstract data visualized by Pubmed Faceoff is search results from Pubmed -- specifically lists of biomedical research papers:

PubMed is a service of the U.S. National Library of Medicine that includes over 18 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s. PubMed includes links to full text articles and other related resources.

You can try Pubmed yourself to see what's in it; for instance, type in "lung cancer", an example we'll use later on. In addition to using the standard Pubmed interface, you can access Pubmed data via a the NCBI Entrez API, which is what the Pubmed Faceoff mashup uses. In fact, there are quite a number of alternative interfaces to Pubmed made possible by this API, including HubMed (see the HubMed search results for lung cancer).

The NCBI Entrez API gives you a list of articles and their publication date (i.e., the age of an article). When trying to figure out what papers to pay attention to (a pressing issue since there are often so many papers being published!), researchers become interested in the "impact" that a given paper is having. It's hard (and perhaps impossible) to boil the importance of a paper down to numbers, but impact is often measured by the number of times it has been cited by other papers as well as by the influence of the journal in which a paper is published. Since the NCBI Entrez API doesn't provide such data, the Pubmed Faceoff mashup makes use of

Pubmed Faceoff attempts to distill these various factors for a given paper into a single representation: a face!

The ethnicity and gender of the face is selected at random for visual interest - you can turn this feature off if you so choose.The age of a face correlates with the publication date of the paper. Younger faces are more recent papers.A smile means that the paper has been cited more times than expected (based on its age). Larger smiles mean more citations.A frown means that the paper has been cited far less than you might expect.The raised eyebrows correlate with the impact factor (sort of - actually the Eigenfactor) of the journal in which the paper was published.

A novel aspect of the mashup is the use of more photorealistic Chernoff faces instead of the classic 2D-cartoon faces. When you look at a list of articles (such as for lung cancer), you see something like:

Pubmed Faceoff

Although this visualization is eye-catching, Euan Adie acknowledges some of its limitations:

I’m quite pleased with how the system turned out although to be honest I still think the usefulness of Chernoff Faces is debatable. Does it actually work? Is the amount of time it takes you to adjust to scanning the faces more than the amount of time it’d take to simply scan a table of data? Or is it just cute?

The gender and ethnicity of each face are picked at random to add a bit of visual interest but personally I find it slightly easier to interpret the faces when they’re all male and European. That I’m rubbish at reading women comes as no surprise but the ethnicity thing is interesting as it fits with research into cross-race facial recognition that suggests we’re each better at recognizing the types of faces that we see every day.

Perhaps there's a way to render our own faces or those of our closest friends and family members. Might that improve the visualization?

Be sure to read the next Mashups article: 500 Photo Mashups