Introductory Seminars for First-Year Students
Riding the Data Wave
Imagine collecting a bit of your saliva and sending it in to one of the personalized genomics company. For very little money you will get back information about hundreds of thousands of variable sites in your genome. Records of exposure to a variety of chemicals in the areas you have lived are only a few clicks away on the web, as are thousands of studies and informal reports on the effects of different diets to which you can compare your own. What does this all mean for you?
Never before in history have humans recorded so much information about themselves and the world that surrounds them, nor has this data been so readily available to the lay-person. Expressions such as "data deluge'' are used to describe such wealth as well as the loss of proper bearings that it often generates. How to summarize all this information in a useful way? How to boil down millions of numbers to just a meaningful few? How to convey the gist of the story in a picture without misleading oversimplifications?
To answer these questions we need to consider the use of the data, appreciate the diversity that they represent, and understand how people instinctively interpret numbers and pictures. During each week, we will consider a different data set to be summarized with a different goal. We will review analysis of similar problems carried out in the past and explore if and how the same tools can be useful today. We will pay attention to contemporary media (newspapers, blogs, etc.) to identify settings similar to the ones we are examining and critique the displays and summaries documented there. Taking an experimental approach, we will evaluate the effectiveness of different data summaries in conveying the desired information by testing them on subsets of students in the seminar.