Introductory Seminars for First-Year Students

Riding the Data Wave


Imagine collecting a bit of your saliva and sending it in to one of the personalized genomics company. For very little money you will get back information about hundreds of thousands of variable sites in your genome. Records of exposure to a variety of chemicals in the areas you have lived are only a few clicks away on the web, as are thousands of studies and informal reports on the effects of different diets to which you can compare your own. What does this all mean for you?

Never before in history have humans recorded so much information about themselves and the world that surrounds them, nor has this data been so readily available to the lay-person. Expressions such as "data deluge'' are used to describe such wealth as well as the loss of proper bearings that it often generates. How to summarize all this information in a useful way? How to boil down millions of numbers to just a meaningful few? How to convey the gist of the story in a picture without misleading oversimplifications?

To answer these questions we need to consider the use of the data, appreciate the diversity that they represent, and understand how people instinctively interpret numbers and pictures. During each week, we will consider a different data set to be summarized with a different goal. We will review analysis of similar problems carried out in the past and explore if and how the same tools can be useful today. We will pay attention to contemporary media (newspapers, blogs, etc.) to identify settings similar to the ones we are examining and critique the displays and summaries documented there. Taking an experimental approach, we will evaluate the effectiveness of different data summaries in conveying the desired information by testing them on subsets of students in the seminar.


Meet the Instructor(s)

Chiara Sabatti

"I was born and raised in Italy, where I studied statistics and economics at the Bocconi University in Milan. I came to Stanford to pursue a Ph.D. in statistics and worked on computer simulation methods until I discovered the power of statistics in genetics during my postdoctoral experience. I was on the faculty at UCLA and after nine happy years in sunny southern California, I came back north with my family and currently live and work at Stanford. I am a professor in the Departments of Biomedical Data Science and Statistics. My research focuses on statistical methods for the analysis of genetics and genomics data. I particularly enjoy working with first-year students and have been a pre-major advisor as well as a major advisor for the Program in Mathematical and Computational Science."