Bernease Herman

Joseph Hellerstein

Teaching Assistant: Tony Cannistra


Days: Tuesdays

Time: 5:00 - 8:50PM

Bernease’s office hours: 6-7:30PM Mondays, bernease AT

Joe Hellerstein will hold office hours by appointment: jlheller AT

Course Description

Scientists, engineers, and other technical professionals require skills in computing and data analysis to do their jobs. We refer to these as data science skills.

Examples of data science skills abound. Biologists search thousands of genomes for DNA sequences with special characteristics, such as genes that transcribe non-coding RNA that is “anti-sense” to messenger RNAs. Astronomers search, integrate, and visualize data from many instruments that produce terabytes of complex data. Social scientists do text analytics on massive repositories of social media data to distill patterns in topics and trends in sentiment.

This course teaches graduate students the software engineering skills to do research in data science fields and to be successful technical professionals in the 21st Century. In particular, this course teaches how to approach computational research with reproducibility in mind: to create sharable and reusable research projects that incorporate both computation and data.

Students will learn the following skills:

The course emphasizes a hands-on learning approach in which class time is often used for problem solving in small groups. The first part of the class teaches the skills described above. The second part is devoted to the class project, creating a computational research project of their choosing.

Some prior computing experience is desirable. For example, we expect that given a CSV file you can open it and plot the data in a language like MATLAB, IDL, R, or Python. A Software Carpentry bootcamp, Codeacademy, or similar MOOC would be appropriate venues to learn these skills. Lessons include, e.g.: