- David Beck
- Joseph L. Hellerstein
- Bernease Herman
- Days: Monday, Wednesday
- Time: 8:30am - 9:50am
- Place: OUG 136
Scientists, engineers, and other technical professionals require skills in computing and data analysis to do their jobs. We refer to these as data science skills.
Examples of data science skills abound. Biologists search thousands of genomes for DNA sequences with special characteristics, such as genes that transcribe non-coding RNA that is “anti-sense” to messenger RNAs. Astronomers search, integrate, and visualize data from many instruments that produce terabytes of complex data. Social scientists do text analytics on massive repositories of social media data to distill patterns in topics and trends in sentiment.
This course teaches graduate students the software engineering skills to do research in data science fields and to be successful technical professionals in the 21st Century. In particular, this course teaches how to approach computational research with reproducibility in mind: to create sharable and reusable research projects that incorporate both computation and data.
Students will learn the following skills:
- Developing software in a way that it can be used by others, including documentation, installing packages, automating setup, and running computational studies.
- Creating technical specifications for what a program should do (its use cases) and how this is accomplished (software design). Creating, updating, and sharing a project using version control (specifically GitHub).
- Programming in python using the Python scientific stack, including numpy, pandas, and matplotlib.
- Developing unit tests that validate important aspects of the project implementation, and, more broadly, using test-driven development to build software.
- Searching, evaluating, and integrating into a project an externally developed Python packages as well as creating your own Python packages.
The course emphasizes a hands-on learning approach in which class time is often used for problem solving in small groups. The first part of the class teaches the skills described above. The second part is devoted to the class project, creating a computational research project of their choosing.
Some prior computing experience is desirable. For example, we expect that given a CSV file you can open it and plot the data in a language like MATLAB, IDL, R, or Python. A Software Carpentry bootcamp, Codeacademy, or similar MOOC would be appropriate venues to learn these skills. Lessons include, e.g.:
- Using the shell (command line): http://swcarpentry.github.io/shell-novice/
- General Python overview: http://swcarpentry.github.io/python-novice-inflammation/