This course is designed for a range of graduate students who have an emphasis in quantitative methods in their program of study. The material is focused at a level that is considered appropriate for students in the MPH Applied Biostatistics, MS and PhD Biostatistics programs for which it is a requirement, and the PhD Epidemiology program for which it meets requirements. In this course, the students develop methods for identifying appropriate data analysis strategies, performing varied data analyses and writing and verbally communicating about their analyses when presented with a variety of data sets. This course is designed to synthesize the linear modeling framework that is experienced in the first year of graduate statistics work and to further expand on this concept by introducing a general class of issues that span numerous data analysis techniques. It also teaches the principles of reproducible research for statistics. Example topics include: missing data, investigating confounding/effect modification, study design and correlated data. Classes will consist of group work, lectures on example topics and individual student presentations on the analysis projects. Short in-class presentations, data analysis projects, and written summaries of the analyses will be used to assist in mastery of the materials.
Vittinghoff, E., Glidden, D.V, Shibosky, S.C. McCulloch, C.E. (2012). Regression Methods in Biostatistics 2nd Ed. Springer; Papers posted on Canvas.
Class materials including lecture notes, data sets and discussion boards will be placed online using Canvas (https://ucdenver.instructure.com/login). We will use the email feature of Canvas to communicate with you. Please make sure that you check the email account associated with your student account on Canvas.
There is no specific required statistical software package for this course. However, the instruction will generally be provided in SAS and at times R. Statistical assistance is not available from the instructors on software outside of SAS and R. Statistical software is available on the computers in Ed2 North Room 2201 and RC1 North Room 1309. Although the university provides software, this is an analysis-intensive course and we strongly suggest that the student invest in a license of their own for use on a home computer; inability to get to campus due to weather will not constitute grounds for extensions. Student licenses for SAS can be purchased through the University (see end of syllabus for detailed information for purchasing).
At the end of the course, you should be able to exhibit the following competencies: 1) Reframe study/investigator hypotheses (i.e. non-statistical hypotheses) into statistical hypotheses. 2) Identify appropriate statistical data summarizations (tables, plots, summary statistics) relevant to various types of data. 3) Identify and write-up appropriate data analysis strategies for a variety of clinical and epidemiologic datasets. 4) Perform an appropriate data analysis for a wide variety of clinical and epidemiologic study designs. 5) Verbally communicate about data, statistical analyses, and statistical results concisely and clearly with clinical/research investigators 6) Summarize various data analysis plans in written form for both the non-statistician and statistician. 7) Summarize in written form the statistical limitations of various analysis techniques. 8) Identify and understand statistical aspects of data analysis methods as presented in clinical, epidemiologic, and statistical literature. 9) Create records of all processes used to a) prepare data for analysis and b) conduct the analyses consistent with principles of reproducible research.
This course is divided into four modules. The first module has daily homework. There are three projects each taking approximately 3-4 weeks in time. Students are encouraged to work collaboratively on homework and the projects. However, the written homework and the following three components (below) for each project are expected to be an individual effort. Each of the data analysis projects consists of 3 components, each of which is described in more detail below: 1) Interim data analysis plan 2) Final presentation 3) Final written report
This presentation should include no more than 6 slides and be no more than 10 minutes in length. Presentations that go over these boundaries will receive a lower grade.
The final written report should be written in terminology that a clinical or other investigator (the person who collected the data, for instance) can understand and should be your own work. Equations are likely of limited utility in these reports. The written text should be 4-6 pages (no longer than 6 pages), double spaced, and with 1-inch margins. Tables and figures do not count toward the page limit but should be limited to a total of 4.
Good research practice requires documentation of data sources and code such that another biostatistician could reproduce your analysis if given the appropriate data and code you used to produce your report. The last pages of your report (not included in the 6-page limit) should include a directory and file name listing of the location of your data and statistical code you used to create the report. The top of your statistical code should include a note about the directory location of the data being read into your statistical program. For instance, using the data import function in SAS without copying the code used to import the data does not constitute reproducible research. The submitted code should be limited to that used for justifying and obtaining the results you present in the report and should be clearly commented so that another biostatistician can easily follow your steps. Exploratory work done to justify the final analysis may also be included. But it should be commented out (i.e., will not run if the code is run) and include comments on the rationale of the work.
The statistical appendix is the statistical analysis section written in equations and other language familiar to a statistician. It should include the detailed choices for models that often get cut from shortened reports. The key element is that the models you fitted in statistical language should be included.
Approximate grading scheme: 15% for module 1 hwk, 22% for projects 2-4, 14% for peer review of projects and statistical code, and 5% for class participation. Class participation includes (but is not limited to) attendance, verbal participation in both class discussions and group work, and peer evaluation of final presentations. Grades for each project are based on:
Letter grades will roughly be assigned according to the following scale: A+(100-98%), A(97-93%), A- (92-90%), B+ (89-88%),B (87-83%),B-(82-80%), C+(79-78%),C(77-73%),C-(72-70%),F(69% and below). The scale will never be shifted upwards, but may be shifted downwards to reflect the overall performance of the class. After the 2nd project, the current curved grading scheme will be posted online so that each student may judge their performance up to that point.
For the initial module and for each project you will be asked to provide feedback on classmates code. The grading scheme is a 0 (didn’t do it), 1 (needs work), 2 (good review). A good review has three components: 1) you provide constructive and nice feedback; 2) the feedback is not generic (meaning I can tell what project you are giving feedback on); 3) you identify at least one thing that you learned and complement it. Think about how you would like to receive feedback (i.e., don’t be mean). Nice job! or This sucks! are not detailed enough to be meaningful reviews. In the topics section of the class website there are other tutorials on providing reviewer feedback.
The instructors will check canvas e-mail once a day and will try to respond to questions within 24-hours. This implies that e-mails sent the evening before project due dates may not be answered prior to the class period. E-mail sent over the weekend will be answered Monday. Office hours are the preferred method of communication.
All students are expected to attend all scheduled class days.
All students are expected to abide the Honor Code of the Colorado School of Public Health. Unless otherwise instructed, all of your work in this course should represent completely independent work. Students are expected to familiarize themselves with the Student Honor Code that can be found at http://www.cudenver.edu/Academics/Colleges/PublicHealth/students/StudentAffairs/StudentResources/Pages/index.aspx or the Student Resources Section of the CSPH website. Any student found to have committed acts of misconduct (including, but not limited to cheating, plagiarism, misconduct of research, breach of confidentiality, or illegal or unlawful acts) will be subject to the procedures outlined in the CSPH Honor Code.
For students requesting accommodations, contact the Office of Disability Resources and Services. Their staff will assist in determining reasonable accommodations as well as coordinating the approved accommodations. Phone number: (303) 724-5640. Location: Building 500, Room W1103. The physical address is 13001 E. 17th Place.