(Work-in-progress)

There are very good R style guides out there. If you’re programming in R, choose one of these and follow it! If you’re using SAS, review the R style guides and my SAS suggestions below. To quote Hadley Wickham’s guide:

Good coding style is like using correct punctuation. You can manage without it, but it sure makes things easier to read… You don’t have to use my style, but you really should use a consistent style.

All the suggestions in these guides are good, but for now let’s focus on Syntax and Organization in the Advanced R style guide. The primary message is consistent and extensive use of whitespace. Whitespace, any horizontal or vertical blank space in your code, makes your code more readable and understandable both for other people and for yourself.

Ever taken a break from a project and then come back to it having to learn all over again what you did? Consistent and logical use of whitespace, along with extensive commenting, will make this process much faster. As you work on more analyses and those analyses get more complex, coding style becomes an increasingly important part of being an efficient analyst.

Some specific points

Use headers at the top of each of your code files. Here are the header styles I use:

################################################################################
# my_r_script.R
#   This is a one-sentence description of what this file does.
#
# author: Matt Mulvahill
# notes:
#   - this is a note about what this file does and/or how to use it.
#   - some people also include a 'date modified:' field in their headers, along
#     with the 'author:' and 'notes:' fields.  I find the date fields 
#     unnecessary when using version control and usually innacurate (since I
#     always forget to update it).
################################################################################

# start of code... 
/*******************************************************************************
* my_sas_script.sas  
*   This is a one-sentence description of what this file does.
*
* author: Matt Mulvahill
* notes:
*   - this is a note about what this file does and/or how to use it.
*   - some people also include a 'date modified:' field in their headers, along
*     with the 'author:' and 'notes:' fields.  I find the date fields 
*     unnecessary when using version control and usually innacurate (since I
*     always forget to update it).
********************************************************************************/

* start of code...

Example of spacing (especially around operators: =, *, /, etc.), line length, and indentation in SAS.

* Comment describing what this new dataset will be used for;
data mynewdata;;
  set mydata;
  length y 8.;

  * some random if/else statements;
  if thisvar = 1 then thisvar = thatvar;
  else thisvar = 2;

  * now a loop making some calculations;
  y = 0;
  z = 0;
  do i = 1 to 100 by 1;
    y = y ** 2;
    z = y / i;
  end;

  keep time thisvar thatvar y z;
run;


* An example PROC from one of my projects;
*   Note that SAS is case-insensitive.  I use all lower case for keywords;
*   (proc, title, etc.). All of the lines are <= 80 characters long, except for;
*   titles which include space and newline characters in their values;
proc mixed data = long;
  title 'Model with nested eye within patient (repeated AR1), with diagnosis interaction';
  class pid eye time_months(ref = '0') diagnosis;

  model log_iop = time_months | diagnosis meds / solution residual cl;
  random pid eye(pid);
  repeated / subject = eye(pid) type = ar(1) r;
  lsmeans time_months time_months * diagnosis / pdiff adjust = tukey;

  ods output LSMeans = diag_meansout 
             SolutionF = diag_estimates 
             Diffs = diag_lsmeans_diffs;
run;