Go back to the index for the all the Git and GitHub topics.

Why are we adopting Git (i.e., version control in this class)?

It will help you better document your analysis process and be more reproducible in your work as a data analyst. It will also show you are keeping up with some of the latest trends and tools in the data analysis/data science community.

First, what is version control?

Version control is a method of tracking the changes to a collection of files (called a repository). It is a replacement for creating new folders with different names, put version 2, 3, etc at the end of a file or adding a date at the end. It is a superior way of achieving this same idea.

Why Git for version control?

Git is one version control system. It has become the most popular version control system. For an explanation of how Git does version control see the book Pro Git. Although Git was developed for source code, it is useful for version control for any type of file. So, in the era of reproducible research, the analytic (and data science) community has realized it as a valuable tool for analysis projects. This is because analysis project are made of a variety of files, including analysis source code, that is changing over the life of a project. Version control is better at tracking what changed and making you write comments about what you intended to, or actually did change, in your files.

How does Git differ from GitHub?

GitHub is one population hosting site that allows you to make your work visible to other people, either in a read-only way or for collaboration. The idea is to put the Git repositories you want to share available on the web. Sharing can be totally public (anyone can see it and change the code). Sharing can be private among only those who you give access to. GitHub reduces the extra work associated with sharing and collaboration to almost nothing. As I have stated before, GitHub is like DropBox with version control appropriate for projects like ours.

Installing Git (and next a Git client)

Just a bit of nomanclature… Git is the software that does version control. A Git client is a nice user interface with Git to save you from working at the command line. I cannot promise you that you will never be at the command line when adopting our new approach for working on analysis projects. I can promise that we will help you if needed. Now follow the instructions below based on the operating system that you have to load git. Then you will go load GitKracken, the Git client we will use in this class.

Git installation: Mac

Git installation: Windows

Go back to the index for the all the Git and GitHub stuff.