## Planning your data analysis

So, you are ready to start planning your data analysis. It is always a good idea to have an idea of how the data will be organised prior to collecting it. This will make your life a whole lot easier when you come to do your analyses later!

In this section of the site, we will have a look at how to organise data based on the design you are using. We will think about the format that works best with R (usually .txt or .csv files), and the way to enter the data (what to put in columns and rows, what to label them, and so on).

Once this is clear, we will think about how to start working with a data set in R. From the options below, please select the design that best describes your experiment.

Before you start, make sure that you are clear on the levels of measurement used in statistics. If you are not sure what I mean by 'an interval scale' or 'a binomial response', I would suggest you check the "Types of Measurement" page.

In this section of the site, we will have a look at how to organise data based on the design you are using. We will think about the format that works best with R (usually .txt or .csv files), and the way to enter the data (what to put in columns and rows, what to label them, and so on).

Once this is clear, we will think about how to start working with a data set in R. From the options below, please select the design that best describes your experiment.

Before you start, make sure that you are clear on the levels of measurement used in statistics. If you are not sure what I mean by 'an interval scale' or 'a binomial response', I would suggest you check the "Types of Measurement" page.

Your experiment has: 1) two

**independent groups**(e.g., wild-type vs mutant) and you are collecting continuous data at**one time point**(e.g., escape latency), OR 2) continuous data from**one group of subjects**, collected at**two time points**(e.g., reaction time before and after administration of a drug) OR 3) continuous data from**one group of subjects**, or at**one time point**, that is being**compared against a population**when we have limited information about the population (e.g., we may only know the population mean).Your experiment has: 1) three or more

**independent groups**(e.g., independent animals given a drug at one of three different doses) and continuous data is collected at**one time point**(e.g., reaction time to salient stimulus) OR 2) continuous data from**one group of subjects**, collected at**three or more different time points**(e.g., baseline, drug-dose 1, drug-dose 2, etc.) OR 3) a continuous data from a**mixture of independent groups and repeated measurements**(e.g., wild-type vs mutant scored at baseline, drug-dose 1, drug-dose 2, etc.). You may also be adding a continuous covariate (e.g., wild-type vs. mutant scored at baseline, drug-dose 1, drug-dose 2, etc., controlling for weight).Your experiment will have
continuous predictor variables and a variety
of different types of response variable. These response variables could
be continuous (e.g., weight), count
data (e.g., number of pups in a litter) or binary (e.g., win/lose
trials or mortality at time point x). So, choose this test
if you have any of the following: 1)

**one group of subjects**with two continuous variables (e.g., subject's weight and subject's height), OR 2)**two groups of subjects**, each with one continuous variable (e.g., maternal alcohol consumption during pregnancy (units/day) and birth weight of offspring), OR 3)**one or more groups of subjects**with a continuous predictor (e.g., years smoking) and a binary outcome (e.g., mortality at 70-years) OR 4)**one or more groups of subjects**with a continuous predictor (weight in gestation) and a 'count' outcome (e.g., number of pups in litter). Finally, you may also have 'random effects' in this design. Random effects are special cases of grouping variables where the 'group' you are measuring may have been drawn from a larger population at random. For example, if you use laboratory animals for your work, and your subjects are housed in groups of three or four (maybe 2 treatment and 2 control in one cage), 'cage' could act as a random effect. This means that in your statistical model, any variability that was caused by cage would be accounted for in the model. Repeated measures ANOVAs are a special case of random effects model, as 'subject' (or what ever your repeated measures are on) is treated as a random effect.Your experiment will be comparing observed frequencies of a nominal variable with expected frequencies. An example of this may be studying Mendelian inheritance. For example, suppose that a cross between two fruit flies gives you 88 offspring, 64 with black eyes and 24 with red eyes and you are trying to ascertain the genotypes of the parents. Your hypothesis is that the allele for black is dominant and that the parent flies were both heterozygous for this trait. If your hypothesis is true, then the

**predicted ratio**of offspring from this cross would be 3:1 (based on Mendel's laws).