Wednesday, October 30, 2013

The Normal Distribution Assumption and Outliers

So your normal distribution assumption tests are not meet for your ANOVA, ANCOVA, MANCOVA, correlation, t test, etc?

Where to start?

By leaving panic behind and taking it easy. The first thing to remember is that you do not use the same criterion for assumptions tests (e.g., normal distribution) as you do for hypothesis tests. 

Criterion for assumption tests

Use p = .001 as the criterion. That is to say only if you get a p value lower than .001 should you worry about violations of assumption tests. If you are looking a skewness (skew) and kurtosis (kurt) then look at zskew =and zkurt and if either is higher than 3.29 (p = .001) then you may have a problem.

zskew = Skewness/SEskew

zkurt = Kurtosis/SEkurt

Why this conservative criterion you may ask? Because only when you have a serious violation of an assumption are you likely to run into problems with interpreting the results of your analysis.

If you have a violation of the normal distribution assumption then follow the flowchart in Figure 1 and refer to explanations in text.

Criterion for outliers

Outliers are not what SPSS calls "extreme values", just so that is clear. Neither are they what SPSS marks with a circle or an asterix in its box plots. An outlier is a value looking for its distribution - no sorry that is an attempt at being funny. An outlier is a value that is more than 3.29 standard deviation units away from the mean. Just look at your highest (Xmax) and lowest (Xmin) value for the variable you are having a problem with and the mean (M) and the standard deviation (SD) then calculate the following.

zdistance = (Xmax - M)/SD


zdistance = (M - Xmin)/SD

If either one of the above give you zdistance > 3.29 then you have an outlier so check other scores close to the outlier to see if you have multiple outliers.

Comparing the results (transformed vs. raw)

You should run your analysis twice. Once using the raw data and once with the transformed data. Compare the effect sizes such as draw = 0.50 vs. dtransformed = 0.56. The difference being 20% or less means that it is probably not going to affect the interpretation of the results. In cases like this you would simply report the raw data findings and note that the violation of the normal distribution only had a small effect on the analysis. 

If the change is moderate then your reader needs to know this and both sets of findings (i.e. raw and transformed) should be reported. 

Comparing the results (with vs. without an outlier)

As above you should run your analysis twice. Once using the raw data and once without the outliers. Compare the effect sizes such as draw = 0.40 vs. dwithout outliers = 0.47. The difference being 20% or less means that it is probably not going to affect the interpretation of the results. In cases like this you would simply report the raw data findings and note that the violation of the normal distribution only had a small effect on the interpretation of the findings. 

If the effect on your interpretation of the findings is moderate or large then your reader needs to know this and both sets of findings (i.e. raw and without outlier) should be reported.

Figure 1. Flowchart for decision on violations of the normality assumption. 

Tuesday, October 22, 2013

EndNote and Author initials: APA style

When EndNote wants to add author initials to citations and you don't need them there

I have noticed that when students and colleges are using EndNote to handle citations in manuscripts that many of the citations have author initials in them. This problem is caused by EndNote as it is trying to differentiate between what it thinks are two different authors. EndNote thinks that there are two or more different authors as the same author has different initials such as “D. Kuss” on one article and “D. J. Kuss” on another. You should be able to fix this (eliminate the initials) by adjusting the EndNote settings.

1. Go to EndNote

2. Open your library (I really hope you know how to do that)

3. Edit -> Output styles -> Edit "your output style"

4. Under citations select Ambiguous citations 
4a. Untick "Include the author initials or full name in citation"
4b. Tick "Add more authors until citation is unique"
4c. Untick "Add the title for different works by the same author"
4d. Tick "Add a letter after the year" (2000a,2000b)

5. Again under citations select Author name
5a. First author: Jane Smith
5b. Other authors: John Doe
5c. Capitalization: as is
5d. Initials: Last Name Only
5e. Untick "Use initials only for primary authors with the same name"

Hopefully you can just save over the old output style. If you have to save the output style under a new name then you need to use this new output style. 

6. To select a new output style
6a. Edit -> Output styles -> Edit "your NEW output style" 

If the new output style is not listed do the following

6b. Edit -> Output styles -> Open Style Manger
Find your new style and tick the box next to it and then close the style manager (the red x in right hand corner) but make sure not to close EndNote (x in top right hand corner but further up). 

You should now be able to select your new style from the drop down menu or 
6c. Edit -> Output styles -> Edit "your NEW output style" 

You may also have to select this new style in Word under the EndNote tab.

I found instructions that cover parts of this at
if you want some visuals. 


Friday, October 4, 2013

Some common essay and thesis issues

Rather than give you a ramble here is a list that I have tried to keep clear and concise

  1. Improve the flow of your argument (e.g., links paragraphs). The reader should not be surprised when he/she starts reading a new paragraph. 
  2. Not following APA style (e.g., citations and reference list). Consider using programs like EndNote or Mendeley to look after your citations and reference list.
  3. An essay claiming to review the literature should include all the relevant studies. 
  4. Page numbers are helpful and necessary.
  5. Make sure you have clear links from one paragraph to the next to maintain the 'flow' of your argument. 
  6. Report statistics to so that the reader knows the magnitude of a change (e.g., Cohen’s d), difference between groups (e.g, Cohen’s d), strength of an association (e.g., r), risk (e.g., %, OR, RR, RRR, ARR) etc. and remember that absolute values are preferred to relative values.
  7. Gender vs. Sex. Please read: 
APA style guide type issues
  1. Forgetting to put a comma when listing authors in the reference list see Example 1.
  2. Do not put a shortened version of the journal name, see Example 1.
  3. When citing in text or brackets don’t put a comma before the and or & if only two authors, see Example 2.
  4. Do not report the issue number after the volume. That said I never comment on this error or deduct marks for it as I find knowing the issue number most useful when looking up journals on the Internet as they do not list page numbers and volumes but rather issue numbers and volumes.
Example 1. Reference list.
Becker, B. J. & Hedges, L. V. (1984). Meta-analysis of cognitive gender differences: A comment on an analysis by Rosenthal and Rubin. J. of Edu. Psych., 76(4), 583-587.

Becker, B. J., & Hedges, L. V. (1984). Meta-analysis of cognitive gender differences: A comment on an analysis by Rosenthal and Rubin. Journal of Educational Psychology, 76, 583-587.

Example 2. Citations.
(Becker, & Hedges, 1984)
(Belina, Fidler, Williams & Cumming, 2005)
No comma if two authors (different from reference list).

(Becker & Hedges, 1984)
(Belina, Fidler, Williams, & Cumming, 2005)

That is it for now,


Research checklist

Please refer to the APA manual at all times!

(Research checklist: Version 6.6.1)


  1. The best thesis is grounded in theory.
  2. Does your study pass the "So what?" test! Is it adding anything new to what is already know. Replication is needed yes but you need to expand on the existing body of research.
  3. Download the ethics application and read through it. Fill in as you go along so that you have most of it ready before you need to submit it. (Check out my ethics blog post)
  4. Are you sampling from more than one population and if rightly so is that dealt with in your analysis and text?
  5. Have you got reliable and valid measures for all your concepts? You can start writing  your thesis now, reporting on the validity and reliability as well of the measures then later add the internal reliability (Cronbach's alpha) from your study.
  6. How easy will it be to get participants?
  7. What will your return rate (response rate) be? 
  8. What statistical analyses will you use?
  9. What is your time frame?
  10. Run a pilot. Yes even if you are doing a survey (e.g., internet, paper-and-pencil) type study. At least then you know if there are any problems with any of the questions and how long it will take to complete them.
  11. Running an experiment? 
    1. Make sure that you test your manipulation in the pilot.
    2. Have you got manipulation checks in place?
  12. Calculate power for all hypotheses. Try using G*Power 
  13. Start formatting your thesis document using the latest APA manual as your guide.
    1. Use the heading 1 etc. to format your headings but modify the default in line with APA style.
    2. Install EndNote to organise your references and citations. 
      1. Get EndNote from
      2. Do not confuse this with Endnote that is already in programs like Word – not the same thing.
      3. Note EndNote and extra author initials solution
      4. Note EndNote hints
  14. Make sure that the IV (the manipulation) is not “contaminated”
    1. For example if you measure depression and fatigue then both concepts may share similar questions thus the correlation between the two will be inflated.
    2. Different levels of different IVs should not overlap.


  1. When you create tables use the insert table function rather than tabs.
  2. Please do NOT use spaces to create tables.
  3. If you table needs to be in landscape then insert a section break before and after the table before you apply the new orientation (landscape) to the table.
  4. Make sure you refer to every table in text before the table appears and make sure that the table appears on the page you refer to it or the following if it cannot fit on the page where it is being referred to.
  5. Check out table templates at


  1. Consider using a figure to capture the allocation of participants to different conditions and any drop out and treatment.
  2. Download one document from covering the following:
    1. "Journal Article Reporting Standards (JARS)".
    2. "Meta-Analysis Reporting Standards (MARS)".
    3. "Flow of Participants Through Each Stage of an Experiment or Quasi-Experiment".


This is now covered in a separate blog

Ethics (UNE related)

This is now covered in a separate blog

While gathering data

  1. Are you working on your literature review? 
  2. Are you getting enough data. If not, take steps to get more data. 
  3. If you are conducting an experiment remember to check your manipulation checks to see if it is working. 
    1. If it is not working at all then back to the drawing board – you should have found this out in your pilot. 
    2. If it is only working on some participants then back to the drawing board or get more participants to meet your power needs.

Data set

  1. Remember to use sensible:
    1. file names based on your surname, three words to capture your thesis project and a version number (e.g., “Thorsteinsson Socials support and depression v01.sav”), when you make changes to the file use the “save as” function to update the version number to v02 etc. making sure that you keep a copy of older versions. It is not a bad idea to do the same for your thesis word processing document.
    2. variables names
    3. variable labels
    4. value labels.
  2. When you reverse score create a new variable (e.g., “ffs01” reverse scored into “ffs01r”).
  3. When you transform then transform into a new variable (e.g., “ffs01r” into “ffs01rt”). 
  4. Don’t forget to give new variables clear variable label (including information such as transformed square root, transformed log 10, reverse scored). 
  5. Use syntax if you can so that you know what steps you have already taken.
  6. Make sure that you have individual items from every scale used in the data set, how else will you calculate internal reliability and summarise individual scales.
  7. When you are aggregating (e.g., mean, sum) item scores into a sub-scale/scale score make sure to use check how many item scores are needed for you to calculate a scale score. Usually you can have a few items missing before you start thinking that you don't have enough items to calculate a scale score. Having scores missing for 3 items does not seem much if there are another 7 items in the scale all 7 with scores. 
    1. If you need the sum of scores for the scale in question make sure to calculate the mean and then multiply with the number of items on the scale as that will give you a better indication of the sum for that scale than if you sum maybe 7 item scores while 3 item scores are missing.

Analysis/Write up

  1. Do you define key concepts as soon as they appear in text – don’t leave the reader wondering what you are talking about; use examples and/or definitions to explain your concepts.
  2. Do you draw conclusions about cause and effect when you have only done a correlational or a quasi-experimental study? Try not to, some makers are dead against it while other markers are more pragmatic.
  3. Do you use concepts consistently; don’t start by talking about psychological problems then later talk about psychological issues if you do then the reader will assume that psychological issues and problems are not the same or that you are confused and don’t know what your are writing about.
  4. Does your introduction support your hypotheses and the questionnaires employed?
    1. Are there any variables/concepts in your hypotheses not mentioned in the introduction?
    2. Are your hypotheses non-directional or directional as per evidence provided in the introduction?
  5. Results must test all of your hypotheses (test them in the correct order and make it clear to the reader when you are testing them).
  6. Report effect sizes for all statistical analyses reported (e.g., ANOVAs, t tests, Chi-square). 
  7. Is it easy to follow hypothesis testing in the result section?
  8. If you have replaced missing values, adjusted outliers, and/or transformed the data the reader will wonder if the raw data supports the findings you end up reporting. Why not run the same analyses that you have already run using the raw data to see if the results are similar (e.g., effect size you get and conclusions you would draw) to the ‘transformed’ data, if these are the same then consider reporting the raw data as it has values that make ‘sense”.
  9. If you have not enough power you need to acknowledge this in your discussion and be careful what you say as your study has not proved the absence of effects if it got no statistically significant effects.
  10. If you are looking at sub-samples are they confounded? For example if you compare males and females and there are only internal students in the male sample but external and internal student in the female sample then you have a problem because your interpretation cannot say if there is a sex difference or if there is study mode difference.
  11. Do you treat your r-values correctly? Remember that an r=.20 is the same as r=.20. However, one of these may have a large N and thus be statistically significant. You have more confidence in the statistically significant r while the other may only be based on a small N thus you have less confidence in the finding. Just remember they the rs are the same size (same effect size) but they differ in the how reliable they are (i.e. can you trust the results to be close to the real association or are they off the mark).
  12. In your Discussion section make sure you discuss your findings in relation to previous findings and theories as covered in the introduction. 
  13. Write the abstract after finishing other parts of your thesis.
  14. Do you have to run a MANOVA before running multiple ANOVAs? Well, I would say no if the ANOVAs are testing your hypothesis/es. Make sure you read and cite Huberty, C. J., & Morris, J. D. (1989). Multivariate analysis versus multiple univariate analyses. Psychological Bulletin, 105, 302-308. doi:10.1037/0033-2909.105.2.302
  15. Which analysis to use? Try Note that this table has links to the relevant sections for SAS, Stata, SPSS, and R with instructions/information for the relevant analysis.
  16. Be consistent when it comes to the number of decimal places. Stick with two decimal places in most cases as any more implies a level of measurement accuracy that you simply not have, depending on your area obviously. You can make an exception for the p value as so many editors hold it in such high esteem thus it may require three decimal places.
  17. Put a leading zero such as Hedges' g = 0.24 for values where the index in question can take a value lower than -1 or higher than +1. Thus you don't have to put a leading zero for r, or p values but you should for F, t, b, and beta. 
  18. Do you need SPSS or will JASP do the job? 

Materials section

For each scale (and sub-scale) that you use report 
  1. alpha level(s) from your study; 
  2. number of items; 
  3. scoring range and anchors; 
  4. reverse scoring of what items; 
  5. give an item(s) example(s); 
  6. citation for the overall scale (not each sub-scale assuming the citation is the same for them all).

Partial example: “The _____ scale is a 10-item scale, answered on a 4-point Likert scale ranging from 0 (Did not apply to me) to 3 (Applied to me very much, or most of the time) to answer questions such as “I felt that life was meaningless”. In the present study internal reliability was .78.”

For more examples and coverage see the APA manual.

Procedure section

  1. Report ethics approval and number.

Effect sizes

There are other guidelines for what is a small, medium or large effect size but at least these match small, medium, and large for the d effect size when you convert d to r squared etc. These d value ranges here are based on Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, New Jersey: Lawrence Erlbaum.

For Cohen's d or Hedges' g or d (be the values positive or negative) we have

Small: 0.00 to 0.20
Medium >0.20 and <0.80
Large 0.80 and higher

For r (be the values positive or negative) we have

Small: .00 to .10
Medium >.10 and <.37
Large .37 and higher

For partial eta squared, eta squared, r squared use:

Small .000 to .010
Medium >.010 and <.138
Large .138 and higher

Question 1: When reporting effect sizes, is it necessary to be consistent in the type of effect size reported? 

It is fine to report different types of effect sizes (ES) for different results such as 

Partial eta squared for ANOVA.

Hedges' g, Cohen's d, or d to capture differences between two groups.

Pearson's r is an ES thus no need for additional ES when reporting correlation coefficients. 

When you report Hedges' g, Cohen's d, or d you also need to report the CI and usually you'd report the 95% CI maybe like

"... g = 0.34 [0.24, 0.44] ..." It should be clear from what you have already written what the CI in the square brackets is (e.g., 90%, 95%, 99%), if it is not clear then report it such as "... g = 0.34, 95% CI  [0.24, 0.44] ..." see APA manual 6th ed. pp. 116-117.

Question 2: Where can I find an ES calculator? 

Google the question and you'll find plenty of other options such as spreadsheets, software, and web pages. 

Reference list
  1. Use a reference manager such as Mendeley or EndNote to organise your references and citations.
  2. Remember to italicise volume numbers, journal titles (not the name of the article), and book names. 
  3. Remember NOT to put the issue number unless the journal is paginated by issue (i.e. each issue starts on page 1), the latter would be unusual for psychology journals.

That is all for now.