An Update on Data Distribution and Techniques of Data Transformation
Abstract
The distribution in biostatistics can be defined as distribution of frequencies of values of a given variable in a sample. Distribution can be broadly classified into normal and skewed distribution. Normal distribution is a symmetrical bell shaped curve. ±1 standard deviation covers 65% of values around median value and ±2 S.D. covers 95% of values around median value. Mean, median & mode are equal for normal distribution curve. Parametric test like t test and ANOVA are based on the assumption that the data follows normal distribution. In skewed or asymmetrical distribution, there is clustering of cases in either right side or left side of the curve. In right sided skewness, the tail of curve is on the right side. In left skewed distribution, the tail is on the left side. Non-parametric test can be used in case of skewed data. Parametric test are more robust as compare to non-parametric test. The alternative is to transform the numerical variable into another scale where the values do satisfy the assumptions needed for the desired parametric or “normal” statistical methods. These technique include logarithm transformation, generalized linear modelling, and bootstrapping.
Downloads
References
2. Limpert E, Stahel WA. Problems with using the normal distribution--and ways to improve quality and efficiency of data analysis. PLoS One. 2011;6(7):e21403
3. Peters, W.S. (1987). Normal Distribution. In: Counting for Something. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-4638-1_8
4. DEMING, W. De Moivre's “Miscellanea Analytica”, and the Origin of the Normal Curve. Nature 132, 713 (1933). https://doi.org/10.1038/132713a0
5. Bennett MR. The origin of Gaussian distributions of synaptic potentials. Prog Neurobiol. 1995 Jul;46(4):331-50.
6. Sartori, R. The Bell Curve in Psychological Research and Practice: Myth or Reality?. Qual Quant 40, 407–418 (2006)
7. Bennett MR. The origin of Gaussian distributions of synaptic potentials. Prog Neurobiol. 1995 Jul;46(4):331-50.
8. Delucchi KL, Bostrom A. Methods for analysis of skewed data distributions in psychiatric clinical studies: working with many zero values. Am J Psychiatry. 2004 Jul;161(7):1159-68.
9. Higgins JP, White IR, Anzures-Cabrera J. Meta-analysis of skewed data: combining results reported on log-transformed or raw scales. Stat Med. 2008 Dec 20;27(29):6072-92
10. Manikandan S. Data transformation. J Pharmacol Pharmacother. 2010 Jul;1(2):126-7
11. Feng C, Wang H, Lu N, Chen T, He H, Lu Y, Tu XM. Log-transformation and its implications for data analysis. Shanghai Arch Psychiatry. 2014 Apr;26(2):105-9
12. Henderson AR. The bootstrap: a technique for data-driven statistics. Using computer-intensive analyses to explore experimental data. Clin Chim Acta. 2005 Sep;359(1-2):1-26

Copyright (c) 2022 Ahmad Najmi, Avik Ray

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.