Statistics-Normality INSTALLATION To install this module, run the following commands: perl Makefile.PL make make test make install NORMALITY TESTS ####################### # SHAPIRO-WILK TEST # ####################### The L [Shapiro65] is considered to be among the most objective tests of normality [Royston92] and also one of the most powerful ones for detecting non-normality [Chen71]. Its statistic is essentially the roughly best unbiased estimator of population standard deviation to the sample variance [Dagostino71]. The test is mathematically complex and most implementations use several conventional approximations (as we do here), including Blom's formula for the expected value of the order statistics [Harter61] and transformation to standard normal distribution for evaluation, especially for large samples [Royston92]. $pval = shapiro_wilk_test ([0.34, -0.2, 0.8, ...]); ($pval, $w_statistic) = shapiro_wilk_test ([0.34, -0.2, 0.8, ...]); This test may not be the best if there are many repeated values in the test distribution or when the number of points in the test distribution is very large, e.g. more than 5000. The routine will L about the latter, but not the former. This particular implementation of the test also requires at least 6 data points in the sample distribution and will L otherwise. ############################## # D'AGOSTINO K-SQUARE TEST # ############################## The L is a good test against non-normality arising from L and/or L [Dagostino90]. $pval = dagostino_k_square_test ([0.34, -0.2, ...]); ($pval, $ksq_statistic) = dagostino_k_square_test ([0.34, -0.2, ...]); The test statistic depends upon both the sample kurtosis and skewness, as well as the moments of these parameters from a normal population, as quantified by Pearson's coefficients [Pearson31]. These are transformed [Dagostino70,Anscombe83] to expressions that sum to the K-squared statistic, which is essentially chi-square-distributed with 2 degrees of freedom [Dagostino90]. The kurtosis transform, and thus the overall test, generally works best when the sample distribution has at least 20 data points [Anscombe83] and the routine will L otherwise. GENERAL IMPLEMENTATION NOTES FOR STATISTICAL TESTS (1) Use standard Horner's Rule for polynomial evaluations, see e.g. Forsythe, Malcolm, and Moler "Computer Methods for Mathematical Computations" (1977) Prentice-Hall, pp 68. (2) VAGARIES OF THE PERL Statistics::Distributions PACKAGE symmetric This package is implemented in the opposite way that one : | finds tables of the standard normal function presented in /:\ | textbooks, where F(z) = A1 is the area from -infinity to : / : \ | Z (or sometimes from 0 to Z). Instead, the Perl package :/ : \| gives f(z) = A2 as the area from Z to +infinity, i.e. / : \ in the *context of a significance test*. Note the /: : |\ following implications for this package: / : A1 | \ / : : |A2\ f(Z) = F(-Z) F(Z) + f(Z) + 1 __/___:___:___|___\___ -Z 0 Z -1 -1 -1 udistr: Z = f [f(Z)] = f [1 - F(Z)] = f (1 - A1) and the same appears to be true for other distributions in this package, e.g. chi-square, student's T, etc. (3) Tests that should perhaps be implemented in future versions: * Anderson-Darling test * Jarque-Bera test SUPPORT AND DOCUMENTATION After installing, you can find documentation for this module with the perldoc command. perldoc Statistics::Normality You can also look for information at: RT, CPAN's request tracker http://rt.cpan.org/NoAuth/Bugs.html?Dist=Statistics-Normality AnnoCPAN, Annotated CPAN documentation http://annocpan.org/dist/Statistics-Normality CPAN Ratings http://cpanratings.perl.org/d/Statistics-Normality Search CPAN http://search.cpan.org/dist/Statistics-Normality/ COPYRIGHT AND LICENSE Copyright (C) 2012 Mike Wendl This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.