Nonparametric statistics

Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are violated.^[1]

Definitions

The term "nonparametric statistics" has been imprecisely defined in the following two ways, among others.

Template:Ordered list

Applications and purpose

Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences. In terms of levels of measurement, non-parametric methods result in ordinal data.

As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are more robust.

Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding.

The wider applicability and increased robustness of non-parametric tests comes at a cost: in cases where a parametric test would be appropriate, non-parametric tests have less power. In other words, a larger sample size can be required to draw conclusions with the same degree of confidence.

Non-parametric models

Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.

A histogram is a simple nonparametric estimate of a probability distribution.
Kernel density estimation provides better estimates of the density than histograms.
Nonparametric regression and semiparametric regression methods have been developed based on kernels, splines, and wavelets.
Data envelopment analysis provides efficiency coefficients similar to those obtained by multivariate analysis without any distributional assumption.
KNNs classify the unseen instance based on the K points in the training set which are nearest to it.
A support vector machine (with a Gaussian kernel) is a nonparametric large-margin classifier.
Method of moments (statistics) with polynomial probability distributions.

Methods

Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed. The most frequently used tests include

2 independent samples are drawn from the same distribution

Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week
Logrank test: compares survival distributions of two right-skewed, censored samples
Mann–Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.
McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal
Median test: tests whether two samples are drawn from distributions with equal medians
Pitman's permutation test: a statistical significance test that yields exact p values by examining all possible rearrangements of labels
Rank products: detects differentially expressed genes in replicated microarray experiments
Siegel–Tukey test: tests for differences in scale between two groups
Sign test: tests whether matched pair samples are drawn from distributions with equal medians
Spearman's rank correlation coefficient: measures statistical dependence between two variables using a monotonic function
Squared ranks test: tests equality of variances in two or more samples
Tukey–Duckworth test: tests equality of two distributions by using ranks
Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random
Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks

| style="-moz-column-count:

Analysis of similarities
Anderson–Darling test: tests whether a sample is drawn from a given distribution
Statistical bootstrap methods: estimates the accuracy/sampling distribution of a statistic
Cochran's Q: tests whether k treatments in randomized block designs with 0/1 outcomes have identical effects
Cohen's kappa: measures inter-rater agreement for categorical items
Friedman two-way analysis of variance by ranks: tests whether k treatments in randomized block designs have identical effects
Kaplan–Meier: estimates the survival function from lifetime data, modeling censoring
Kendall's tau: measures statistical dependence between two variables
Kendall's W: a measure between 0 and 1 of inter-rater agreement
Kolmogorov–Smirnov test: tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution
Kruskal–Wallis one-way analysis of variance by ranks: tests whether > 2 independent samples are drawn from the same distribution
Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week
Logrank test: compares survival distributions of two right-skewed, censored samples
Mann–Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.
McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal
Median test: tests whether two samples are drawn from distributions with equal medians
Pitman's permutation test: a statistical significance test that yields exact p values by examining all possible rearrangements of labels
Rank products: detects differentially expressed genes in replicated microarray experiments
Siegel–Tukey test: tests for differences in scale between two groups
Sign test: tests whether matched pair samples are drawn from distributions with equal medians
Spearman's rank correlation coefficient: measures statistical dependence between two variables using a monotonic function
Squared ranks test: tests equality of variances in two or more samples
Tukey–Duckworth test: tests equality of two distributions by using ranks
Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random
Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks

-webkit-column-count

Analysis of similarities
Anderson–Darling test: tests whether a sample is drawn from a given distribution
Statistical bootstrap methods: estimates the accuracy/sampling distribution of a statistic
Cochran's Q: tests whether k treatments in randomized block designs with 0/1 outcomes have identical effects
Cohen's kappa: measures inter-rater agreement for categorical items
Friedman two-way analysis of variance by ranks: tests whether k treatments in randomized block designs have identical effects
Kaplan–Meier: estimates the survival function from lifetime data, modeling censoring
Kendall's tau: measures statistical dependence between two variables
Kendall's W: a measure between 0 and 1 of inter-rater agreement
Kolmogorov–Smirnov test: tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution
Kruskal–Wallis one-way analysis of variance by ranks: tests whether > 2 independent samples are drawn from the same distribution
Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week
Logrank test: compares survival distributions of two right-skewed, censored samples
Mann–Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.
McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal
Median test: tests whether two samples are drawn from distributions with equal medians
Pitman's permutation test: a statistical significance test that yields exact p values by examining all possible rearrangements of labels
Rank products: detects differentially expressed genes in replicated microarray experiments
Siegel–Tukey test: tests for differences in scale between two groups
Sign test: tests whether matched pair samples are drawn from distributions with equal medians
Spearman's rank correlation coefficient: measures statistical dependence between two variables using a monotonic function
Squared ranks test: tests equality of variances in two or more samples
Tukey–Duckworth test: tests equality of two distributions by using ranks
Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random
Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks

column-count

Analysis of similarities
Anderson–Darling test: tests whether a sample is drawn from a given distribution
Statistical bootstrap methods: estimates the accuracy/sampling distribution of a statistic
Cochran's Q: tests whether k treatments in randomized block designs with 0/1 outcomes have identical effects
Cohen's kappa: measures inter-rater agreement for categorical items
Friedman two-way analysis of variance by ranks: tests whether k treatments in randomized block designs have identical effects
Kaplan–Meier: estimates the survival function from lifetime data, modeling censoring
Kendall's tau: measures statistical dependence between two variables
Kendall's W: a measure between 0 and 1 of inter-rater agreement
Kolmogorov–Smirnov test: tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution
Kruskal–Wallis one-way analysis of variance by ranks: tests whether > 2 independent samples are drawn from the same distribution
Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week
Logrank test: compares survival distributions of two right-skewed, censored samples
Mann–Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.
McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal
Median test: tests whether two samples are drawn from distributions with equal medians
Pitman's permutation test: a statistical significance test that yields exact p values by examining all possible rearrangements of labels
Rank products: detects differentially expressed genes in replicated microarray experiments
Siegel–Tukey test: tests for differences in scale between two groups
Sign test: tests whether matched pair samples are drawn from distributions with equal medians
Spearman's rank correlation coefficient: measures statistical dependence between two variables using a monotonic function
Squared ranks test: tests equality of variances in two or more samples
Tukey–Duckworth test: tests equality of two distributions by using ranks
Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random
Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks

" }} }}>

Analysis of similarities
Anderson–Darling test: tests whether a sample is drawn from a given distribution
Statistical bootstrap methods: estimates the accuracy/sampling distribution of a statistic
Cochran's Q: tests whether k treatments in randomized block designs with 0/1 outcomes have identical effects
Cohen's kappa: measures inter-rater agreement for categorical items
Friedman two-way analysis of variance by ranks: tests whether k treatments in randomized block designs have identical effects
Kaplan–Meier: estimates the survival function from lifetime data, modeling censoring
Kendall's tau: measures statistical dependence between two variables
Kendall's W: a measure between 0 and 1 of inter-rater agreement
Kolmogorov–Smirnov test: tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution
Kruskal–Wallis one-way analysis of variance by ranks: tests whether > 2 independent samples are drawn from the same distribution
Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week
Logrank test: compares survival distributions of two right-skewed, censored samples
Mann–Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.
McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal
Median test: tests whether two samples are drawn from distributions with equal medians
Pitman's permutation test: a statistical significance test that yields exact p values by examining all possible rearrangements of labels
Rank products: detects differentially expressed genes in replicated microarray experiments
Siegel–Tukey test: tests for differences in scale between two groups
Sign test: tests whether matched pair samples are drawn from distributions with equal medians
Spearman's rank correlation coefficient: measures statistical dependence between two variables using a monotonic function
Squared ranks test: tests equality of variances in two or more samples
Tukey–Duckworth test: tests equality of two distributions by using ranks
Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random
Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks

History

Early nonparametric statistics include the median (13th century or earlier, use in estimation by Edward Wright, 1599; see Template:Slink) and the sign test by John Arbuthnot (1710) in analyzing the human sex ratio at birth (see Template:Slink).^[2]^[3]

Notes

1 }}

     | references-column-width 
     | references-column-count references-column-count-{{#if:1|{{{1}}}}} }}
   | {{#if: 
     | references-column-width }} }}" style="{{#if: 
   | {{#iferror: {{#ifexpr: 1 > 1 }}
     | Template:Column-width
     | Template:Column-count }}
   | {{#if: 
     | Template:Column-width }} }} list-style-type: {{#switch: 
   | upper-alpha
   | upper-roman
   | lower-alpha
   | lower-greek
   | lower-roman = {{{group}}}
   | #default = decimal}};">

General references

Bagdonavicius, V., Kruopis, J., Nikulin, M.S. (2011). "Non-parametric tests for complete data", ISTE & WILEY: London & Hoboken. Template:Isbn.
Template:Cite book
Gibbons, Jean Dickinson; Chakraborti, Subhabrata (2003). Nonparametric Statistical Inference, 4th Ed. CRC Press. Template:Isbn.
Template:Cite book also Template:Isbn.
Hollander M., Wolfe D.A., Chicken E. (2014). Nonparametric Statistical Methods, John Wiley & Sons.
Sheskin, David J. (2003) Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press. Template:ISBN
Wasserman, Larry (2007). All of Nonparametric Statistics, Springer. Template:Isbn.

[1] Template:Cite journal

[Conover1999-2] Template:Citation

[Sprent1989-3] Template:Citation

[1]

[2]

[3]

Nonparametric statistics

Contents