Statistika nonparametrik: Perbedaan antara revisi
k Perubahan kosmetik tanda baca |
k ~ref |
||
Baris 76: | Baris 76: | ||
== Catatan == |
== Catatan == |
||
{{reflist}} |
|||
<references group="" responsive=""></references> |
|||
== Referensi == |
== Referensi == |
Revisi per 9 Oktober 2020 04.52
Statistik nonparametrik adalah cabang statistik yang tidak hanya didasarkan pada keluarga parametrized dari distribusi probabilitas (contoh umum dari parameter adalah mean dan varians). Statistik nonparametrik didasarkan pada distribusi bebas atau memiliki distribusi yang ditentukan tetapi dengan parameter distribusi tidak ditentukan. Statistik nonparametrik mencakup statistik deskriptif dan inferensi statistik .
Definisi
Istilah "statistik nonparametrik" telah didefinisikan secara tidak tepat dalam dua cara berikut, antara lain.
- The first meaning of nonparametric covers techniques that do not rely on data belonging to any particular parametric family of probability distributions.
These include, among others:
- distribution free methods, which do not rely on assumptions that the data are drawn from a given parametric family of probability distributions. As such it is the opposite of parametric statistics.
- nonparametric statistics (a statistic is defined to be a function on a sample; no dependency on a parameter).
Order statistics, which are based on the ranks of observations, is one example of such statistics.
The following discussion is taken from Kendall's.[1]
Statistical hypotheses concern the behavior of observable random variables.... For example, the hypothesis (a) that a normal distribution has a specified mean and variance is statistical; so is the hypothesis (b) that it has a given mean but unspecified variance; so is the hypothesis (c) that a distribution is of normal form with both mean and variance unspecified; finally, so is the hypothesis (d) that two unspecified continuous distributions are identical.
It will have been noticed that in the examples (a) and (b) the distribution underlying the observations was taken to be of a certain form (the normal) and the hypothesis was concerned entirely with the value of one or both of its parameters. Such a hypothesis, for obvious reasons, is called parametric.
Hypothesis (c) was of a different nature, as no parameter values are specified in the statement of the hypothesis; we might reasonably call such a hypothesis non-parametric. Hypothesis (d) is also non-parametric but, in addition, it does not even specify the underlying form of the distribution and may now be reasonably termed distribution-free. Notwithstanding these distinctions, the statistical literature now commonly applies the label "non-parametric" to test procedures that we have just termed "distribution-free", thereby losing a useful classification.
- The second meaning of non-parametric covers techniques that do not assume that the structure of a model is fixed. Typically, the model grows in size to accommodate the complexity of the data. In these techniques, individual variables are typically assumed to belong to parametric distributions, and assumptions about the types of connections among variables are also made. These techniques include, among others:
- non-parametric regression, which is modeling whereby the structure of the relationship between variables is treated non-parametrically, but where nevertheless there may be parametric assumptions about the distribution of model residuals.
- non-parametric hierarchical Bayesian models, such as models based on the Dirichlet process, which allow the number of latent variables to grow as necessary to fit the data, but where individual variables still follow parametric distributions and even the process controlling the rate of growth of latent variables follows a parametric distribution.
Aplikasi dan tujuan
Metode non-parametrik banyak digunakan untuk mempelajari populasi yang mengambil urutan peringkat (seperti ulasan film menerima satu hingga empat bintang). Penggunaan metode non-parametrik mungkin diperlukan ketika data memiliki peringkat tetapi tidak ada interpretasi numerik yang jelas, seperti ketika menilai preferensi . Dalam hal tingkat pengukuran, metode non-parametrik menghasilkan data ordinal .
Karena metode non-parametrik membuat asumsi lebih sedikit, penerapannya jauh lebih luas daripada metode parametrik yang sesuai. Secara khusus, mereka dapat diterapkan dalam situasi di mana sedikit yang diketahui tentang aplikasi tersebut. Selain itu, karena ketergantungan pada asumsi yang lebih sedikit, metode non-parametrik lebih kuat .
Pembenaran lain untuk penggunaan metode non-parametrik adalah kesederhanaan. Dalam kasus tertentu, bahkan ketika penggunaan metode parametrik dibenarkan, metode non-parametrik mungkin lebih mudah digunakan. Karena kedua kesederhanaan ini dan keawetannya yang lebih besar, metode non-parametrik dilihat oleh beberapa ahli statistik sebagai menyisakan lebih sedikit ruang untuk penggunaan yang tidak tepat dan kesalahpahaman.
Penerapan yang lebih luas dan peningkatan ketahanan uji non-parametrik dikenakan biaya: jika uji parametrik sesuai, uji non-parametrik memiliki daya yang lebih kecil. Dengan kata lain, ukuran sampel yang lebih besar dapat diminta untuk menarik kesimpulan dengan tingkat kepercayaan yang sama.
Model non parametrik
Model non-parametrik berbeda dari model parametrik karena struktur model tidak ditentukan secara apriori melainkan ditentukan dari data. Istilah non-parametrik tidak dimaksudkan untuk menyiratkan bahwa model seperti itu benar-benar kekurangan parameter tetapi bahwa jumlah dan sifat parameter itu fleksibel dan tidak diperbaiki terlebih dahulu.
- Histogram adalah estimasi nonparametrik sederhana dari distribusi probabilitas.
- Estimasi kepadatan kernel memberikan estimasi kepadatan yang lebih baik daripada histogram.
- Metode regresi nonparametrik dan regresi semiparametrik telah dikembangkan berdasarkan kernel, splines, dan wavelet .
- Analisis pembungkus data memberikan koefisien efisiensi yang sama dengan yang diperoleh dengan analisis multivariat tanpa asumsi distribusi.
- KNNs mengklasifikasikan instance yang tidak terlihat berdasarkan pada poin K di set pelatihan yang terdekat dengan itu.
- Mesin vektor dukungan (dengan kernel Gaussian) adalah classifier margin besar nonparametrik.
- Metode momen (statistik) dengan distribusi probabilitas polinomial.
Metode
Metode statistik inferensial non-parametrik (atau bebas distribusi ) adalah prosedur matematika untuk pengujian hipotesis statistik yang, tidak seperti statistik parametrik, tidak membuat asumsi tentang distribusi probabilitas dari variabel yang dinilai. Tes yang paling sering digunakan termasuk
- Analysis of similarities
- Anderson–Darling test: tests whether a sample is drawn from a given distribution
- Statistical bootstrap methods: estimates the accuracy/sampling distribution of a statistic
- Cochran's Q: tests whether k treatments in randomized block designs with 0/1 outcomes have identical effects
- Cohen's kappa: measures inter-rater agreement for categorical items
- Friedman two-way analysis of variance by ranks: tests whether k treatments in randomized block designs have identical effects
- Kaplan–Meier: estimates the survival function from lifetime data, modeling censoring
- Kendall's tau: measures statistical dependence between two variables
- Kendall's W: a measure between 0 and 1 of inter-rater agreement
- Kolmogorov–Smirnov test: tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution
- Kruskal–Wallis one-way analysis of variance by ranks: tests whether > 2 independent samples are drawn from the same distribution
- Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week
- Logrank test: compares survival distributions of two right-skewed, censored samples
- Mann–Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.
- McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal
- Median test: tests whether two samples are drawn from distributions with equal medians
- Pitman's permutation test: a statistical significance test that yields exact p values by examining all possible rearrangements of labels
- Rank products: detects differentially expressed genes in replicated microarray experiments
- Siegel–Tukey test: tests for differences in scale between two groups
- Sign test: tests whether matched pair samples are drawn from distributions with equal medians
- Spearman's rank correlation coefficient: measures statistical dependence between two variables using a monotonic function
- Squared ranks test: tests equality of variances in two or more samples
- Tukey–Duckworth test: tests equality of two distributions by using ranks
- Wald–Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random
- Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks
Sejarah
Statistik nonparametrik awal termasuk median (abad ke-13 atau lebih awal, digunakan dalam estimasi oleh Edward Wright, 1599; lihat Median § History ) dan tes tanda oleh John Arbuthnot (1710) dalam menganalisis rasio jenis kelamin manusia saat lahir (lihat Sign test § History ). [2] [3]
Lihat juga
- Statistik parametrik
- Resampling (statistik)
- Interval kepercayaan nonparametrik berbasis CDF
- Teori bidang informasi
Catatan
- ^ Stuart A., Ord J.K, Arnold S. (1999), Kendall's Advanced Theory of Statistics: Volume 2A—Classical Inference and the Linear Model, sixth edition, §20.2–20.3 (Arnold).
- ^ Conover, W.J. (1999), "Chapter 3.4: The Sign Test", Practical Nonparametric Statistics (edisi ke-Third), Wiley, hlm. 157–176, ISBN 0-471-16068-7
- ^ Sprent, P. (1989), Applied Nonparametric Statistical Methods (edisi ke-Second), Chapman & Hall, ISBN 0-412-44980-3
Referensi
- Bagdonavicius, V., Kruopis, J., Nikulin, MS (2011). "Tes non-parametrik untuk data lengkap", ISTE & WILEY: London & Hoboken. ISBN 978-1-84821-269-5 ISBN 978-1-84821-269-5 .
- Corder, G. W.; Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Wiley. ISBN 978-1118840313.
- Gibbons, Jean Dickinson ; Chakraborti, Subhabrata (2003). Inferensi Statistik Nonparametrik, Edisi ke-4 CRC Tekan. ISBN 0-8247-4052-1 ISBN 0-8247-4052-1 .
- Hettmansperger, T. P.; McKean, J. W. (1998). Robust Nonparametric Statistical Methods. Kendall's Library of Statistics. 5 (edisi ke-First). London: Edward Arnold. ISBN 0-340-54937-8. MR 1604954. Hettmansperger, T. P.; McKean, J. W. (1998). Robust Nonparametric Statistical Methods. Kendall's Library of Statistics. 5 (edisi ke-First). London: Edward Arnold. ISBN 0-340-54937-8. MR 1604954. Hettmansperger, T. P.; McKean, J. W. (1998). Robust Nonparametric Statistical Methods. Kendall's Library of Statistics. 5 (edisi ke-First). London: Edward Arnold. ISBN 0-340-54937-8. MR 1604954. Hettmansperger, T. P.; McKean, J. W. (1998). Robust Nonparametric Statistical Methods. Kendall's Library of Statistics. 5 (edisi ke-First). London: Edward Arnold. ISBN 0-340-54937-8. MR 1604954. Hettmansperger, T. P.; McKean, J. W. (1998). Robust Nonparametric Statistical Methods. Kendall's Library of Statistics. 5 (edisi ke-First). London: Edward Arnold. ISBN 0-340-54937-8. MR 1604954. Hettmansperger, T. P.; McKean, J. W. (1998). Robust Nonparametric Statistical Methods. Kendall's Library of Statistics. 5 (edisi ke-First). London: Edward Arnold. ISBN 0-340-54937-8. MR 1604954. Hettmansperger, T. P.; McKean, J. W. (1998). Robust Nonparametric Statistical Methods. Kendall's Library of Statistics. 5 (edisi ke-First). London: Edward Arnold. ISBN 0-340-54937-8. MR 1604954. Hettmansperger, T. P.; McKean, J. W. (1998). Robust Nonparametric Statistical Methods. Kendall's Library of Statistics. 5 (edisi ke-First). London: Edward Arnold. ISBN 0-340-54937-8. MR 1604954. juga ISBN 0-471-19479-4 .
- Hollander M., Wolfe DA, Chicken E. (2014). Metode Statistik Nonparametrik, John Wiley & Sons.
- Sheskin, David J. (2003) Buku Pegangan Prosedur Statistik Parametrik dan Nonparametrik . CRC Tekan. ISBN 1-58488-440-1 ISBN 1-58488-440-1
- Wasserman, Larry (2007). Semua Statistik Nonparametrik, Springer. ISBN 0-387-25145-6 ISBN 0-387-25145-6 .