R knows about a lot of probability distributions:
help(Distributions)
For each distribution, there are 4 function: dxxx, pxxx, qxxx and rxxx, where xxx is the name for the distribution. Take the normal distribution as an example.
By default, the normal distribution functions use the distribution with mean=0 and standard deviation (sd)=1. You can change these for different variations on the distribution.
dnormFirst, there is a function to get the density of the probability distribution (the PDF):
vals<-seq(-4, 4, .2) # this makes a vector with a sequence of numbers from -4 to 4 in it
data.frame(vals=vals, pdf=dnorm(vals))
## vals pdf
## 1 -4.0 0.0001338302
## 2 -3.8 0.0002919469
## 3 -3.6 0.0006119019
## 4 -3.4 0.0012322192
## 5 -3.2 0.0023840882
## 6 -3.0 0.0044318484
## 7 -2.8 0.0079154516
## 8 -2.6 0.0135829692
## 9 -2.4 0.0223945303
## 10 -2.2 0.0354745928
## 11 -2.0 0.0539909665
## 12 -1.8 0.0789501583
## 13 -1.6 0.1109208347
## 14 -1.4 0.1497274656
## 15 -1.2 0.1941860550
## 16 -1.0 0.2419707245
## 17 -0.8 0.2896915528
## 18 -0.6 0.3332246029
## 19 -0.4 0.3682701403
## 20 -0.2 0.3910426940
## 21 0.0 0.3989422804
## 22 0.2 0.3910426940
## 23 0.4 0.3682701403
## 24 0.6 0.3332246029
## 25 0.8 0.2896915528
## 26 1.0 0.2419707245
## 27 1.2 0.1941860550
## 28 1.4 0.1497274656
## 29 1.6 0.1109208347
## 30 1.8 0.0789501583
## 31 2.0 0.0539909665
## 32 2.2 0.0354745928
## 33 2.4 0.0223945303
## 34 2.6 0.0135829692
## 35 2.8 0.0079154516
## 36 3.0 0.0044318484
## 37 3.2 0.0023840882
## 38 3.4 0.0012322192
## 39 3.6 0.0006119019
## 40 3.8 0.0002919469
## 41 4.0 0.0001338302
Looking at the data, the pdf peaks at 0 (the mean), and decreases symetrically away from 0. Plot this:
plot(dnorm(vals) ~ vals, type="l")
It looks like what you expect for the normal distribution. dnorm is mostly commonly used for drawing the distribution – you don’t usually need to compute the value of the PDF at a specific point otherwise.
pnormpnorm tells us how likely it is (the probability) that a random draw from the distribution would be a value less than or equal to the number supplied – this is the area under the PDF curve. The return value will range between 0 and 1:
vals<-seq(-3,3,.2)
data.frame(vals=vals, prob=pnorm(vals))
## vals prob
## 1 -3.0 0.001349898
## 2 -2.8 0.002555130
## 3 -2.6 0.004661188
## 4 -2.4 0.008197536
## 5 -2.2 0.013903448
## 6 -2.0 0.022750132
## 7 -1.8 0.035930319
## 8 -1.6 0.054799292
## 9 -1.4 0.080756659
## 10 -1.2 0.115069670
## 11 -1.0 0.158655254
## 12 -0.8 0.211855399
## 13 -0.6 0.274253118
## 14 -0.4 0.344578258
## 15 -0.2 0.420740291
## 16 0.0 0.500000000
## 17 0.2 0.579259709
## 18 0.4 0.655421742
## 19 0.6 0.725746882
## 20 0.8 0.788144601
## 21 1.0 0.841344746
## 22 1.2 0.884930330
## 23 1.4 0.919243341
## 24 1.6 0.945200708
## 25 1.8 0.964069681
## 26 2.0 0.977249868
## 27 2.2 0.986096552
## 28 2.4 0.991802464
## 29 2.6 0.995338812
## 30 2.8 0.997444870
## 31 3.0 0.998650102
It’s always increasing (since as you move right on the number line you’re always increasing the probability that a random draw would be less than or equal to the value), and pnorm(0) is 0.5 – since there’s a 50/50 chance that a random draw would be to the left of the mean. If you plot the results of pnorm, you get the CDF:
plot(pnorm(vals)~vals, type="l")
You would use pnorm when looking up the probability of getting a particular value (such as a test statistic that follows a normal distribution). You can get the probability of a random draw being to the right of the specified value instead with lower.tail:
pnorm(1.6)
pnorm(1.6, lower.tail=FALSE)
## [1] 0.9452007
## [1] 0.05479929
qnormqnorm is the opposite of pnorm – it tells you what value you need such that you have the supplied probability of a random draw being less than or equal to (to the left of) that value from the distribution. For example, if you want to know what value you need such that there’s a 95% chance that a random draw would be less than or equal to the number:
qnorm(0.95)
## [1] 1.644854
You need a value of 1.64. Random draws from a normal distribution with mean=0 and standard deviation=1 will be less than or equal to 1.64 95% of the time.
The value you supply to qnorm must be between 0 and 1. There’s no limit on the range of the return value.
Again, pnorm and qnorm are opposites:
qnorm(pnorm(2))
pnorm(qnorm(.8))
## [1] 2
## [1] 0.8
rnormThe last function is to generate random draws from the distribution. Tell it how many random draws you want:
rnorm(10)
## [1] 0.1041620 -0.6990046 -1.2360683 -0.9342026 0.1992118 0.4508575
## [7] -0.3115579 -0.2119198 -0.3646142 0.7728898
If you want to make sure we get the same sequence of random numbers each time (or as someone else), you can set the seed with any integer value:
set.seed(12345)
rnorm(10)
## [1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875 -1.8179560
## [7] 0.6300986 -0.2761841 -0.2841597 -0.9193220
This is useful for running simulations when you need to sample from a distribution.