# stat 5102 lecture slides deck 2 the sample mean xn is a statistic, so is the sample variance sn 2,...

Post on 24-Jul-2020

0 views

Embed Size (px)

TRANSCRIPT

Stat 5102 Lecture Slides Deck 2

Charles J. Geyer

School of Statistics

University of Minnesota

1

Statistical Inference

Statistics is probability done backwards.

In probability theory we give you one probability model, also called a probability distribution. Your job is to say something about expectations, probabilities, quantiles, etc. for that distri- bution. In short, given a probability model, describe data from that model.

In theoretical statistics, we give you a statistical model, which is a family of probability distributions, and we give you some data assumed to have one of the distributions in the model. Your job is to say something about which distribution that is. In short, given a statistical model and data, infer which distribution in the model is the one for the data.

2

Statistical Models

A statistical model is a family of probability distributions.

A parametric statistical model is a family of probability distribu-

tions specified by a finite set of parameters. Examples: Ber(p),

N (µ, σ2), and the like.

A nonparametric statistical model is a family of probability dis-

tributions too big to be specified by a finite set of parameters.

Examples: all probability distributions on R, all continuous sym- metric probability distributions on R, all probability distributions on R having second moments, and the like.

3

Statistical Models and Submodels

If M is a statistical model, it is a family of probability distribu- tions.

A submodel of a statistical model M is a family of probability distributions that is a subset of M.

If M is parametric, then we often specify it by giving the PMF (if the distributions are discrete) or PDF (if the distributions are

continuous)

{ fθ : θ ∈ Θ }

where Θ is the parameter space of the model.

4

Statistical Models and Submodels (cont.)

We can have models and submodels for nonparametric families

too.

All probability distributions on R is a statistical model.

All continuous and symmetric probability distributions on R is a submodel of that.

All univariate normal distributions is a submodel of that.

The first two are nonparametric. The last is parametric.

5

Statistical Models and Submodels (cont.)

Submodels of parametric families are often specified by fixing

the values of some parameters.

All univariate normal distributions is a statistical model.

All univariate normal distributions with known variance is a sub-

model of that. Its only unknown parameter is the mean. Its

parameter space is R.

All univariate normal distributions with known mean is a sub-

model of that. Its only unknown parameter is the variance. Its

parameter space is (0,∞).

6

Statistical Models and Submodels (cont.)

Thus N (µ, σ2) does not, by itself, specify a statistical model. You must say what the parameter space is. Alternatively, you must

say which parameters are considered known and which unknown.

The parameter space is the set of all possible values of the un-

known parameter.

If there are several unknown parameters, we think of them as

components of the unknown parameter vector, the set of all

possible values of the unknown parameter vector is the parameter

space.

7

Parameters

The word “parameter” has two closely related meanings in statis-

tics.

• One of a finite set of variables that specifies a probability distribution within a family. Examples: p for Ber(p), and µ

and σ2 for N (µ, σ2).

• A numerical quantity that can be specified for all probability distributions in the family. Examples: mean, median, vari-

ance, upper quartile.

8

Parameters (cont.)

The first applies only to parametric statistical models. The pa-

rameters are the parameters of the model. The second applies

to nonparametric statistical models too.

Every distribution has a median. If it is not unique, take any

unique definition, say G(1/2), where G is the quantile function.

Not every distribution has a mean. But if the family in question

is all distributions with first moments, then every distribution in

the family has a mean.

9

Truth

The word “true” has a technical meaning in statistics. In the

phrase “true unknown parameter” or “true unknown distribu-

tion” it refers to the probability distribution of the data, which

is assumed (perhaps incorrectly) to be one of the distributions

in the statistical model under discussion.

10

Statistics

The word “statistic” (singular) has a technical meaning in statis-

tics (plural, meaning the subject).

A statistic is a function of data only, not parameters. Hence

a statistic can be calculated from the data for a problem, even

though the true parameter values are unknown.

The sample mean Xn is a statistic, so is the sample variance S2n,

and so is the sample median X̃n.

11

Statistics (cont.)

All scalar-valued statistics are random variables, but not all ran-

dom variables are statistics. Example: (Xn − µ)/(Sn/ √

n) is a

random variable but not a statistic, because it contains the pa-

rameter µ.

Statistics can also be random vectors. Example: (Xn, S2n) is a

two-dimensional random vector.

12

Estimates

A statistic X is an estimate of the parameter θ if we say so.

The term “estimate” does not indicate that X has any partic-

ular properties. It only indicates our intention to use X to say

something about the true unknown value of the parameter θ.

There can be many different estimates of a parameter θ. The

sample mean Xn is an obvious estimate of the population mean

µ. The sample median X̃n is a less obvious estimate of µ. The

sample standard deviation Sn is a silly estimate of µ. The con-

stant random variable X always equal to 42 is another a silly

estimate of µ.

13

Estimates (cont.)

We often indicate the connection between a statistic and the

parameter it estimates by putting a hat on the parameter. If θ

is a parameter, we denote the statistic θ̂ or θ̂n if we also want to

indicate the sample size.

The formal name for the symbol ˆ is “caret” but statisticians

always say “hat” and read θ̂ as “theta hat”.

14

Estimates (cont.)

The conventions are now getting a bit confusing.

Capital lightface roman letters like X, Y , Z denote statistics.

Sometimes they are decorated by bars, wiggles, and subscripts, like Xn and X̃n, but they are still statistics.

Parameters are denoted by greek letters like µ, σ, and θ, and, of course, any function of a parameter is a parameter, like σ2.

Exception: we and nearly everybody else use p for the parameter of the Ber(p), Bin(n, p), Geo(p), and NegBin(n, p) distributions, perhaps because the greek letter with the “p” sound is π and it is a frozen letter that always means the number 3.1415926535 . . ., so we can’t use that.

15

Estimates (cont.)

Whatever the reason for the exception, we do have the conven-

tion roman letters for statistics and greek letters for parameters

except for the exceptions.

Now we have a different convention. Greek letters with hats are

statistics not parameters.

θ is the parameter that the statistic θ̂n estimates.

µ is the parameter that the statistic µ̂n estimates.

16

Theories of Statistics

There is more than one way to do statistics. We will learn two,

called frequentist and Bayesian. There are other theories, but

we won’t touch them.

17

Frequentist Statistics

The frequentist theory of probability can only define probabil-

ity for an infinite sequence of IID random variables X1, X2, . . . .

It defines the probability Pr(Xi ∈ A), which is the same for all i because the Xi are identically distributed, as what the corre-

sponding expectation for the empirical distribution

Pn(A) = 1

n

n∑ i=1

IA(Xi)

converges to. We know

Pn(A) P−→ Pr(Xi ∈ A)

by the LLN. But the frequentist theory tries to turn this into a

definition rather than a theorem.

18

Frequentist Statistics (cont.)

The frequentist theory of probability has some appeal to philoso-

phers but no appeal to mathematicians. The attempt to make

a theorem so complicated we didn’t even prove it a fundamental

definition makes the frequentist theory so difficult that no one

uses it.

That’s why everyone uses the formalist theory: if we call it a

probability and it obeys the axioms for probability (5101, deck 2,

slides 2–4 and 131–140), then it is a probability.

19

Frequentist Statistics (cont.)

The frequentist theory of statistics is completely different from