Introduction of Sample Size
Population in statistics means the collection of a set of all the
elements required for a particular study. Sample is a subset of that
entire set chosen very carefully.
It
is a very important base of statistical analysis. Very often we come across
situations in which we may not be able to study the entire population. In that
cases we have to draw well representative samples from the population and infer
about the unknown characteristics of the population solely on the basis of that
sample.
Inference is one of the most important task of statistics and drawing a valid sample is the first and foremost part of inferring.
Definition of Sample Size Sample size in Statistics is defined as the number of elements that we
wish to include in a sample. Importance of Sample Size Selection of size of a sample is a
very crucial work. One of the pivotal activity of
designing a sample survey is to decide the proper sample size.
Estimation of parameters or testing of hypotheses vastly depends on the size of
sample. An important aspect: Influencing factors of Sample Size The choice of sample size
depends on the following factors: 1. Population size 2. Level of significance and power
of the test for which the sample is drawn. 3. Standard Deviation of the
Population 4. Underlying event rate of the
random experiment or population. ●
We see that population size plays a very important role in selection of
the no. of elements to be drawn in a sample. If we are estimating a
characteristic of a population of very large size, the sample should
consequently contain a fairly large no. of elements otherwise it will be not
a well representative one. Level
of significance and power of any statistical test is usually chosen following some convention of the
seriousness of committing the Type I error i.e. α. So as they are most
of the time predetermined the selection of size of the sample thus depends a
lot on these two quantities. ●
Standard deviation is used to measure
the variability within a population. From the value of the population SD we
will be able to understand how scattered the values in the population are. The
more heteroscedasticity, the more will be the size of the sample, lesser
variability will lead to a small sample. ●
The Underlying event rate is the no of times a particular event is being
observed in a performance of a random experiment. This highly affects the no.
of items to be included in the sample. Utility of Sample Size It is always and everywhere
suggested to use large samples. There are many utilities of using
large sample. They are as follows: 1. Large samples usually increase
the precision and provide us more reliable results because the more
elements from the population are included the more representative the sample
will become. 2. It reduces the amount of bias
in estimation and also the sampling error. 3. When the size of the sample is
sufficiently large, many useful approximations can be made like we can
use the normal approximation to non-normal populations, several laws of large
numbers can be applied to desired cases etc. 4. We will get consistent
estimators if we take large sample and it will yield efficient results
regarding inference. 5. For constructing confidence
intervals with a fixed confidence
coefficient sample size is a very useful factor. The larger the sample the
more reliable the confidence interval. Also the degrees of freedom of
different statistical tests are calculated by subtracting 1 from the sample
size. This can be considered as a utility. Determination of Sample size Most of the above stated influencing
factors play important role in determination of sample size. They are power of the test,
variability pattern of the population, population size etc. Even before
starting a survey we may look into the previous surveys and also get an idea in
determining the size of the sample. Usually the distribution required
for sample size determination is the distribution of the underlying
population from which the sample is to be drawn or has been drawn. For example, Let us consider hypothesis test
regarding mean of a univariate normal population. Let {X1,X2,.......X,n
} be a random sample taken from a Normal Population with unknown mean μ
and known variance σ2.
The null hypothesis: H0:
μ = 0 against a simple alternative H1: μ = μ*, where μ*
>0. Now, if we wish to (1) reject H0 with a
probability of at least 1-β when H1 is true (i.e. a power of the test), and (2) reject H0 with
probability α (i.e. Type I error ) when H0 is true, then we need the
following: If zα is the
upper α percentage point of the standard normal distribution, then P[x_bar >zασ√∎|H0 is true] = α Now, through careful manipulation,
this can be shown that
n ≥ (( zα+Φ-1(1-β))/ ( μ*/σ))2 where Φ(.)is the Cumulative Distribution
Function of the Standard Normal Distribution. Formula of Sample Size The conventional formula for calculating
sample size is: Sample Size = (Distribution of 50%)
/ ((Margin of Error% / Confidence Level Score)2) ● Finite Population Correction: True Sample = (Sample Size X Population) /
(Sample Size + Population – 1) Example of Sample size The following video links contain
some examples of Sample size determination: Conclusion Further we can say that no sample
is perfect and the maximum permissible limit of error should be
determined by the experimenter himself.