The aim of statistical mechanics is the evaluation of the laws of classical thermodynamics for
macroscopic systems using the properties of its atomic particles.
In addition to the classical TD the statistical approach provides information on the nature of statistical
errors and variations of thermodynamic parameters.
macro state  micro state 
e.g. is characterized by $T,p,V,N,...$ 

Question: Which weight has a micro state in a macro state?
Principle: Maximize ”Degree of Uncertainty”
within the restrictions of the macro state
”Degree of uncertainty” = ”Thermodynamic entropy”
Event $i$ with probability ${p}_{i}$,
$$1\ge {p}_{i}\ge 0\phantom{\rule{2em}{0ex}},$$  (2.1) 
and
$$\sum _{i}{p}_{i}=1\phantom{\rule{2em}{0ex}}.$$  (2.2) 
The degree of uncertainty is defined by the contents of information of a statement.
For a function $I$ to be an
information $I\left({p}_{i}\right)$ of a statement
with a probability ${p}_{i}$
we need several properties:
This three properties are fulfilled using the function
$$I\left({p}_{i}\right)=kln\left({p}_{i}\right)\phantom{\rule{2em}{0ex}}.$$  (2.3) 
with $k$: (arbitrary) Measurement unit for information.
For our information function we find:
To calculate the average information we must multiply the information of a statement with its weight
(probability) of occurrence.
Thus we get
$${S}^{\prime}=\sum _{i}{p}_{i}I\left({p}_{i}\right)=k\sum _{i}{p}_{i}ln\left({p}_{i}\right)\phantom{\rule{2em}{0ex}}.$$  (2.4) 
For the equilibrium of a physical system the degree of uncertainty
${S}^{\prime}$ must
be maximized:
The mathematical effort is to find
$$S=max{S}^{\prime}$$  (2.5) 
within the restrictions of the macro state.
The entropy $S$
is therefor just the maximum average information.
Justification of this principle:
The results describe all experiments: Macro states are dominated by micro states with large
probabilities.
As we will learn in the next sections for classical particles in an isolated system the maximum of
${S}^{\prime}$ (cf. Eq. (2.4)) is found
if all states $i$ are occupied
with the same probability ${p}_{i}=p$,
i.e.
$$1=\sum _{i=1}^{W}{p}_{i}=Wp\phantom{\rule{2em}{0ex}}\text{resp.}\phantom{\rule{2em}{0ex}}p=\frac{1}{W}\phantom{\rule{2em}{0ex}}.$$  (2.6) 
Inserting this result into Eq. (2.4) we find the famous equation
$$S=max{S}^{\prime}=k\sum _{i=1}^{W}pln\left(p\right)=kln\left(\frac{1}{W}\right)=kln\left(W\right)$$  (2.7) 
The relation between ”average information” and ”degree of uncertainty” may be somewhat counter intuitive; so we will discuss it for an example:
Let us assume a set of classical particles. All particles shall occupy state 1, i.e. ${p}_{1}=1$ and ${p}_{i}=0$ for $i>1$.
Since
$$\underset{x\to 0}{lim}\left[xln\left(x\right)\right]=0\phantom{\rule{2em}{0ex}}\text{and}\phantom{\rule{2em}{0ex}}\left[1ln\left(1\right)\right]=0$$  (2.8) 
we find for the average information ${S}^{\prime}=0$. We know ”everything” about the occupation of the states $i$; therefor the degree of uncertainty is 0. Maximizing the information for one state therefor minimizes the average information for the ensemble. Since ${p}_{i}$ and $I\left({p}_{i}\right)$ are positive numbers in fact ${S}^{\prime}=0$ is the global minimum of ${S}^{\prime}$.
A second possible misunderstanding concerns the relation between $W$ and the particle number $N$. $W$ is incomparably larger than $N$. Let us discuss this for a (most) simple example with only two different possible states, e.g. left half, right half of a box. Each particle therefor has two possibilities for occupying states leading to $W={2}^{N}$ possible arrangements, i.e. microstates. Including this into Eq. (2.7) we get
$$S=kln\left({2}^{N}\right)=kNln\left(2\right)\phantom{\rule{2em}{0ex}}.$$  (2.9) 
Eq. (2.9) demonstrates that of course the entropy is an extensive
parameter; it scales with the size of the system. For a thermodynamic system
$N$ is already a large
number, but $W$
is much larger.
For typical thermodynamic systems each particle can occupy many different states, so the
factor of 2 in our above example must be replaced typically by numbers in the order of
$1{0}^{20}$ and
thus $W\approx 1{0}^{\left(20\phantom{\rule{0.3em}{0ex}}N\right)}$
which is indeed a huge number.
© J. Carstensen (Stat. Meth.)