It’s well known that lower bound for sorting problem (in general case) is \(\Omega(n \log n)\). The proof I was taught is somewhat involved and is based on paths in “decision” trees. Recently I’ve discovered an information-theoretic approach (or reformulation) to that proof.

First, let’s state the problem: given a set of some objects with an ordering produce elements of that set in that order. For now it’s completely irrelevant what are these objects, so we can assume them to be just numbers from 1 to n, or some permutation. Thus we’ll be interested in sorting permutations.

We’re given an ordering via a comparison function. It tells us if one object preceds (or is smaller) another outputing True or False. Thus each invocation of the comparator gives us 1 bit of information.

Next question is how many bits we need to represent any permutation. It’s just a binary logarithm of number of all possible permutations of \(n\) elements: \(\log_2 n!\). Then we notice that

\[ \log_2 n! = \sum_{k=1}^n \log_2 k \ge \sum_{k=n/2}^{n} \log_2 k \ge \frac{n}{2} \log_2 \frac{n}{2} \]

\[ \log_2 n! = \sum_{k=1}^n \log_2 k \le n \log_2 n \]

(Or just use the Stirling’s approximation formula). Hence \(\log_2 n! = \Theta(n \log n)\)

So what, you may ask. The key point of proof is that sorting is essentially a search for a correct permutation of the input one. Since one needs \(\log_2 n!\) bits to represent any permutation, we **need to get that many bits** of information somehow. Now let’s get back to our comparison function. As we’ve figured out already it’s able to give us only one bit of information per invocation. That implies that we need to call it \(\log n! = \Theta(n \log n)\) times. And that’s exactly the lower-bound for sorting complexity. Q.E.D.

Non-\(n \log n\) sorting algorithms like RadixSort are possible because they use much more bits of information, taking advantage of numbers’ structure.