A linear circuit obeys the principles of proportionality and superposition.
A linear circuit is an ideal approximation of a real-world physical circuit. All physical circuits have some non-linearity. Linear analysis is useful when the real-world non-linearity is small enough that you get answers that are close enough to your required accuracy.
Ideal resistors, capacitors, and inductors, both energized and not, are linear. Ideal components are those that conform to their simple mathematical models exactly, and in this case that includes superposition.
When signals are large, circuits tend to show more non-linearity. Linear models can be used for transistors and diode over a small range with reasonable accuracy. For example, a voltage of 1mV is often no problem for a good linear range for a transistor. Linear models that take advantage of this are called 'small-signal' models.
To analyze large signals, non-linear transistor and diode models are necessary. For these models, superposition does not apply to the transistor or diode. Linearity still applies to the surrounding linear elements such as resistors.
Resistors are often linear of a huge range of operation. Capacitors can also be linear over a wide range, if they are made from good materials. Air-core inductors have a large linear range, while iron-core inductors have limited range.
One thing that can be a bit confusing is that ideal circuit elements do not exist in the real world. There are always some deviations from the ideal linear equations. These deviations are required by the physics that governs their operation. If you could isolate and connect ideal components, it would be possible to build devices that would not obey all the laws of physics. Therefore these ideal components do not exist in isolation.
You are forgetting that you can overdrive a base so that it goes into saturation. This is desirable to get the transistor fully turned on.
Your calculation for I7 is correct:
$$ I_7 = \frac {V}{R_3} = \frac {5 - 0.6}{25k} = 0.176~mA $$
You then calculate the maximum collector current using \$ I_8 = \beta I_7 \$
Lets say \$ \beta \$ = 100, then \$ I_8 = 0.176 * 100 = 17.6~mA \$.
Now look at R1. The maximum current through it is given by
$$ I_1 = \frac {V}{R_1} = \frac {5 - 0.6}{1k} = 4.4~mA $$
Clearly the maximum possible value for \$ I_8 \$ is this 4.4 mA + 0.176 mA from \$ I_7 \$.
The calculations indicate that we could reduce the base current into Q1. In practice we want to ensure that the drive is adequate to cover variations in \$ \beta \$ due to production spread or even transistor substitution. Drive it hard and make sure it's fully 'on' is the normal approach.
Best Answer
I've been pouring my brains out and eventually I've found a nice mathematical approach to prove this and decided to answer my own question. In such a circuit, solving for any voltage/current across/through any component (I'll call that \$f\$) would always lead you to construct a differential equation that is always linear, with constant coefficients (due to linear properties of passive components) and non-homogeneous (due to the sinusoidal input). Such a differential equation will always take this form: $$a\frac{d^nf}{dt^n}+b\frac{d^{n-1}f}{dt^{n-1}}+...+j\frac{df}{dt}+kf=C\sin{(\omega t+\theta)}$$ where \$a...k\$ are constants (combinations of inductance, resistance, etc.), \$n\$ is the order of the differential equation (which reflects the number of energy storage elements in the circuit), and \$C\sin{(\omega t+\theta)}\$ is a generalized sinusoidal function that describes the input. A general solution to this differential equation will always take this form: $$f=\text{(general homogeneous solution)}+\text{(particular solution)}$$ where the particular solution \$=A\sin{(\omega t+\theta)}+B\cos{(\omega t+\theta)}\$ which is a sinusoidal function of the same frequency! Now, in AC circuit analysis, we are always looking at the circuit in steady state, when the homogeneous solution approaches zero (which inevitably happens because of resistances in the circuit).