Because you are usually just interested in the worst case when analyzing the performance. Thus, knowing the upper bound is sufficient.
When it runs faster than expected for a given input - that is ok, it's not the critical point. It's mostly negligible information.
Some algorithms, as @Peter Taylor noted, don't have a tight bound at all. See quicksort for example which is O(n^2) and Omega(n).
Moreover, tight bounds are often more difficult to compute.
See also:
Your quote is saying that if the range of integers k is linearly proportional to the total number of elements in the collection n, then a counting sort will have a best-case and a worst-case time complexity of O(n), making it a very efficient sort in such a case.
Big Oh (and others in the Bachman-Landau family of notations) are just simplifications of the precise complexity of the function in any given case, used to illustrate the increasing value of the function as N grows large. In programming, Big Oh is usually used in the context of time complexity; for n values, a function f(n) will execute in a time on the order of g(n), and thus we say it has a Big Oh complexity of O(g(n)). However the mathematical construct of Big Oh isn't so limited. In this particular case, it is being used to refer to the general relationship between k and N as numbers.
Simply put, when k (the range of the values of an n-element collection) increases as n does on the order of O(n), then the counting sort will be efficient. This will be true if, say, the list contained all multiples of 4 between 1 and x (in which case k~=x, and n=x/4, meaning that k ~= 4n = O(n)). The condition k=O(n) would be false for, say, the set of all perfect squares from 1 to x2 (in which case k=x2 but n=x, so k = n2 = O(n2)). In that case, k increases on the square (O(n2)) when N increases linearly (O(n)), so the counting sort would execute in the more complex time (which would be O(n2)), which would perform worse than some other sort implementation that ran in Theta(nlogn).
Best Answer
Those notations are meant to denote the asymptotic growth. Constants do not grow and thus it's pretty equal which constant you choose. However, there's a convention that you choose 1 to indicate no growth.
I assume that this is due to the fact that you want to simplify the mathematical terms in question. When you've got a constant factor just divide by it and all that's left of it is 1. This makes comparisons easier.
Example:
O(34 * n^2) = O(1 * n^2) = O(n^2)
and
O(2567.2343 * n^2 / 5) = O(n^2)
See what I mean? As these mathematical terms get more and more complicated, you don't want to have noisy constants when they're not relevant for the information you're interested in. Why should I write O(2342.4534675767) when it can be easier expressed with O(1), which communicates the facts of the case unambiguously.
Further, the wikipedia article about time complexity also implies it's a convention: