From a theoretical point of view, the maximum capacity of a channel affected by AWGN Noise (Additive Gaussian Gaussian Noise) is determined by the Shannon–Hartley theorem:
$$ C\leq log_2(1+\frac{S}{N_0B})$$
This means you can't put more than that information on a channel with a band (B= \$f_{MAX}-f_{MIN}\$) without making the communication unreliable.
Then we go on the modulations: every modulation has a particular spectrum efficiency and an erroneous bit probability. More levels you use (QPSK vs 16-QAM, p.e.), more bit for each symbol (= more efficiency) but more erroneous symbols (similar to the bit error rate, with a Gray code).
The spectrum is directly related to the shaping impulse used by the modulation. A very common one is the raised cosine impulse (cause it has no Inter-Symbol Interference), that decreases the efficiency of a factor \$ (1+ \alpha) \$
Again we go on codes, that could give a huge gain, especially using concatenated codes like Reed-Solomon + Viterbi, using Turbo codes or LDPC.
Every effort is done to approach the Shannon capacity limit.
Bandwidth is as you say, the difference between the upper frequency and lower frequency on a spectrum, usually at the 3dB points where the curve on the graph is 3dB lower than the maximum value.
In digital communications, whilst the data is digital (comprising 1's and 0's), often the digital data is not transmitted directly, but a carrier signal modulated in response to the 1's and 0's. The actual signal transmitted is not digital, but analogue and where either the phase, amplitude or frequency is modulated based on the value of the data bit value.
As an analogue signal is being transmitted then the bandwidth concept applies, taking the analogue time based waveform and representing it in the frequency domain as a spectrum. And the question becomes "how much bandwidth does the digital communications channel occupy?"
If you take the radio spectrum for example, FM broadcast radio band, from 88 to 108MHz, there exists in that spectrum multiple radio stations each operating on different carrier frequencies, a technique known as frequency domain multiplexing. If you then transmit a modulated digital data stream, that data stream will occupy a band of space from left to right on the frequency spectrum, with a minimum frequency and and upper frequency. And you can transmit multiple digital channels of data, with each one centred on a different frequency, and you don't want the bands overlapping, as that will result in interference and corruption of the data when both digital channels transmit at the same time. So when undertaking the design and implementation of digital communication systems you often need to know what the bandwidth is to prevent overlap of the spectrum for each digital comms channel.
Generally speaking, the higher the data rate, the wider that band of frequency space the data communications will use. Higher speed data = greater bandwidth when modulated on to the transmission medium. The two are closely linked and evidently some vendors state bandwidth when it's the data rate or speed they're actually quoting.
There is no simple relationship between data rate expressed in bits/kilo/mega bits per second and the amount of spectrum used expressed in Hertz, it very much depends on the modulation scheme used.
In terms of the transmission of data between components in a computer system, most of the time, there is no modulation of signal by data (except for ethernet, bluetooth, wifi), the transmission of data between a graphics card and motherboard is a simple digital transmission of data achieved by binary data (1's and 0's) sent down tracks on a circuit board, (a databus) and there is no modulation of carriers as there is in radio communication, as it isn't needed, and the word bandwidth isn't strictly the right word to use, but "throughput" is.
The words are often used interchangeably these days.
Best Answer
The usefulness of the definitions (there are many, but let's not open a new can of worms) of bandwidth can be understood once you learn how a signal traveling through some medium or system is altered in the process.
You'll learn that not only a signal has its bandwidth, but also a system or a medium has its own bandwidth (or pass-band). The bandwidth of a system (or medium) is (roughly) the range of frequencies that the system lets pass without modification. Frequencies outside that bandwidth are altered in some way.
In particular, you'll learn that linear time-invariant (LTI) systems can be characterized in the frequency domain by a complex function called frequency response H(f). This function is important because it tells you how a sinusoidal signal is altered by the system, since a LTI system can modify both the amplitude and the phase of a sinusoidal signal.
Since the system is linear and any signal can be represented by a superposition of a (possibly infinite) number of sinusoidal signals, knowing H(f) let you compute exactly how any signal is modified by the system.
How is this babble about H(f) related to the bandwidth of the system? Because the modulus of H(f), i.e. |H(f)| gives you the information to determine the bandwidth of the system, hence the range of frequencies that can pass through the system unaltered.
So, a system (or medium) with a 2Hz bandwidth can carry any signal with smaller bandwidth, provided the signal lies entirely in the system's pass-band. If the pass-bands don't overlap, or overlap only partially, either the signal won't pass (in the former case) or will be distorted heavily.
Moreover, it can be shown that a signal's bandwidth value is related, in the time domain, to the rate of variation of a signal. In other words, a signal with a 10Hz bandwidth will vary much more slowly than a 5kHz signal.
Disclaimer: the question you made is really broad, so I had to simplify many of the subjects I touched upon and I cut some corners. Don't expect extreme rigor in the above, since it would require at least a x10 lengthier text to put all these things in a more formal framework. You can find more detail in this Wikipedia article on bandwidth.
Moeover, keep in mind that although the concept is the same, it is explained in slightly different ways depending on the specific branch of EE you are studying. For example signal theory versus control theory versus analog filter design versus network engineering: four fields where the concept is employed, but which sometimes use slightly different approaches.