Oversampling means to sample at significantly more than the Nyquist Rate.
When using an ADC, the ADC generates quantisation noise because the continuous valued signal has to be translated to discrete output values. If you oversample then this noise power is "spread out" over a larger frequency range, i.e. it has a lower spectral density. So if you apply a digital low-pass filter after the ADC you can reduce the total noise. The reduction would be -3dB if you halved the bandwidth of the signal, which is equivalent to 1/2 bit improvement in your ADC. So oversampling by 16x and filtering with a perfect brick wall LPF would give you an improvement of 4*1/2 bit = 2bits.
Intuitively so you can see this works: say the ADC output is oversampled by 4 so for a specific sample you get 3,4,3,3 ; the average of this is 3.25 so you have improved the effective number of bits (ENoB) of your ADC reading.
Delta-Sigma ADCs shape the quantisation noise, pushing more of it out to higher frequencies so they can get 2 or even 3 bits per octave of oversampling. This diagram (from EETimes) illustrates the point:
On your point (2) you refer to "multiple cycle sampling" as "means to sample many many cycles (AC sampling)".
Your description is a little confusing, but you can use techniques that rely on sampling a repetitive signal over multiple cycles to "fill in" samples that fall in between the sample rate. Digital Sampling Oscilloscopes use this technique. Basically you sample your signal starting from time 0 and then sample again from time T/N (either on stored data or the next input signal cycle), where T is the sample period and N is the oversample rate. You then "fill in" the new data.
EDIT: Based on OP clarification:
"If we want to measure 50 Hz AC signal, we set ADC's sample rate to 1000 Hz, and sample 10 cycles, that is (1000 Hz / 50 Hz) x 10 = 200 samples. "
By sampling the same points from a periodic perspective you will get some noise reduction once you average the values as described in my answer, but the noise reduction will not match the theoretical reduction because the quantisation noise will be correlated to the sampling. Also, you're missing a trick if you do not recognise the point I made in (2). By choosing the sample frequency to be relatively prime with respect to the signal frequency you would not be sampling the "same points" each period. This gives you more data. If you then choose to average this you get less noise because the quantisation noise will be de-correlated.
If you look at Table 2, you'll see that the MD0 and MD1 pins control how the master utilizes the clock input (either 256, 384, or 512 * fs). Once you select one of these options, you can use your crystal frequency to tell you the sampling frequency.
Once you know the sampling frequency, you can use the resolution to give you the BCK rate. This particular part has 24-bit resolution. Each "frame" of audio consists of two channels, left and right. Each channel has a 24-bit value. Therefore, each sample is 24-bits of data. If, for example, you are using 64KHz sampling rate (by selecting 256fs using MD0 and MD1 pins), then you can get the BCK rate:
24 bits/sample * 64,000 samples/sec = 1,536,000 bits/sec
Therefore, your BCK frequency is 1.536 MHz. Your L/R clock will be 24 times slower than this, since it oscillates when the channel switches, which is every 24 bits.
Best Answer
The sample rate of most ADCs is completely independent from it's bit resolution.
The number of bits returned by an ADC is a function of it's hardware.
Basically, a 10 bit ADC is a 10 bit ADC. If you're not using some of the bits, it's still a 10 bit ADC.
As such, the maximum achievable sample rate with the ADC in the ATmega88 at the full 10-bit resolution is 15 kSPS. However, you can run the ADC faster, at the expense of increased noise.
From the ATmega88 docs:
So basically, as the speed at which you clock the ADC increases, the ADC noise increases. As such, if you run the ADC faster then the 15 kSPS, you still get 10 bit conversion results, however, the smaller bits are not valid, as they are swamped by noise and/or biases in the ADC.