Note that there are two units involved here:
- Energy (measured in Watt-hours) - there is only a certain amount of energy stored in the batteries - and
- Power (consumption) (measured in Watts) - the amount of energy 'used up' in a certain time-interval
The total power requirement of a CPU is basically influenced by the following parts:
- Supply voltage. The higher the supply voltage is the bigger are the currents flowing through the transistors inside the chip.
- 'Operational' clock frequency of the CPU and/or its sub-modules. What I mean by this is that, naturally, the clock frequency will only affect consumption if and when the clock is actually running and the respective sub-module of the CPU is clocked and running too.
Basically, each 'switch' (transistor) inside a CPU will consume a certain, constant quantum of energy each time it switches (for a given supply voltage).
The higher the clock frequency is, the more "switching" can be done per time-unit. More "switching" also means that more calculations can be done per time-unit. Therefore, performing a given calculation will always consume the same amount of energy, irrespective of the clock's frequency: Double frequency = double power consumption for half the time.
So the question actually is: What determines the power consumption when no calculation is going on at a given moment?
Most CPUs have power-saving features which allow them to stop operation (of sub-modules) completely for some time. The CPU core itself, which performs instruction fetching, decoding, &c. may be halted completely when there's nothing to do, basically reducing the consumption of this part by some orders of magnitude.
Theoretically, the consumption of a CPU can be reduced to almost zero during times when it just sits there waiting for "something" to do. This is primarily the case when the CPU has to wait for data from other components, like the SD card, which cannot deliver the data as fast as the CPU could process them.
Here, the software, operating system and applications, come into play.
The OS should (and usually does) "pause" the CPU when there is currently no task ready to do some calculation. How consistently this is done is subject to some inevitable compromizes required when building an OS.
Besides, the applications (tasks) need to be 'cooperative' w.r.t. the OS to allow it to activate power-save mode. "Busy waiting" in a loop in an application, while basically doing nothing useful, will use significant amounts of power (and energy), and the OS has no means to detect that actually it could just aswell pause the CPU.
This said, it should be clear that the limits of the clock frequency which one may configure are never guaranteed to have any effect in one way or the other. - Just like a cache memory is not guaranteed to speed things up. Yet, in practice, it usually does.
Therefore, in theory, the total energy consumption (for a given task) is un-influenced by the frequency. The peak power requirement however is linearily dependant on the frequency. (Remember that each "switching" takes the same amount of energy).
In practice, due in part to compromizes in the software, your mileage may vary. Especially so when dynamic, demand-based clock-scaling is performed, as is usually done by the OS on the "RPi".
So, in theory the relation between power-demand and frequency is purely linear over all frequencies. In practice, one can reasonably expect to save energy with lower clock rates, but it's not guaranteed and in a complex system of hard- and software will likely not even be close to linear.
Best Answer
I can help to address the first part of your question. As an abstraction of a single switching device inside a processor, imagine a MOSFET connected to ground with a load resistor to the power supply (in a real CMOS part there wouldn't be a load resistor, but another transistor, but this distinction is not important for the analysis). Connected from the junction of the resistor and the transistor is a capacitor, representing all the input capacitances of the transistor that the transistor under discussion is driving. When the first transistor switches off, this capacitance will be charged up through the load resistor. When the first transistor switches back on, the charge stored on the capacitor will be discharged through the first transistor.
It can be shown that when a capacitor is charged through a load resistor, 1/2 of the energy used in charging the capacitor is lost in the resistor, for a total energy dissipation of \$\frac{1}{2}C{V_s}^2\$, where \$V_s\$ is the supply voltage. When the switch then turns on, assuming the resistance of the switch is much less than the load resistance, that same energy will be dissipated in the switch, for a total energy of \$C{V_s}^2\$. Dividing this by the switching period gives you the dynamic power dissipation of the switch/capacitance combo, \$C{V_s}^2f\$. Shrinking the die reduces the junction capacitances of the MOSFETS, so if you know the supply voltage, switching frequency, number of transistors and the approximate junction capacitances of a certain process, you could calculate a ballpark figure of what kind of power savings a process shrink entails, all other things being equal.