I know exactly where you're coming from. The mistake you're making is assuming that the integral term drops to zero in steady state. This is not the case, and, indeed, is highly dependent on implementation details.
First off, understand the integral term in a mathematical PID is the integral from the start of time (or, well, the system) and not "error over last few cycles". Your implementation of PID or PI should not cause the older contributors of the integral term to drop in relative weightage of the I term. Let me explain. When writing the I term's code, the first instinct is to assume that the term will diverge, crossing the variable size and overflowing, and people attempt to fix this using moving averages, degrading the weightage of the older values, and all sorts of strange gimmicks. This should not happen in a properly implement PI or PID system. Instead, you should simply calculate I as I = I + Ki*Error.
The baseline level required to maintain the system, which you mention in your question, must be provided by the I term. Since you do not know how much this is apriori, you must allow the controller to discover this value for itself. That, in fact, is the job of the I term. The Ki value should be small enough for the controller to converge before it overflows. Some thought about how this works on paper will help. Try to visualize the process, not specific boundary conditions. One thing that you should keep in mind is that the I term is not constructed from absolute value of error. It includes both positive and negative values of error.
Further, imagine the condition where the controller is just reaching the steady state. You will realize that I is not necessarily zero at this point. Indeed, I is actually the baseline control force you mention in the question. If the state actually remains stable, and if error from here on in is continuously zero (or zero averaged over time), the value of I will remain as it is.
Now, when it comes to real implementations, the problem you will face is that even with a small I, by the time your system reaches the set point, I may well have saturated. The system will then have to err in the opposite direction for a long time to rid itself of the I term it accumulated while it was still reaching the set point. In fact, I've noticed that PI and PID work best for a single set point, and degrade when you have to keep changing that point by a large value. A big contributor to this is the fact that I has high inertia. Tuning the value of I is possible to keep rhe controller functional, but when the system itself responds to stimulus slowly (say you're heating a block of metal), tuning is often difficult. Instead, what can help is to activate I only when. The system is within a certain threshold of the set point. When you change the set point by something greater than this threshold, clear I and disable it (use only P/PD control) until you reach close to the new setpoint. By doing this, you add another tunable parameter (the threshold), but it makes setting both Ki and the threshold easier than setting Ki by itself to be optimal for both situations.
Implementing a current loop should not require field oriented control. You should be able to have a PI controller driven by a current feedback (e.g. see here: http://irtfweb.ifa.hawaii.edu/~tcs3/tcs3/0405_Servo_review/onaka_docs/2000_articles_control.pdf and here: http://www.drivetechinc.com/articles/curbldc3.pdf ). I would think you need two independent loops/feedback for a three phase system. What speed are you running your loop at? What are the stability issues you're seeing? You should tune your current loop with all the other loops disabled.
Since you mention "average" current and measuring on one phase I'm wondering what exactly you're doing. Closed loop current control is traditionally controlling the "instantaneous" current and is the highest frequency/bandwidth, innermost, loop in a closed loop motion control system.
Best Answer
The reason is simply that it is hard to find a process that can benefit from the properties of D-only controller. Let's review those properties.
Derivative control function can be written as U(t) = Kd x de(t) / dt. Which means the control function is proportional to change of an error in the given time.
If there is no change in a process (i.e. error does not change) then the control function is zero (i.e. "do nothing"). This could be useful, however this also means that D-controller alone cannot bring the system to its target (setpoint).
If the error changes the control function acts to reverse the change. Furthermore, the faster we approaching our target the stronger derivative action opposes the change. This "slow down" command is very useful. If your target is a wall and you are in a car then you probably want to slow down if you want to stop in time.
So, from the above you can see that D-controller provides damping effect when something changes and no effect whatsoever if nothing changes. You can find applications for this, but not many. Unfortunately, usability of derivative control further reduced by its other properties.
D-controller is very sensitive to noise in a system. For example, let's say your GPS sensor has 1 m resolution. In reality it means that even if you stay still the consecutive position readings can differ by ±1 m (this would be a "noise" in a system). Now, assuming you are moving 10 km/h and reading position every second (this would be your dt time constant), the distance to your target (an "error") changes about 2.8 m each second. As you can see, the noise amounts to more than 35% of the de(t) part of the equation. If you attempt to use D-controller to stabilize your movement in these conditions it might instead cause instability.
Derivative term is affected by setpoint change, something called "derivative kick". If we take previous example and change your destination by 100 m then the consecutive calculated distance (error) will change in the middle, e.g. 50, 47, 45, 142, 139, 137 etc. Now, the de(t) will look like this: -3, -2, 97, -3, -2. Note that huge jump in derivative, not only exceeding many times the magnitude of position change, but also having opposite sign.
If you can find a process that can be controlled with logic described above then you can use pure derivative controller. I doubt there are many industrial processes like that, however.