The goal is to get ~1mm accuracy
Wavelength is determined by speed and frequency. Speed is approximately 340 m/s and therefore wavelength is 8.5mm.
So what you may ask. Any standing waves you might get will occur every 8.5mm and these could ruin you expected accuracy of 1mm.
You may then point out that you will use a pulse driven into the transducer. The 40kHz resonators I've come across are very "resonant" and generating a pulse may not be that easy.
I'm saying these things because I think you need to take them into account.
A narrower beam seems logical to me or else there could be several reflections from different objects coming back and obscuring your desired distance measurement. Also remember that narrow beam devices can still produce/be susceptible to side lobe interference.
As for your other questions I think you need to determine what you want to transmit before you think about signal processing.
I'm not a radar expert by any means, but I think I understand the general concepts well enough to try to answer your questions.
What specific requirements on the peak and average powers and the widths of radar pulses was chirped-radar designed to overcome? Were these purely 'internal' concerns regarding the electronics, or were there external goals and restrictions that were hard to meet otherwise?
The basic problem in radar is to get both adequate power for total range and good timing resolution for range resolution. It is hard to build high-power amplifiers for microwave frequencies. You want to have a lot of energy in each transmitted pulse, but you also want to keep the pulse short. The solution, as you have found in optics, is to stretch the pulse by chirping it, which allows the power amplifier to operate at a lower power for a longer time in order to get the same pulse energy.
Now, in radar, it doesn't matter if you don't compress the pulse again before feeding it to the antenna — the chirped pulse works just as well as the compressed pulse in terms of detecting objects.
In fact, you gain additional advantages when the reflections come back, because now you can amplify the chirped signal in the receiver (getting some of the same advantages as in the transmitter amplifier regarding peak-to-average power), and you can use a "matched filter" to compress the pulse just prior to detection, which has the additional advantage of rejecting a lot of potential interference sources as well. The narrow pulses coming out of the receiver filter give you the time resolution you need.
Is the name 'chirped pulse amplification' ever used in a radar context?
Generally not, because amplification isn't the only reason that chirping is used.
Is the optics-style CPA - stretch, amplify, compress, and then use the pulse - used at all in radar applications, or in broader electronics fields?
Not to my knowledge, but it would certainly be feasible.
Best Answer
Let's start by getting out of the way the basic expressions and ideas for convolution and correlation.
Convolution
For an input signal \$x(t)\$ going through a system \$h(t)\$, the output \$y(t)\$ is given by
$$y(t) = x(t) * h(t) = \int_{-\infty}^{\infty}x(t - \tau)h(\tau)d\tau = \int_{-\infty}^{\infty}x(\tau)h(t-\tau)d\tau$$
The engineering convention is usually represented by the most-right hand side. Of course, they are equivalent because convolution is commutative. However, my opinion is that the left hande side allows for a more intuitive explanation when consdering signals passing through a system:
Using the linear-time-invariant (LTI) concept, this describes that for every time shift \$\tau\$, the shifted version of the input signal \$x(t-\tau)\$ is weighted by some value given by the impulse response \$h(\tau)\$, which we then accumulate via the integral. It's important to see that the output of the convolution operation is a function of \$t\$. The variable \$\tau\$ is just a dummy variable used to calculate the integral and has no real meaning.
Cross-Correlation
When doing correlation, we want to answer the question "how alike are two signals, \$x(t)\$ and \$h(t)\$, if I shift one of them by some delay \$\tau\$ for all time delays of interest?". This gives us a function of \$\tau\$ given by
$$C(\tau) = \int_{-\infty}^{\infty}x(t)^*h(t + \tau)dt $$
See now how the variable of integration is \$t\$ whereas for convolution it was \$\tau\$. Here, the variable \$t\$ has no real meaning since we're only concerned about the cross-correlation function being a function of the time delay only, which is relative. Nevertheless, we see that the two expressions are extremely similar.
If we cross-correlate the same function, then the equation becomes
$$R(\tau) = \int_{-\infty}^{\infty}x(t)^*x(t + \tau)dt$$
This gives us the definition of the autocorrelation \$R(\tau)\$ of \$x(t)\$.
Matched Filter Theory
Matched filter theory has the result that the optimal filter, let's call it \$h(t)\$, that achieves the maximum signal-to-noise ratio (SNR) for a signal \$x(t)\$ after some delay \$t_0\$ is given by
$$h(t) = x(-t + t_0)^*$$
We see that the matched filter is the time-reversed complex conjugate of the input signal shifted by some delay \$t_0\$. This matched filter achieves the maximum SNR at \$t = t_0\$. In radar applications we're looking for the time delay of the target, so of course we don't know a priori what the delay will be to define the matched filter. It's possible to have multiple matched filters tuned for different \$t_0\$, but this becomes increasingly impractical to implement in a radar system.
A practical choice would be to set \$t_0 = 0\$ so that the new matched filter has a maximum SNR at \$t = 0\$. This way we need to only define one matched filter. We pay the price with potential SNR loss for other values of \$t\$. The new matched filter is then
$$h(t) = x(-t)^*$$
If we use this new \$h(t)\$ in the definition of the convolution integral we get
$$y(t)= x(t) * h(t) = \int_{-\infty}^{\infty}x(\tau)x(t + \tau)^*d\tau$$
If you compare this with \$R(\tau)\$, they are equivalent with the difference being that the conjugates are on the opposite functions thus changing the direction of the phasor rotations, which is usually of little consequence.
You can now see that computationally the convolution and autocorrelation functions are the same. The difference is the choice for \$h(t)\$, which is now the time-reversed complex conjugate of the signal you wish to receive.
Thinking graphically, since the signal which in our case is really the system \$h(t)\$ is already time-reversed, performing convolution flips the signal to its original orientation and you actually are now doing correlation.