Let's start by getting out of the way the basic expressions and ideas for convolution and correlation.
Convolution
For an input signal \$x(t)\$ going through a system \$h(t)\$, the output \$y(t)\$ is given by
$$y(t) = x(t) * h(t) = \int_{-\infty}^{\infty}x(t - \tau)h(\tau)d\tau = \int_{-\infty}^{\infty}x(\tau)h(t-\tau)d\tau$$
The engineering convention is usually represented by the most-right hand side. Of course, they are equivalent because convolution is commutative. However, my opinion is that the left hande side allows for a more intuitive explanation when consdering signals passing through a system:
Using the linear-time-invariant (LTI) concept, this describes that for every time shift \$\tau\$, the shifted version of the input signal \$x(t-\tau)\$ is weighted by some value given by the impulse response \$h(\tau)\$, which we then accumulate via the integral. It's important to see that the output of the convolution operation is a function of \$t\$. The variable \$\tau\$ is just a dummy variable used to calculate the integral and has no real meaning.
Cross-Correlation
When doing correlation, we want to answer the question "how alike are two signals, \$x(t)\$ and \$h(t)\$, if I shift one of them by some delay \$\tau\$ for all time delays of interest?". This gives us a function of \$\tau\$ given by
$$C(\tau) = \int_{-\infty}^{\infty}x(t)^*h(t + \tau)dt $$
See now how the variable of integration is \$t\$ whereas for convolution it was \$\tau\$. Here, the variable \$t\$ has no real meaning since we're only concerned about the cross-correlation function being a function of the time delay only, which is relative. Nevertheless, we see that the two expressions are extremely similar.
If we cross-correlate the same function, then the equation becomes
$$R(\tau) = \int_{-\infty}^{\infty}x(t)^*x(t + \tau)dt$$
This gives us the definition of the autocorrelation \$R(\tau)\$ of \$x(t)\$.
Matched Filter Theory
Matched filter theory has the result that the optimal filter, let's call it \$h(t)\$, that achieves the maximum signal-to-noise ratio (SNR) for a signal \$x(t)\$ after some delay \$t_0\$ is given by
$$h(t) = x(-t + t_0)^*$$
We see that the matched filter is the time-reversed complex conjugate of the input signal shifted by some delay \$t_0\$. This matched filter achieves the maximum SNR at \$t = t_0\$. In radar applications we're looking for the time delay of the target, so of course we don't know a priori what the delay will be to define the matched filter. It's possible to have multiple matched filters tuned for different \$t_0\$, but this becomes increasingly impractical to implement in a radar system.
A practical choice would be to set \$t_0 = 0\$ so that the new matched filter has a maximum SNR at \$t = 0\$. This way we need to only define one matched filter. We pay the price with potential SNR loss for other values of \$t\$. The new matched filter is then
$$h(t) = x(-t)^*$$
If we use this new \$h(t)\$ in the definition of the convolution integral we get
$$y(t)= x(t) * h(t) = \int_{-\infty}^{\infty}x(\tau)x(t + \tau)^*d\tau$$
If you compare this with \$R(\tau)\$, they are equivalent with the difference being that the conjugates are on the opposite functions thus changing the direction of the phasor rotations, which is usually of little consequence.
You can now see that computationally the convolution and autocorrelation functions are the same. The difference is the choice for \$h(t)\$, which is now the time-reversed complex conjugate of the signal you wish to receive.
Thinking graphically, since the signal which in our case is really the system \$h(t)\$ is already time-reversed, performing convolution flips the signal to its original orientation and you actually are now doing correlation.
There seems to be a fairly large number of papers on the subject of pulse compression in medical ultrasound according to Google.
The main reason to use pulse compression (ie using chirps) is to increase the average transmitted power to increase SNR but it does come with its own set of limitations, such as increasing the minimum range response and ambiguities in the presence of doppler.
It is used with radar because the available amplifiers that can provide high-quality output are limited in power (especially with semiconductor PA) but even TWTs can't provide the peak power that magnetrons do. Magnetrons however can't provide the signal quality needed for sophisticated beam-forming and don't integrate well with modern electronics.
If the transducers can provide adequate SNR without using compression, there is not much reason to use it.
Best Answer
Using all the area under a data bit signal (integrating) and its energy gives the maximum signal when compared to the noise error. Less performing methods use peak signal and then peak noise may apply but gives worse SNR and higher BER. Good demodulators may be harder than center peak sample but integrating the entire signal energy over the symbol gives higher SNR and lower BER.
When you get into ADC specs they use about 6 different ways to express error for quantifying ADC dynamic performance.
SINAD (signal-to-noise-and-distortion ratio),
ENOB (effective number of bits),
SNR (signal-to-noise ratio),
THD (total harmonic distortion),
THD + N (total harmonic distortion plus noise),
SFDR (spurious free dynamic range).