The reason your unsigned() cast doesn't work is because the result takes the exact bits in the input and just 'calls it unsigned'. ie, 0xFFFC signed is -4, but when cast to unsigned, it's 65532.
Instead, since your incoming signal is always 16 bit signed, just check to see if it is negative (as simple as checking the leading bit). Then, instead of casting to unsigned, negate it (which will have the desired effect) and then multiply by your scale factor, negating again afterwards. I can't think at the moment how to do this in a single step, but this should work for you if you can spare two extra multiplies.
One reason such a function doesn't belong in numeric_std is that, in practice, you may need more control of the details...
Addition is quite straightforward and most tools and technologies implement it well.
But multiplication is difficult enough that FPGA manufacturers devote chunks of FPGA area to providing 18-bit signed multipliers with associated logic. Synthesis tools will use these, but perhaps not optimally. If you need a 32-bit multiply, you might get a badly pipelined (slow!) multiplication that you can improve on by splitting the multiply into 4 and summing partial products yourself. (synth tools are improving, so this may no longer be true).
Or you may need to round, or dither, instead of truncating the product.
Or one input is a constant, so that KCM (constant coefficient multipliers) unrolled in hardware yields a more efficient solution.
So multiplication is still not a one-size-fits-all operation, and it certainly wasn't when numeric_std was created. As Martin Thompson says, look at the newer fixed-point library for what is possible now.
As for performing your own fixed point scaling and truncation; I find it easier to reason starting at the MSB and working down...
Given your 8-bit Q2.5 format (signed!) numbers,
s_mm.nnnnn * s_mm.nnnnn = ss_mmmm.nn_nnnn_nnnn
just remember that multiplying the sign bits effectively gives you 2 identical sign bits EXCEPT for the case -4.0*-4.0 (more generally, both inputs -2**m). If you can guarantee this doesn't happen (e.g. you control the filter coefficients) you can simplify handling this case...
Best Answer
This isn't particularly different from multiplying two fixed point numbers with the same format. You need to do a multiplication which preserves the most significant bits, then shift the binary point back to the desired output format.
So, do a 16x16 => 32 bit multiplication. The binary point is then at position 13+6 = 19, so you have a Q13.19 format number.
Assuming you want Q10.6 format output, you shift right by 13, optionally check for overflow, then take the lower 16 bits.