First: Do you have a 20 MHz crystal or crystal oscillator? Those are two different things. A crystal oscillator will all on its own generate a 20 MHz clock signal for the PIC and you use the external oscillator option with it.
On the other hand, the quartz crystal is an external part of the internal oscillator and internal components together with the crystal and load capacitors make a complete oscillator. In such configuration, you use various crystal modes. Also take a look at figure 2.2 on page 27 of the datasheet.
Now to set up this part correctly, you need to understand a few things, so I'll quote the datasheet:
When the PIC18F4550 is used for USB connectivity, it must have either
a 6 MHz or 48 MHz clock for USB operation, depending on whether
Low-Speed or Full-Speed mode is being used.
You need to combine things so that the USB clock is 48 MHz or 6 MHz and then you need to set-up the microcontroller operating frequency so that it works at suitable speed. Those two clocks may be different.
On page 26 of the datasheet, you have a nice diagram which you should take time to analyze. The USB PLL input expects 4 MHz frequency which it will use to generate the 96 MHz from which it will derive the operating frequency for USB and the microcontroller.
In your screenshot, the 20 MHz are divided by 5 to get the 4 MHz needed for USB PLL which then raises that to 96 MHz, as seen in the PLL prescaler section.
Then you have the system clock postscaler section. It is currently set to use the 96 MHz created by USB PLL and divided by 2 as the main system clock. You also have other options to set the main systme clock. I can't remember exactly what they are and I've just formatted my HDD, so mikroC isn't installed yet. They should offer you to derive the system clock from an internal oscillator or directly from the clock used to generate the 4 MHz for the USB PLL or as it is shown in the screenshot from the 96 MHz generated by the USB PLL.
The point here is that you can independently select the main clock and the USB clock. For example, if you have a 20 MHz oscillator, you could run the PIC main clock at those 20 MHz and at the same time run the USB clock at needed 48 MHz.
Next you have the oscillator selection part. For real crystal oscillators, you should use EC options and connect the output of the oscillator to the OSC1/CKLI pin (in your case pin 9). You can then use the 20 MHz oscillator to drive the PIC.
In case you're using a crystal, you need to use the crystal options. They are XT, for low frequency crystals, up to 4 MHz, and HS for high frequency crystals up to 20 MHz, if I remember correctly.
As for which crystal is better, well that depends on a lot of things such as which exact crystal you're using, its characteristics, characteristics of the PLL used in the PIC and so on.
Usually low frequency crystals drift less over time and produce cleaner signal while high frequency crystals often give as their output a harmonic of some lower frequency and the signal is usually weaker. I myself would use the 4 MHz crystal here.
Also I forgot the last part of your question: In the "Oscillator frequency" field, you should enter the effective operating frequency of the PIC, that is to say the frequency the "primary clock" on figure 2.1 on page 25 of the datasheet sees. In your particular case, that would be 48 MHz.
So to sum this up: In the 20 MHz crystal case, you should first set the "oscillator selection" to HSPLL. That will give 20 MHz at the input of "primary oscillator" in the above-mentioned figure 2.1. Next, you should set the PLL prescaler to divide by 5, so you get 4 MHz which are multiplied by 24 to get the 96 MHz for USB. Next set the "USB clock selection" to 96 MHz divided by 2 and set the "System clock postscaler selection" to 96 divided by two. Finally, set the Oscillator frequency to 48 MHz and you're done with this part.
For the 4 MHz crystal, you should first set HSPLL. Set the PLL prescaler to divide by 1 and then set the "USB clock selection" to 96 MHz divided by 2 and set the "System clock postscaler selection" to 96 divided by two and set the Oscillator frequency to 48 MHz and that's it.
Best Answer
So there are a few answers here, I will start from easiest to explain simply (for me anyway) to more fuzzy answers.
First, The wave form you posted is not "squared" due to a couple of factors, namely parasitic electronic effects that are affecting the pin that you are probing such as capacitance, improper grounding to your scope, series resistance on the trace, parasitic inductance and the like. Basically all of these parasitic effects are limiting how fast the pin can change state. A true square wave is impossible in practice since it would require a circuit to change state instantaneously (if that were the case our computers would be running MUCH faster than they do). There is also an added "gotchya" with the STM32 microcontrollers and others where the output drive strength can be changed. That is, you can, in firmware, dictate how much current you are shoving down the pin into the rest of the circuit therefore changing the speed at which it can change state. The code you posted shows no mention of this, so most likely the micro is running at the default 20MHz, which is supported by your scope screenshot (rise time of ~50ns).
Second, You are correct in assuming that the discrepancy in output frequency is due to the internal RC oscillator. From the STM32L152C datasheet, page 75, you can see that the internal RC shows a worst case accuracy of -4% to +3% across the operating temperature range, and +/- 1% at room temperature. So (4.202 - 4.194) / 4.194 * 100 = 0.19% difference, which is within spec.
Lastly, The jitter is most likely due to the piss-porr HAL libraries (I know this is not a great explanation, but I have run into so many problems with this b/s since they stopped supporting the SPL library that we have been writing our own HAL in house for the various STM32's). If you look at the HAL code, more often than not they are passing around gigantic and bloated data structures that represent the automatically configured HAL code with every function call. What could normally be done by sending 4 bytes to the appropriate register (a few if not 1 cpu cycles) compiles down to many, many MANY lines of assembly. Combine that with a few ISR's kickin it here and there there is going to be some jitter. The main bottle neck here is how long it takes the processor to grab the next few instructions from flash, its actually a wacky involved process when the processor is doing stuff with pipelined instruction fetches. You can read the Manual for the processor here Section 3. I would try loading your program to the dev board to execute from RAM and see if the problem goes away, if it doesn't then I am being an idiot and completely missing something simple. But my recommendation would be to ditch the HAL libraries if you can, unfortunately ST hasn't released their SPL for the L1 and L4 series of parts, so you may have to do some digging.
Finally... :) Is this even an issue? Can your computer receive the char's correctly or can you decode them with your scope?