Is there any other clever solution?
If you are limited to 32 registers then you can only have 32 bits of state in your random numbers. Therefore the only thing you can do to expand it out is use some combinational logic to produce the 448 bits that you need.
Generating the 32 bits is (as you surmise) probably most efficiently done using an LFSR.
I would suggest that using some kind of hash function of those bits would be your best bet to getting the least pattern passing through to the 448 bits output, but that might take up too much logic in your destination. It's not clear to me what constraints you have there.
Okay, I've found a way to do exactly what I asked in the question, without involving the WDT.
It's a bit of a hack, to say the least, and sacrifices two pins (EDIT: only one pin is sacrificed) (but requires no external components, so if you have two unused pins it will be "for free").
The idea is to use the PSMC (Programmable Switch Mode Control) of the PIC16F1783.
This can be clocked from the INTOSC, by way of connecting the HFINTOSC to the 4x PLL, yielding a 64MHz frequency to clock the PWM output.
The PWM output can then be routed to another pin on the PCB.
Now, using the external crystal to clock the CPU, the PWM signal can be read in a tight loop. Since the two clocks are not synchronized, there should jitter between the two clocks, and the PWM input should contain some unpredictable jitter.
On way to use this jitter to build random values could be to have a 16 byte checksum array. The TMR1 could then be configured to run as fast as possible, and every time the PWM signal changes, the TMR1 value could be written to the start of the array. Then the MD5 sum of of the array could be taken, and written back to the same array.
By iterating this procedure a few thousand times, a 16 byte MD5 hash could be built, which should be entirely random.
However, a MD5 checksumming algorithm barely fits on the PIC16F1783, so this is more of use on slightly more powerful chips. The same idea could be used though, by simply incrementing a byte by the TMR1 value, and let it wrap around a few thousand times before considering it "random enough".
The only way this could fail is if the internal 500kHz source would somehow sync up to the crystal oscillator. I have no idea if that is possible.
Update:
The following code seems to work in practice in my lab:
#define RAND_SIZE 255
unsigned char random_data[RAND_SIZE];
void make_random_data()
{
//Used output pin: RC3. Make sure it is unconnected!
PSMC1PRH=0x0; //choose a very short period period
PSMC1PRL=0x3;
PSMC1DCH=0x00; //set 50% duty
PSMC1DCL=0x2;
PSMC1PHH=0;
PSMC1PHL=0;
PSMC1CONbits.PSMC1EN=1;
PSMC1CLK=1; //64mhz
PSMC1STR0bits.P1STRD=1;
PSMC1OENbits.P1OED=1;
PSMC1PRSbits.P1PRST=1;
PSMC1PHSbits.P1PHST=1;
PSMC1DCSbits.P1DCST=1;
// Zero the id, to make sure any previous value does not influence result.
for(int i=0;i<RAND_SIZE;++i)
random_data[i]=0;
// Generate the new random id:
for(int pass=0;pass<1000;++pass)
{
for(int j=0;j<RAND_SIZE;++j)
{
unsigned char bitmask=1;
for(signed char bitnum=7;bitnum>=0;--bitnum)
{
if (PORTCbits.RC3)
random_data[j] ^= bitmask;
bitmask<<=1;
}
}
}
}
Update 2:
Only one pin is sacrificed, since it is possible to read the output PWM pin, you don't need to route it to an input and read that.
Best Answer
There are a lot of circuits around advertised as "digital dice". The usual approach is to use a counter which runs at high speed, for a time determined by how long the user presses the button. Since a human can't press a button repeatably for the same number of microseconds, this produces acceptably random numbers.
The purist approach is to use a genuine noise source such as a silicon junction (diodes, especially Zeners), and amplify it. In a suitably shielded case this will produce "real" noise which you can use for cryptographic purposes.