On the PICs with two 32KHz timers (TMR1 and TMR3) my recommendation would be to use one of the timers 'free-running' (never write to it) and use the other to generate wakeup events. If you only have one timer available, getting reliable operation without cumulative errors is going to be very difficult. Different part revisions have different behaviors when it comes to writing timer 1, and so an approach that would work with one revision may be broken by a future revision. What would have been nice would have been if Microchip had allowed the timer to switch between synchronous and asynchronous modes without losing a count (simple to do: instead of using a multiplexer to switch between sync and async mode, add asynchronous set/clear to the synchronizing latch, and when in async mode use those to force the output to follow the input). None of their parts are, so far as I know, documented as doing any such thing. Consequently, I would expect that switching between synchronous and async mode may randomly gain or lose a count.
How accurate does this 30 day period have to be? 30 days is about 2.5 million seconds, with a crystal accuracy of 20 ppm you might have a one minute error after 1 month.
If a one minute error is unacceptable you could use a temperature controlled oscillator or a better crystal, like 5 ppm. If external aid is allowed you could use the signal of a DCF77 receiver (Europe, WWVB for North-America). These will give you a tick per second with atomic clock precision. All you have to do is count pulses. Note that DCF77 has only 59 pulses per minute, the omitted pulse indicates the start of a new minute. If you take this into account your 30 day period has elapsed after 2 548 800 pulses (59 \$\times\$ 60 \$\times\$ 24 \$\times\$ 30).
If the PIC has to do it all by itself that shouldn't be a problem either. Clock at 32768 Hz and program a timer to give an interrupt after 32768 clock cycles, that's one second. Count 2 592 000 interrupts (60 \$\times\$ 60 \$\times\$ 24 \$\times\$ 30).
In a month a lot can happen, and you probably want a battery backup in case there's a power outage. If you use the atomic clock signal you can also decode the time code after each minute pulse and compare date and time with your target time. In that case power outages don't even matter.
edit
You don't mention which PIC you're using, and without any practical experience with them I know there is a lot of them. I'll pick the PIC10F200 because as I understand it it's (one of the) least capable PICs, just having one 8-bit timer/counter.
The timer/counter can be clocked internally by the clock/4, and has a selectable prescaler. If you use a 32.768 kHz crystal for the clock, then clock/4 = 8192 Hz. Set the prescaler to \$\div\$32 and the 8-bit timer/counter will overflow once every second.
edit 2 (re Olin's comment)
Olin points out that the PIC10F200 only has an internal oscillator. That won't have crystal accuracy, but you can clock the timer from an external clock. Connect the output of the 32.768 kHz oscillator to the T0CKI input and set the prescaler to \$\div\$128. Then the 8-bit timer/counter will overflow once every second. As I understand it there's no overflow interrupt, so you'll have to detect this by comparing the timer value to 0x00.
edit 3 (re your comment on accuracy)
Allowing a two day error in one month is what we call very low accuracy. That 6.7%. The internal oscillator is calibrated to 1% at 25°C, 2% over the full range. So if you want you can use the internal oscillator, then you don't need the external 32.768kHz crystal. The oscillator is tuned to 4MHz, \$\div\$4 gives you 1MHz at the timer's prescaler. If you set the prescaler to \$\div\$64 then the 8-bit timer is clocked at 15625Hz. Count 61 overflows for every second, even if you ignore the remainder you still get 0.06% accuracy.
Best Answer
Windell's right, if you're talking about PICs, the datasheet (and hardware) already handle this for you.
If you're not using a PIC, the general method I use is to do this:
What this does is to read the upper byte followed by the lower byte, and to continue doing so until the hi byte doesn't change. This readily handles the case where the low byte of the 16-bit value overflows between reads. It assumes, of course, that there are no side effects from reading the TIMER_HI register multiple times. If your particular microprocessor doesn't allow that, it's time to throw it out and use one that isn't quite as braindead. :-)
This method ALSO assumes that your timer isn't changing so rapidly that you run the risk of overflowing the low 8 bits within a processor fetch cycle or two. If you're running a timer so fast (or a microprocessor so slow) then it's time to re-think your implementation.