Electronic – EEPROM read/write errors on dsPIC

eeprommicrochipmicrocontrollerpic

I'm running a Microchip dsPIC30F6012a. I have this chip on several PCBs, all running the same software, and observe the same problem on all of them. This implies a systemic problem, not a one-off production issue. The problem is also reproducible, implying I should be able to kill it if I know where to look. But I'm still having surprising difficulty debugging the application.

The board under test accepts 24V, which gets stepped down to 5V through a V7805. The chip runs on its internal oscillator, with a 16x PLL, giving an operation speed of ~29.5 MIPS. The relevant code on this board is essentially very simple: wake up, read data from EEPROM, then enter an infinite loop. Interrupt every millisecond, observe some environmental data, and write an updated value to EEPROM. There's other stuff going on, but the problem still occurs even if the unrelated code is commented out, so I can be reasonably certain it's not relevant to the problem at hand.

In general use, 95% of the time the board wakes up with the correct value in memory, and goes on about its business. The other 5% of the time, though, it wakes up with an incorrect value. Specifically, it wakes up with a bit-flipped version of the data it's supposed to have. It's a four-byte unsigned long that I'm watching, and either the upper or lower word of the long can get flipped. For example, 10 becomes 2^16-10, which later becomes 2^32-10.
I can reproduce the glitch by manually cycling power several dozen times, but that's not very consistent, and my switch finger gets worn out.

In order to reproduce the problem in a controlled fashion, I built a second board which drives the 24V supply to the board under test. (Another dsPIC driving a darlington optocoupler.) The tester board turns the 24V off for 1.5 seconds (long enough for the 5V rail to drop to essentially 0 and stay there for one second), then turns the 24V on for some configurable length of time. With an on-time of approximately 520 mS, I can reproduce this EEPROM glitch within five power cycles, every time.

The 5V rail is behaving reasonably. It settles at 5V within 1 mS of turn-on, with perhaps .4V of overshoot, assuming I can trust my scope. At turn-off it decays to 0V exponentially, reaching 1V within 50 mS. I have no build warnings that seem relevant, just unused variables and missing newlines at the end of files.

I've tried several things:

  • Enabling/disabling the MCLR
  • Enabling/disabling the WDT
  • Enabling/disabling code protection
  • Enabling/disabling/changing brownout detect voltage
  • Enabling/disabling/changing the power-on timer
  • Different PLL settings on the main internal oscillator
  • Connecting/disconnecting my PICkit 3 programmer
  • Adding 470 uF of capacitance to the 5V rail
  • Adding/removing .1 uF across the 4.7k pullup on my MCLR pin
  • Disabling all interrupts in the code and leaving nothing but EEPROM updates in the main loop
  • Adding a 1.5 second delay to my startup routine before I start reading EEPROM

I've also written separate test code which does nothing but continually write values to EEPROM and then read them back, making sure that the value has not changed. Tens of thousands of iterations gave no errors. All I can conclude is that something goes wrong with EEPROM read or write, specifically at powerup/powerdown.

I've been using the same EEPROM libraries since 2007. I've seen occasional glitches, but nothing reproducible. The relevant code can be found here:
http://srange.net/code/eeprom.c
http://srange.net/code/readEEByte.s
http://srange.net/code/eraseEEWord.s
http://srange.net/code/writeEEWord.s

I've seen EEPROM errors before in other applications, but always as one-off glitches, nothing this reproducible or consistent.

Does anyone have any idea what's going on? I'm running out of things to try.

Best Answer

Two things come to my mind:

First, according to the data sheet, a erase-write-cycle takes at least 0.8ms, and up to 2.6ms. You say that you have an interrupt every 1ms, which may lead to a write operation. I have seen in the code that you disable interrupts for parts of the erase and for parts of the write function. But you still might get funny interleaving of the function calls. Maybe it helps when you disable interrupts for the whole sequence of erase and write?

Second - you might want to write while to power goes down, and the EEPROM write happens exactly in the moment when the supply voltage goes below the operating voltage. You can try to monitor the supply voltage, and refuse a write when it is below, lets say, 4.5V. This assumes that it stays long enough above 2.7V as the minimal operating voltage, and brown-out-detection is set to trigger only below that point.