What are you going to do with it? Relative to your requirements the BeagleBoard and BeagleBone are a couple orders of magnitude of overkill, although the Bone is an amazing value for its power and hackability. I'm not sure why you say it fails requirement 4, it has all those things except perhaps the nice documentation. The Beagles seem to be pitched as linux platforms so I don't know if you'll be completely on your own if you want to bootstrap it yourself; unlike with an MCU you may not have a bunch of C libraries for working directly with the hardware and peripherals.
Have you considered an ARM Cortex M3 or M4 microcontroller kit? STMicro has a $20 discovery kit for their new Cortex M4 micros. 192KB RAM, 1MB flash, JTAG, USB, and a pretty awesome array of peripherals. These are targeted towards bare-metal development so you will just get some C libraries that let you configure the hardware and you provide a main function and interrupt service routines. It doesn't have networking but at that price point, unless you are designing something where that's a make or break requirement it's an amazing value for learning and prototyping. I have one in the mail now so I can't give a first-hand account but I have developed on other Cortex MCUs and really like them.
If you do require networking, I have used TI's Stellaris LM3S6965 Ethernet Evaluation Kits and they are great, the docs and libraries are pretty good (I've hit a few stumbling blocks figuring things out but overall a good experience). I've even used lwIP and UIP to build a device with a (very, very, very) simple web server. I'm a little reluctant to recommend the full kits over the BeagleBone though because they are around $70 and vastly less powerful than the Bone, but it all depends on what you want to build or learn.
The first thing I verify on a new board, whether it is using an internal oscillator or an external crystal, is that I have the clock frequency set up correctly. This is important because many of the peripherals, such as UART, SPI, I2C and timers depend on it.
The way I verify it is to write a program with a short loop, either in assembly language where I can count the cycles manually, or C as long as you can get a disassembly listing and do the same thing -- and turn an LED on and off. I set up a loop so it executes once a second. I run the code, and check that the LED blinks 60 times in a minute.
As far as peripherals go, the best way to check them is to use an oscilloscope if you have one, and look at the RX line for UART, the CLK, MOSI, and chip select lines for SPI, and the SDA and SCL lines for I2C, and check that the lines are toggling and the timing looks correct.
If you don't have an oscilloscope, you can put LEDs on these lines, and then enable or disable the peripherals, When disabled, most of the lines will be low (LED off), but some will be high, like the RX lead of the UART (LED on). When the peripheral is enabled, most the LEDs should dim, since the lines will be toggling. By running in a loop (disabled/enabled) it is easier to see the difference between on or dim.
For the UART, you can connect the TX line to the RX line as a loop around. You can also connect then to a UART to USB cable, and on the PC real a terminal a program like RealTerm. Besides testing out the interface, this will come in handy for other debugging later.
For other pieces of code, I use multiple LEDs as necessary to show that various paths in the code are being executed. If you have the UART working and connected to a PC, you can sprinkle your code with calls to a subroutine to output a message to show what points the program has reached (or use printf if you have the standard C libraries available). But as Vladimir Cravero points out in a comment below, this can slow your code down some (at 115,200 baud, not too much, since one character time is < 10 µs). But in ISRs and other time critical code, just use LEDs.
As Al Bundy points out in a comment below, in-circuit debuggers can be useful also, particularly if one can set multiple breakpoints, and even more useful if you can breakpoint on a memory location being changed. Not all debuggers have that feature.
However I don't use debuggers a lot unless I have to, for example to look at bits in a peripheral register; or to track down a bug which I can't find by inspection; or to rudimentary code coverage analysis. But in general I like to run programs at their "normal" speed since a lot of issues will usually show up which may not when the program is single-stepped. Most of my programs use interrupts a lot, which interferes with using a debugger.
Best Answer
As you suspect, this is happening because the unsigned int data type is 4 bytes in size. Each
*bss_start_p = 0;
statement actually clears four bytes of the bss area.The bss memory range needs to be aligned correctly. You could simply define _BSS_START and _BSS_END so that the total size is a multiple of four, but this is usually handled by allowing the linker script to define the start and stop locations.
As an example, here is the linker section in one of my projects:
The
ALIGN(4)
statements take care of things.Also, you may wish to change
while(bss_start_p != bss_end_p)
to
while(bss_start_p < bss_end_p)
.This won't prevent the problem (since you might be clearing 1-3 more bytes than you wish), but it could minimize the impact :)