Yes, you will need to enable PAE to see all 4GB of RAM. Whilst a 32Bit CPU should in theory be able to use up to 4GB of RAM without PAE, the problem with this is that it requires all 32bits of the address bus to use the 4GB, leaving nothing left for things like graphics cards. So with Non PAE systems, you will get less than 4GB.
Enabling PAE will get round this.
You have a server with 256M, but you can't use all of that -- remember there's some OS overhead. Add to that with the fact you're over committing as other folks have mentioned and you'll definitely thrash here. 256M is only enough for a small DB, 20 connections is a lot with what you've got configured.
1) reduce your max connections to 4 (you're using 3 out of 20)
2) optimize your query cache better; 8M is really large, and 64M total is a lot based on your hits/prunes; try a 4/32 combo and see how it goes. Really I think a 2/24 combo would work for you.
3) you have no sorts requiring temp tables, why is that max_heap_table_size verb in there? Comment that out, use the defaults
4) do you actually have 128 tables? Try cutting that table_cache in half to 64 or 48
5) reduce thread_cache_size to 4
6) optimize those tables to reduce fragmenting
Those are some things to start with. It looks like you threw a bunch of numbers in a config without any actual profiling to know what you needed and have created a mess; if all else fails go back to the defaults and get rid of your custom settings and start over using some performance tuning guides you can find on Google. Get the output of SHOW VARIABLES and SHOW STATUS, find any one of a bajillion tuning guides and plug in your actual, real numbers into their equations and that'll tell you the exact-ish numbers you need to put in your config file.
Best Answer
Both kernels split the virtual address space into a user portion and a kernel portion. The kernel portion is shared between all processes in the system, and so the kernel is limited to that much directly addressable memory. Each user process in the system has its own user portion of the address space. Classically this split was done in the middle, giving each half 2gb. Windows can be directed to move the split to 3gb for user and 1gb for kernel with the /3gb boot.ini switch. The linux kernel is rather configurable at build time, and last I checked, the Ubuntu kernels build with the 3:1 split.
PAE allows for 64 gb of physical ram to be addressed, but any given page table still is limited to 4gb. Because there is only one kernel portion of that address space shared between all processes on the system, it is limited to 1 or 2 GB of directly addressable ram no matter what. Additional physical memory can be used, but it has to be only partially mapped into the virtual address space at any given time, and the mappings changed when needed. Because each process has a separate user address space, you can have, for example, 5 different processes that each have 2gb of their own memory that maps to different parts of 16gb of physical ram you have installed, and the kernel using another 2gb.
Note that the filesystem cache does not have to keep pages mapped all the time, so it can use plenty more of that physical ram, and the kernel automatically maps bits of it when needed, then unmaps it so it can map other pages. This trickery allows the kernel to use many gb of memory for the cache, and a few hundred mb for other uses, even when the kernel only has 1gb of virtual address space to play with.
Also worth noting is that in recent versions of Windows, Microsoft has instituted various artificial product licensing limitations. The Windows 7 Pro I am stuck with on my PC at work refuses to use physical ram addresses > 4gb even if I enable PAE, which results in it only being able to use 3.4 of the 4.0 gb of ram installed, since a chunk of the ram is relocated over the 4gb mark to leave room for things like the video ram to be located under 4gb.