How to diagnose frequent “500 Internal Server Error” on Oracle Apex

oracleoracle10g

I have an Oracle 10g XE database running on OEL 5 on an Amazon EC2 instance. On it I run a public website (actually, 2 web sites) written in Oracle Application Express 4.0.1.

Most of the year, the site is primarily used by people just viewing information. As such it works fine – no errors, not much activity. Around this time of year (Mar-Apr), when we start taking applications from people for a sport team, I start seeing frequent errors, almost always when submitting a screen (e.g. one that will create or update a record).

I need help to find the cause of the error.

The actual error message is an Apache error "invalid response from upstream server", which is due to the fact that I have Apache running (port 80) in front of Apex – it ProxyPasses the requests on /apex/ to Apex (port 8080). When I access Apex directly via port 8080, I get 500 Internal Server Error instead. The Apache logs show that's what Apache is getting as well. It always takes 3-5 seconds before the error page is returned.

Sometimes (like right now, when I'm trying to reproduce it) it doesn't happen at all. Other times (perhaps when there are several people on it?) it will take 3-4 goes before the update is accepted. When it's happening, it occurs very frequently – i.e. maybe only 1 in 10 requests will succeed on the first try.

I've seen it crop up in a number of different applications, as well as in the Apex development/admin application itself. The problem is not isolated to any set of pages, and I've seen it when a page should be inserting a record, updating a record, calling a procedure, or even just navigating to another page (although that last one is rare).

Nothing gets added to the alert log. I thought it might be a space issue, but all the tablespaces seem to have sufficient free space. I've tried restarting Apache and restarting the database, with no change. I've run out of things to just "try", and I'd like to nail down the cause of the issue for once and for all if possible.

Best Answer

I don't yet know if this is the answer, but - during the recent outage at Amazon US-East, I noticed I was getting ORA-00018 maximum number of sessions exceeded when trying to connect to the database.

I've since upped the number of processes (thus increasing the number of sessions). Since the outage, I've noticed the error hasn't happened again.

NOTE: Next time we get a flurry of activity I'll reduce this parameter and see if the problem re-occurs - then I'll have a better idea if this is the solution.

Related Topic