Phil,
Automation can just be hard to maintain, but the more you use your automation for deployment, the more you can leverage it for test setup (and vice versa).
Frankly, it's easier to evolve automation code, factoring it and refactoring it into specific, small units of functionality when using a build tool that isn't
just driving statically-compiled, pre-factored units of functionality, as is the case with NAnt and MSBuild. This is one of the reasons that many people who were relatively early users of toole like NAnt have moved off to Rake. The freedom to treat build code as any other code - to cotinually evolve its content and shape - is greater with Rake. You don't end up with the same stasis in automation artifacts as easily and as quickly with Rake, and it's a lot easier to script in Rake than NAnt or MSBuild.
So, some part of your struggle is inherently bound up in the tools. To keep your automation sensible and maintained, you should be wary of obstructions that static build tools like NAnt and MSBuild impose.
I would suggest that you not couple your test environment boot-strapping from assembly load. That's an inside-out coupling that only serves brief convenience. There's nothing wrong (and, likely everything right) with going to the command line and executing the build task that sets up the environment before running tests either from the IDE or from the command line, or from an interactive console, like the C# REPL from the Mono Project, or from IRB.
Test data setup is simply just a pain in the butt sometimes. It has to be done.
You're going to need a library that you can call to create and clean up database state. You can make those calls right from your test code, but I personally tend to avoid doing this because there is more than one good use of test data or sample data control code.
I drive all sample data control from HTTP. I write controllers with actions specifically for controlling sample data and issue GETs against those actions through Selenium. I use these to create and clean up data. I can compose GETs to these actions to create common scenarios of setup data, and I can pass specific values for data as request parameters (or form parameters if needs be).
I keep these controllers in an area that I usually call "test_support".
My automation for deploying the website does not deploy the test_support area or its routes and mapping. As part of my deployment verification automation, I make sure that the test_support code is not in the production app.
I also use the test_support code to automate control over the entire environment - replacing services with fakes, turning off subsystems to simulate failures and failovers, activating or deactivating authentication and access control for functional testing that isn't concerned with these facets, etc.
There's a great secondary value to controlling your web app's sample data or test data from the web: when demoing the app, or when doing exploratory testing, you can create the data scenarios you need just by issuing some gets against known (or guessable) urls in the test_support area. Really making a disciplined effort to stick to restful routes and resource-orientation here will really pay off.
There's a lot more to this functional automation (including test, deployment, demoing, etc) so the better designed these resources are, the better the time you'll have maintaining them over the long hall, and the more opportunities you'll find to leverage them in unforseen but beneficial ways.
For example, writing domain model code over the semantic model of your web pages will help create much more understandable test code and decrease the brittleness. If you do this well, you can use those same models with a variety of different drivers so that you can leverage them in stress tests and load tests as well as functional test as well as using them from the command line as exploratory tools. By the way, this kind of thing is easier to do when you're not bound to driver types as you are when you use a static language. There's a reason why many leading testing thinkers and doers work in Ruby, and why Watir is written in Ruby. Reuse, composition, and expressiveness is much easier to achieve in Ruby than C# test code. But that's another story.
Let's catch up sometime and talk more about the other 90% of this stuff :)
Ideally it should really be QA who end up writing the tests. The problem with using a programmatic solution is the learning curve involved in getting the QA people up to speed with using the tool. Developers can certainly help with this learning curve and help the process by mentoring, but it still takes time and is a drag on development.
The alternative is to use a simple GUI tool which backs a language (and data scripts) and enables QA to build scripts visually, delving into the finer details of the language only when really necessary - development can also get involved here also.
The most successful attempts I've seen have definitely been with the latter, but setting this up is the hard part. Selenium has worked well for simple web applications and simple threads through the application. JMeter also (for scripted web conversations for web services) has worked well... Another option which is that of in house built test harness - a simple tool over the top of a scripting language (Groovy, Python, Ruby) that allows QA to put test data into the application either via a GUI or via data files. The data files can be simple properties files, or in more complex cases structured (something like YAML or even Excel) data files. That way they can build the basic smoke tests to start, and later expand that into various scenario driven tests.
Finally... I think rich client apps are way more difficult to test in this way, but it depends on the nature of the language and the tools available to you...
Best Answer
"took longer and the benefits were not that apparent"
Always the case. Quality appears to cost money up front.
"We still have testers manually going through test scripts"
And folks claim this is cheaper? How or why?
"approach it and what tools you use?"
Automate everything. It's not negotiable. Management pressure to reduce time invested into testing is something I always translate to "you mean reduce quality so we can spend more time doing rework? I can't. Find someone else who likes to do rework. I hate rework because it's always more expensive and complicated than doing it closer to right up front."