BDD adds a cycle around the TDD cycle.
So you start with a behaviour and let that drive your tests, then let the tests drive the development. Ideally, BDD is driven by some kind of acceptance test, but that's not 100% necessary. As long as you have the expected behaviour defined, you're ok.
So, let's say that you're writing a Login Page.
Start with the happy path:
Given that I am on the login page
When I enter valid details
Then I should be logged into the site
And shown my default page
This Given-And-When-And-Then-And syntax is common in behaviour-driven development. One of the advantages of it is that it can be read (and, with training, written) by non-developers -- that is, your stakeholders can view the list of behaviours you have defined for successful completion of a task and see if it matches their expectations long before you release an incomplete product.
There is a scripting language, known as Gherkin, which looks a lot like the above and allows you to write test code behind the clauses in these behaviours. You should look for a Gherkin-based translator for your usual development framework. That's out of the scope of this answer.
Anyway, back to the behaviour. Your current application doesn't do this yet (if it does then why is someone requesting a change?), so you're failing this test, whether you're using a test runner or simply testing manually.
So now it's time to switch to the TDD cycle to provide that functionality.
Whether you're writing BDD or not, your tests should be named to a common syntax. One of the most common is the "should" syntax you described.
Write a test: ShouldAcceptValidDetails. Go through the Red-Green-Refactor cycle until you're happy with it. Do we now pass the behaviour test? If not, write another test: ShouldRedirectToUserDefaultPage. Red-Green-Refactor til you're happy. Wash, rinse, repeat until you fulfil the criteria set out in the behaviour.
And then we move on to the next behaviour.
Given that I am on the login page
When I enter an incorrect password
Then I should be returned to the login page
And shown the error "Incorrect Password"
Now you shouldn't have preempted this to pass your earlier behaviour. You should fail this test at this point. So drop back down to your TDD cycle.
And so on until you have your page.
Highly recommend The Rspec Book for learning more about BDD and TDD, even if you're not a Ruby developer.
There are two distinct cases that might fall under the category of "using the UI" that you mentioned above, and they have different cases when it would make sense to use them:
1: specifying something using the UI concepts, language or elements - specific use cases that would justify this would be important rendering logic (think about twitter's link display - eg if the entire link is less than 100 chars, display the link, else shorten it, or positioning elements on the screen depending on how many links an article has etc) or critical/highly risky user interface specific workflows (navigation, dynamic menus etc). "critical/highly risky" is the crucial thing here, because it's very easy to think about the UI and one of the biggest problems new teams typically have with BDD style specs is to overdo them from the UI, describing stuff that's not that important long term
2: just automating tests through the UI while the spec is still talking about underlying functionality - eg the spec talks about free delivery, but the underlying test automation uses selenium to open a browser, load the web site, purchase books, go to checkout etc (similar to what I've described in the Three Layers pattern http://gojko.net/2010/04/13/how-to-implement-ui-testing-without-shooting-yourself-in-the-foot-2/). Specific use cases that justify this are where the system is designed in such a way that the risk is spread across the user interface, and testing below the UI would not provide enough confidence to stakeholders. This is mostly a legacy system design issue, and I've used this approach when retrofitting BDD-style tests to a legacy system before an important change, or when extending a legacy system that contains a lot of risk or important logic in the user interface layer. If you have to do this on a new system, that's typically a sign of risk not being localised and might point to problems with the system design.
Outside the scope of BDD, but related to tools that people often use for BDD, writing tests (not specs in the spec-by-example sense, but just automating regression tests) through the user interface also makes sense in the cases where the risk is spread across the different layers, so excluding the UI from the test would not provide enough confidence, or when you want to automate a small number of selected face-saving tests that additionally de-risk critical workflows throughout all components. As an example, a major media company had roughly 10 such tests for their primary web site. These tests were automated using cucumber, because that is what developers could maintain easily, and they were very flaky - almost any UI change broke them, but because there was just 10 of them the maintenance costs weren't that high. Those tests were described in UI terms and they executed through the UI to ensure the widest possible coverage. They were used by developers as a quick check they had to do before saying that they are done with a piece of work, in addition to the technical unit tests and BDD-style .
As you mentioned, generally it makes a lot of sense to avoid coupling the user interface and business logic (not just for spec or testing purposes, but for maintainability as well - the UI tends to be the most volatile part of the system).
Best Answer
If you can derive the scenarios from the description, you're done.
An anti-pattern that I often see in BDD is people feeling the need to talk through, and write down, every scenario in detail.
Some scenarios are so well understood that it's enough to derive them from a brief description. For instance, if I say, "I'd like the login feature this week," you know what that should look like. You know that there are scenarios for the right password, the wrong password, the wrong username. We don't really need to talk through those or capture them in detail.
Similarly, I might say, "Here's the form for user registration. We need to be able to create new users, let them edit their details, and delete themselves, except that deletion shouldn't actually delete, it should just mark them as deleted so they can recover their accounts if they want to."
And you can ask, "Is account recovery part of this feature?"
"They can be two features if you want."
"Okay, so we have scenarios for create, read, update, delete; that should be easy enough. Let's talk about account recovery; that sounds more interesting."
In general, if the description of behavior is enough for the dev team to derive the scenarios, you don't need to talk through them. You can do so if there's any doubt, but you may just want to capture which scenarios you need to remember, if you capture any at all.
If you've never done it before or you're uncertain, talk through the scenarios.
Focus on the areas which are unusual, particularly if there are features you've never done before. These are fantastic places to have conversations and write down any surprising examples which come up. I usually have two questions I ask, based on the BDD template:
If everyone at the table is looking bored, the feature you're talking through is probably well understood. It's often enough to say, "It should work like X, but with Y instead." This is what Dan North calls the Ginger Cake pattern; it's like the recipe for chocolate cake, but with ginger instead of chocolate.
Even if the business stakeholder is able to derive the scenarios himself, it's really important for the dev team to be able to talk to him, pick up and internalise his language. That language then gets carried into the code, enabling them to have better conversations in the future, and helping newcomers to the project understand what's going on. If the devs don't get to speak the language, they won't use it.
If the business stakeholder or analyst really doesn't want to spend the time capturing things in the session, I'd rather the developers wrote the scenarios down in collaboration with the testers, then asked him to review it. This is more likely to uncover misunderstandings than the other way round.
Sometimes BDD doesn't work.
Another possibility is that you find a scenario the business stakeholder is uncertain about. "Oh, I hadn't thought of that! I'm not sure." Rather than trying to nail the business down and punish the business with certainty, it may be worth abandoning BDD at this point and trying something simple out to get some feedback and give the business something over which they can iterate. Keep it easy to change, and write the scenarios once there's a better understanding of what's going on.
BDD done well can really help to uncover places of uncertainty. Since every project worth doing has some aspect of it that's new and has never been done before, there is some uncertainty in there, somewhere. If you focus on using the scenarios to help deliberately discover ignorance, you'll learn faster, and learning is usually a large part of the time spent on a project.
Additionally I've found that the more dev teams collaborate in this way, the more the business are prepared to trust them with uncertainty, and the more innovation starts to occur. Innovative companies, by their very nature, have plenty of uncertainty in their projects.
I wrote a blog post a while back on Cynefin, which I find really helps me understand where the conversations will be most effective. If you read it and understand the four domains, here are the rules I use:
Simple and complicated stuff (known) is often well-understood and you don't need to talk through the scenarios in detail.
Highly complex stuff (unknown) is not understood at all. You may discover this by talking through the scenarios. The lack of certainty means that BDD won't work here, so iterate over something easy to change and get fast feedback instead. Any practice which retains your options, like A-B testing, is also great in this space.
BDD works brilliantly in the space in between (knowable) as a mechanism for passing on knowledge, and to uncover the other two spaces. It's not a hammer, and not everything is a nail. In fact, if you can focus the time spent having conversations on anything, it's not about the examples you can find; it's about finding the examples you can't.