AI Capabilities – Can a Neural Network Provide More Than ‘Yes’ or ‘No’ Answers?

artificial intelligencemachine learning

Every example neural network for image recognition I've read about produces a simple "yes" or "no" answer. One exit node corresponds to "Yes, this is a human face," and one corresponds to "No, this is not a human face."

I understand that this is likely for simplicity of explanation, but I'm wondering how such a neural network could be programmed to give a more specific output. For example, let's say I was classifying animals. Instead of it saying "Animal" or "Not an animal", I would want responses like "Dog", "Fish", "Bird", "Snake", etc., with one final exit node being "Not an animal/I don't recognize this".

I'm sure this must be possible, but I'm having trouble understanding how. It seems like due to the training algorithm of backpropogation of error, as you train up one exit node (i.e., "This is a dog") and the weights of the neurons are changed, then the ideal state for another exit node that you previously trained (i.e., "This is a bird") will begin to deviate, and vice versa. So training the network to recognize one category would sabotage any training done for another category, thus limiting us to a simple "Yes" or "No" design.

Does this make such a recognizer impossible? Or am I misunderstanding the algorithm? The only two things I can think of are that:

  • Either we could train one neural network for each thing we want classified and somehow use those to construct a greater, super-network (so for example, a network for "dog", a network for "bird", etc., which we somehow add together to create the super-network for "animals"); or,

  • Create some kind of ridiculously complicated training methodology which would require incredibly advanced mathematics and would somehow produce an ideal neuron-weight-state for all possible outputs (in other words, insert math magic here).

(Side note 1: I am specifically looking at multilayer perceptrons as a kind of neural network.)

(Side note 2: For the first bulleted "possible solution", having each specific neural network and iterating through them until we receive a "Yes" response is not good enough. I know this could be done fairly easily, but that is simple functional programming rather than machine learning. I want to know if it's possible to have one neural network to feed the information to and receive the appropriate response.)

Best Answer

To answer just your title, yes. Neural nets can give non-boolean answers. For example, neural nets have been used to predict stock market values, which is a numeric answer and thus more than just yes/no. Neural nets are also used in handwriting recognition, in which the output can be one of a whole range of characters - the whole alphabet, the numbers, and punctuation.

To focus more on your example - recognising animals - I'd say it's possible. It's mostly an extension of the handwriting recognition example; you're recognising features of a shape and comparing them to "ideal" shapes to see which matches. The issues are technical, rather than theoretical. Handwriting, when run through recognition software, is usually mapped down to a set of lines and curves - nice and simple. Animal faces are harder to recognise, so you'd need image processing logic to extract features like eyes, nose, mouth, rough skull outline etc. Still, you only asked if it's possible, not how, so the answer is yes.

Your best bet is probably to take a look at things like Adaptive Resonance Theory. The general principle is that the sensory input (in this case, metrics on the relative size, shape and spacing of the various facial features) is compared to a "prototype" or template which defines that class of thing. If the difference between the sensory input and the remembered template is below a certain threshold (as defined by a "vigilance parameter"), then the object being observed is assumed to be a member of the group represented by the template; if no match can be found then the system declares it to be a previously unseen type. The nice thing about this sort of net is that when it recognises that an object is, say, a horse, it can learn more about recognising horses so that it can tell the difference between, say, a standing horse and a sleeping horse, but when it sees something new, it can start learning about the new thing until it can say "I don't know what this is, but I know it's the same thing as this other thing I saw previously".

EDIT:

(In the interest of full disclosure: I'm still researching this myself for a project, so my knowledge is still incomplete and possibly a little off in places.)

how does this tie in with backpropogation setting weights for one output node ruining the weights for another, previously-trained node?

From what I've read so far, the ART paradigm is slightly different; it's split into two sections - one that learns the inputs, and one that learns the outputs for them. This means that when it comes across an input set that doesn't match, an uncommitted neuron is activated and adjusted to match the input, so that that neuron will trigger a match next time. The neurons in this layer are only for recognition. Once this layer finds a match, the inputs are handed to the layer beneath, which is the one that calculates the response. For your situation, this layer would likely be very simple. The system I'm looking at is learning to drive. This is actually two types of learning; one is learning to drive in a variety of situations, and the other is learning to recognise the situation. For example, you have to learn how to drive on a slippery road, but you also have to learn to feel when the road you're driving on is slippery.

This idea of learning new inputs without ruining previously-learned behaviours is known as the stability/plasticity dilemma. A net needs to be stable enough to keep learned behaviour, but plastic enough that it can be taught new things when circumstances change. This is exactly what ART nets are intended to solve.

Related Topic