Conditionally show questions based on previous answers in a database driven survey

algorithmsdatabase-designdependencies

I'm trying to design survey-type application that uses user responses to determine what the next question should be. For simplicity's sake, question types can be either combo boxes (they have a set of distinct answers to choose from) or images (user selects an image file from their system). I require this to be database-driven, so questions (and any answers they depend on) need to be stored in the database.

Here is a sample question flow:

Sample question flow

There are several use cases in this example. Initially, the user would be presented with question Q1 (no other questions visible at this time). If the user answers "Yes", both Q2 and Q10 become visible. If Q1 is "No", then Q3 becomes visible instead. This is the simplest case.

Another common scenario is shown later on. Q5, Q6, and Q7 are all shown at the same time. If the user answers those specific questions with "Yes", "Dogs", and "Yes", respectively, then Q10 is shown. In all other cases, Q8 is shown instead.

The final scenario is where two branches both lead to the same question. For example, if the user answers Q8 with "Yes", then Q10 is shown. If the answer is "No", then Q9 is shown instead, and Q10 is only revealed once the user has answered Q9 (with any result).

There can be any combination of the above scenarios. Additionally, as you can see, some questions can be reached from several different branches (Q10, for example, can be accessed on 3-4 different question flows).

The issue I'm having is coming up with a good database design to accommodate these requirements. The approach I've been aiming for to tackle this is to define some kind of dependency lookup table in the database.

Here's the general (simplified) structure I came up with:

----------
|Question|
----------
QuestionID,
QuestionText

---------------
|QuestionGroup|
---------------
GroupID, 
QuestionID

--------------------
|QuestionDependency|
--------------------
ParentGroupID,
ParentGroupValue,
ChildGroupID

The idea here is that each question could belong to one or more "groups", meaning that a single answer could enable several questions at once (by activating just one group). The QuestionDependency tables is used to look up which question groups to activate based on the ParentGroupValue field. Let's look at a concrete example based on the sample question flow from above:

Question Tables for Sample Flow

(In these tables, <ans> means the question has been answered and is not blank – can be any other value.)

So for example, as you can see in the QuestionDependency table, once the answers in G1 (which only has one question) are equal to "Yes", then G2 (which contains Q2 and Q10) is enabled. Otherwise if the answer is "No", G3 is enabled. Similarly for G5, if the answers for questions Q5, Q6, and Q7 are "Yes", "Dogs", and "Yes" respectively (parsing logic will be handled separately, don't worry about that), then questions in G8 are enabled. In all other combinations (assuming each question is actually answered), G6 is enabled instead.

The problem with this approach is that when questions belong to multiple groups (as with Q10), there's no way of knowing from the code which group is currently being accessed. If the user answers Q10, are they answering from G2 or G8? I'm unable to determine that.

My main question is, can you think of a better way to approach this problem? I've been trying to think of a better solution for several days now and this is unfortunately the best I've come up with.

Best Answer

First, three observations.

Already covered by Christophe in the previous answer: the table that controls the flow to another question group needs to have the QuestionID as a foreign key. Otherwise the answers "Yes" or "Dogs" can't be related to a question.

I would change the terminology from "ParentGroup/ChildGroup" to "QuestionGroup/FollowingGroup", because Parent/Child has a narrow, technical meaning that doesn't apply here.

Your Question Dependency table doesn't cater for conditions that involve multiple questions - for example "If Q1 answer is Yes, and Q2 answer is "Dogs", go to G4". To do this you'd have to normalise this table out into two tables:

Flow
FlowID OriginGroup       NextGroup
F1     G1                G2
F2     G1                G4

FlowDependency
FlowID   QID     Value
F2       Q1      Yes
F2       Q2      Dogs

And two further points:

Further possible answers There are two further possible answers that you may need to somehow encode in the QuestionDependency table: no answer, and any answer. (If you have independent code that globally enforces an "answers to all questions are mandatory" rule, then the former won't apply).

No answer would have to be encoded as a conventional string that will never be used as an actual question answer. Alternatively, it could be encoded as a row in the table with Value=NULL, so that the answer to that question is never even considered by the particular "flow".

Any answer could be encoded as the absence of a row. For example, say there's a question group G1 with questions Q3, Q4, Q5. If the user answers "Hovercraft" to Q3, and "Eels" to Q4, then they should be redirected to group G4 (which shows a Monty Python video about Hungarians speaking English). Q5 is about their favourite breakfast cereal, and is irrelevant to the flow between question groups. This could be encoded as

Group        Question        Value       FollowingGroup
G1           Q3              Hovercraft  G4
G1           Q4              Eels        G4

The code that interprets these rules would match the answers to Q3 and Q4 to the Values in the table, not consider Q5 at all, and since it finds that everything matches, direct the user to G4.

Default flow It would be very difficult to guarantee coverage of all possible answers/combinations of answers: in other words, to guarantee by populating a QuestionDependency table that, whatever answers the user gives, they go somewhere when they press "Next". So I suggest that each question group have a "default next group" specified. If none of the conditions in QuestionDependency are matched, the user goes to the default next group.

Related Topic