Investigating competing claims concerning relative inference difficulty
Ira A. Noveck
Centre de Recherche en Epistemologie Appliquée
Paris, France
Guy Politzer
Centre National de la Recherche Scientifique
Saint Denis, France
To appear in Mental Logic (M . D. S. Braine & D. P. O'Brien,
eds.). NY: Erlbaum.Abstract
This research was supported in part by grants from the Fyssen Foundation
(Paris, France), Center for Research in Learning, Perception, and Cognition
at the University of Minnesota, and the National Institute of Child Health
and Human Development (T32 HD-07151) to the first author. The authors express
their appreciation to Luca Bonatti, Martin Braine, Ruth Byrne, Francesco
Cara, Vittorio Girotto, David O'Brien, Dan Sperber and Charles Tijus for
helpful comments at various stages of this project. Requests for reprints
should be sent to Ira Noveck, CREA-Ecole Polytechnique, 1 rue DESCARTES,
75005 Paris, France.
ABSTRACT
INTRODUCTION
Given that the competing mental logic and mental model accounts of propositional
reasoning have led to a prolific exchange of arguments and counter-arguments,
as well as analyses and re-analyses of foundational work, one would assume
that a neutral observer would have enough data to determine which theory
better describes such reasoning. This is not the case however. The debate's
empirical findings vary sufficiently to run the risk of leaving the neutral
observer nonplussed as to which position is more supportable. For example,
consider the contradictory reports concerning responses to the disjunctive
premise set presented in (1) below:
(1)
p or q
not q
The correct conclusion is, of course, p. Whereas Johnson-Laird, Byrne & Schaeken (1992) report relatively low rates (30%) of correct responses to (1), the mental logic approach typically reports high rates (often above 90%), as to be expected for a mental logic schema (Braine et al., 1995; Lea, O'Brien, Fisch, Noveck, and Braine, 1990; Braine, Reiser & Rumain, 1984; see also Braine & Rumain, 1981). How does one reconcile such wildly divergent findings concerning a fundamental inference? This chapter endeavors to address the contradictory findings not only to maintain the integrity of the empirical literature, but to help resolve part of a wider debate between the mental logic and mental model theories.
We begin by comparing empirical methods because gaps in findings may well be attributable to materials and procedure. Consider the or-elimination premise set in (1) above. The mental model study employs lengthy propositions and exclusive disjunctions (e.g., John is in London or Mary is in Caracas, but not both) and places the second premise on a separate page (Mary is not in Caracas), thus imposing a memory load -- the first, relatively complex premise must be remembered when the second premise is encountered. In contrast, the mental logic experiments typically employ less complex propositions concerning letters on an imaginary blackboard, use unspecified disjunctions, and present the two relevant premises simultaneously on the same page (e.g., [On the blackboard] There is an L or a C; There is not a C), i.e., no memory load is imposed.1 Given these differences, it should not be surprising to find that subjects are less likely to draw the conclusion John is in London than There is an L. Once the differences in materials and procedure become apparent, the results from the two kinds of problems become easier to explain. However, it is not clear which factor accounts for most of the discrepancy and the question remains as to which of the two theories does a better job of describing propositional reasoning once variations are accounted for.
Among various points of disagreement (e.g., algorithms and representation), there is one feature of the two theories that is readily resolvable -- relative inference difficulty. The mental logic approach (Braine, Reiser and Rumain, 1984) classifies a set of highly feasible inference forms as core or ancillary, investigates their relative difficulty, and -- by exclusion -- underlines which valid forms do not merit mental logic status (e.g., modus tollens [as seen in 3 below]). The mental models theory predicts difficulty based on the presumed number of constructions an inference requires; the more models, the more difficult the inference. These two approaches lead to diverging predictions. For example, according to the mental logic theory, premise set (2) below triggers a core schema whereas the premise set that prompts modus tollens (3) does not; thus (2) should be easier to deduce than (3). As discussed below, the mental models account makes a competing prediction: (3) should not appear more difficult than (2).
| (2) | (3) | ||
| not both p and q | If p then q | ||
| p | not-q |
(The correct conclusion to (2) is not q and to (3) it is notp). These two premise sets include the same number of negative elements and will serve as a source of comparison later.
In the remainder of the paper, we briefly review the inference-difficulty
claims and findings from the mental models approach. This is followed by
a discussion concerning how claims from mental models differ from those
of mental logic and how one can practically test between them. Finally,
we report a study in which we directly compare predictions from the two
theories in one overarching procedure.
Inference Difficulty and its Relevance to Mental Logic
Johnson-Laird et al. (1992, p. 428) predict that inferences drawn from
not
both propositions will be more difficult than those drawn from
or
propositions and that inferences drawn from or propositions will
be more difficult than those based on if propositions; this is because
these connectives require, at least initially, three mental models, two
mental models, and one (explicit) mental model, respectively. In an investigation
of this prediction, Johnson-Laird et al. (1992, Experiment 1) present subjects
with four premise sets and record subjects' spontaneous conclusions. Subjects
are presented two premises consecutively (concerning people in cities),
each premise on a separate page, and are required to write down the conclusion.
Two premise sets employ the disjunction or and two the conditional
if.
The two or forms are expressed exclusively as p or q but not
both; not-q (conclusion: p) and p or q but not both; p (conclusion:
not-q). The two forms concerning if are if p then q;not-q,
which prompts modus tollens as seen in (3) above, and the premise set that
triggers modus ponens, if p then q;p, as shown in (4) below2:
| (4) |
| If Linda is in Amsterdam then Cathy is in Majorca. |
| Linda is in Amsterdam. |
The correct response of course is Cathy is in Majorca. Johnson-Laird et al. find that subjects' rates of correct conclusions to the four premise sets ([1] p or q but not both;not-q, [2] p or q but not both;p, [3] if p then q;not-q, and [4] if p thenq;p) are in line with the ordinal prediction. That is, subjects give correct responses to these four premise sets on 30%, 48%, 64% and 91% of the trials, respectively. According to Johnson-Laird et al. (1992), two factors are considered responsible for these findings. One is that negative categorical premises (in premise sets [1] and [3] above) are expected to yield lower rates than those with affirmative categorical premises because they require an inconsistency to be detected.3 The other more theoretically relevant factor is the number of mental models required ab initio: If propositions are said to require one explicit and one implicit mental model initially and or propositions two explicit models.
Whereas mental models theory uses number of models as a predictor of problem difficulty, mental logic has a quite different approach. Broadly, assuming subjects interpret the premises as intended, the easiest problems are those that can be solved in one step by a core mental logic schema. The next easiest are those that require more than one step, but can be solved using the direct reasoning routine (see Chapter X for the core schemas and the direct reasoning routine). Problems that cannot be solved by the direct reasoning routine constitute a third (heterogenous) level of difficulty. (Finer gradations in difficulty can be predicted -- see Chapter X,Y(?) -- but these levels just enunciated suffice for our purposes.)
Three of the core schemas are relevant here. One is modus ponens,
which solves problem (4) above in a simple step:
| (5) |
| if p then q. |
| p |
| q |
Note that the schema involving or, as in (1) above
uses an unspecified or only. That means that when a problem presents
an explicitly exclusive-or, mental logic predicts that more than one inferential
step will be needed. For example, consider problem (6):
| (6) |
| There is an L or an R, but not both. |
| There is not an R. |
Mental logic predicts two steps. The reasoner first applies and-elimination
to the first premise of (6) in order to decouple it (this premise being
a pragmatic variation of There is an L or an R and there is not both
an L and an R). One can then apply the second premise in (6) to There
is an L or an R in order to carry out the inference as represented
in (1) above. Similarly, consider problem (7):
| (7) |
| There is an X or a V, but not both. |
| There is an X. |
Again, mental logic predicts that the reasoner first applies and-elimination, this time extracting There is not both an X and a V. Then, in combination with the second premise in (7), the schema represented in (2) applies to infer that There is not a V.
Thus, mental logic predicts that problem (6) with exclusive-or
should be slightly more difficult than the corresponding problem with
unspecified-or.
Similarly, problem (7) should be slightly more difficult than a corresponding
problem that presents the not-both premise directly. However, any
difference in difficulty should be small and possibly hard to detect statistically
because the extra step involves
and-elimination, a very easy schema
(see Chapter X).
Constructing a Level Playing Field
In order to make comparative assessments of the two theories, it is important to use the same procedures and materials. Here are the ground rules we adopted. The first is that we imposed the more challenging procedure used by Johnson-Laird et al. (1992) of presenting premises separately in order to a) avoid ceiling effects and to b) allow replication of the mental models findings from Johnson-Laird et al. (1992). The second is that we investigate the effect of two variables on inference-difficulty (as measured by subjects' rates of correct responses), type of content (people-in-cities vs. letters) and manner of presenting the disjunction (exclusive-or vs. unspecified-or). Although theoretical predictions from the two theories do not change across content, it is worthwhile to know whether these factors are responsible for the inconsistencies observed across the two kinds of experiments, especially with respect to p or q;not-q. The third is that we include six inference forms -- the four that Johnson-Laird et al. (1992) investigated plus two that follow by having the negated disjunction (not both p and q) as a major premise with either p or not-q as a minor premise -- whose predicted relative difficulty differs as a function of the given theory.
Premises with not both p and q are added for two main reasons. First, the mental models position claims (Johnson-Laird et al., 1992, p. 428) that the premise set not both p and q;p should prompt even fewer correct conclusions than those involving disjunctions which, according to Johnson-Laird et al.'s data, would mean rates of correct responses that are much lower than those reported by Braine et al. (1984). Second, the inclusion of not both p and q;p premise sets makes possible a critical test that compares predictions of the two theories without involving the manipulated disjunctions. As described earlier, the mental logic position argues that reasoners ought to provide correct conclusions to the premise set not both p and q;p more readily than the premise set that prompts modus tollens (if p then q;not-q) because the former is in the mental logic repertoire and the latter is not. The mental model position leads to the following competing prediction. Not both p and q proposition initially require three models and if propositions initially require one explicit model along with one implicit model. The implicit model for if can be fleshed out either into one other explicit model (biconditional representation) or into two explicit models (conditional representation) in order to perform a modus tollens inference. Thus, if can eventually be the source of two or three models. By adding one model to represent the minor premise and one more for producing the conclusion, one arrives at the following number of models for each kind of inference: Five models for drawing the inference from not both p and q;p and either four or five models for modus tollens, which shows that in no case can if p then q;not-q be more difficult than not both p and q;p. If anything, modus tollens should appear easier because it would be expected that a number of subjects would construct the biconditional interpretation and would build only two models for their initial representation of if. The mental model prediction is therefore the following: rates of correct performance on if p then q;not-q should be equivalent to, or higher than, performance on not both p and q; p.
In the Experiment described below, premises with people-in-cities content are expected to prompt fewer correct conclusions than those presented with letters content. When reasoning with people-in-cities content, subjects must store four objects (two names and two places) in memory; when reasoning with arbitrary letters, any given proposition has two constituents. We also investigate how rates of correct performance will be affected when the disjunction is presented exclusively or in an unspecified manner.
Experiment
Method
Subjects. One hundred and twenty four undergraduate students from the University of Minnesota (in Minneapolis) participated. Subjects received either $4.00 or credit towards requirements for the Introductory Psychology course at the University of Minnesota.
Materials. Four problem sets were prepared. Each problem set included eighteen pairs of premises (a major propositional premise and a categorical premise), the result of preparing three instantiations of each of the following six premise sets: 1) if p then q;p, 2) if p then q;not-q, 3) p or q; p (or p or q but not both;p), 4) p or q;not-q (or p or q but not both; not-q), 5) not both p and q;p and 6) not both p and q;not-q. The correct answer to the last premise set is nothing follows.
Problem sets concerned either people in cities (to be referred to as the People-in-cities problem set) or letters said to be written on an index card (to be referred to as the Letters problem set). For the People-in-cities problem set, common first names and well known cities were chosen. Two constraints were considered when combining the names and cities. The first was that no name-city combination had fewer than three syllables or more than five. The second constraint was that there were equal numbers of female and male names and they were paired so that all gender combinations appeared equally often (when including two practice problems).
Given that names and cities vary in length, the total number of syllables in the people-in-cities condition was carefully balanced. Considering only the names and cities, six combinations had a total of seven syllables, six combinations had a total of eight syllables (name-city syllable subtotals of four and four), and six had a total of nine syllables. These were distributed as evenly as possible among the six premise sets. For the Letters set, only those letters that are pronounceable in one syllable were chosen (which excludes W). No letter was used more than twice. Letters were joined randomly with the constraint that half the problems presented letters in their alphabetic order (e.g., If there is an H then there is an L) and the other half presented the letters in their inverted alphabetic order (e.g., If there is a U then there is a J). Also, letter-pairs that could be construed to have meaning were avoided. The remaining two problem sets were identical to the two previously described, except that exclusive disjunctions were replaced with unspecified disjunctions. See Appendix for a complete listing of the materials in the Experiment.
Design. This is a 6 (Premise sets) X 2 (Kinds of Content: People-in-cities vs. Letters) X 2 (Expressions of disjunction: exclusive or vs. unspecified or) design with the first being a within subjects factor. The experimental groups are called Letters Exclusive-or, Letters Unspecified-or, People-in-Cities Exclusive-or, and People-in-Cities Unspecified-or. It follows that the non-disjunctive premise sets (1.if p then q;p, 2. if p then q;not-q, 5. not both p and q;p and 6. not both p and q;not-q) in the People-in-Cities conditions were identical as were the non-disjunctive premise sets in the Letters conditions.
Procedure. Subjects were seated in front of a Macintosh computer that presented the experiment with Frida software (Poitrenaud, 1990). The experimenter conducted the entire procedure by prompting screens. The first screen presented the instructions. In the People-in-cities condition subjects were prompted to imagine that an international company keeps track of its employees on index cards, that the names of several colleagues were written on each index card and that, for each card, the computer will give two pieces of information. The first piece describes an existing relationship between two colleagues and the second informs something specific about one of the colleagues. For the Letters condition, subjects were prompted to imagine game cards that each have several letters written on them. As with the other condition, subjects were told that the computer will present two pieces of information, one concerning a relationship between two letters and a second, more specific, piece of information.
The instructions asked subjects to read each of the two pieces of information out loud (per Johnson-Laird et al, 1992) and to write down whatever conclusion follows. An example was given followed by a sentence that pointed out that they can say that nothing follows when there is insufficient information to draw a conclusion. Before moving on to two practice problems, the experimenter highlighted key aspects of the instructions (that the subject will read two pieces of information out loud and that they have to write down what follows and, if nothing follows, to write that down: nothing follows).
Both practice problems were variations of if p then q;p premises. Modus ponens inferences were used because subjects' performance on these is not under dispute here; the literature shows that subjects are typically able with these forms and that successful performance is expected without practice. The first practice problem was the same as the one in the instruction screen and was presented to get subjects into the habit of reading the premises out loud. Its propositional premise had a negative consequent so that subjects would see that conclusions with negatives were relevant. The second example was expressed with an only if connective and provided affirmative constituents in both the antecedent and consequent. In the practice problems as well as in the task, the first premise disappeared when the second premise appeared (in order to be consistent with Johnson-Laird et al.'s paper-and-pencil procedure). In the rare event that subjects did not provide the correct conclusion to a practice problem, subjects were prompted to reconsider their response. No subject failed to correctly answer a practice problem when asked to reconsider. Once a subject was ready, the experimenter prompted the computer to present the eighteen premise sets. The computer program was designed to select the eighteen in a random order.
Results and Discussion
The results are divided into two parts. We first investigate inference difficulty with respect in particular to disjunction presentation and content in order to determine how subjects' performance varies as a function of materials. This is important to reasoning research generally. We then compare predictions from the mental logic and mental models theories.
Summary of effects. We consider only flawless responses (allowances were made for spelling errors). That is, a response was considered incorrect if 1) the conclusion was evaluated incorrectly (e.g., to respond P instead of not-P) and if 2) it had the wrong name or wrong city in the People-in-Cities condition, or the wrong letter in the Letters condition. Each subject received a score ranging from 0 to 3 for each kind of premise set and these scores were converted to percentages of correct responses, as seen in Table 1.4 Two separate analyses were computed: One for the four determinable premise sets in the two Unspecified-or conditions and the other for the five determinable premise sets in the two Exclusive-or conditions.
--------------------------------------
Insert Table 1 here
--------------------------------------
To investige the two Unspecified-or conditions, a 4 (Determinable Premise Sets) X 2 (Contents: People-in-cities vs. Letters) ANOVA was computed with the first being a within subjects measure. The results revealed an effect for Premise Set F(3,180) = 19.67, p < .001, MSe = .64, and no effect for Content, F(1,60)=2.43, p =.12. There were no interactions (p =.33). Figure 1 summarizes the Premise Set effect: If p then q;p yields the highest rate of correct responses, followed by a second subset of premise sets -- not both p and q;p and p or q;not-q -- which yield comparable rates of correct responses. Finally, representing a third subset, the premise set if p then q;not-q yields a rate of correct responses that is significantly lower than p or q;not-q.
--------------------------------------
Insert Figures 1 & 2 here
--------------------------------------
To investige the two Exclusive-or conditions, a 5 (Determinable Premise Sets) X 2 (Contents: People-in-cities vs. Letters) ANOVA was computed with the first being a within subjects measure. There is an effect for Premise Set F(4,240) = 19.89, p < .001, MSe= .61, and a main effect for Content, F(1,60)=19,193, p < .001, MSe = 1.13. There is a marginal, but not significant, Content X Premise Set interaction (p=.07).
Given the lack of a statistically significant interaction, we summarize performance across the two Exclusive-or conditions in Figure 2. For these conditions, if p then q; p again is easiest for subjects, followed by p or q but not both;p and not both p and q;p on a second tier. A third subset comprising p or q but not both;p and if p then q;not-q yielded comparable rates of correct responses. Figure 2 shows that when or is presented as an exclusive disjunction, subjects' rate of correct performance to p or q but not both;not-q drops down one level (when compared to p or q;not-q in Figure 1).
Before describing the main effect for content, we investigate how the but not both clause in the disjunctive premise set p or q but not both;not-q affected rates of correct performance. A 2 Factor ANOVA was computed in which Disjunction Type (Exclusive-or vs. Unspecified-or) and Content (Letters vs. People-in-Cities) were between subject variables. The ANOVA revealed one main effect -- for Content, F(1, 120) = 13.25, p < .001, MSe = .63. Although the effect for Disjunction Type had some marginal influence, it was not significant (p=.23); the same holds for the Content X Disjunction Type interaction (p = .31). Rates of correct responses to the premise sets p or q;not-q and p or q but not both;not-q combined (fourth row of Table 1) in the Letters and People-in-cities conditions were 82% and 60%, respectively. Clearly, subjects' rates of correct performance in response to this premise set were more sensitive to type of content than to the presence or absence of the exclusiveness clause.
Space does not permit a thorough discussion of the more general effect for Content but, briefly, subjects responded correctly to 86% of the determinable premise sets with Letters content and to 68% of those with the People-in-Cities content. Another 9% of responses in the People-in-Cities condition would have been correct if errors attributable to memory-limitations were considered acceptable (e.g., if we were to accept John is in Caracas as accurate when the correct response was George is in Caracas). Interestingly, errors in the People-in-cities sets were largely the result of errors on the name. In contrast, only one more response in the Letters condition (less than .3%) would have been accepted following a more generous accounting. When errors in memory are disregarded, the findings in the People-in-cities set more closely resemble those found in the Letters set.
Comparing predictions from the mental logic and mental model theories. We carry out two tests in order to test between the two theories. First, we make a direct comparison of subjects' responses to the if p then q;not-q (modus tollens) and not both p and q;p premise sets. As discussed in the introduction, the mental logic theory predicts that if p then q;not-q will prompt a lower rate of correct responses than not both p and q;p because the former is not a core mental logic schema and the latter is. Mental model theory makes the complementary prediction: if p then q;not-q ought to prompt a rate of correct responses that is either higher than, or equal to, that of not both p and q;p because if p then q;not-q ought to prompt up to (four), or as many (five), mental models as not both p and q;p (which prompts five). Table 1 shows that, across the four conditions, subjects consistently had significantly more success in responding correctly to not both p and q;p (80% correct overall) than to if p then q;not-q (59% overall). This finding is significant whether one sums across conditions, t (123) = 5.26, p < .001, or investigates conditions separately (see Table 2). These results favor the mental logic account of propositional reasoning.
Our second test involves a set of finer-grained analyses. We treat each experimental condition separately and check predictions with respect to performance on any two premise sets that prompt a prediction. A succesful prediction is one in which a t-test is significant at the .05 level. A confirmed null hypothesis generously serves as the basis for confirming a predicted equivalence of inference difficulty. We focus on the middle part of the table because both theories correctly anticipate that if p then q;p (prompting modus ponens) would yield the highest rate of correct responses (Braine et al., 1984; Johnson-Laird et al., 1992) and that not both p and q;not-q would yield the lowest.
------------------------------
Insert Table 2 here
-------------------------------
The mental logic account predicts that among the remaining premise sets, if p then q;not-q will prompt the lowest rate of correct responses and significantly fewer correct responses than the others. This amounts to two predictions in each of the two Unspecified-or conditions and to three predictions in each of the two Exclusive-Or conditions. Table 2 shows that mental logic succesfully predicts eight of ten effects with this analysis and a ninth is marginally confirmed.
Predictions from mental models are somewhat more complicated because the number of anticipated mental models varies on two inferences as a function of a fleshing-out procedure. As described earlier, conclusions from if p then q;not-q may require four or five models. Similarly, the number of total mental models required to infer p from the (Unspecified-or) premise set p or q;not-q also depends on a subject's interpretation of or (Byrne, personal communication). If subjects interpret or as inclusive it will prompt 3 models initially (1.p q, 2. p not-q, 3. not-p q, ) and require five to reach solution and if subjects interpret or exclusively, it will prompt 2 models (1. p not-q, 2. not-p q) and require four to reach solution. Below we describe how we attempt to adopt the set of predictions most favorable to the mental models position.
To simplify the discussion, we refer to the premise sets by the row in which they appear in Table 1 (as II, III, IV, and V) and we indicate either that one premise set is predicted to be easier than another (i.e., prompt higher rates of correct responses) by way of a greater-than sign or that two premise schemas are predicted to be equally easy by way of an equal sign. We begin with the two Unspecified-or conditions in order to determine the most favorable assignments for if p then q; not-q (II) and p or q;not-q (IV) before applying the outcome to the entire set of data. Predictions for the two Unspecified-or conditions could be any one of the following (where the subscripts refer to the total number of mental models required to make the inference):
II4 = IV4 > V5 i.e., (if p then q;not-q ) requires 4 models and (p or q;not-q ) requires 4
II4 > IV5 = V5 i.e., (if p then q;not-q ) requires 4 models and (p or q;not-q ) requires 5
IV4 > II5 = V5 i.e.,(if p then q;not-q ) requires 5 models and (p or q;not-q ) requires 4
II5 = IV5 = V5 i.e.,(if p
then q;not-q ) requires 5 models and (p or q;not-q ) requires
5
Each line yields three predictions per Content condition. The strongest set of predictions arises out of IV4 > II5 = V5 (listed third above) because IV (p or q;not-q) led to significantly higher rates of solution than II (if p then q; not-q) in both Unspecified-or conditions. None of the other three potential assignments leads to more than one confirmed prediction per Content condition and no other assignment leads to a particular pattern of confirmed predictions among the two Content conditions. For this reason, we assume heretofore that the solution of ifp then q; not-q requires five models and that p or q; not-q requires four. This provides the three predictions for each of the two Unspecified-or conditions in Table 2.
Given the above analysis, we now can adopt the prediction that
III4 = IV4 > II5 = V5 for the
two Exclusive-or conditions, which leads to six further predictions in
each of two Content conditions. Table 2 shows that, of the eighteen predictions
overall, four are confirmed and that two are marginally confirmed. This
is the most favorable set of predictions when a consistent number of mental
models are assigned to each premise set.
General Discussion
The experiment aimed to remove procedural and material inconsistencies across two similar experimental designs in order to more confidently determine which of the two theories, mental logic or mental models, better describes propositional reasoning. As one analysis showed, People-in-Cities content leads to more errors than Letters content undoubtedly because there is more information to store. Most notably, the more complex content negatively affects rates of correct responses to p or q;not-q (and p or q but not both;not-q). Although the presence of the exclusive clause in p or q but not both;not-q has the effect of making the inference as difficult as modus tollens, the test that compared performance between p or q;not-q and p or q but not both;not-q, was not significant. This null finding is a clue that or is treated exclusively, at least in the kind of spontaneous production task employed here (cf. Evans and Newstead, 1980).
The results from this Experiment are generally very supportive of the mental logic approach. That is, premise sets that are hypothesized to prompt mental logic schemas -- if p then q;p, p or q;not-q and not both p and q;p -- yield the highest rates of correct responses in three conditions. Rates of correct responses in the fourth condition, the People-in-Cities Exclusive-Or set (which arguably presents propositions in the least user-friendly manner), indicate that two mental logic schemas are among the premise sets that prompt the highest rates of correct responses. This evidence extends previous findings that support the mental logic approach because the present study is based on a procedure (i.e., to store premises in memory) that is more challenging than any employed previously. Using a similar task, Klauer & Oberhauer (1995) recently published findings that more closely follow the order of difficulty predicted by mental logic (they yielded higher rates of correct performance to p or q but not both;not-q).
The present findings favor mental logic over mental models as well. The most convincing piece of evidence favoring the mental logic account is that premises for the premise set if p then q;not-q consistently prompted lower rates of correct responses than the not both p and q;p premises. Mental model theory would not expect such an outcome according to its construal of modus tollens inference-making. This finding is consistent with those found elsewhere (George & Politzer, 1995) and it is important because the two compared inferences are similar in many crucial ways: They are both logically valid, they both include a negative among the premises, and they both require that one produce a negative conclusion in order to be correct.
Table 2 captures how the two theories fundamentally differ in their accounts of propositional reasoning. On the one hand, mental logic makes a smaller set of predictions based on a straighforward categorical claim (i.e., whether or not a premise set triggers a mental logic schema). The strength of the mental logic approach is that it largely succeeds in confirming its moderate number of predictions. That these inferences are made routinely and with little relative difficulty corroborates evidence collected elsewhere (e.g., Braine, O'Brien, Noveck, Samuels, Fisch, Lea and Yang; 1994) and argues in favor of the claim that 1) there exists a basic repertory of the mental logic and that 2) it is a necessary foundation for reasoning theories. On the other hand, mental models -- to its credit perhaps -- ventures more predictions based on an elaborate concept of mental models. Mental models weakness however is that it largely fails to confirm its predictions. Moreover, mental models compels one to take into account a variety of subtle factors that are not clearly articulated in the theory's presentation in order to derive predictions (e.g., the role of detecting an inconsistency -- see Footnote 2). These underdefined factors may or may not be important. Their net result however is that they force the investigators to choose insecurely among various interpretations of text. In sum, mental logic is strikingly simple and reliable whereas mental models (despite its initial intuitive appeal) is complex and often indeterminate.
Finally, this work further reveals the value of attempts to standardize procedures in the investigation of reasoning phenomena (see also Noveck and O'Brien, 1996). Prior empirical work on propositional reasoning tasks have included several features that varied across theoretical paradigms, thus blocking confident assessments of the competing accounts. We have levelled the playing field by investigating subjects' performance with single propositional logic inferences and by examining the import of both content and the exclusiveness clause. The outcome of the tests weigh in favor of the mental logic account.
FOOTNOTES
2 To emphasize that we are usually referring to the premise
sets that prompt modus ponens and modus tollens inferences, we will simply
present the premise sets themselves.
3 The role of the "inconsistency to be detected" in mental
models, which is a source of inference difficulty, is mentioned twice in
Johnson-Laird et al. (1992). On page 425, while describing the algorithm,
the inconsistency refers to any contrary value between new and previously
constructed models. It follows that an inconsistency arises when p
appears in the minor premise and is inconsistent with a previously-constructed
model containing not-p (as in p or q, but not both; p). However,
on page 431 -- in the course of describing predictions from their first
Experiment -- "inconsistency" is employed only in reference to a categorical
(minor) premise that has a negative value. In our discussions of "inconsistency,"
we adopt the former interpretation because it follows from Johnson-Laird's
general description of their algorithm and, if its more general use were
adopted in describing their first experiment, it would not unduly affect
the predictions from their first experiment. It follows then that all the
determinable premise sets here but if p then q;p have an inconsistency
to be detected.
4 The same Experiment, using the two Exclusive-or problem
sets only, was presented in French to University students in Paris. The
French results were comparable to their companion problems in English.
For the sake of brevity, we do not include them here.
REFERENCES
Bonatti, L. (1994). Propositional reasoning by model? Psychological Review, 101, No.4, 725-733.
Braine, M. D. S. (1978). On the relation between the natural logic of reasoning and standard logic. Psychological Review, 85, 1-21.
Braine, M. D. S. (1990). The "natural logic" approach to reasoning. In W.F. Overton (Ed.), Reasoning, necessity and logic: Developmental Perspectives (pp. 133-157). NJ: Erlbaum.
Braine, M. D. S., O'Brien, D.P., Noveck, I. A., Samuels, M., Fisch, S. M., Lea, R. B., and Yang, Y (1995). Predicting intermediate inferences in propositional reasoning. Journal of Experimental Psychology: General, 124, No. 3, 263-292.
Braine, M. D. S. & O'Brien, D. P. (1991) A theory of if: Lexical entry, reasoning program, and pragmatic principles. Psychological Review, 98, 182-203.
Braine, M. D. S, Reiser, B. J., and Rumain, B. (1984). Some empirical evidence for a theory of natural propositional logic. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and thinking (Vol. 18, pp. 317-371). New York: Academic Press.
Braine, M. D. S. & Rumain, B. (1981). Development of comprehension of "Or": Evidence for a sequence of competencies. Journal of Experimental Child Psychology, 31, 46-70
Braine, M. D. S. & Rumain, B. (1983). Logical Reasoning. In P. Mussen (Ed.), Handbook of Child Psychology, Vol. 3. New York: Wiley.
Evans, J. St. B. T. & Newstead, S. E. (1980). A study of disjunctive reasoning. Psychological Research, 41, 373-388.
George, C. & Politzer, G. (submitted manuscript). Propositional reasoning as constraint satisfaction.
Johnson-Laird, P. N. (1975) Models of Deduction. In R. J. Falmagne (Ed.) Reasoning: Representation and process. (pp. 7-54). N. J.: Lawrence Erlbaum Associates.
Johnson-Laird, P.N. & Byrne, R. M. J. (1991). Deduction. Hillsdale, N. J.: Lawrence Erlbaum Associates.
Johnson-Laird, P. N., Byrne, R. M. J. & Schaeken, W. (1992). Propositional reasoning by model. Psychological Review, 99, 418-439.
Johnson-Laird, P. N., Byrne, R. M. J. & Schaeken, W. (1994). Why models rather than rules give a better account of propositional reasoning: A reply to Bonatti and to O'Brien, Braine, and Yang. Psychological Review, 101, No. 4, 418-439.
Klauer, K.C. & Oberhauer, K. (1995). Testing the mental model theory of propositional reasoning. Quarterly Journal of Experimental Psychology, 48A (3), 671-687.
Lea, R. B., O'Brien, D. P., Fisch, S. M., Noveck, I. A. & Braine, M. D. S. (1990). Predicting propositional logic inferences in text comprehension, Journal of Memory and Language, 29, 361-387.
Noveck, I. A. & O'Brien, D. P. (1996). To what extent do pragmatic reasoning schemas affect performance on Wason's selection task. Quarterly Journal of Experimental Psychology, 49A(2), 463-489.
Noveck, I. A., Lea, R. B., Davidson, G. & O'Brien, D. P. (1990). Human reasoning is both logical and pragmatic. Intellectica, 11, 81-110.
O'Brien, D. P., Braine, M. D. S. & Yang, Y. (1994) Propositional reasoning by mental models? Simple to refute in principle and in practice. Psychological Review, 101, No. 4, 711-724.
Poitrenaud, S. (1990). FRIDA, logiciel de maquettage de Systèmes techniques du Laboratoire de psychologie cognitive du traitement de l'information symbolique. Fifth European Conference on Cognitive Ergonomics, Urbino, Italy.
Rips, L. J. (1983). Cognitive processes in propositional reasoning. Psychological Review, 90, 38-71.
Table 1. Percentage of correct responses to the six premise sets
in the four problem sets of the Experiment , n=31 per problem set.
| Unspecified-or | Exclusive-or | Total | |||
| Premise set | Letters | People-in-Cities | Letters | People-in-Cities | |
| if p then q;p | 99 | 92 | 100 | 95 | 97 |
| if p then q;not-q | 65 | 55 | 65 | 54 | 59 |
| p or q;p a
|
-- | -- | 96 | 69 | -- |
| p or q;not-q a | 82 | 67 | 81 | 53 | 70 |
| not both p and q;p | 81 | 83 | 87 | 70 | 80 |
| not both p and q;not-q | 32 | 31 | 42 | 27 | 33 |
Table 2. Account of the Mental Logic and Mental Models predictions.
| Condition | Prediction | Empirical Status | |
| Mental Logic 1. | Letters, Uns-or | IV > II | Confirmed |
| 2. | Letters, Uns-or | V > II | Confirmed |
| 3. | People, Uns-or | IV > II | Marginally confirmed |
| 4. | People, Uns-or | V > II | Confirmed |
| 5. | Letters, Exc-or | III > II | Confirmed |
| 6. | Letters, Exc-or | IV > II | Confirmed |
| 7. | Letters, Exc-or | V > II | Confirmed |
| 8. | People, Exc-or | III > II | Confirmed |
| 9. | People, Exc-or | IV > II | Not Confirmed |
| 10. | People, Exc-or | V > II | Confirmed |
| Mental Models 1. | Letters, Uns-or | IV > II | Confirmed |
| 2. | Letters, Uns-or | IV > V | Not confirmed |
| 3. | Letters, Uns-or | II = V | Not confirmed |
| 4. | People, Uns-or | IV > II | Marginally confirmed |
| 5. | People, Uns-or | IV > V | Not confirmed |
| 6. | People, Uns-or | II = V | Not confirmed |
| 7. | Letters, Exc-or | III = IV | Not confirmed |
| 8. | Letters, Exc-or | III > II | Confirmed |
| 9. | Letters, Exc-or | III > V | Marginally confirmed |
| 10. | Letters, Exc-or | IV > II | Confirmed |
| 11. | Letters, Exc-or | IV > V | Not confirmed |
| 12. | Letters, Exc-or | II = V | Not confirmed |
| 13. | People, Exc-or | III = IV | Not confirmed |
| 14. | People, Exc-or | III > II | Confirmed |
| 15. | People, Exc-or | III > V | Not confirmed |
| 16. | People, Exc-or | IV > II | Not confirmed |
| 17. | People, Exc-or | IV > V | Not confirmed |
| 18. | People, Exc-or | II = V | Not confirmed |
Notes. Predictions refer to subjects' relative difficulty in drawing
correct conclusions from the compared premise sets. The Roman Numerals
refer to the premise sets found in the corresponding row in Table 1. For
example, II refers to if p then q; not-q . = is to be read as is
as difficult as
and > is to be read as is easier than. Empirical
status is determined by t-tests with a .05 level of significance on rates
of solution in Table 1.
Appendix
A list of the English materials with Cities and Letters content presented
in Experiment 2.
if p then q;p
| 1. If Janet is in Nice then Paul is in Chicago.
Janet is in Nice. |
1. If there is a U then there is a J.
There is a U. |
| 2. If Peter is in Bombay then Stan is in Liverpool.
Peter is in Bombay. |
2. If there is a Z then there is a K.
There is a K. |
| 3. If Daniel is in Tucson then Emily is in Dublin.
Daniel is in Tucson. |
3. If there is an H then there is an L.
There is an H. |
| if p then q; not-q | |
| 4. If Patricia is in Rome then Robert is in Madrid.
Robert is not in Madrid. |
4. If there is an L then there is a K.
There is not a K. |
| 5. If Susan is in Budapest then Laurie is in Stockholm.
Laurie is not in Stockholm. |
5. If there is an C then there is an M.
There is not an M. |
| 6. If Joseph is in Cambridge then Gordon is in Moscow.
Gordon is not in Moscow. |
6. If there is a U then there is an A.
There is not an A. |
| p or q; p* | |
| 7. Isabelle is in Milan or Phillip is in Nashville.
Isabelle is in Milan. |
7. There is an F or a Y.
There is an F. |
| 8. Terry is in Duluth or Alan is in London.
Terry is in Duluth. |
8. There is an R or a G.
There is an R. |
| 9. Sylvie is in Montreal or Anna is in Kiev.
Sylvie is in Montreal. |
9. There is a D or an S.
There is a D. |
| p or q; not-q* | |
| 10.Agnes is in Bonn or Steven is in Boston.
Steven is not in Boston. |
10.There is an M or a B.
There is not a B. |
| 11.Andrew is in New York or Sybil is in Helsinki.
Sybil is not in Helsinki. |
11.There is a Z or an R.
There is not an R. |
| 12.George is in Caracas or Marc is in Naples.
Marc is not in Naples. |
12.There is a K or a V.
There is not a V. |
| not both p and q; p | |
| 13.We do not have both Marie in Peking and Martin in Venice.
Marie is in Peking. |
13.There is not both an O and a C.
There is an O. |
| 14.We do not have both Karen in Lansing and Michelle in
Bogota.
Karen is in Lansing. |
14.There is not both an H and a P.
There is an H. |
| 15.We do not have both Jack in Athens and Claire in Toronto.
Jack is in Athens. |
15.There is not both a J and an S.
There is a J. |
| not both p and q; not-q | |
| 16.We do not have both Sophie in Oslo and Sarah in Prague.
Sarah is not in Prague. |
16.There is not both an E and a B.
There is not a B. |
| 17.We do not have both Ed in Buffalo and Felix in Brussels.
Felix is not in Brussels. |
17.There is not both an N and a D.
There is not a D. |
| 18.We do not have both John in Amsterdam and Fawn in Munich.
Fawn is not in Munich. |
18.There is not both a G and an L.
There is not an L. |
Notes. *The exclusive or propositional premises appeared with
", but not both" appended to them.
Figure 1
Summary of inference difficulty, from least (top) to most (bottom),
based on subjects' rates of correct responses to the four determinable
premise sets in the two Unspecified-or conditions. The p values
refer to the results from within-subjects t-tests.
Figure 2
Summary of inference difficulty, from least (top) to most (bottom),
based on subjects' rates of correct responses to the five determinable
premise sets in the two Exclusive-or conditions. The p values refer
to the results from within-subjects t-tests.