Putting the Trolley in Order: Experimental Philosophy and the Loop Case

Forthcoming in Philosophical Psychology

S. Matthew Liao (NYU), Alex Wiegmann (Göttingen), Joshua Alexander (Sienna), and Gerard Vong (Fordham)

June 5, 2011
Putting the Trolley in Order: Experimental Philosophy and the Loop Case

Abstract

In recent years, a number of philosophers have been conducting empirical studies that survey people’s intuitions about various subject matters in philosophy. Some have found that intuitions vary accordingly to seemingly irrelevant facts: facts about who is considering the hypothetical case, the presence or absence of certain kinds of content, or the context in which the hypothetical case is being considered. Our research applies this experimental philosophical methodology to Judith Jarvis Thomson’s famous Loop Case, which she used to call into question the validity of the intuitively plausible Doctrine of Double Effect. We found that intuitions about the Loop Case vary according to the context in which the case is considered. We contend that this undermines the supposed evidential status of intuitions about the Loop Case. We conclude by considering the implications of our findings for philosophers who rely on the Loop Case to make philosophical points and for philosophers who use intuitions in general.

I. Introduction and Background

In recent years, a number of philosophers have conducted empirical studies that survey people’s intuitions about various subject matters in philosophy. Some have found that intuitions vary accordingly to seemingly irrelevant facts such as the social and cultural background of the subjects (see, e.g., Weinberg et al. 2001 and Machery et al. 2004), the affective content of the relevant case descriptions (see, e.g., Nichols & Knobe 2007), or the order of the presented cases (see, e.g., Swain et al. 2008). Our research applies this method of experimental philosophy to Judith Jarvis Thomson’s famous Loop case, which she used to call into question the intuitively plausible Doctrine of Double Effect (hereafter DDE) (Thomson, 1985). DDE draws a distinction between intending harm and merely foreseeing harm. According to DDE, we are morally prohibited from intending harm, even when that harm would bring about a greater good; however, we are morally permitted to intend to employ neutral or good means to promote a greater good, even though we foresee the same harmful side effects, if (a) the good is proportionate to the harm, and (b) there is no better way to achieve this good.[1]

DDE can be used to explain the normative difference in permissibility between the following two trolley cases:

Standard: A runaway trolley is headed toward five innocent people who are on the track and who will be killed unless something is done. Abigail can push a button, which will redirect the trolley onto a second track, saving the five people. However, on this second track is an innocent bystander, who will be killed if the trolley is turned onto this track.

Push: A runaway trolley is headed toward five innocent people who are on the track and who will be killed unless something is done. Abigail can push a button, which will activate a moveable platform that will move an innocent bystander in front of the trolley. The runaway trolley would be stopped by hitting the innocent bystander, thereby saving the five but killing the innocent bystander.

DDE is supposed to neatly capture our intuitive judgments that it is permissible to redirect the trolley in Standard but impermissible to redirect the trolley in Push. According to DDE, because we merely foresee the death of the innocent bystander in Standard but do not intend him to be hit as a means to saving the five other innocent people, it is permissible for us to redirect the trolley. But in Push, because we intend the innocent bystander be hit by the trolley as a means to stopping it from hitting the five other innocent people, it is not permissible for us to push him into the trolley’s path.

While intuitions about Standard and Push have been used to support DDE, Judith Jarvis Thomson (1985) famously challenged DDE on the basis of intuitions about a different trolley case:

Loop: A runaway trolley is headed toward five innocent people who are on the track and who will be killed unless something is done. Abigail can push a button, which will redirect the trolley onto a second track, where there is an innocent bystander. The runaway trolley would be stopped by hitting the innocent bystander, thereby saving the five but killing the innocent bystander. The second track loops back towards the five people. Hence, if it were not the case that the trolley would hit the innocent bystander and grind to a halt, the trolley would go around and kill the five people.[2]

As Thomson explains, hitting the innocent bystander is causally required to stop the trolley and to save the five other innocent people. Also, Thomson argues that redirecting the trolley seems to involve intending to hit the innocent bystander in order to save the five other innocent people. If so, DDE would forbid our redirecting the trolley in Loop. And yet Thomson suggests that intuitively it seems permissible to redirect the trolley in Loop. As Thomson asks, what difference can an extra bit of track make? Since DDE prohibits an action that Thomson suggests that we find intuitively permissible, she argues that we should give up DDE.

Many philosophers agree with Thomson about Loop (see, e.g., Kamm 2007 and Scanlon 2008).[3] Indeed, Frances Kamm’s Doctrine of Triple Effect (DTE) offers an explanation of the apparent permissibility of redirecting the trolley in Loop. DTE distinguishes between doing something because an effect will occur and doing something in order to bring about the effect, whereby doing something because an effect will occur need not imply that one intends the effect to occur. Using DTE, Kamm argues that in Loop, when we redirect the trolley, we act because we believe that the innocent bystander will be hit, but not in order to bring about his being hit. This means, according to Kamm, that we do not intend for the innocent bystander to be hit. Consequently, Kamm believes that DTE accounts for Thomson’s intuition that redirecting the trolley in Loop is permissible.

For our purposes, it is significant that Thomson first presented Standard, then Loop, then Push in her seminal paper. Thomson did not consider whether or not this order of presentation might affect our intuitions about Loop – at the time, she did not have much reason to worry that this might be the case. But, there is now evidence suggesting that people’s intuitive responses to some hypothetical cases can vary depending on which cases have been considered beforehand (Petrinovich & O’Neill, 1996; Sinnott-Armstrong 2008; Swain et al., 2008; Wiegmann et al., 2010).[4] In fact, Petrinovich and O’Neill found that people’s intuitive responses to some trolley cases vary depending on which cases have been considered beforehand. This raises the possibility that people’s intuitive responses to Loop might be distorted by an order effect. Given the importance of Loop for moral philosophy, we set out to run a properly controlled experiment to test whether Loop is indeed vulnerable to such an effect.

Before proceeding, it is useful to make two preliminary remarks. First, those who are acquainted with the literature on the trolley cases will recognize that Standard, Loop, and Push are not exactly the same as Thomson’s original cases. The reason for the difference is to remove certain possible confounding factors that have been noticed since Thomson’s publication. For instance, Push is intended to parallel Thomson’s Fat Man case, according to which you are standing on a footbridge over a trolley track and can push a fat man over the footbridge so that his weight can stop a runaway trolley that is about to hit five innocent people. Some people have thought that what explains why many people judge the action in Fat Man to be morally impermissible is that in this case, but not in the other two cases, the action is “up close and personal,” that is, you are actively pushing the man off the footbridge (see, e.g., Greene 2008). To remove this possible confounding factor, we have made the action at issue uniform in all of our cases, namely, Abigail can push a button to achieve a certain consequence.

Secondly, in the literature on experimental philosophy, there is a debate about whether the responses gathered from surveys such as ours are intuitions. Some philosophers have argued that intuitions must have special (e.g., modally strong or abstract) propositional content, others have argued that intuitions must have the appearance of necessity, and still others have argued that they must be based solely on conceptual competence (see, e.g., Sosa 1998, Bealer 1998, and Ludwig 2007). If these philosophers are right, then there is an issue regarding whether survey responses are in fact genuine philosophical intuitions. We take note of this controversy without wishing to take a stand on it. (For a detailed response to this way of challenging experimental philosophy, see Weinberg & Alexander (forthcoming)). It is worth mentioning that the cases we have presented are just the kinds of cases that many philosophers, including Thomson and Kamm whose work this paper addresses, have regarded as appropriate for eliciting intuitions and have fueled much of the philosophical debate about trolleys. Indeed, Kamm (2007) explicitly describes this methodology as one in which one tests and develops “theories or principles by means of intuitive judgments about cases” (p. 14) (our italics). Given this, there is some reason to regard these responses as intuitions. In any case, whether the responses we have elicited are intuitions or not, our findings should still be of significant philosophical interest. So, for the purpose of this paper, we shall not make much of the distinction between “intuitions” and “responses,” and we shall continue to use the word “intuitions”.

II. Empirical Results

Subjects

145 participants were recruited via the Research Subject Volunteer Program^SM (http://rsvp.alkami.org) and redirected to the test page (www.psycho-experimente.de) where the experiment was run as an online questionnaire study. The majority of the subjects were between 18 and 30 years old (58%), and the sample had a female bias (70%). 84% listed English as their primary language, and 31% had some background in philosophy. The study ran from December 2008 to June 2009.

Design

In a between-subjects design, each subject was randomly assigned to one of three conditions (see figure B). In each condition participants had to judge several scenarios consecutively (see figure A) and the last two scenarios served as control scenarios. For the purposes of our experiment, the crucial comparison was between the rating of Loop in Condition 1 and the rating of Loop in Condition 2, i.e., the rating of Loop when preceded by Push vs. the rating of Loop when preceded by Standard.

Figure A: Overview of all scenarios used.

Scenario	Schematic	Description	Claim
Standard		A runaway trolley is headed toward five innocent people who are on the track and who will be killed unless something is done. Abigail can push a button, which will redirect the trolley onto a second track, saving the five people. However, on this second track is an innocent bystander, who will be killed if the trolley is turned onto this track.	It is morally permissible for Abigail to push the button to redirect the trolley onto the second track.
Loop		A runaway trolley is headed toward five innocent people who are on the track and who will be killed unless something is done. Abigail can push a button, which will redirect the trolley onto a second track, where there is an innocent bystander. The runaway trolley would be stopped by hitting the innocent bystander, thereby saving the five but killing the innocent bystander. The second track loops back towards the five people. Hence, if it were not the case that the trolley would hit the innocent bystander and grind to a halt, the trolley would go around and kill the five people.	It is morally permissible for Abigail to push the button to redirect the trolley onto the second track.
Push		A runaway trolley is headed toward five innocent people who are on the track and who will be killed unless something is done. Abigail can push a button, which will activate a moveable platform that will move an innocent bystander in front of the trolley. The runaway trolley would be stopped by hitting the innocent bystander, thereby saving the five but killing the innocent bystander.	It is morally permissible for Abigail to push the button to activate the moveable platform that will move the innocent bystander in front of the trolley.
Clearly Permissible	Same Situation as in Standard but this time there is no person located on the side track.		Same as in Standard
Clearly Impermissible	Same situation as in Push but this time there are no people located on the track.		Same as in Push

Figure B: List of scenarios used in each condition, their order of appearance and the average mean [rating scales ranged from 1 (strongly disagree) to 6 (strongly agree)].

	First Scenario	Second Scenario	Third Scenario	Fourth Scenario	Fifth Scenario
Condition 1	Push 3.23	Loop 3.10	Standard 3.37	Clearly Permissible	Clearly Impermissible
Condition 2	Standard 4.34	Loop 3.82	Push 3.32	Clearly Permissible	Clearly Impermissible
Condition 3	Loop 4.19	Clearly Permissible	Clearly Impermissible

Procedure

After being redirected to the test page, participants began by reading a general description of the study. They were instructed to read the stories carefully and to imagine the situations as best as they could. Participants were also informed about the estimated length of the survey (ten minutes), about the possibility of leaving the survey at any point, and about the fact that their data would be treated anonymously. Furthermore, they were told that they are not allowed to go back and change their answers. To control for this we recorded for each participant whether the back-button was pushed.

According to the condition they had been assigned, participants were then presented with three (control condition) or five scenarios (test conditions). The scenarios were described using a short piece of text. To supplement the text, the scenarios were also accompanied by a diagram illustrating the situation (see figure A). These scenarios were specifically designed to be clear and straightforward and to contain no extraneous information.[5] After each scenario participants were asked to indicate the degree to which they agreed or disagreed with a corresponding claim (see figure A). Responses were made on a rating scale ranging from 1 (strongly disagree) to 6 (strongly agree). Agreement is read as agreement with the claim that it is permissible to do the action in question, which means that the higher the number the more inclined the participants are to hold that it is permissible to do the action in question. Additionally, participants had the opportunity to provide comments on each scenario.

After responding to all scenarios, participants were asked to provide some demographic information. In addition, we asked participants whether a) they have attended a course in philosophy; and b) the number of philosophical books they have read.

Results

Since the ratings of “3” and “4” were labeled with “mildly disagree” and “mildly agree”, respectively, we interpreted ratings of 3 or below as “disagree” and ratings of 4 or above as “agree”. 12 of the 145 subjects failed to respond to one of the control cases correctly, i.e., they disagreed with the proposed action in Clearly Permissible or agreed with the action in Clearly Impermissible, and were therefore excluded from the analyses.

Figure B displays the mean responses for each scenario. The mean response for Loop when preceded by Push was 3.10 (SE = 0.23) and the mean response for Loop when preceded by Standard was higher, namely, 3.82 (SE = 0.25). Planned comparisons revealed that this effect was significant (F_(1,
130) = 4.850446; p = 0.029). This confirmed our prediction that intuitive judgments regarding the permissibility of Loop vary significantly depending on whether Loop is preceded by Standard or by Push. Moreover, the majority of participants presented with Loop after Push disagree with the action (29 out of 52 or 56% answered 4 or above) whereas the majority of participants presented with Loop after Standard agree with the action (25 out of 38 or 66% answered 3 or below). A chi-squared test revealed that this difference was significant (χ²₍₁₎=4.1; p<0.05). This suggests that not only does order seem to influence levels of agreement/disagreement, but also that order seems to influence whether or not people agree/disagree in the first place.

III. Implications

Future research will be conclusive, but our study suggests that intuitions about Loop are sensitive to the context in which the case is considered, in particular, when the intuitions are elicited in close temporal proximity to certain related hypothetical cases. Since evidence is only trustworthy to the extent to which its sensitivity is limited to things that are relevant to the truth or falsity of the claims for which it is supposed to provide evidence, the evidentiary status of Loop intuitions is significantly challenged by our findings. Either a convincing case must be made that context is relevant to the moral permissibility of redirecting the trolley in Loop or Loop intuitions cannot legitimately play the evidentiary role that they were supposed to play in Thomson’s argument against DDE. While we do not have any specific reasons to think that a convincing case cannot be made for the relevance of context, we also admit to having trouble seeing what such a case might look like.

In particular, if context is playing a role in shaping people’s Loop intuitions, it would seem to be doing so in one of two ways. On the one hand, people might be making comparative judgments, specifically comparing the moral permissibility of actions in one case against the moral permissibility of similar actions in cases considered in close temporal proximity. On the other hand, what other cases are considered in close temporal proximity might influence what features of a specific case are made salient to the reader and taken into consideration when thinking about the case. In either case, the challenge is not to explain why our intuitive judgments track such things, but why we should think that ethical truth tracks such things. Without that kind of explanation, whatever other reasons we might have for wanting to reject DDE, it would be problematic to base the case against DDE on Loop intuitions. Such a situation would also call into question the philosophical justification of theories, like Kamm’s DTE, that were constructed to accommodate Loop intuitions. Any theory proposed specifically to accommodate some set of evidence would have to be reevaluated in light of identified problems with that evidence.[6]

Our study also adds to a growing body of empirical work that challenges the method of using intuitions in philosophy. The precise nature of this challenge and what general methodological conclusions should be drawn from empirical work proves to be difficult to state precisely, and here we do not want to take too strong a stand on this issue. (For more detailed discussions of this issue, see Alexander & Weinberg (2007), Weinberg (2007), Liao (2008), Horvath (2010), and Alexander (forthcoming).) On the one hand, the most radical version of this challenge calls for a global elimination of the use of intuitions in philosophy. This excessively strong skeptical position is too strong, being warranted by neither the empirical evidence nor philosophical arguments. Merely showing that some limited number of intuitions varies according to seemingly irrelevant facts does not show that all intuitions are epistemically unreliable in this way. On the other hand, the most conservative version of this challenge calls for rather localized methodological restrictions, eliminating problematically sensitive intuitions from practice while leaving our intuition deploying practices otherwise unchanged. This position is too conservative. We do not yet know all of the features that make the use of intuitions evidentially unreliable nor if there is some unifying explanation for this unreliability. As such, the most conservative approach fails to appreciate the risks involved in not knowing how widespread these kinds of problematic intuitional sensitivity might be. This may be especially risky in moral philosophy as moral principles like DDE can and are applied to issues in applied ethics such as abortion and military bombing. In such cases, purported justification of particular moral principles through unreliable intuitions can lead to bad moral consequences. The right position on methodological reform seems to us to fall somewhere in between, combining local methodological restrictions with global shift in how we use intuitions as evidence in philosophy. Specifically, we should, for example, be extra cautious with respect to relying on intuitions that have been shown to be sensitive to the context in which a particular case is considered or to other seemingly irrelevant factors, and that in general, we should be on the lookout for such potentially unreliable intuitions. To do so, we need more empirical and philosophical work in order to determine what intuitions can legitimately be treated as evidence so that philosophy’s intuition deploying practice can go forward in an epistemically responsible manner.

Before concluding, it is worthwhile considering an increasingly popular response to empirical work of this kind, according to which, since the philosophical practice of appealing to intuitions is properly construed as the practice of appealing to philosophers’ intuitions, studies conducted on the intuitions of untutored folk provide no evidence for, or against, that practice (see, for example, Hales 2006, Ludwig 2007, Williamson 2007 and Horvath 2010). (For a detailed evaluation of this kind of defense, see Weinberg et al. (2010).) The success of this kind of responses depends on its being the case that expert intuitions are in some sense more epistemically virtuous than folk intuitions. Interestingly, this might not be the case, at least when it comes to order effects. A number of recent psychological studies on expertise and expert performance suggest that expertise alone does not necessarily inoculate experts from order effects: Olympic gymnastics judges, professional auditors, and Master-level chess players have all been found to make judgments that are susceptible to order effects (see, for example, Damish et al. 2006, and Brown 2009, and Lewandowsky & Thomas forthcoming). Of even more relevance to our work, Eric Schwitzgebel and Fiery Cushman (manuscript) have recently surveyed trained philosophers with cases on the doctrine of double effect similar to our own and found that even trained philosophers are significantly influenced by order effects, in fact, to a similar extent as non-philosophers. This suggests that even if philosophers’ intuitions were more epistemically virtuous in other cases, they may not be more so in this case, and whatever traction this kind of response may have against other empirical work, it may not have the same traction against our study.

Acknowledgements

Each author contributed equally to this project. We would like to thank Michael Otsuka, Regina Rini, and audiences at the Metro Experimental Research Group Metaethics Workshop at NYU, the 36^th Annual Meeting of the Society for Philosophy and Psychology, and Duke University’s Moral Psychology Group for their helpful comments on earlier versions of this paper. Thanks are also due to Jonas Kahle for setting up the online experiment. Alex Wiegmann was supported by a grant of the Deutsche Forschungsgemeinschaft (DFG WA 621/21-1), and the Courant Research Centre ‘Evolution of Social Behaviour’, University of Göttingen (funded by the German Initiative of Excellence).

References

Alexander, J. (forthcoming). Experimental Philosophy: An Introduction, Cambridge: Polity Press.

Alexander, J, & Weinberg, J.M. (2007). Analytic Epistemology and Experimental Philosophy. Philosophy Compass, 2(1), 56-80.

Aquinas, T. (1988) Summa Theologica II-II, Q. 64, art. 7, “Of Killing”. In William P. Baumgarth and Richard J. Regan, S.J. (eds.), On Law, Morality, and Politics. Indianapolis/Cambridge: Hackett Publishing Co.

Brown, C. (2009). Order Effects and the Audit Materiality Revision Choice. The Journal of Applied Business Research, 25(1), 21-36

Cullen, S. (2010). Survey-driven romanticism. Review of Philosophy and Psychology, 1(2), 275 296.

Damisch, L., Mussweiler, T., & Plessner, H. (2006). Olympic medals as fruits of comparison? Assimilation and contrast in sequential performance judgments. Journal of Experimental Psychology: Applied, 12, 166-178.

Hales, S. D. (2006). Relativism and the foundations of philosophy, Cambridge, MA: MIT Press.

Horvath, J. (2010). How (Not) to React to Experimental Philosophy. Philosophical Psychology, 23(4), 447-480.

Kamm, F. (2007). Intricate Ethics: Rights, Responsibilities, and Permissible Harm. New York: Oxford University Press.

Kauppinen, A. (2007). The Rise and Fall of Experimental Philosophy”, Philosophical Explorations 10: 95-118.

Lewandowsky, S., & Thomas, J. L. (2009). Expertise: Acquisition, Limitations, and Control. In F. T. Durso (Ed.), Reviews of Human Factors and Ergonomics (Vol. 5, pp. 140-165). Santa Monica: Human Factors and Ergonomics Society.

Liao, S. M. (2008). A Defense of Intuitions. Philosophical Studies 140: 247-262.

Liao, S. M. (2009). The Loop Case and Kamm’s Doctrine of Triple Effect. Philosophical Studies 146: 223-231.

Ludwig, K. (2007). The epistemology of thought experiments: First person versus third person approaches. Midwest Studies in Philosophy, 31, 128-159.

Machery, E., Mallon, R., Nichols, S., and Stich, S. (2004). Semantics, Cross-cultural style. Cognition 92: B1-B12.

McMahan, J. (2009). Intention, Permissibility, Terrorism, and War. Philosophical Perspectives 23: 345-372.

Nichols, S., and Knobe, J. (2007). Moral Responsibility and Determinism: The Cognitive Science of Folk Intuitions. Nous 41: 663-685.

Otsuka, M. (2008). Double-Effect, Triple-Effect and the Trolley Problem. Utilitas 20: 92-110

Nahmias, E., Morris, S., Nadelhoffer, T., and Turner, J. (2006). Is Incompatibilism Intuitive? Philosophy and Phenomenological Research 73: 28-53.

Petrinovich, L., and O'Neill, P., (1996). Influence of wording and framing effects on moral intuitions. Ethology and Sociobiology 17: 145-171.

Quinn, W. (1993). Morality and Action. Cambridge: Cambridge University Press.

Scanlon, T. (2008). Moral Dimensions: Permissibility, Meaning, Blame. Cambridge: Harvard University Press.

Sinnott-Armstrong, W. (2008). Framing Moral Intuitions in W. Sinnott –Armstrong (Ed.) Moral Psychology, Volume 2: The Cognitive Science of Morality, (pp. 47-76). Cambridge, MA: MIT Press.

Swain, S., Alexander, J. and Weinberg, J. (2008). The Instability of Philosophical Intuitions: Running Hot and Cold on Truetemp. Philosophy and Phenomenological Research 76: 138-155.

Thompson, J. (1985). The Trolley Problem. The Yale Law Journal 94: 1395-415.

Thomson, J. (2008). Turning the Trolley. Philosophy and Public Affairs 36: 359-374.

Weinberg, J.M., & Alexander, J. (forthcoming). The Challenge of Sticking with Intuitions Through Thick and Thin. In A. Booth and D. Rowbottom (Eds.), Intuitions. Oxford: Oxford University Press.

Weinberg, J.M., Gonnerman, C., Buckner, C., & Alexander, J. (2010). Are Philosophers Expert Intuiters? Philosophical Psychology, 23(3), 331-355.

Weinberg, J. M., Nichols, S., & Stich, S. (2001). Normativity and epistemic intuitions. Philosophical Topics, 29, 429-60.

Wiegmann, A., Okan, Y., Nagel, J., & Mangold, S. (2010). Order Effects in Moral judgment. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the Thirty-Second Annual Conference of the Cognitive Science Society (pp. 2111-2116). Austin, TX: Cognitive Science Society.

Williamson, T. (2007). The Philosophy of Philosophy. Oxford: Blackwell.

[1] There is no standard formulation of DDE, although many trace its origins to a passage from Thomas Aquinas (1988). Here, we follow Frances Kamm’s (2007) formulation of DDE. For a different formulation, see Quinn (1993). For the purposes of this paper, not much turns on which formulation is adopted.

[2] It might be worthwhile stipulating that if the five were not present, the trolley would not go around and hit the one, but would carry on harmless down the track.

[3] Of course, others have disagreed. See, e.g., Liao (2009), McMahan (2009) and Otsuka (2008). In fact, Thomson (2008) seems to have recently moved away from this position herself.

[4] In addition to being inspired by these studies, our study was also inspired by an informal order-effect study conducted by Otsuka (2008) that suggested that Loop case intuitions, in particular, are susceptible to order effects.

[5] Simon Cullen (2010) has argued that experiments that utilize complex or ambiguous vignettes often result in subjects’ misunderstanding the vignettes because they draw upon formal features of the experiment in order to determine the implicit meaning of the questions asked. Our study was designed to avoid such misunderstandings through clear text and diagrams. Furthermore, as will be described in the results section, we excluded subjects who clearly did not understand the cases from our subsequent statistical analyses.

[6] It might be tempting to suggest that the solution is simply to evaluate hypothetical cases in isolation. It is not clear that this would be possible, as Loop is a philosophers' example that was introduced, and has been continually discussed, in the context of other trolley cases such as the ones in this study. But even if it were possible, it is not clear why we should think that isolating cases would help. Evidence that we have one set of intuitions in one context and another set of intuitions in another context gives us no more reason to prefer decontextualized intuitions than it does to prefer the intuitions generated in one context over those generated in another. In fact, evidence that context influences intuitions about certain hypothetical cases gives us some reason to worry about those cases as it does to worry about context. Ideally, cases that are most suitable for philosophers should be ones that can elicit clear and stable intuitions across context. In general, instruments that are sensitive to the contexts in which they are used aren't likely to become any more reliable when used in isolation, even when contextual sensitivity is unwanted. Either the instrument is working properly, in which case isolation may decrease performance, or it is broken, in which case isolation can't ensure improved performance.