Skip to main content

Make Thinking Visible Again: How to Teach in an Age of Instant Answers

Make Thinking Visible Again: How to Teach in an Age of Instant Answers

Picture

Member for

6 months 3 weeks
Real name
Natalia Gkagkosi
Bio
Natalia Gkagkosi writes for The Economy Research, focusing on Economics and Sustainable Development. Her background in these fields informs her analysis of economic policies and their impact on sustainable growth. Her work highlights the critical connections between policy decisions and long-term sustainability.

Modified

AI doesn’t make students “dumber”; low-rigor, answer-only tasks do
Redesign assessments for visible thinking—cold starts, source triads, error analysis, brief oral defenses
Legalize guided AI use, keep phones out of instruction, and run quick A/B pilots to prove impact

The most alarming number in today's classrooms is 49. Across OECD countries in PISA 2022, students who spent up to an hour a day on digital devices for leisure scored 49 points higher in math than those who were on screens five to seven hours a day. This gap approaches half a standard deviation, even after accounting for background differences. At the same time, one in four U.S. teens, or 26%, now uses ChatGPT for schoolwork. This is double the percentage from 2023. In simple terms, students are losing focus as finding answers has become effortless. When a system makes it easier to get a decent sentence than to think deeply, we can expect students to rely on it. The problem is not that kids will interact with an intelligent machine; it's that schoolwork often demands so little from them when such a machine is around.

The Wrong Debate

We often debate whether AI will make kids "dumber." This question overlooks the real issue. When a task can be completed by simply pasting a prompt into a chatbot, it is no longer a thinking task. AI then becomes a tool for cognitive offloading—taking over recall, organization, and even judgment. In controlled studies, people often rely too much on AI suggestions, even when contradictory information is available; the AI's advice overshadows essential cues. This behavior isn't due to children's brains but reflects environments that prioritize speed and surface-level accuracy over reasoning, evidence, and revision. To address this, we need to focus less on blocking tools and more on rebuilding classroom practices so that visible thinking—clear reasoning that is visible, open to questioning, and assessable—becomes the easiest option.

We also need to acknowledge the changes in the information landscape. When Google displays an AI summary, users click on far fewer links; many stop searching entirely after reading the summary. This "one-box answer" habit transfers to schoolwork. If a single synthetic paragraph seems trustworthy, why bother searching for sources, comparing claims, or testing counterexamples? Students are not lazy; they respond rationally to incentives. If the first neat paragraph gets full credit, the demand for messy drafts and thoughtful revisions goes away—exactly where most learning happens. We cannot lecture students out of this pattern while keeping assessments that encourage it. We must change the nature of the work itself.

Figure 1: Usage doubled in two years—if tasks reward quick answers, more students will choose AI shortcuts.

The phone debate serves as a cautionary tale. Bans can minimize distractions and improve classroom focus. The Netherlands has reported better attention after a national classroom ban, and England has provided guidance for stricter policies. However, research on the relationship between grades and mental health is mixed. Reviews show no consistent academic increase from blanket bans. To put it simply, reducing noise is helpful, but it doesn't enhance learning. A school can ban phones, yet still assign tasks that a chatbot can complete in 30 seconds. If we only focus on control, we may temporarily address symptoms while leaving the main learning problem—ask-and-paste assignments—untouched.

Build Cognitive Friction

The goal is not to make school anti-technology; it's to make it pro-thinking. Start by creating cognitive friction—small, deliberate hurdles that make getting unearned answers challenging and earned reasoning rewarding. One method is implementing "cold-start time": the first ten minutes of a task should involve handwritten or whiteboard notes capturing a plan, a claim, and two tests that could disprove it. AI can support brainstorming later, but students must first present their foundation. During a pilot of this approach in math and history departments last year (about 180 students across two grade bands), teachers noted fewer indistinguishable responses and richer class discussions. Note: since there was no control group, these results should be seen as suggestive; we tracked argument quality based on rubrics, not final grades. Available research supports this change: meta-analyses indicate positive effects of guided AI on performance while cautioning against unguided reliance. The takeaway is to design effectively, not deny access.

Figure 2: When an AI summary appears, users click sources roughly half as often—another reason assignments must require visible evidence checks.

Next, create prompts that are complex for AI and easier for humans. Ask students for an answer plus their rationale: what three counterclaims they considered and the evidence they used to dismiss them. Require students to utilize triads of sources—two independent sources that don't cite each other and one dataset—and then ask them to reconcile any differences in the data. In science, place more weight on error analysis than on the final answer; in writing, assess the revision memo that explains what changed and why. In math, have students verbally defend a solution for two minutes to a randomly chosen "what if" scenario (like changing a parameter or inverting an assumption). These strategies make thinking visible and turn simple answer-chasing into a losing tactic. They also align with the field's direction: the OECD's "Learning in the Digital World" pilot for PISA 2025 emphasizes computational thinking and self-regulated learning—skills that will endure despite the rise of automation.

Finally, use AI as a counterpoint. Provide every student with the same AI-generated draft and assess their critiques: what is factually correct, what is plausible but lacking support, what is incorrect, and what is missing? The psychology at play is significant. Research shows that people often over-trust AI suggestions. Training students to identify systematic errors—incorrect citations, flawed causal links, hidden assumptions—fosters a healthy attitude of trust but verify. Teachers can facilitate this with brief checklists and disclosure logs: if students use AI, they must provide the relevant content and explain how they verified the claims. Note: This approach maintains academic integrity without requiring punitive oversight and can be implemented on a large scale. As districts increase AI training for teachers, the ability to implement these routines is rapidly improving.

Policy Now, Not Panic

At the system level, the principle should be freedom to use, yet the duty to demonstrate. Schools need clear policies: students may use generative tools as long as they can show how the answer was derived in a way that is traceable and partially offline. This requires rubrics that reward planning work, oral defenses, and revision notes, along with usage disclosures that protect privacy while ensuring transparency. UNESCO's 2023 global review states clearly: technology can help with access and personalization, but can also harm learning if it substitutes for essential tasks; governance and teaching should take the lead. A policy that allows beneficial uses while resisting harmful ones is more sustainable than outright bans. It also views students as learners to be nurtured, not issues to be managed.

Regarding phones, aim for a managed quiet rather than a panicked response. Research shows distractions are common and linked to worse outcomes; structured limitations during instructional time are justifiable. However, complete bans should be accompanied by redesigned assessments; otherwise, we may applaud compliance while critical skills lag. The OECD's 2024 briefs offer valuable insights: smartphone bans can reduce interruptions; however, the effectiveness of learning improvements depends on enforcement and effective teaching methods. Countries are making changes: the Netherlands has tightened classroom rules and reports better focus, while England has formalized schools' powers to limit and confiscate phones when necessary. Districts should implement effective strategies—clear rules and consistent enforcement—while creating lessons that make phones and chatbots irrelevant to grades because those make grades dependent on reasoning.

We also need prompt evidence, not years of discussion. School networks can conduct simple A/B tests in a single term: one group of classes adopts the cognitive-friction strategies (cold starts, triads, oral defenses, AI critiques), while another group continues with existing methods; compare reasoning and retention based on rubrics one month later. Note: keep the stakes low, pre-register metrics, and maintain intact classes to prevent contamination. Meanwhile, state agencies should fund updates to assessments—developing AI-resistant prompts and scoring guidelines along with teacher training. The good news is that we aren't starting from scratch; controlled studies and meta-analyses have already shown that guided AI can enhance performance by improving feedback and revision cycles. Our task is to tie these gains to judgment habits rather than outsourcing.

Anticipating potential pushback is essential. Some may argue that any friction is unfair to students who struggle with writing fluency or multilingual learners who need AI support. The key is to distinguish between support and shortcuts. Allow AI for language clarity and brainstorming outlines, but evaluate argument structure—claim, evidence, warrant—through work that cannot be pasted. Others might say oral defenses are logistically demanding. They don't have to be: two-minute "micro-vivas" at random points, once per unit, scored on a four-point scale, reveal most superficial work with minimal time commitment. A third concern is that strict phone rules could affect belonging or safety. In this case, policies should be narrow and considerate: phones should be off during instruction but accessible at lunch and after school, with exceptions made for documented medical needs. The choice is not between total freedom and strict control. It is about creating classrooms that focus on visible thinking rather than just submitted text.

What about the argument that AI makes us think less? The evidence is mixed by design: in higher education, AI tutors and feedback tools often improve performance when integrated into lessons. At the same time, experiments reveal that people can overtrust confident but incorrect AI advice, and reviews highlight risks to critical thinking due to cognitive offloading. Both outcomes can coexist. The pivotal factor is the design of the task. If a task requires judgment across conflicting sources, tests a claim against data, and demands a transparent chain of reasoning, AI becomes an ally rather than a substitute. If the task asks for a neat paragraph that any capable model can produce, outsourcing wins. Our policy should not aim to halt technology. It should increase the cost of unearned answers while decreasing the effort needed for earned ones.

A final note on evidence: PISA correlations do not establish causation, but the direction and magnitude of these associations—especially after adjusting for socioeconomic factors—match what teachers observe: increased leisure screen time at school and peer device use correlate with weaker outcomes and reduced attention. Conversely, structured technology use, including teacher-controlled AI tutors, can be beneficial. The reasonable policy response is to minimize ambient distractions, ensure visible reasoning, and use AI for feedback rather than answers. This framework is now implementable and can be audited by principals, making it understandable to families.

Return to the number 49. It does not suggest "ban technology." It indicates that as leisure screen time in school increases, measured learning decreases. It shows that attention is limited and fragile. With a world filled with instant answers, we risk short-circuiting the very skills—argument, analysis, revision—that education aims to strengthen. The solution is attainable. Make thinking the only route to earning grades through cold starts, counterclaims, source triads, error analysis, and brief oral defenses. Use AI where it accelerates feedback, but requires students to demonstrate their reasoning process, not just deliver their conclusions. Keep phones out during instruction unless they serve the task; create policies that permit beneficial uses and disclose them. If we do this, the machine will no longer serve as a crutch for weak prompts but as a tool that clarifies strong thinking. In a decade, let the takeaway be not that AI made students "dumber," but that schools became more thoughtful about what they ask young minds to achieve.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Campbell, M. et al. (2024). Evidence for and against banning mobile phones in schools: A scoping review. Journal of Children's Services Research.
Department for Education (England). (2024). Mobile phones in schools: Guidance.
Deng, R., Benitez, J., & Sanz-Valle, R. (2024). Does ChatGPT enhance student learning? A systematic review. Computers & Education.
EdWeek (reporting RAND). (2025, Apr. 8). More teachers than ever are trained on AI—are they ready to use it?
Klingbeil, A. et al. (2024). Trust and reliance on AI: An experimental study. Computers in Human Behavior.
OECD. (2024a). Students, digital devices and success. OECD Education and Skills.
OECD. (2024b). Technology use at school and students' learning outcomes. OECD Education Spotlights.
Pew Research Center. (2025, Jan. 15). About a quarter of U.S. teens have used ChatGPT for schoolwork—double the share in 2023.
Pew Research Center. (2025, July 22). Google users are less likely to click on links when an AI summary appears in the results.
Reuters. (2025, July 4). Study finds smartphone bans in Dutch schools improved focus.
UNESCO. (2023). Global Education Monitoring Report 2023: Technology in education—A tool on whose terms?
Wang, J. et al. (2025). The effect of ChatGPT on students' learning performance: A meta-analysis. Humanities and Social Sciences Communications.
Zhai, C. et al. (2024). The effects of over-reliance on AI dialogue systems on student learning: A systematic review. Smart Learning Environments.

Picture

Member for

6 months 3 weeks
Real name
Natalia Gkagkosi
Bio
Natalia Gkagkosi writes for The Economy Research, focusing on Economics and Sustainable Development. Her background in these fields informs her analysis of economic policies and their impact on sustainable growth. Her work highlights the critical connections between policy decisions and long-term sustainability.

The Case for a Contestable “Politician Scan”: Designing AI That Even the Loser Can Trust

The Case for a Contestable “Politician Scan”: Designing AI That Even the Loser Can Trust

Picture

Member for

11 months 2 weeks
Real name
David O'Neill
Bio
David O’Neill is a Professor of Finance and Data Analytics at the Gordon School of Business, SIAI. A Swiss-based researcher, his work explores the intersection of quantitative finance, AI, and educational innovation, particularly in designing executive-level curricula for AI-driven investment strategy. In addition to teaching, he manages the operational and financial oversight of SIAI’s education programs in Europe, contributing to the institute’s broader initiatives in hedge fund research and emerging market financial systems.

Modified

AI scans simplify elections but risk bias
Clear rules and provenance reduce errors
With oversight, even losers can trust them

The largest election year ever recorded coincides with the most persuasive media technology in history. In 2024, around half of humanity—about 3.7 billion people—will be eligible to vote in more than 70 national elections. At the same time, a 2025 study in Nature Human Behaviour found that a leading large-language model was more persuasive than humans in debate-style exchanges 64% of the time, particularly when it tailored arguments using minimal personal data. Simply put, more people are voting while the cost of influence decreases and accelerates. An “AI-assisted politician scan” that summarizes candidates’ positions, records, and trade-offs seems inevitable because it significantly reduces the time required to understand politics. However, design choices—such as what gets summarized, how sources are weighted, and when the system refuses to respond—quietly become policy. The crucial question is not whether these tools will emerge, but whether we can design them fairly enough that even losing candidates would accept them.

Why an “AI Politician Scan” Is Inevitable—and Contested, But Promising

When voters confront many issues and candidates, any system that makes it easier to find, understand, and compare information will gain traction. Early evidence highlights this point. In a randomized study of California voters during the November 2024 ballot measures, a chatbot based on the official voter guide improved accuracy on detailed policy questions by 18% and reduced response time by about 10% compared to a standard digital guide. The presence of the chatbot also encouraged more information seeking; users were 73% more likely to explore additional propositions with its help. These are significant improvements as they directly address the obstacles that prevent people from reading long documents in the week leading up to a vote. However, the same study found no clear impact on voter turnout or direction of votes, and knowledge gains did not last without continued access—a reminder that convenience alone does not guarantee better outcomes.

The challenge lies in reliability. Independent tests during the 2024 U.S. primaries revealed that general-purpose chatbots answered more than half of basic election-administration questions incorrectly, with around 40% of responses deemed harmful or misleading. Usability research warns that when people search for information using generative tools, they may feel efficient but often overlook essential verification steps—creating an environment where plausible but incorrect answers can flourish. Survey data reflects public caution: by mid-2025, only about one-third of U.S. adults reported ever using a chatbot, and many expressed low trust in election information from these systems. In short, while AI question-answering can increase access, standard models are not yet reliable enough for essential civic information without strict controls.

Figure 1: On in-depth items, accuracy rises by ~18 pp while time to answer falls by ~10%; basic items show little or negative change—gains concentrate where comprehension is hardest.

Fairness adds another layer of concern. A politician scan may intend to be neutral, but how it ranks, summarizes, and refuses information can influence interpretation. We have seen similar cases before; even traditional search-engine ranking adjustments can sway undecided voters by significant margins in lab conditions, and repeated experiments confirm this effect is real. If a scan highlights one candidate’s actual votes while presenting another’s aspirational promises—or if it disproportionately represents media coverage favoring incumbents or “front-runners”—the tool gains agenda-setting power. The solution is not to eliminate summarization; rather, it is to recognize design as a public interest issue where equal exposure, source balance, and contestability are top priorities, rather than simply focusing on user interface aesthetics.

Figure 2: One-off exposure yields no lasting knowledge gains; improvements appear only with week-long use (≈+9–12 on follow-up indices), underscoring the need for sustained, cited access.

Design Choices Become Policy

Three design decisions shape whether a scan serves as a voter aid or functions as an unseen referee: provenance, parity, and prudence. Provenance involves grounding information. Systems that quote and link to authoritative sources—such as legal texts, roll-call votes, audited budgets, and official manifestos—reduce the risk of incorrect information and are now standard practice in risk frameworks. The EU’s AI Act mandates clear labeling of AI-generated content and transparency for chatbots, while Spain has begun enforcing labeling with heavy fines. The Council of Europe’s 2024 AI treaty includes democratic safeguards that apply directly to election technologies. Together, these developments point to a clear minimum: scans should prioritize cited, official sources; display those sources inline; and indicate when the system is uncertain, rather than attempting to fill in gaps.

Parity focuses on balanced information exposure. Summaries should be created using matched templates for each candidate: the same fields in the same order, filled with consistent documentation levels. This means parallel sections for “Voting Record,” “Budgetary Impact,” “Independent Fact-Checks,” and “Conflicts/Controversies,” all based on sources of equal credibility. It also requires enforcing “balance by design” in ranking results. When a user requests a comparison, the scan should show each candidate's stance along with their evidentiary basis side-by-side, rather than listing one under another based on popularity bias. Conceptually, this approach treats the tool like a balanced clinical trial for information, with equal input, equal format, and equal outcome measures. Practically, this strategy reduces subtle amplification effects—similar to how minimizing biased rankings in search reduces preference shifts among undecided users.

Prudence pertains to the risks of persuasion and data usage. The current reality is that large language models can argue more effectively than individuals at scale, particularly with even minimal personalization. This means that targeted persuasion through a politician scan poses a real risk rather than a theoretical one. One potential solution is a “no-personalization” rule for political queries: the scan can adjust based on issues (showing more fiscal details to users focused on budgeting) but not by demographics or inferred voting intentions. Another solution is to implement an “abstain-when-uncertain” policy. Suppose the system cannot reference an official source or resolve discrepancies between sources. In that case, it should pause and direct the user to the proper authoritative page—like the election commission or parliament database—rather than guess. A third option is to log and review factual accuracy. Election officials or approved auditors should track aggregate metrics—percentage of answers with official citations, rate of abstentions, and correction rate after review—so the scan remains accountable over time rather than just during a pre-launch assessment.

A Compact Even the Losing Candidate Can Accept

What would lead a losing candidate to find the tool fair? A credible agreement with four main components: symmetrical inputs, visible provenance, contestable outputs, and independent oversight. Symmetrical inputs imply that every candidate's official documents are processed to the same depth and updated on the same schedule, with a public record that any campaign can verify. Visible provenance requires that every claim link back to a specific clause, vote, or budget line; where no official record exists, the scan should indicate that and refrain from speculating. Contestable outputs allow each campaign to formally challenge inaccuracies when the scan misstates a fact, ensuring timely corrections and a public change log. Independent oversight involves an election authority or accredited third party conducting continuous tests—using trick questions about registration deadlines, ballot drop boxes, or eligibility rules that have previously caused issues—and publicly reporting the success rate every month during the election period. This transforms “trust us” into “trust, but verify.”

None of this is effective without boundaries on persuasion and targeted messaging. A politician's scan should strictly provide information, not motivation. This entails no calls to action tailored to the user’s identity or inferred preferences, no creation of campaign slogans, and no ranking changes based on a visitor’s profile. If a user asks, “Which candidate should I choose if I value clean air and low taxes?” the scan should offer traceable trade-offs and historical voting records rather than suggest a preferred choice—especially because modern models can influence opinions even when labeled as “AI-generated.” Some regions are already adopting this perspective through transparency and anti-manipulation measures in the EU’s AI regulations, national efforts on labeling enforcement, and an emerging international treaty outlining AI duties in democratic settings. A responsible scan will operate within these guidelines by default, viewing them as foundational rather than as compliance tasks to be corrected later.

Finally, prudence involves acknowledging potential errors. Consider a national voter-information portal receiving one million scan queries in the fortnight before an election. If unrestricted chatbots fail on 50% of election-logistics questions in stress tests, but a grounded scan lowers that error rate to around 5% (a conservative benchmark based on the performance gap between general-purpose and specialized models), that still results in thousands of incorrect answers unless the system also avoids mistakes when uncertain, directs users to official resources for procedural questions, and quickly implements corrections. The key point is clear: scale magnifies small error rates into significant civic consequences. The only reliable solutions are constraints, pausing when unsure, and transparency, not just clever engagement strategies.

The initial facts will remain: billions of voters, and AI capable of out-arguing us when personalization is involved. The choice lies not between an AI-assisted politician scan and the current situation; it is between a carefully regulated tool and countless unaccountable summaries. The positive aspect is that we already understand the practical elements. Grounding in official sources enhances depth without apparent bias—consistency in templates and ranking curbs subtle influences. Pausing and ensuring provenance reduces errors. Regular testing and public metrics enable administrators to monitor quality in real time. When these elements become integrated into procurement and implementation, the scan shifts from being an unseen editor of democracy to a public service—one that even a losing candidate can agree to as a necessary step to clarify elections for busy citizens. The world's largest election cycle will not be the last; if we want the next one to be fairer, we should implement this agreement now and measure it as if our votes depend on it.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Associated Press (2024). Chatbots’ inaccurate, misleading responses about U.S. elections threaten to keep voters from polls. April 16, 2024.
Council of Europe (2024). Framework Convention on Artificial Intelligence and human rights, democracy and the rule of law. Opened for signature Sept. 5, 2024.
Epstein, R., & Robertson, R. E. (2015). The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences.
European Parliament (2025). EU AI Act: first regulation on artificial intelligence (overview). Feb. 19, 2025.
Gallegos, I. O., Shani, C., Shi, W., et al. (2025). Labeling Messages as AI-Generated Does Not Reduce Their Persuasive Effects. arXiv preprint.
Nielsen Norman Group (2023). Information Foraging with Generative AI: A Study of 3,000+ Conversations. Sept. 24, 2023.
NIST (2024). Generative AI Profile: A companion to the AI Risk Management Framework (AI RMF 1.0). July 26, 2024.
Pew Research Center (2025). Artificial intelligence in daily life: Views and experiences. April 3, 2025.
Salvi, F., et al. (2025). On the conversational persuasiveness of GPT-4. Nature Human Behaviour. May 2025.
Stanford HAI (2024). Artificial Intelligence Index Report 2024.
UNDP (2024). A “super year” for elections: 3.7 billion voters in 72 countries. 2024.
VoxEU/CEPR—Ash, E., Galletta, S., & Opocher, G. (2025). BallotBot: Can AI Strengthen Democracy? CEPR Discussion Paper 20070; and working paper PDF.
Reuters (2025). Spain to impose massive fines for not labelling AI-generated content. March 11, 2025.
Robertson, R. E., et al. (2017). Suppressing the Search Engine Manipulation Effect (SEME). ACM Conference on Web Science.
Time (2023). Elections around the world in 2024. Dec. 28, 2023 (context on global electorate share).
Washington Post (2025). AI is more persuasive than a human in a debate, study finds. May 19, 2025.

Picture

Member for

11 months 2 weeks
Real name
David O'Neill
Bio
David O’Neill is a Professor of Finance and Data Analytics at the Gordon School of Business, SIAI. A Swiss-based researcher, his work explores the intersection of quantitative finance, AI, and educational innovation, particularly in designing executive-level curricula for AI-driven investment strategy. In addition to teaching, he manages the operational and financial oversight of SIAI’s education programs in Europe, contributing to the institute’s broader initiatives in hedge fund research and emerging market financial systems.