The aim of the investigation was to explore the feasibility of AI technology, specifically ChatGPT, to generate exemplar responses for open-ended questions (OEQs) in physics. Primary objectives included:
Evaluate whether ChatGPT can consistently produce responses that align with the exam board criteria for excellent OEQ responses in physics.
Utilise the generated responses to lead pupil discussions on the key attributes of both exceptional and subpar OEQ responses, with the goal of enhancing understanding of OEQ structure and content.
ChatGPT, an AI model developed by OpenAI, specialises in interpreting and generating human-like natural language text responses. Trained on an extensive body of internet text, its objectives revolve around enhancing natural language understanding and processing while reflecting improved knowledge depth in its responses.
In Scottish Qualifications Authority physics exams, OEQs challenge pupils to apply physics knowledge to real-world scenarios.
OEQs are often demanding for both pupils and teachers. Pupils routinely struggle with initiating responses, judging the necessary depth of explanation, and keeping their commentary relevant to the question. OEQs lack exemplary answers in the marking instructions, which provide only general grading criteria (e.g., 3 marks for strong understanding, 2 for reasonable, and 1 for limited understanding). This can make it difficult for teachers, especially NQTs, to assess responses and apply consistent marking. Physics teachers will often spend considerable time crafting model responses to compensate for this absence of exemplar answers to share with their pupils.Six different OEQs from the National 5 curriculum, representing a range of topics was selected. Each was fed into ChatGPT multiple times, using the following prompts:
Original question.
Question + Learning Outcomes.
Question + Target Audience (i.e. this responses should be written with the knowledge of a 15-year-old physics student.)
Question + SQA Grading Criteria.
The responses were anonymously assessed by physics teachers at Robert Gordon’s College. Responses generated when ChatGPT was given the question only typically scored poorly (mean 1/3) due to a lack of N5-level concepts and irrelevant or poorly explained physics content.
However, when prompts specified learning outcomes or target audiences, the responses slightly improved (mean of 1.9 and 1.7 respectively), showing more references to course concepts but often containing repetitive information. The most successful responses, averaging a score of 2.6, occurred when marking instructions accompanied the question.
Nevertheless, all responses fell short by omitting mathematical relationships, formulas, and annotated diagrams—a crucial element in pupil explanations.The teachers then shared these AI-generated examples with their classes. Despite the responses being of average quality, they served as effective stimuli for meaningful learning discussions, enabling pupils to identify key characteristics of strong answers, such as including essential physics relationships, definitions of core concepts and, when appropriate, annotated diagrams.
This method streamlined educators' work by efficiently generating multiple examples for analysis, saving time for more focused teaching. While AI, especially ChatGPT, has promise in aiding education, it emphasised the importance of precise prompts and the continuous refinement of AI systems to cater to specific educational requirements.
The scenario in this case study is genuine and based upon real events and data, however its narration has been crafted by AI to uphold a standardised and clear format for readers.
Key Learning
ChatGPT proves adept at swiftly and effectively crafting open-ended question (OEQ) responses, particularly when guided by a more specific initial prompt. Although the generated responses varied in quality, they still served as valuable stimuli for pupils to delve into, discussing these with teachers and peers to enhance their grasp of what defines an exceptional response.
Risks
Class teachers should thoroughly review the generated responses before presenting them to their pupils. This step is crucial to minimise the risk of misconceptions or inaccuracies, especially concerning physics concepts, which could confuse the class. In cases where responses are generated during lesson time, engaging in a comprehensive discussion about response quality becomes vital. This allows for the identification and discussion of any errors to ensure clarity and accurate understanding.