Customer Service: 11:00 - 19:00 Teaching Hours by Appointment Contact Us Access Events

Open Source Software For Analysis And Correlation Of Reading Patterns With Superior Sat Scores Using Eye Gaze Tracking Device


An open-source application using a gaze tracking device for investigating a correlation between reading patterns and superior Scholastic Aptitude Test (SAT) reading scores is presented. An eye gaze tracking device tracks where the reader is looking across the screen and provides the coordinates of the gaze. The collected data enables us to determine reading patterns and times spent looking at specific sections of the screen and analyze them. The statistical analysis shows that the ratio of time spent on reading the passage and time spent on reading the questions has the highest correlation with number of correct answers. The software is released under the GNU General Public License and is freely available for all educators.


gaze-tracking device; eye gaze; reading patterns; superior Scholastic Aptitude Test (SAT) ;


We would like to thank the Tokyo Academics staff, and Students who assisted in this project.

The SAT is a standardized test that many American students choose to take in order to enroll in University. Students want to know what methods are the most effective in order to achieve their highest potential score. Previous analysis of the SAT’s reading section has been through post-test surveys, which are highly subjective and depend completely on the students’ biased answers [1]. A more objective analysis of SAT reading sections should be conducted in order to provide students with the most reliable techniques for reading various passages.Eye gaze tracking technology has been developed recently and made readily available to average consumer. The device tracks the eye gaze with a camera and high-resolution infrared sensor. The data collection method is very simple, nonintrusive, and objective. By using this non-intrusive method we were able to collect data but still have the test-taking scenario as accurate as possible. Fig. 1 shows the demonstration of the technology.Our method uses an eye gaze tracking device in order to collect and store data on reading patterns when students are taking the SAT reading test. Unbiased data collected from the eye-tracking device allows us to analyze and find different reading patterns. Correlation analysis between reading patterns and raw scores can uncover the most effective and efficient way to read a SAT reading passage. Our findings reveal that some reading patterns (e.g. ratio of time spent on reading questions and the ratio of time spent on reading the passage) have a significant impact on SAT scores while other patterns (e.g. total amount of switches between the passage and the questions) have little to no impact on performance for the SAT reading section. Our application is open-source and is available to all educators who are interested in a more objective SAT analysis. The source code is available on Github:

Related Work
Traditionally the way College Board would collect data about the test is through biased and qualitative post test surveys. This has created a very unreliable and hard to analyze data set. Prior to our research there was no unbiased, nonintrusive and statistical method to evaluate reading patterns on a SAT. Our method aims to give students a more comprehensive and detailed way to better improve their test scores. Moreover, our open-source code gives teachers around the world with varying backgrounds a chance to analyze his or her students relative to their experience.

Approach and methodology
For our research we used an EyeTribe eye-tracking device in order to detect gaze patterns from the test subjects. The biggest screen an EyeTribe eye-tracker can accurately work on is a 24inch monitor. Our code is optimized for a 24inch monitor because with the biggest screen the points of where the subject is looking on the screen can be as clear as possible.Data Collection Method To collect our data we use Khan Academy’s practice SAT reading passages as the environment [4], where the passage is on the left side and the questions are on the right side. Students are required to take the test online while the Eye-Tribe device is tracking their eyes. The coordinates of where the students are looking are stored in a CSV file for us to analyze. With the 24inch monitor, the coordinate goes from (0,0) to (600,1024). The Eye-Tribe device is capable of recording 30 coordinates in a second.Figure two shows the visualization of a test takers eye gaze on the Khan Academy interface.

Fig 2. Heat map of eye gazes on Khan Academy’s SAT passage test interfaceIn addition, we take note of number of correct answers, incorrect answers, unanswered questions, and other various features of the test subjects like ethnicity, gender, age, etc. Table 1. shows an example of our ground truth data which will be used for our statistical models.

Table 1. Example of ground truth data Analysis method From the collected data, we extract 12 reading pattern features in order to then compare them with the number of questions answered correctly. The list of features and their explanation is presented in Table 2.Table 2. List of features and their explanation

After extracting the 12 features, we then calculate the correlation of each above feature with the number of correct answer in order to find which features have the biggest impact. The correlation coefficient is ranges from -1 to 1. The closer the correlation coefficient gets to 1 or -1 the stronger the correlation between said feature and the number of correct answers is. We want to find which features have the most impact to see how we can maximize potential SAT reading passage scores. The correlation coefficient between features and number of correct answers is calculated by Pearson correlation formula [2].

  • r is the correlation coefficient
  • n is total number of data
  • x is the feature vector, x̅ is the average of x
  • y is the correct answers vector, ȳ is the average of y
  • Sx is the summation of x
  • Sy is the summation of y

Experiment Settings

Results Overview
We found that if a student spends more time reading questions then reading the passage, the performance was better (Fig. 3); however, if a student switch a lot between reading passage and reading questions, it does not improve his performance. Currently, a lot of data is skewed because of a small data set but we are working on collecting more to have the most accurate
results for what features matter most.

Discussion and conclusions

The purpose of the research was to provide a more quantifiable data set for reading standardized tests, specifically the SAT, and use this data to then help improve test takers efficacies to improve scores. The benefit of our approach is that it is easy, quantifiable and can be used on type of student. Our program is an open source program that is available to all who are interested in conducting objective SAT reading data. From our research we have found that the switch count and speed of reading have the lowest negative correlation and the total time and time spent reading the passage at the beginning have the highest positive correlation. Meaning that reading the passage first and the questions while minimizing switches and maximizing time given is the most effective way to read a SAT passage to maximize your score. To better improve our research we want to collect more data from a more diverse population to verify our analysis more objectively and use a statistical test for those with high score and those with low score in order to show strong evidence on the impact of SAT reading patterns.