Machine Learning Bias in Predicting High School Grades: A Knowledge Perspective

Ricardo Costa-Mendes, Frederico Cruz-Jesus, Tiago Oliveira, Mauro Castelli


This study focuses on the machine learning bias when predicting teacher grades. The experimental phase consists of predicting the student grades of 11th and 12thgrade Portuguese high school grades and computing the bias and variance decomposition. In the base implementation, only the academic achievement critical factors are considered. In the second implementation, the preceding year’s grade is appended as an input variable. The machine learning algorithms in use are random forest, support vector machine, and extreme boosting machine. The reasons behind the poor performance of the machine learning algorithms are either the input space poor preciseness or the lack of a sound record of student performance. We introduce the new concept of knowledge bias and a new predictive model classification. Precision education would reduce bias by providing low-bias intensive-knowledge models. To avoid bias, it is not necessary to add knowledge to the input space. Low-bias extensive-knowledge models are achievable simply by appending the student’s earlier performance record to the model. The low-bias intensive-knowledge learning models promoted by precision education are suited to designing new policies and actions toward academic attainments. If the aim is solely prediction, deciding for a low bias knowledge-extensive model can be appropriate and correct.


Doi: 10.28991/esj-2021-01298

Full Text: PDF


Knowledge Bias; Bias And Variance Decomposition; Random Forest; Support Vector Regression; Precision Education; Academic Achievement.


Lu, Owen HT, Anna YQ Huang, Jeff CH Huang, Albert JQ Lin, Hiroaki Ogata, and Stephen JH Yang. "Applying learning analytics for the early prediction of Students' academic performance in blended learning." Journal of Educational Technology & Society 21, no. 2 (2018): 220-232. Available online: (accessed on May 2021).

Faulkner, Eric, Anke-Peggy Holtorf, Surrey Walton, Christine Y. Liu, Hwee Lin, Eman Biltaj, Diana Brixner, et al. “Being Precise About Precision Medicine: What Should Value Frameworks Incorporate to Address Precision Medicine? A Report of the Personalized Precision Medicine Special Interest Group.” Value in Health 23, no. 5 (May 2020): 529–539. doi:10.1016/j.jval.2019.11.010.

Youdell, Deborah. “Bioscience and the Sociology of Education: The Case for Biosocial Education.” British Journal of Sociology of Education 38, no. 8 (January 26, 2017): 1273–1287. doi:10.1080/01425692.2016.1272406.

Selzam, S, E Krapohl, S von Stumm, P F O’Reilly, K Rimfeld, Y Kovas, P S Dale, J J Lee, and R Plomin. “Predicting Educational Achievement from DNA.” Molecular Psychiatry 22, no. 2 (July 19, 2016): 267–272. doi:10.1038/mp.2016.107.

Lupton, Deborah, and Ben Williamson. “The Datafied Child: The Dataveillance of Children and Implications for Their Rights.” New Media & Society 19, no. 5 (January 23, 2017): 780–794. doi:10.1177/1461444816686328.

Sadowski, Jathan. “When Data Is Capital: Datafication, Accumulation, and Extraction.” Big Data & Society 6, no. 1 (January 2019): 205395171882054. doi:10.1177/2053951718820549..

Rességuier, Anaïs, and Rowena Rodrigues. “AI Ethics Should Not Remain Toothless! A Call to Bring Back the Teeth of Ethics.” Big Data & Society 7, no. 2 (July 2020): 205395172094254. doi:10.1177/2053951720942541.

Broussard, Meredith. "When algorithms give real students imaginary grades." The New York Times (2020).

Hatchuel, Armand, and Benoit Weil. "A new approach of innovative Design: an introduction to CK theory." In DS 31: Proceedings of ICED 03, the 14th International Conference on Engineering Design, Stockholm. 2003.

A. R. Jensen, “The g Factor: The Science of Mental Ability.” Choice Reviews Online 36, no. 04 (December 1, 1998): 36–2443–36–2443. doi:10.5860/choice.36-2443.

Georgiou, George K., Kan Guo, Nithya Naveenkumar, Ana Paula Alves Vieira, and J.P. Das. “PASS Theory of Intelligence and Academic Achievement: A Meta-Analytic Review.” Intelligence 79 (March 2020): 101431. doi:10.1016/j.intell.2020.101431.

Rohde, Treena Eileen, and Lee Anne Thompson. “Predicting Academic Achievement with Cognitive Ability.” Intelligence 35, no. 1 (January 2007): 83–92. doi:10.1016/j.intell.2006.05.004.

King, Ronnel B. "Gender differences in motivation, engagement and achievement are related to students' perceptions of peer—but not of parent or teacher—attitudes toward school." Learning and Individual Differences 52 (2016): 60-71. doi:10.1016/j.lindif.2016.10.006.

Francis, Becky, and Christine Skelton. “Reassessing Gender and Achievement” (November 22, 2005). doi:10.4324/9780203412923.

Lupart, Judy L., Elizabeth Cannon, and Jo Ann Telfer. “Gender Differences in Adolescent Academic Achievement, Interests, Values and Life‐role Expectations.” High Ability Studies 15, no. 1 (September 2004): 25–42. doi:10.1080/1359813042000225320..

Mensah, Fiona K., and Kathleen E. Kiernan. “Gender Differences in Educational Attainment: Influences of the Family Environment.” British Educational Research Journal 36, no. 2 (April 2010): 239–260. doi:10.1080/01411920902802198.

Di Fabio, Annamaria, and Lara Busoni. “Fluid Intelligence, Personality Traits and Scholastic Success: Empirical Evidence in a Sample of Italian High School Students.” Personality and Individual Differences 43, no. 8 (December 2007): 2095–2104. doi:10.1016/j.paid.2007.06.025.

Kuhfeld, Megan, Elizabeth Gershoff, and Katherine Paschall. “The Development of Racial/ethnic and Socioeconomic Achievement Gaps During the School Years.” Journal of Applied Developmental Psychology 57 (July 2018): 62–73. doi:10.1016/j.appdev.2018.07.001.

Perreira, Krista M., Kathleen Mullan Harris, and Dohoon Lee. “Making It in America: High School Completion by Immigrant and Native Youth.” Demography 43, no. 3 (August 1, 2006): 511–536. doi:10.1353/dem.2006.0026.

Qin, Desiree Baolian. "The Role of Gender in Immigrant Children's Educational Adaptation." Current Issues in Comparative Education 9, no. 1 (2006): 8-19. doi:10.1177/000312240807300507.

Lei, Jing, and Yong Zhao. “Technology Uses and Student Achievement: A Longitudinal Study.” Computers & Education 49, no. 2 (September 2007): 284–296. doi:10.1016/j.compedu.2005.06.013.

Salomon, Adi, and Yifat Ben-David Kolikant. “High-School Students’ Perceptions of the Effects of Non-Academic Usage of ICT on Their Academic Achievements.” Computers in Human Behavior 64 (November 2016): 143–151. doi:10.1016/j.chb.2016.06.024.

Kubey, Robert W., Michael J. Lavin, and John R. Barrows. “Internet Use and Collegiate Academic Performance Decrements: Early Findings.” Journal of Communication 51, no. 2 (June 1, 2001): 366–382. doi:10.1111/j.1460-2466.2001.tb02885.x.

Fan, Xitao, and Michael Chen. "Parental involvement and students' academic achievement: A meta-analysis." Educational psychology review 13, no. 1 (2001): 1-22. doi:10.1023/A:1009048817385.

Gilar-Corbi, Raquel, Pablo Miñano, Alejandro Veas, and Juan-Luis Castejón. “Testing for Invariance in a Structural Model of Academic Achievement Across Underachieving and Non-Underachieving Students.” Contemporary Educational Psychology 59 (October 2019): 101780. doi:10.1016/j.cedpsych.2019.101780.

Benner, Aprile D., Alaina E. Boyle, and Sydney Sadler. “Parental Involvement and Adolescents’ Educational Success: The Roles of Prior Achievement and Socioeconomic Status.” Journal of Youth and Adolescence 45, no. 6 (February 5, 2016): 1053–1064. doi:10.1007/s10964-016-0431-4.

Hill, Nancy E., and Stracie A. Craft. "Parent-school involvement and school performance: Mediated pathways among socioeconomically comparable African American and Euro-American families." Journal of educational psychology 95, no. 1 (2003): 74. doi:10.1111/j.0963-7214.2004.00298.x.

Sirin, Selcuk R. “Socioeconomic Status and Academic Achievement: A Meta-Analytic Review of Research.” Review of Educational Research 75, no. 3 (September 2005): 417–453. doi:10.3102/00346543075003417.

Papay, John P., Richard J. Murnane, and John B. Willett. “Income-Based Inequality in Educational Outcomes.” Educational Evaluation and Policy Analysis 37, no. 1_suppl (May 2015): 29S–52S. doi:10.3102/0162373715576364.

Steinmayr, Ricarda, Felix C. Dinger, and Birgit Spinath. "Parents’ education and children's achievement: The role of personality." European Journal of Personality 24, no. 6 (2010): 535-550. doi:10.1002/per.755.

Tesfagiorgis, Mussie, Samuel Tsegai, Tedros Mengesha, Jana Craft, and Mussie Tessema. “RETRACTED: The Correlation Between Parental Socioeconomic Status (SES) and Children’s Academic Achievement: The Case of Eritrea.” Children and Youth Services Review 116 (September 2020): 105242. doi:10.1016/j.childyouth.2020.105242.

Tomul, Ekber, and Havva Sebile Savasci. “Socioeconomic Determinants of Academic Achievement.” Educational Assessment, Evaluation and Accountability 24, no. 3 (May 2, 2012): 175–187. doi:10.1007/s11092-012-9149-3.

Hoxby, C. M. “The Effects of Class Size on Student Achievement: New Evidence from Population Variation.” The Quarterly Journal of Economics 115, no. 4 (November 1, 2000): 1239–1285. doi:10.1162/003355300555060.

Krueger, A. B. “Experimental Estimates of Education Production Functions.” The Quarterly Journal of Economics 114, no. 2 (May 1, 1999): 497–532. doi:10.1162/003355399556052.

Wößmann, Ludger, and Martin West. “Class-Size Effects in School Systems Around the World: Evidence from Between-Grade Variation in TIMSS.” European Economic Review 50, no. 3 (April 2006): 695–736. doi:10.1016/j.euroecorev.2004.11.005.

Leithwood, Kenneth, and Doris Jantzi. “A Review of Empirical Evidence About School Size Effects: A Policy Perspective.” Review of Educational Research 79, no. 1 (March 2009): 464–490. doi:10.3102/0034654308326158.

Gershenson, Seth, and Laura Langbein. “The Effect of Primary School Size on Academic Achievement.” Educational Evaluation and Policy Analysis 37, no. 1_suppl (May 2015): 135S–155S. doi:10.3102/0162373715576075.

Schneider, Mark. “Do School Facilities Affect Academic Outcomes?,” National Clearinghouse for Educational Facilities and Educational Resources Information Center, Washington DC, 2002.

Woolner, Pamela, Elaine Hall, Steve Higgins, Caroline McCaughey, and Kate Wall. “A Sound Foundation? What We Know About the Impact of Environments on Learning and the Implications for Building Schools for the Future.” Oxford Review of Education 33, no. 1 (February 2007): 47–70. doi:10.1080/03054980601094693.

Aaronson, Daniel, Lisa Barrow, and William Sander. “Teachers and Student Achievement in the Chicago Public High Schools.” Journal of Labor Economics 25, no. 1 (January 2007): 95–135. doi:10.1086/508733.

Rockoff, Jonah E. “The Impact of Individual Teachers on Student Achievement: Evidence from Panel Data.” American Economic Review 94, no. 2 (April 1, 2004): 247–252. doi:10.1257/0002828041302244..

Rivkin, Steven G., Eric A. Hanushek, and John F. Kain. “Teachers, Schools, and Academic Achievement.” Econometrica 73, no. 2 (March 2005): 417–458. doi:10.1111/j.1468-0262.2005.00584.x..

Wayne, Andrew J., and Peter Youngs. “Teacher Characteristics and Student Achievement Gains: A Review.” Review of Educational Research 73, no. 1 (March 2003): 89–122. doi:10.3102/00346543073001089.

Lee, Se Woong. “Pulling Back the Curtain: Revealing the Cumulative Importance of High-Performing, Highly Qualified Teachers on Students’ Educational Outcome.” Educational Evaluation and Policy Analysis 40, no. 3 (April 20, 2018): 359–381. doi:10.3102/0162373718769379.

Papamitsiou, Zacharoula K., and Anastasios A. Economides. "Learning analytics and educational data mining in practice: A systematic literature review of empirical evidence." J. Educ. Technol. Soc. 17, no. 4 (2014): 49-64. Available online: (accessed on May 2021).

Costa-Mendes, Ricardo, Tiago Oliveira, Mauro Castelli, and Frederico Cruz-Jesus. “A Machine Learning Approximation of the 2015 Portuguese High School Student Grades: A Hybrid Approach.” Education and Information Technologies 26, no. 2 (September 5, 2020): 1527–1547. doi:10.1007/s10639-020-10316-y.

Cruz-Jesus, Frederico, Mauro Castelli, Tiago Oliveira, Ricardo Mendes, Catarina Nunes, Mafalda Sa-Velho, and Ana Rosa-Louro. “Using Artificial Intelligence Methods to Assess Academic Achievement in Public High Schools of a European Union Country.” Heliyon 6, no. 6 (June 2020): e04081. doi:10.1016/j.heliyon.2020.e04081.

Miguéis, V.L., Ana Freitas, Paulo J.V. Garcia, and André Silva. “Early Segmentation of Students According to Their Academic Performance: A Predictive Modelling Approach.” Decision Support Systems 115 (November 2018): 36–51. doi:10.1016/j.dss.2018.09.001.

Musso, Mariel Fernanda, Carlos Felipe Rodríguez Hernández, and Eduardo C. Cascallar. "Predicting key educational outcomes in academic trajectories: a machine-learning approach." (2020). doi:10.1007/s10734-020-00520-7.

Mengash, Hanan Abdullah. “Using Data Mining Techniques to Predict Student Performance to Support Decision Making in University Admission Systems.” IEEE Access 8 (2020): 55462–55470. doi:10.1109/access.2020.2981905..

Sorensen, Lucy C. “‘Big Data’ in Educational Administration: An Application for Predicting School Dropout Risk.” Educational Administration Quarterly 55, no. 3 (September 27, 2018): 404–446. doi:10.1177/0013161x18799439.

Murphy, K. P., Machine Learning: A probabilistic perspective. MIT Press, (2012).

Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman, “The Elements of Statistical Learning: Data Mining, Inference, and Prediction” Second Edition. Springer, (2008).

Bergstra, James, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. "Algorithms for hyper-parameter optimization." Advances in neural information processing systems 24 (2011): 2546–2554.

Bergstra, James, and Yoshua Bengio. "Random search for hyper-parameter optimization." Journal of machine learning research 13, no. 2 (2012).

Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel et al. "Scikit-learn: Machine learning in Python." the Journal of machine Learning research 12 (2011): 2825-2830.

Breiman, Leo. “Bagging Predictors.” Machine Learning 24, no. 2 (August 1996): 123–140. doi:10.1007/bf00058655..

Amit, Yali, and Donald Geman. “Shape Quantization and Recognition with Randomized Trees.” Neural Computation 9, no. 7 (October 1, 1997): 1545–1588. doi:10.1162/neco.1997.9.7.1545.

Smola, A.J., Schölkopf, B., “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, (August 2004): 199–222. doi:10.1023/B:STCO.0000035301.49549.88.

Rivas-Perea, Pablo, Juan Cota-Ruiz, David Garcia Chaparro, Jorge Arturo Perez Venzor, Abel Quezada Carreón, and Jose Gerardo Rosiles. “Support Vector Machines for Regression: A Succinct Review of Large-Scale and Linear Programming Formulations.” International Journal of Intelligence Science 03, no. 01 (2013): 5–14. doi:10.4236/ijis.2013.31002..

C. Bishop, Christopher M. "Pattern recognition." Machine learning 128, no. 9 (2006).

R. E. Schapire, “The Boosting Approach to Machine Learning: An Overview,” in Nonlinear Estimation and Classification. Lecture Notes in Statistics, vol 171, D. D. Denison, M. H. Hansen, C. C. Holmes, B. Mallick, and B. Yu, Eds. New York: Springer, (2003): 149–171. doi:10.1007/978-0-387-21579-2_9.

Friedman, Jerome H. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29, no. 5 (October 1, 2001). doi:10.1214/aos/1013203451.

Chen, Tianqi, and Carlos Guestrin. “XGBoost.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (August 13, 2016). doi:10.1145/2939672.2939785.

Mehta, Pankaj, Marin Bukov, Ching-Hao Wang, Alexandre G.R. Day, Clint Richardson, Charles K. Fisher, and David J. Schwab. “A High-Bias, Low-Variance Introduction to Machine Learning for Physicists.” Physics Reports 810 (May 2019): 1–124. doi:10.1016/j.physrep.2019.03.001.

Efron, Bradley, and Trevor Hastie. Computer age statistical inference. Vol. 5. Cambridge University Press, 2016.

Tibshirani, Robert. "The lasso method for variable selection in the Cox model." Statistics in medicine 16, no. 4 (1997): 385-395. doi:10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3.

Domingos, Pedro. "A unified bias-variance decomposition." In Proceedings of 17th International Conference on Machine Learning, (2000): 231-238.

Geman, Stuart, Elie Bienenstock, and René Doursat. “Neural Networks and the Bias/Variance Dilemma.” Neural Computation 4, no. 1 (January 1992): 1–58. doi:10.1162/neco.1992.4.1.1.

Gordon, Diana F., and Marie Desjardins. “Evaluation and Selection of Biases in Machine Learning.” Machine Learning 20, no. 1–2 (1995): 5–22. doi:10.1007/bf00993472.

Caraffini, Fabio, and Anna V. Kononova. “Structural Bias in Differential Evolution: A Preliminary Study” (2019). doi:10.1063/1.5089972.

Li, Yi, and Nuno Vasconcelos. “REPAIR: Removing Representation Bias by Dataset Resampling.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019). doi:10.1109/cvpr.2019.00980.

Tommasi, Tatiana, Novi Patricia, Barbara Caputo, and Tinne Tuytelaars. “A Deeper Look at Dataset Bias.” Advances in Computer Vision and Pattern Recognition (2017): 37–55. doi:10.1007/978-3-319-58347-1_2.

McFarland, Daniel A, and H Richard McFarland. “Big Data and the Danger of Being Precisely Inaccurate.” Big Data & Society 2, no. 2 (December 1, 2015): 205395171560249. doi:10.1177/2053951715602495.

Full Text: PDF

DOI: 10.28991/esj-2021-01298


  • There are currently no refbacks.

Copyright (c) 2021 Ricardo Costa-Mendes, Tiago Oliveira, Mauro Castelli, Frederico Cruz-Jesus