Beyond Course Averages: A Generalized Bayesian Hierarchical Framework for Course-Level Learning Evaluation

Vicente Montano; Archie Reyes

doi:10.17309/jltm.2026.7.1.04

Автор(и)

Vicente Montano Університет Мінданао https://orcid.org/0000-0001-9117-568X
Archie Reyes Університет Мінданао https://orcid.org/0009-0005-7443-3022

DOI:

https://doi.org/10.17309/jltm.2026.7.1.04

Ключові слова:

байєсівське ієрархічне моделювання, багаторівневий аналіз, курсовий рівень оцінювання, нестабільність малих вибірок, освітні вимірювання

Анотація

Обґрунтованість. Оцінювання результатів навчання на рівні курсів у вищій освіті часто базується на порівнянні середніх показників, що передбачає незалежність курсів та однакову надійність оцінок. За умов малих і нерівномірних контингентів така практика призводить до статистичної нестабільності та перебільшення крайніх значень, ускладнюючи інтерпретацію курсових відмінностей.

Мета. Метою дослідження є обґрунтування узагальненої методологічної рамки застосування байєсівського ієрархічного моделювання (Bayesian hierarchical modeling, BHM) для оцінювання результатів навчання на рівні курсів з урахуванням невизначеності та нерівномірності вибірок.

Матеріали і методи. У дослідженні використано ієрархічну байєсівську модель, у якій результати навчання студентів моделюються на індивідуальному рівні з урахуванням їх належності до конкретних курсів, що відображає багаторівневу організацію освітніх даних. Модель передбачає декомпозицію загальної дисперсії на внутрішньокурсову та міжкурсову складові з оцінюванням курсових ефектів на основі апостеріорних розподілів. Для зменшення спотворень, зумовлених малими обсягами вибірок, застосовано механізм часткового пулінгу. Як емпіричну ілюстрацію використано знеособлені дані про результати навчання 279 студентів у 22 курсах.

Результати. Показано, що наївні порівняння курсів за середніми значеннями систематично перебільшують крайні оцінки за малих обсягів вибірок, формуючи нестабільні та потенційно хибні висновки. Застосування ієрархічного байєсівського підходу з частковим пулінгом суттєво знижує штучну екстремальність оцінок і водночас зберігає структурно обґрунтовані міжкурсові відмінності.

Висновки. Запропонована методологічна рамка забезпечує статистично обґрунтовану альтернативу описовому агрегуванню та ранжуванню курсів, орієнтуючи оцінювання результатів навчання на ймовірнісну структурну інтерпретацію з урахуванням невизначеності.

Завантаження

Дані завантаження ще не доступні.

Біографії авторів

Vicente Montano, Університет Мінданао

Кафедра управління людськими ресурсами, Коледж бізнес-адміністрування та освіти, вул. Болтон, 8000, м. Давао, Філіппіни

Archie Reyes, Університет Мінданао

Кафедра управління людськими ресурсами, Коледж бізнес-адміністрування та освіти, вул. Болтон, 8000, м. Давао, Філіппіни

Посилання

Anwar, M.A., Ahmed, N., & Al Ameen, A.M. (2012). An Outcome-Based Assessment and Improvement System for Measuring Student Performance and Course Effectiveness. Contemporary Issues in Education Research, 5(4), 279-294. https://doi.org/10.19030/cier.v5i4.7272 DOI: https://doi.org/10.19030/cier.v5i4.7272

Cabrera, A.F., Colbeck, C.L., & Terenzini, P.T. (2001). Developing performance indicators for assessing classroom teaching practices and student learning. Research in higher education, 42(3), 327-352. https://doi.org/10.1023/A:1018874023323 DOI: https://doi.org/10.1023/A:1018874023323

Hristov, S., Nakov, D., & Miočinović, J. (2023). Constructive alignment between objectives, teaching and learning activities, student competencies and assessment methods in higher education. Journal of Agriculture and Plant Sciences, 21(2), 21-36. https://doi.org/10.46763/JAPS23212021h DOI: https://doi.org/10.46763/JAPS23212021h

Lewis, E. (2021). Best practices for improving the quality of the online course design and learners experience. The Journal of Continuing Higher Education, 69(1), 61-70. https://doi.org/10.1080/07377363.2020.1776558 DOI: https://doi.org/10.1080/07377363.2020.1776558

Kennedy, D. (2008). Linking Learning Outcomes and Assessment of Learning of Student Science Teachers. Science Education International, 19(4), 387-397. https://eric.ed.gov/?id=EJ890648&utm_source=chatgpt.com

Button, K.S., Ioannidis, J.P., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S., & Munafò, M.R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature reviews neuroscience, 14(5), 365-376. https://doi.org/10.1038/nrn3475 DOI: https://doi.org/10.1038/nrn3475

Anderson, S.F., & Maxwell, S.E. (2017). Addressing the “replication crisis”: Using original studies to design replication studies with appropriate statistical power. Multivariate behavioral research, 52(3), 305-324. https://doi.org/10.1080/00273171.2017.1289361 DOI: https://doi.org/10.1080/00273171.2017.1289361

Mascha, E.J., & Vetter, T.R. (2018). Significance, errors, power, and sample size: the blocking and tackling of statistics. Anesthesia & Analgesia, 126(2), 691-698. https://doi.org/10.1213/ANE.0000000000002741. DOI: https://doi.org/10.1213/ANE.0000000000002741

Berry, S.M., Broglio, K.R., Groshen, S., & Berry, D.A. (2013). Bayesian hierarchical modeling of patient subpopulations: efficient designs of phase II oncology clinical trials. Clinical Trials, 10(5), 720-734. https://doi.org/10.1177/1740774513497539 DOI: https://doi.org/10.1177/1740774513497539

Vandendijck, Y., Faes, C., Kirby, R.S., Lawson, A., & Hens, N. (2016). Model-based inference for small area estimation with sampling weights. Spatial Statistics, 18, 455-473. https://doi.org/10.1016/j.spasta.2016.09.004 DOI: https://doi.org/10.1016/j.spasta.2016.09.004

Moeyaert, M., Rindskopf, D., Onghena, P., & Van den Noortgate, W. (2017). Multilevel modeling of single-case data: A comparison of maximum likelihood and Bayesian estimation. Psychological Methods, 22(4), 760. https://doi.org/10.1037/met0000136 DOI: https://doi.org/10.1037/met0000136

McGlothlin, A.E., & Viele, K. (2018). Bayesian hierarchical models. Jama, 320(22), 2365-2366. https://doi.org/10.1001/jama.2018.17977 DOI: https://doi.org/10.1001/jama.2018.17977

Chan, E.K. (2014). Standards and guidelines for validation practices: Development and evaluation of measurement instruments. In Validity and validation in social, behavioral, and health sciences (pp. 9-24). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-07794-9_2 DOI: https://doi.org/10.1007/978-3-319-07794-9_2

Birenbaum, M. (2007). Evaluating the assessment: Sources of evidence for quality assurance. Studies in Educational Evaluation, 33(1), 29-49. https://doi.org/10.1016/j.stueduc.2007.01.004 DOI: https://doi.org/10.1016/j.stueduc.2007.01.004

Ramezani, S.G., & Mostafavi, Z.S. (2025). Developing and validating a comprehensive scale for accreditation standards and quality assurance in e-learning institutions. Education and Information Technologies, 1-49. https://doi.org/10.1007/s10639-025-13587-5 DOI: https://doi.org/10.1007/s10639-025-13587-5

Baartman, L.K., Bastiaens, T.J., Kirschner, P.A., & Van der Vleuten, C.P. (2007). Evaluating assessment quality in competence-based education: A qualitative comparison of two frameworks. Educational research review, 2(2), 114-129. https://doi.org/10.1016/j.edurev.2007.06.001 DOI: https://doi.org/10.1016/j.edurev.2007.06.001

Inglis, A. (2008). Approaches to the validation of quality frameworks for e‐learning. Quality Assurance in Education, 16(4), 347-362. https://doi.org/10.1108/09684880810906490 DOI: https://doi.org/10.1108/09684880810906490

Whiting, P., Wolff, R., Mallett, S., Simera, I., & Savović, J. (2017). A proposed framework for developing quality assessment tools. Systematic reviews, 6(1), 204. https://doi.org/10.1186/s13643-017-0604-6 DOI: https://doi.org/10.1186/s13643-017-0604-6

Bentley, T.G., Cohen, J.T., Elkin, E.B., Huynh, J., Mukherjea, A., Neville, T.H., ... & Broder, M.S. (2017). Validity and reliability of value assessment frameworks for new cancer drugs. Value in Health, 20(2), 200-205. https://doi.org/10.1016/j.jval.2016.12.011 DOI: https://doi.org/10.1016/j.jval.2016.12.011

Kruger, T., & Leuro, J. (2015, September). Using Quality Assurance Principles to Help Ensure the Validity and Reliability of Competency Assessments. In SPE Offshore Europe Conference and Exhibition (pp. SPE-175491). SPE. https://doi.org/10.2118/175491-MS DOI: https://doi.org/10.2118/175491-MS

Feiler, P.H., Goodenough, J.B., Gurfinkel, A., Weinstock, C.B., & Wrage, L. (2012). Reliability validation and improvement framework (No. CMUSEI2012SR013). https://www.sei.cmu.edu/documents/1918/2012_003_001_34081.pdf DOI: https://doi.org/10.21236/ADA610905

Smidt, A., Balandin, S., Sigafoos, J., & Reed, V.A. (2009). The Kirkpatrick model: A useful tool for evaluating training outcomes. Journal of Intellectual and Developmental Disability, 34(3), 266-274. https://doi.org/10.1080/13668250903093125 DOI: https://doi.org/10.1080/13668250903093125

Praslova, L. (2010). Adaptation of Kirkpatrick’s four level model of training criteria to assessment of learning outcomes and program evaluation in higher education. Educational assessment, evaluation and accountability, 22(3), 215-225. https://doi.org/10.1007/s11092-010-9098-7 DOI: https://doi.org/10.1007/s11092-010-9098-7

Cheung, V.K. L., Chia, N.H., So, S.S., Ng, G.W. Y., & So, E.H. K. (2023). Expanding scope of Kirkpatrick model from training effectiveness review to evidence-informed prioritization management for cricothyroidotomy simulation. Heliyon, 9(8). https://doi.org/10.1016/j.heliyon.2023.e18268 DOI: https://doi.org/10.1016/j.heliyon.2023.e18268

Thörn, J., Strandberg, P.E., Sundmark, D., & Afzal, W. (2022). Quality assuring the quality assurance tool: applying safety-critical concepts to test framework development. PeerJ Computer Science, 8, e1131. https://doi.org/10.7717/peerj-cs.1131 DOI: https://doi.org/10.7717/peerj-cs.1131

Nawaz, F., Ahmad, W., & Khushnood, M. (2022). Kirkpatrick model and training effectiveness: a meta-analysis 1982 to 2021. Business & Economic Review, 14(2), 35-56. https://doi.org/10.22547/BER/14.2.2 DOI: https://doi.org/10.22547/BER/14.2.2

Baldwin, S.A., & Fellingham, G.W. (2013). Bayesian methods for the analysis of small sample multilevel data with a complex variance structure. Psychological methods, 18(2), 151. https://doi.org/10.1037/a0030642 DOI: https://doi.org/10.1037/a0030642

Schmid, C.H., & Brown, E.N. (2000). Bayesian hierarchical models. Methods in enzymology, 321, 305-330. https://doi.org/10.1016/S0076-6879(00)21200-7 DOI: https://doi.org/10.1016/S0076-6879(00)21200-7

Columb, M.O., & Atkinson, M.S. (2016). Statistical analysis: sample size and power estimations. Bja Education, 16(5), 159-161. https://doi.org/10.1093/bjaed/mkv034 DOI: https://doi.org/10.1093/bjaed/mkv034

Chen, C., Wakefield, J., & Lumely, T. (2014). The use of sampling weights in Bayesian hierarchical models for small area estimation. Spatial and spatio-temporal epidemiology, 11, 33-43. https://doi.org/10.1016/j.sste.2014.07.002 DOI: https://doi.org/10.1016/j.sste.2014.07.002

Goodhue, D., Lewis, W., & Thompson, R. (2006, January). PLS, small sample size, and statistical power in MIS research. In Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS’06) (Vol. 8, pp. 202b-202b). IEEE. https://doi.org/10.1109/HICSS.2006.381 DOI: https://doi.org/10.1109/HICSS.2006.381

Monnahan, C.C., Thorson, J.T., & Branch, T.A. (2017). Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo. Methods in Ecology and Evolution, 8(3), 339-348. https://doi.org/10.1111/2041-210X.12681 DOI: https://doi.org/10.1111/2041-210X.12681

Bocquel, M., Papi, F., Podt, M., & Driessen, H. (2013). Multitarget tracking with multiscan knowledge exploitation using sequential MCMC sampling. IEEE Journal of Selected Topics in Signal Processing, 7(3), 532-542. https://doi.org/10.1109/JSTSP.2013.2251317 DOI: https://doi.org/10.1109/JSTSP.2013.2251317

Nguyen, T.D., Gupta, S., Rana, S., & Venkatesh, S. (2018). Stable bayesian optimization. International Journal of Data Science and Analytics, 6(4), 327-339. https://doi.org/10.1007/s41060-018-0119-9 DOI: https://doi.org/10.1007/s41060-018-0119-9

Kim, M., Ding, Y., Malcolm, P., Speeckaert, J., Siviy, C.J., Walsh, C.J., & Kuindersma, S. (2017). Human-in-the-loop Bayesian optimization of wearable device parameters. PloS one, 12(9), e0184054. https://doi.org/10.1371/journal.pone.0184054 DOI: https://doi.org/10.1371/journal.pone.0184054

Stern, H.S., & Sinharay, S. (2005). Bayesian model checking and model diagnostics. Handbook of Statistics, 25, 171-192. https://doi.org/10.1016/S0169-7161(05)25007-1 DOI: https://doi.org/10.1016/S0169-7161(05)25006-X

Koch, K.R. (2018). Bayesian statistics and Monte Carlo methods. Journal of Geodetic Science, 8(1), 18-29. https://doi.org/10.1515/jogs-2018-0003 DOI: https://doi.org/10.1515/jogs-2018-0003

Chen, J.J., Lai, P.C., & Huang, Y.T. (2025). Bayesian reanalysis reinforces the potential mortality benefit of TNF-α inhibitors in COVID-19: a methodological perspective. Critical Care, 29(1), 250. https://doi.org/10.1186/s13054-025-05506-4 DOI: https://doi.org/10.1186/s13054-025-05506-4

Gajewski, B.J., Simon, S.D., & Carlson, S.E. (2008). Predicting accrual in clinical trials with Bayesian posterior predictive distributions. Statistics in medicine, 27(13), 2328-2340. https://doi.org/10.1002/sim.3128 DOI: https://doi.org/10.1002/sim.3128

Feng, Y., Gao, K., & Lacasse, S. (2024). Bayesian partial pooling to reduce uncertainty in overcoring rock stress estimation. Journal of Rock Mechanics and Geotechnical Engineering, 16(4), 1192-1201. https://doi.org/10.1016/j.jrmge.2023.05.003 DOI: https://doi.org/10.1016/j.jrmge.2023.05.003