publications | Charles Cui

2025

AIED

Adaptive, Scalable, and Human-Centered Technology for Data Visualization Literacy and Educational Assessment Development

Yuan "Charles" Cui, Fumeng Yang, and Matthew Kay

Artificial Intelligence in Education (AIED). Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED, 2025

Abs HTML PDF

Assessments play a critical role in education, but developing assessments can be difficult. This dissertation research focuses on data visualization literacy (a person’s ability to interpret visualizations) and K–12 educational assessments. Visualization literacy can influence people’s critical decisions. To understand and help people improve this ability, we must first be able to measure it with assessments. There are many challenges associated with visualization literacy assessments: e.g., (RQ1) how can we measure this ability in a repeatable and timely manner and (RQ2) how can we create diverse visualization items (question-answer pairs about visualizations) at scale? In K–12 education, teachers create assessments to assess student progress and instructional strategies. Despite recent studies on using fully automated ai approaches to generate K–12 level questions, few have investigated (RQ3) how to develop human-centered ai systems for holistic assessment development. This dissertation research aims to answer these questions by developing adaptive, scalable, and human-centered technology for assessment development.
IEEE VIS

An Autoethnography on Visualization Literacy: A Wicked Measurement Problem

Lily W. Ge, Anne-Flore Cabouat, Karen Bonilla, Yuan Cui, Yiren Ding, Noëlle Rakotondravony, Mackenzie Michaek Creamer, Jasmine Tan Otto, Maryam Hedayati, Bum Chul Kwon, Angela Locoro, Lane Harrison, Petra Isenberg, Michael. Correll, and Matthew Kay

IEEE Visualization Conference (VIS), 2025

Abs HTML PDF

Visualization items—factual questions about visualizations that ask viewers to accomplish visualization tasks—are regularly used in the field of information visualization as educational and evaluative materials. For example, researchers of visualization literacy require large, diverse banks of items to conduct studies where the same skill is measured repeatedly on the same participants. Yet, generating a large number of high-quality, diverse items requires significant time and expertise. To address the critical need for a large number of diverse visualization items in education and research, this paper investigates the potential for large language models (LLMs) to automate the generation of multiple-choice visualization items. Through an iterative design process, we develop the VILA (Visualization Items Generated by Large LAnguage Models) pipeline, for efficiently generating visualization items that measure people’s ability to accomplish visualization tasks. We use the VILA pipeline to generate 1,404 candidate items across 12 chart types and 13 visualization tasks. In collaboration with 11 visualization experts, we develop an evaluation rulebook which we then use to rate the quality of all candidate items. The result is the VILA bank of ∼1,100 items. From this evaluation, we also identify and classify current limitations of the VILA pipeline, and discuss the role of human oversight in ensuring quality. In addition, we demonstrate an application of our work by creating a visualization literacy test, VILA-VLAT, which measures people’s ability to complete a diverse set of tasks on various types of visualizations; comparing it to the existing VLAT, VILA-VLAT shows moderate to high convergent validity (R = 0.70). Lastly, we discuss the application areas of the VILA pipeline and the VILA bank and provide practical recommendations for their use.
ACM CHI

AVEC: An Assessment of Visual Encoding Ability in Visualization Construction

Lily W. Ge, Yuan Cui, and Matthew Kay

ACM Conference on Human Factors in Computing Systems (CHI), 2025

Abs HTML PDF

Visualization literacy is the ability to both interpret and construct visualizations. Yet existing assessments focus solely on visualization interpretation. A lack of construction-related measurements hinders efforts in understanding and improving literacy in visualizations. We design and develop avec, an assessment of a person’s visual encoding ability—a core component of the larger process of visualization construction—by: (1) creating an initial item bank using a design space of visualization tasks and chart types, (2) designing an assessment tool to support the combinatorial nature of selecting appropriate visual encodings, (3) building an autograder from expert scores of answers to our items, and (4) refining and validating the item bank and autograder through an analysis of test tryout data with 95 participants and feedback from the expert panel. We discuss recommendations for using avec, potential alternative scoring strategies, and the challenges in assessing higher-level visualization skills using constructed-response tests. Supplemental materials are available at: https://osf.io/hg7kx/.

2024

IEEE VIS

Promises and Pitfalls: Using Large Language Models to Generate Visualization Items

Yuan Cui, Lily W. Ge, Yiren Ding, Lane Harrison, Fumeng Yang, and Matthew Kay

IEEE Visualization Conference (VIS), 2024

Abs HTML PDF

Visualization items—factual questions about visualizations that ask viewers to accomplish visualization tasks—are regularly used in the field of information visualization as educational and evaluative materials. For example, researchers of visualization literacy require large, diverse banks of items to conduct studies where the same skill is measured repeatedly on the same participants. Yet, generating a large number of high-quality, diverse items requires significant time and expertise. To address the critical need for a large number of diverse visualization items in education and research, this paper investigates the potential for large language models (LLMs) to automate the generation of multiple-choice visualization items. Through an iterative design process, we develop the VILA (Visualization Items Generated by Large LAnguage Models) pipeline, for efficiently generating visualization items that measure people’s ability to accomplish visualization tasks. We use the VILA pipeline to generate 1,404 candidate items across 12 chart types and 13 visualization tasks. In collaboration with 11 visualization experts, we develop an evaluation rulebook which we then use to rate the quality of all candidate items. The result is the VILA bank of ∼1,100 items. From this evaluation, we also identify and classify current limitations of the VILA pipeline, and discuss the role of human oversight in ensuring quality. In addition, we demonstrate an application of our work by creating a visualization literacy test, VILA-VLAT, which measures people’s ability to complete a diverse set of tasks on various types of visualizations; comparing it to the existing VLAT, VILA-VLAT shows moderate to high convergent validity (R = 0.70). Lastly, we discuss the application areas of the VILA pipeline and the VILA bank and provide practical recommendations for their use.
ACM CHI

Odds and Insights: Decision Quality in Visual Analytics Under Uncertainty

Abhraneel Sarma, Xiaoying Pu, Yuan Cui, Eli T Brown, Michael Correll, and Matthew Kay

ACM Conference on Human Factors in Computing Systems (CHI), 2024

Abs HTML PDF

Recent studies have shown that users of visual analytics tools can have difficulty distinguishing robust findings in the data from statistical noise, but the true extent of this problem is likely dependent on both the incentive structure motivating their decisions, and the ways that uncertainty and variability are (or are not) represented in visualisations. In this work, we perform a crowd-sourced study measuring decision-making quality in visual analytics, testing both an explicit structure of incentives designed to reward cautious decision-making as well as a variety of designs for communicating uncertainty. We find that, while participants are unable to perfectly control for false discoveries as well as idealised statistical models such as the Benjamini-Hochberg, certain forms of uncertainty visualisations can improve the quality of participants’ decisions and lead to fewer false discoveries than not correcting for multiple comparisons. We conclude with a call for researchers to further explore visual analytics decision quality under different decision-making contexts, and for designers to directly present uncertainty and reliability information to users of visual analytics tools. This paper and the associated analysis materials are available at: https://osf.io/xtsfz/

2023

IEEE VIS

Adaptive Assessment of Visualization Literacy

Yuan Cui, Lily W. Ge, Yiren Ding, Fumeng Yang, Lane Harrison, and Matthew Kay

IEEE Visualization Conference (VIS), 2023

Abs HTML PDF

Visualization literacy is an essential skill for accurately interpreting data to inform critical decisions. Consequently, it is vital to understand the evolution of this ability and devise targeted interventions to enhance it, requiring concise and repeatable assessments of visualization literacy for individuals. However, current assessments, such as the Visualization Literacy Assessment Test (VLAT), are time-consuming due to their fixed, lengthy format. To address this limitation, we develop two streamlined computerized adaptive tests (CATs) for visualization literacy, A-VLAT and A-CALVI, which measure the same set of skills as their original versions in half the number of questions. Specifically, we (1) employ item response theory (IRT) and non-psychometric constraints to construct adaptive versions of the assessments, (2) finalize the configurations of adaptation through simulation, (3) refine the composition of test items of A-CALVI via a qualitative study, and (4) demonstrate the test-retest reliability (ICC: 0.98 and 0.98) and convergent validity (correlation: 0.81 and 0.66) of both CATs via four online studies. We discuss practical recommendations for using our CATs and opportunities for further customization to leverage the full potential of adaptive assessments.
ACM CHI

CALVI: Critical Thinking Assessment for Literacy in Visualizations

Lily W. Ge, Yuan Cui, and Matthew Kay

ACM Conference on Human Factors in Computing Systems (CHI), 2023

Abs HTML PDF

Visualization misinformation is a prevalent problem, and combating it requires understanding people’s ability to read, interpret, and reason about erroneous or potentially misleading visualizations, which lacks a reliable measurement: existing visualization literacy tests focus on well-formed visualizations. We systematically develop an assessment for this ability by: (1) developing a precise definition of misleaders (decisions made in the construction of visualizations that can lead to conclusions not supported by the data), (2) constructing initial test items using a design space of misleaders and chart types, (3) trying out the provisional test on 497 participants, and (4) analyzing the test tryout results and refining the items using Item Response Theory, qualitative analysis, a wrong-due-to-misleader score, and the content validity index. Our final bank of 45 items shows high reliability, and we provide item bank usage recommendations for future tests and different use cases.

2021

W3PHIAI

Can an Algorithm Be My Healthcare Proxy?

Duncan McElfresh, Samuel Dooley, Yuan Cui, Kendra Griesman, Weiqin Wang, Tyler Will, Neil Sehgal, and John Dickerson

Explainable AI in Healthcare and Medicine, 2021

Abs HTML PDF

Planning for death is not a process in which everyone participates. Yet a lack of planning can severely impact a patient’s well-being, the well-being of her family, and the medical community as a whole. Advance Care Planning (ACP) has been a field in the United States for a half-century, and often using short surveys or questionnaires to help patients consider future end of life (EOL) care decisions. Recent web-based tools promise to increase ACP participation rates; modern techniques from artificial intelligence (AI) could further improve and personalize these tools. We discuss two hypothetical AI-based apps and their potential implications. We hope that this paper will encourage thought about appropriate applications of AI in ACP as well as implementation of AI to ensure patient intentions are honored.