# CASEset

Overview

**CASEset** is the dataset and training infrastructure for **CASE (Context-Aware Screen-based Estimation of Gaze)**. CASEset captures synchronized webcam frames, desktop screenshots, and high-precision gaze labels to enable models that reason about what users are looking at on screen.

The Problem

High-accuracy gaze tracking currently requires expensive specialized hardware. Webcam-based approaches are cheaper but lack access to on-screen context, which limits accuracy for screen-targeted tasks.

The Key Insight

Existing large-scale gaze datasets capture appearance but not the screen content users view. CASEset pairs synchronized screen content with gaze to enable context-aware models that use visual saliency and UI structure.

Technical Architecture

Dataset Pipeline

The collection infrastructure synchronizes three streams: * Webcam frames (face/eye appearance) * Desktop screenshots (visual context) * High-precision gaze (Tobii Pro Fusion) Key requirements: * Sub-50ms synchronization across modalities * Temporal interaction sequences during natural tasks * Diverse interface contexts (browsing, documents, apps)

FAZE-CCT Hybrid Model

A high-level pipeline: * Stage 1: FAZE DT-ED processes webcam frames to extract normalized gaze vectors. * Stage 2: Coordinate Translator maps gaze vectors to tentative screen coordinates. * Stage 3: CCT (Compact Convolutional Transformer) refines predictions using a 400×400 screenshot patch centered on the tentative location and optional recent click history. ```mermaid graph TD A[Webcam Frame] --> B["Stage 1 · FAZE DT-ED — Gaze Appearance Encoder"] B --> C("Normalized Gaze Vector") C --> D["Stage 2 · Coordinate Translator — Gaze Vector → Screen Coordinates"] D --> E("Tentative Screen Coordinates") F["Screenshot Patch · 400 × 400 px"] --> G H[Click History] --> G E --> G["Stage 3 · CCT Refiner — Context-Aware Transformer"] G --> I("Refined Screen-Gaze Prediction") style A fill:#455A64,stroke:#1C313A,stroke-width:2px,color:#FFFFFF style F fill:#455A64,stroke:#1C313A,stroke-width:2px,color:#FFFFFF style H fill:#455A64,stroke:#1C313A,stroke-width:2px,color:#FFFFFF style B fill:#546E7A,stroke:#29434E,stroke-width:2px,color:#FFFFFF style C fill:#ECEFF1,stroke:#607D8B,stroke-width:2px,color:#37474F style D fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#FFFFFF style E fill:#ECEFF1,stroke:#78909C,stroke-width:2px,color:#37474F style G fill:#37474F,stroke:#1C313A,stroke-width:2px,color:#FFFFFF style I fill:#CFD8DC,stroke:#546E7A,stroke-width:2px,color:#263238 ```

Knowledge Distillation Approach

CASEset enables distillation where expensive hardware (Tobii) teaches webcam-only models to improve accuracy while supporting on-device, privacy-preserving inference.

Roadmap

Target milestones through Fall 2026; high-level phases include infrastructure & pilot data, full data collection & initial model, model refinement & optimization, and final evaluation & thesis completion.

Project Structure

``` CASEset/ ├── README.md # This file ├── SUMMARY.md # GitBook table of contents ├── references.md # Bibliography and citations ├── collect/ # Data collection modules (webcam, screen, Tobii) ├── configs/ # YAML configs for collection & training ├── data/ # Raw and processed datasets ├── docs/ # Additional documentation ├── evaluation/ # Model evaluation scripts and metrics ├── models/ # Model architectures and weights ├── notebooks/ # Jupyter notebooks for analysis ├── outputs/ # Generated outputs │ ├── checkpoints/ # Model checkpoints │ ├── logs/ # Training and experiment logs │ └── results/ # Evaluation results ├── scripts/ # Utility and automation scripts ├── tests/ # Unit and integration tests └── training/ # Training pipelines and utilities ```

Citation

If you use CASEset in your research, please cite: ```bibtex @inproceedings{tran2024case, title={CASE: Context Aware Screen-Based Estimation of Gaze}, author={Tran, M. and Milkowski, L.}, booktitle={Eighth IEEE International Conference on Robotic Computing (IRC)}, pages={112--113}, year={2024}, organization={IEEE} } ```

License

\[License information to be added]

Acknowledgments

S.U. Fall 2025 Undergraduate Research Showcase