# References

\[1] M. Tran and L. Milkowski (2024) "CASE: Context Aware Screen-Based Estimation of Gaze," Eighth IEEE International Conference on Robotic Computing (IRC), Tokyo, Japan, 2024, pp. 112-113,&#x20;

{% embed url="<https://ieeexplore.ieee.org/document/10818040>" %}

\[2] Bao, Y., Cheng, Y., Liu, Y., & Lu, F. (2022). Adaptive Feature Fusion Network for Gaze Tracking in Mobile Tablets. 2022 26th International Conference on Pattern Recognition (ICPR), 1473-1479.&#x20;

{% embed url="<https://doi.org/10.1109/ICPR56361.2022.9956543>" %}

\[3] Cheng, Y., Wang, H., Bao, Y., & Lu, F. (2022). Appearance-Based Gaze Estimation With Deep Learning: A Review and Benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 8428-8448.&#x20;

{% embed url="<https://doi.org/10.1109/TPAMI.2021.3111128>" %}

\[4] Chen, Z., & Shi, B. E. (2023). Towards High Performance Low Complexity Calibration in Appearance Based Gaze Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 3817-3829.&#x20;

{% embed url="<https://doi.org/10.1109/TPAMI.2022.3182940>" %}

\[5] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations (ICLR).&#x20;

{% embed url="<https://arxiv.org/abs/2010.11929>" %}

\[6] Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., et al. (2022). PVT v2: Improved Baselines with Pyramid Vision Transformer. Computational Visual Media, 8(3), 415-424.&#x20;

{% embed url="<https://doi.org/10.1007/s41095-022-0274-8>" %}

\[7] Li, J., Li, D., Savarese, S., & Hoi, S. (2023). BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. International Conference on Machine Learning (ICML).&#x20;

{% embed url="<https://arxiv.org/abs/2301.12597>" %}

\[8] Huang, S., Dong, L., Wang, W., Hao, Y., Singhal, S., Ma, S., et al. (2023). Language Is Not All You Need: Aligning Perception with Language Models. Advances in Neural Information Processing Systems (NeurIPS), 36.&#x20;

{% embed url="<https://arxiv.org/abs/2302.14045>" %}

\[9] Biten, A. F., Gómez, L., Rusiñol, M., & Karatzas, D. (2022). Scene Text Visual Question Answering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7), 4073-4086.&#x20;

{% embed url="<https://doi.org/10.1109/TPAMI.2021.3055735>" %}

\[10] Zhang, X., Park, S., Beeler, T., Bradley, D., Tang, S., & Hilliges, O. (2020). ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Poses and Gaze Directions. European Conference on Computer Vision (ECCV), 365-381.&#x20;

{% embed url="<https://doi.org/10.1007/978-3-030-58558-7_22>" %}

\[11] Fischer, T., Chang, H. J., & Demiris, Y. (2022). RT-BENE: A Dataset and Baselines for Real-Time Blink Estimation in Natural Environments. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 1134-1143.&#x20;

{% embed url="<https://doi.org/10.1109/WACV51458.2022.00120>" %}

\[12] Kothari, R., Yang, Z., Kanan, C., Bailey, R., Pelz, J. B., & Diaz, G. J. (2020). Gaze-in-Wild: A Dataset for Studying Eye and Head Coordination in Everyday Activities. Scientific Reports, 10(1), 2539.&#x20;

{% embed url="<https://doi.org/10.1038/s41597-020-0443-0>" %}

\[13] Mehta, S., & Rastegari, M. (2022). MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. International Conference on Learning Representations (ICLR).&#x20;

{% embed url="<https://arxiv.org/abs/2110.02178>" %}

\[14] Cai, H., Gan, C., Wang, T., Zhang, Z., & Han, S. (2020). Once-for-All: Train One Network and Specialize it for Efficient Deployment. International Conference on Learning Representations (ICLR).&#x20;

{% embed url="<https://arxiv.org/abs/1908.09791>" %}

\[15] Lin, J., Tang, J., Tang, H., Yang, S., Dang, X., & Han, S. (2023). AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv preprint arXiv:2306.00978.&#x20;

{% embed url="<https://arxiv.org/abs/2306.00978>" %}

\[16] Huang, M. X., Li, J., Ngai, G., Leong, H. V., & Bulling, A. (2022). Moment-to-Moment Detection of Internal Thought from Eye Vergence Behaviour. ACM Transactions on Computer-Human Interaction, 29(4), 1-49.&#x20;

{% embed url="<https://doi.org/10.1145/3517221>" %}

\[17] Brousseau, B., Rose, J., & Eizenman, M. (2020). Accurate Model-Based Point of Gaze Estimation on Mobile Devices. Vision Research, 175, 1-9.&#x20;

{% embed url="<https://doi.org/10.1016/j.visres.2020.06.008>" %}

\[18] Pathirana, P. N., Senarath, S., Meedeniya, D., & Jayarathna, S. (2022). Eye Gaze Estimation: A Survey on Deep Learning-Based Approaches. Expert Systems with Applications, 199, 116894.&#x20;

{% embed url="<https://doi.org/10.1016/j.eswa.2022.116894>" %}

\[19] Park, S., Mello, S. D., Molchanov, P., Iqbal, U., Hilliges, O., & Kautz, J. (2019). Few-Shot Adaptive Gaze Estimation. IEEE/CVF International Conference on Computer Vision (ICCV), 9367-9376.&#x20;

{% embed url="<https://doi.org/10.1109/ICCV.2019.00946>" %}

\[20] Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhandarkar, S., Matusik, W., & Torralba, A. (2016). Eye Tracking for Everyone. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2176-2184.&#x20;

{% embed url="<https://doi.org/10.1109/CVPR.2016.239>" %}