Publications

You can also find my articles on my Google Scholar profile.

Journal

M. Cai, J. Kezierbieke, X. Zhong, and H. Chen, "Uncertainty-aware and class-balanced domain adaptation for object detection in driving scenes," IEEE Transactions on Intelligent Transportation Systems (T-ITS), DOI:10.1109/TITS.2024.3413813, 2024. [paper]
G. Duan, H. Liu, M. Cai, J. Sun, and H. Chen, "MaDroid: A maliciousness-aware multifeatured dataset for detecting Android malware," Computers & Security, 2024. [paper]
G. Duan, Y. Fu, M. Cai, H. Chen, and J. Sun, "DongTing: a large-scale dataset for anomaly detection of the Linux kernel," Journal of Systems and Software, 2023. [paper]
C. Xue, X. Zhong, M. Cai, H. Chen, and W. Wang, "Audio-visual event localization by learning spatial and semantic co-attention," IEEE Transactions on Multimedia (TMM), DOI:10.1109/TMM.2021.3127029, 2021.
H. Yu, M. Cai, Y. Liu, and F. Lu, "First- and third-person video co-analysis by learning spatial-temporal joint attention," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), DOI:10.1109/TPAMI.2020.3030048, 2020.
Y. Huang, M. Cai, Z. Li, F. Lu and Y. Sato, "Mutual context network for jointly estimating egocentric gaze and action," IEEE Transactions on Image Processing (TIP), DOI:10.1109/TIP.2020.3007841, 2020.
Y. Huang, M. Cai, and Y. Sato, "An ego-vision system for discovering human joint attention," IEEE Transactions on Human-Machine Systems (THMS), DOI:10.1109/THMS.2020.2965429, 2020. [project]
M. Cai, F. Lu, and Y. Gao, "Desktop action recognition from first-person point-of-view," IEEE Transactions on Cybernetics (TCYB), DOI:10.1109/TCYB.2018.2806381, 2018. [preprint]
M. Cai, K. Kitani, and Y. Sato, "An ego-vision system for hand grasp analysis," IEEE Transactions on Human-Machine Systems (THMS), vol. 47, no. 4, pp. 524–535, 2017. [project] [preprint]

International Conference

H. Huang, H. Yu, D. Liu, H. Chen and M. Cai, "Egocentric speaker diarization with vision-guided clustering and adaptive speech re-detection," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025.
X. Ren, J. Luo, X. Zhong, and M. Cai, "Emotion-aware audio-driven face animation via contrastive feature disentanglement," INTERSPEECH, 2023. [paper]
Z. Liao, F. Xiong, J. Luo, M. Cai, ES Chng, J. Feng, and X. Zhong, "Blind estimation of room impulse response from monaural reverberant speech with segmental generative neural network," INTERSPEECH, 2023. [paper]
H. Jiang, J. Hu, D. Liu, J. Xiong, and M. Cai, "Driversonar: Fine-grained dangerous driving detection using active sonar," Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (UbiComp), 2021. [paper]
M. Cai, F. Lu, and Y. Sato, "Generalizing hand segmentation in egocentric videos with uncertainty-guided model adaptation," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. (acceptance rate: 22%) [project]
Z. Li, Y. Huang, M. Cai, and Y. Sato, "Manipulation-skill assessment from videos with spatial attention network," International Conference on Computer Vision Workshop (ICCVW), 2019.
[Arxiv preprint]
H. Yu, M. Cai, Y. Liu, and F. Lu, "What I see is what you see: joint attention learning for first and third person video co-analysis," ACM International Conference on Multimedia (ACM MM), 2019. (acceptance rate: 26.8%)
[Arxiv preprint]
Y. Huang, M. Cai, Z. Li, and Y. Sato, "Predicting gaze in egocentric videos by learning task-dependent attention transition," Proceedings of European Conference on Computer Vision (ECCV), Sep 2018. (oral presentation, acceptance rate: 2.4%)
[project] [Arxiv preprint]
Y. Huang, M. Cai, H. Kera, R. Yonetani, K. Higuchi, and Y. Sato, "Temporal localization and spatial segmentation of joint attention in multiple first-person videos," Proceedings of IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 2313-2321, Oct 2017.
[project] [paper] [poster]
M. Cai, K. Kitani, and Y. Sato, "Understanding hand-object manipulation with grasp types and object attributes," Proceedings of Robotics: Science and Systems (RSS), XII.034, pp. 1-10, June 2016. (acceptance rate: 20%)
[project] [paper]
M. Cai, K. Kitani, and Y. Sato, "A scalable approach for understanding the visual structures of hand grasps," Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 1360-1366, May 2015.
[project] [paper]

Domestic Conference

Z. Li, Y. Huang, M. Cai, and Y. Sato, "Pairwise performance assessment using deep ranking," Meeting on Image Recognition and Understanding (MIRU), extended abstract, Aug 2018.
Y. Huang, M. Cai, Z. Li, and Y. Sato, "Egocentric gaze prediction using task-dependent attention transition," Meeting on Image Recognition and Understanding (MIRU), extended abstract, Aug 2018.
Y. Huang, M. Cai, H. Kera, R. Yonetani, K. Higuchi, and Y. Sato, "Spatial-temporal segmentation of joint attention in multiple first-person videos," Meeting on Image Recognition and Understanding (MIRU), extended abstract, Aug 2017.
M. Cai, K.M. Kitani, and Y. Sato, "Hand skeleton pruning based on contour partition with fingertip detection," Meeting on Image Recognition and Understanding (MIRU), extended abstract, Nov 2014.

Technical Report

M. Cai, K.M. Kitani, and Y. Sato, "Studying mutual context of grasp types and object attributes in hand manipulation activities," IEICE technical report, vol.116 no.208, pp. 105-112, Sep 2016.
M. Cai, K.M. Kitani, and Y. Sato, "Discovering appearance-based grasp structures with wearable cameras," IEICE technical report, vol.114 no.351, pp. 49-54, Nov 2014.

Minjie Cai (蔡敏捷)

Journal

International Conference

Domestic Conference

Technical Report