Publications

You can also find my articles on my Google Scholar profile.

Journal

  • C. Xue, X. Zhong, M. Cai, H. Chen, and W. Wang, "Audio-visual event localization by learning spatial and semantic co-attention," IEEE Transactions on Multimedia (TMM), accepted, 2021. (Impact factor: 6.513)
  • H. Yu, M. Cai, Y. Liu, and F. Lu, "First- and third-person video co-analysis by learning spatial-temporal joint attention," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), DOI:10.1109/TPAMI.2020.3030048, 2020. (Impact factor: 17.861)
  • Y. Huang, M. Cai, Z. Li, F. Lu and Y. Sato, "Mutual context network for jointly estimating egocentric gaze and action," IEEE Transactions on Image Processing (TIP), DOI:10.1109/TIP.2020.3007841, 2020. (Impact factor: 6.79)
  • Y. Huang, M. Cai, and Y. Sato, "An ego-vision system for discovering human joint attention," IEEE Transactions on Human-Machine Systems (THMS), DOI:10.1109/THMS.2020.2965429, 2020. (Impact factor: 3.332)
    [project]
  • M. Cai, F. Lu, and Y. Gao, "Desktop action recognition from first-person point-of-view," IEEE Transactions on Cybernetics (TCYB), DOI:10.1109/TCYB.2018.2806381, 2018. (Impact factor: 8.803)
    [preprint]
  • M. Cai, K. Kitani, and Y. Sato, "An ego-vision system for hand grasp analysis," IEEE Transactions on Human-Machine Systems (THMS), vol. 47, no. 4, pp. 524–535, 2017. (Impact factor: 2.563)
    [project] [preprint]

International Conference

  • M. Cai, F. Lu, and Y. Sato, "Generalizing hand segmentation in egocentric videos with uncertainty-guided model adaptation," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. (acceptance rate: 22%)
    [project]
  • Z. Li, Y. Huang, M. Cai, and Y. Sato, "Manipulation-skill assessment from videos with spatial attention network," International Conference on Computer Vision Workshop (ICCVW), 2019.
    [Arxiv preprint]
  • H. Yu, M. Cai, Y. Liu, and F. Lu, "What I see is what you see: joint attention learning for first and third person video co-analysis," ACM International Conference on Multimedia (ACM MM), 2019. (acceptance rate: 26.8%)
    [Arxiv preprint]
  • Y. Huang, M. Cai, Z. Li, and Y. Sato, "Predicting gaze in egocentric videos by learning task-dependent attention transition," Proceedings of European Conference on Computer Vision (ECCV), Sep 2018. (oral presentation, acceptance rate: 2.4%)
    [project] [Arxiv preprint]
  • Y. Huang, M. Cai, H. Kera, R. Yonetani, K. Higuchi, and Y. Sato, "Temporal localization and spatial segmentation of joint attention in multiple first-person videos," Proceedings of IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 2313-2321, Oct 2017.
    [project] [paper] [poster]
  • M. Cai, K. Kitani, and Y. Sato, "Understanding hand-object manipulation with grasp types and object attributes," Proceedings of Robotics: Science and Systems (RSS), XII.034, pp. 1-10, June 2016. (acceptance rate: 20%)
    [project] [paper]
  • M. Cai, K. Kitani, and Y. Sato, "A scalable approach for understanding the visual structures of hand grasps," Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 1360-1366, May 2015.
    [project] [paper]

Domestic Conference

  • Z. Li, Y. Huang, M. Cai, and Y. Sato, "Pairwise performance assessment using deep ranking," Meeting on Image Recognition and Understanding (MIRU), extended abstract, Aug 2018.
  • Y. Huang, M. Cai, Z. Li, and Y. Sato, "Egocentric gaze prediction using task-dependent attention transition," Meeting on Image Recognition and Understanding (MIRU), extended abstract, Aug 2018.
  • Y. Huang, M. Cai, H. Kera, R. Yonetani, K. Higuchi, and Y. Sato, "Spatial-temporal segmentation of joint attention in multiple first-person videos," Meeting on Image Recognition and Understanding (MIRU), extended abstract, Aug 2017.
  • M. Cai, K.M. Kitani, and Y. Sato, "Hand skeleton pruning based on contour partition with fingertip detection," Meeting on Image Recognition and Understanding (MIRU), extended abstract, Nov 2014.

Technical Report

  • M. Cai, K.M. Kitani, and Y. Sato, "Studying mutual context of grasp types and object attributes in hand manipulation activities," IEICE technical report, vol.116 no.208, pp. 105-112, Sep 2016.
  • M. Cai, K.M. Kitani, and Y. Sato, "Discovering appearance-based grasp structures with wearable cameras," IEICE technical report, vol.114 no.351, pp. 49-54, Nov 2014.