Multimodal Machine Learning: Integrating Vision, Language, and Sensor Data for Intelligent Systems
Main Article Content
Abstract
Multimodal Machine Learning (MML) combines multiple data sources such as text, images, audio, and sensor inputs to enhance system intelligence. This paper reviews recent advancements in multimodal learning techniques, including transformer-based architectures and cross-modal fusion strategies. It evaluates applications in healthcare diagnostics, autonomous vehicles, and smart surveillance systems. Challenges such as data alignment, model complexity, and computational cost are discussed. The paper highlights emerging trends in foundation models and unified AI systems capable of understanding diverse data modalities.
Article Details
References
Whig, P., Yathiraju, N., & Sharma, P. (2025). Cutting-edge solutions for advancing sustainable development: Exploring technological horizons for sustainability (Part 2). Bentham Science Publishers.
Whig, P., & Elngar, A. (2025). AI innovations for transforming food production. IGI Global.
Whig, P., & Elngar, A. (2025). Modernizing the food industry: AI-powered infrastructure, security, and supply chain innovation. IGI Global.
Whig, P., Sharma, P., Elngar, A. A., & Silva, N. (2025). Quantum learning: Bridging artificial intelligence, quantum computing, and data science in education. CRC Press.
Rather, R., Vats, H., Sharma, S., & Whig, P. (2025). AI and ML applications in data-informed leadership: Transforming higher education. In Advances in data science driven technologies (Vol. 5, pp. 36–53). Bentham Science.