About - VisionForge AI

Why we changed the idea

Captioning alone was not enough. The output needed to be useful in real situations.

A normal caption might say “a beach at sunset” or “a room with furniture.” That can be accurate, but it is not always helpful for someone who needs to move safely. VisionForge AI focuses on the details that affect navigation: obstacles, surface conditions, nearby people, doors, stairs, water, vehicles, and the safest direction to move.

The assistant lets a user upload an image and ask a question such as “Can I walk forward safely?” or “What is in front of me?” The system sends the image and question to Gemini, returns a spoken accessibility-focused response, and stores the interaction in Google Datastore.

The project is designed as a working web system, not just a static demo. It includes a Node.js and Express backend, cloud deployment on Google App Engine, Gemini image analysis, saved logs, and a mobile-friendly assistant page for quick use.

Core Capabilities

The main features are focused on accessibility, safety, and saved results.

Image Input

Users can upload a photo of their surroundings so the assistant can inspect the scene.

Voice Question

The user can ask practical questions about movement, obstacles, or what is nearby.

Guidance Output

The result is written and spoken back with safety details and next-step guidance.

System Workflow

Upload Image → Ask Question → Gemini Analysis → Spoken Guidance → Save Log in Datastore → Review History

Project Team

Built as a Web Systems project with a focus on practical AI and cloud deployment.

Mustafa Khattak

Full-Stack Development

Worked on the interface, backend integration, Google Cloud setup, and deployment workflow.

Kokila Ganesan

AI Feature Support

Supported the accessibility use case, prompt direction, and response quality testing.

Xiaoqin Tang

Cloud and Data Support

Supported cloud database planning, saved logs, and system documentation requirements.