Think about the number of electronic appliances you encounter on a daily basis – using a microwave, adjusting the thermostat in your home, or buying a snack from the office vending machine. Now, imagine trying to operate any of these if you were blind. Impossible? Anhong Guo, Ph.D. student in the Human-Computer Interaction Institute (HCII), is turning this into a reality with the development of an interactive screen reader called VizLens.
VizLens is a first of its kind mobile application for iOS that combines on-demand crowd sourcing with real-time computer vision, allowing blind users to interactively explore physical interfaces like those found on electronic appliances.
Although previous systems have been developed using these same concepts, they are generally limited to specific tasks and interfaces. This is where VizLens is different. It can quickly adapt to new interfaces through the use of crowd sourcing, while using computer vision to give real time directions to a blind user about the location of where their finger is on an interface.
How it Works
The concept behind VizLens is fairly straightforward: Users take a smartphone photo of an interface, assign a name to it (e.g. microwave), and send the image to crowd workers for labeling. The workers then determine if the image presents a clear view of the interface. If not, the user is then prompted to take another photo. Once an appropriate “reference image” is captured, the crowd then works together to number and label the interface’s buttons and controls. When complete, the image is then sent to a server where it is stored for when the user wants to operate the device.
To do this, the user simply opens the VizLens application, points it at the interface, and holds a finger over the part of the interface beneath their finger. VizLens activates the VoiceOver screen reader in iOS, which “speaks” a description directly back to the user. Computer vision then matches the previously labeled reference image to the smartphone image captured in real-time. When this happens, VizLens can detect what the user is pointing to and provide an instantaneous audio voice response. According to Guo, “We wanted to use the smartphone camera as sort of a replacement or an augmentation of the user’s eyes.”
After conducting user studies and gathering participant feedback, VizLens v2 was developed, which focuses on improvements such as helping users take better reference images and adapting how the system responds to changing or dynamic interfaces. According to HCII faculty member Jeff Bigham, who advised the project, one of the biggest challenges was engineering the app so that it would respond to users in a quick enough and reliable way.
Guo hopes that someday VizLens will be deployed in the real world to help blind users in their daily lives. “We want to study how the system collects data and improves itself over time. It can be used as a teaching tool for new users.” For example, the system could collect data about how users operate different appliances, then suggest a sequence of buttons to new users so that they can use the appliance more efficiently. Bigham notes that they would like to work with manufacturers of appliances to help VizLens work better, or work right out of the box, without as much reliance on the crowd.
VizLens is changing how blind people interact with the world, reducing the need for sighted assistance or the use of tactile markers (Braille), which are impractical on public devices or dynamic interfaces. It promises a future of greater independence and easier adaptability for the blind, things most of us take for granted.
Guo will be presenting his paper on VizLens at UIST, October 16-19 in Tokyo, Japan.