Document management support system based on computer vision and augmented reality.
The objective of this Chair established in partnership with the University of Castile-La Mancha is to develop a document management system that supports workers with intellectual or sensory disabilities, using technologies such as augmented reality and computer vision.
The system identifies the document that the user is working with as well as his or her interaction with it. A computing system then analyzes the input and displays amplified multimodal information (visual and audio) on the work in question.
The proposed solution will use the following hardware components:
- USB camera: The system uses an inexpensive camera to provide the input for the computer vision module.
- Projector: The system will use a projector to display visual information directly aligned on the real document. The system will respond to the requests the user makes directly in the physical space, amplifying the relevant information for the required action.
- Computing system: The computing system will use the images obtained from the USB camera as an input and generate an output via the projector. This output will factor in the relative 3D positioning between the document and the projector to ensure perfect visual amplification. It will be possible to move the document within an area of the desktop but the amplification must be perfectly aligned with the physical space. This computing system will also generate relevant audio information for the document in question (e.g. voice synthesizing and sound alerts), as well as displaying additional information on a screen.
These are the general objectives. The specific objectives and expected outcomes are as follows:
In short, the ARgos project will support workplace inclusion for disabled people who need to manage printed documents.
- Document identification system. ARgos will feature a rapid document identification system that uses specific computer vision algorithms. Based on a 2D image, the system will calculate any distortions arising from the projection in perspective (based on parameters extrinsic and intrinsic to the camera) and will compare a specific document with a database of documents known to the system.
- Interaction in the physical space. Users will be able to interact directly in the physical space by pointing their finger at the paper. The system will also support voice commands, thus eliminating the need to use a mouse or keyboard with the platform.
- Multimodal amplification. ARgos will also feature different real-world information amplification modes. For example, visual information will be amplified using the projector, which will display information related to the context directly on the paper space. Meanwhile, the computing system will generate audio information related to the operation in progress (voice synthesizing and sound alerts). The computing system screen will be able to display perfectly aligned 3D information in the document space as well as other sources of visual information.
- Inexpensive components. To support the actual implementation of the system in the workplace, ARgos needs to operate with inexpensive components and incorporate distortion correcting and 3D recording mechanisms comprised entirely of software solutions.
Related links