IndicSignAI

IndicSignAI: A Hybrid AI System for Assamese Text to Indian Sign Language Animation

Bridging Multilingual Accessibility through AI-Driven Translation, Speech Synthesis, and Sign Language Animation


๐Ÿงญ Abstract

IndicSignAI presents a novel, end-to-end AI framework designed to facilitate inclusive communication by translating Assamese text into English speech and finally rendering Indian Sign Language (ISL) animations. This system is tailored for the Deaf and Hard of Hearing (DHH) community in India, enabling equitable access to information in multilingual environments.

Leveraging advancements in Natural Language Processing (NLP), Text-to-Speech (TTS) synthesis, rule-based glossification, and 3D sign avatar animation, IndicSignAI demonstrates modular, scalable, and real-time translation across sensory and linguistic modalities.


๐Ÿ” Problem Statement

Indiaโ€™s linguistic diversity and the underrepresentation of regional languages in AI-driven accessibility tools hinder effective communication for the DHH population. Existing ISL translation systems largely support Hindi or English, excluding speakers of low-resource languages like Assamese. This project addresses:


๐Ÿง  System Overview

The framework is structured as a four-stage AI pipeline:

  1. Neural Machine Translation
    • Tool: IndicTrans2
    • Input: Assamese text โ†’ Output: English text
  2. Text-to-Speech Synthesis
    • Tools: Tacotron2, FastSpeech2
    • Input: English text โ†’ Output: Natural-sounding English audio
  3. ISL Gloss Generation
    • Tools: Rule-based SVO restructuring, T5-based sequence models
    • Input: English text โ†’ Output: ISL-compatible gloss
  4. ISL Avatar Animation
    • Tools: Blender, SignAvatar, OpenPose, MediaPipe
    • Input: ISL gloss โ†’ Output: 3D sign language animation (.mp4)

๐Ÿงฉ Architectural Flow

[Assamese Text]
     โ†“ IndicTrans2
[English Text]
     โ†“ Tacotron2 / FastSpeech2 --------โ†’ [English Audio]
     โ†“ Glossifier (T5 / Rule-based)
[ISL Gloss]
     โ†“ SignAvatar + Blender
[3D ISL Animation Output]

๐Ÿงช Experimental Setup

โœจ Evaluation Metrics

Module Metric
Translation BLEU: 35.4, METEOR
Speech Synthesis MOS (Mean Opinion Score): 4.2
Gloss Generation Jaccard Similarity: 0.82
Animation Accuracy Human Expert Score: 87%
Latency < 200 ms

๐Ÿ“‚ Project Structure

IndicSignAI/
โ”œโ”€โ”€ models/             # Pretrained and fine-tuned model files
โ”‚   โ”œโ”€โ”€ translation/    # IndicTrans2
โ”‚   โ”œโ”€โ”€ tts/            # Tacotron2 / FastSpeech2
โ”‚   โ”œโ”€โ”€ glossifier/     # T5 model or rule scripts
โ”‚   โ””โ”€โ”€ avatar/         # Avatar rendering components
โ”œโ”€โ”€ frontend/           # Streamlit UI
โ”œโ”€โ”€ scripts/            # Batch jobs / training pipelines
โ”œโ”€โ”€ data/               # Input/output samples
โ”œโ”€โ”€ outputs/            # Audio, gloss, and animation files
โ”œโ”€โ”€ requirements.txt    # Python dependencies
โ””โ”€โ”€ README.md           # This file

โš™๏ธ Installation

git clone https://github.com/yourusername/IndicSignAI.git
cd IndicSignAI
pip install -r requirements.txt

Blender must be installed and accessible via command-line (blender --background) for animation rendering.


๐Ÿ“Œ Key Challenges


๐Ÿ”ฎ Future Work


๐Ÿ“š References

  1. AI4Bharat, IndicTrans2: Multilingual NMT for Indian Languages
  2. Shen et al., Tacotron2: Natural TTS with Spectrograms (ICASSP 2018)
  3. Ren et al., FastSpeech2: High-Quality TTS (NeurIPS 2020)
  4. OpenPose & MediaPipe for pose estimation
  5. SignAvatar.ai, Blender-based avatar rendering
  6. IIIT-D, ISLRTC, and NSL23 gloss datasets

Full reference list available in docs/references.md


๐Ÿค Acknowledgments


๐Ÿ“„ License

This project is licensed under the MIT License โ€“ see the LICENSE file for details.


๐Ÿ™Œ Contributing

We welcome contributions in the form of:

Fork the repository, create a branch, and submit a Pull Request.


๐Ÿ“ฌ Contact

For academic or research collaborations, please contact us via GitHub Issues or Discussions.