IndicSignAI

IndicSignAI: A Hybrid AI System for Assamese Text to Indian Sign Language Animation

Bridging Multilingual Accessibility through AI-Driven Translation, Speech Synthesis, and Sign Language Animation

🧭 Abstract

IndicSignAI presents a novel, end-to-end AI framework designed to facilitate inclusive communication by translating Assamese text into English speech and finally rendering Indian Sign Language (ISL) animations. This system is tailored for the Deaf and Hard of Hearing (DHH) community in India, enabling equitable access to information in multilingual environments.

Leveraging advancements in Natural Language Processing (NLP), Text-to-Speech (TTS) synthesis, rule-based glossification, and 3D sign avatar animation, IndicSignAI demonstrates modular, scalable, and real-time translation across sensory and linguistic modalities.

🔍 Problem Statement

India’s linguistic diversity and the underrepresentation of regional languages in AI-driven accessibility tools hinder effective communication for the DHH population. Existing ISL translation systems largely support Hindi or English, excluding speakers of low-resource languages like Assamese. This project addresses:

The absence of Assamese–ISL translation pipelines
Lack of standardized ISL grammar
Minimal digital representation of Indian regional languages

🧠 System Overview

The framework is structured as a four-stage AI pipeline:

Neural Machine Translation
- Tool: IndicTrans2
- Input: Assamese text → Output: English text
Text-to-Speech Synthesis
- Tools: Tacotron2, FastSpeech2
- Input: English text → Output: Natural-sounding English audio
ISL Gloss Generation
- Tools: Rule-based SVO restructuring, T5-based sequence models
- Input: English text → Output: ISL-compatible gloss
ISL Avatar Animation
- Tools: Blender, SignAvatar, OpenPose, MediaPipe
- Input: ISL gloss → Output: 3D sign language animation (.mp4)

🧩 Architectural Flow

[Assamese Text]
     ↓ IndicTrans2
[English Text]
     ↓ Tacotron2 / FastSpeech2 --------→ [English Audio]
     ↓ Glossifier (T5 / Rule-based)
[ISL Gloss]
     ↓ SignAvatar + Blender
[3D ISL Animation Output]

🧪 Experimental Setup

Hardware: NVIDIA RTX 3060 GPU, 32GB RAM
Software: Python 3.9, PyTorch, Transformers, Blender, Streamlit

✨ Evaluation Metrics

Module	Metric
Translation	BLEU: 35.4, METEOR
Speech Synthesis	MOS (Mean Opinion Score): 4.2
Gloss Generation	Jaccard Similarity: 0.82
Animation Accuracy	Human Expert Score: 87%
Latency	< 200 ms

📂 Project Structure

IndicSignAI/
├── models/             # Pretrained and fine-tuned model files
│   ├── translation/    # IndicTrans2
│   ├── tts/            # Tacotron2 / FastSpeech2
│   ├── glossifier/     # T5 model or rule scripts
│   └── avatar/         # Avatar rendering components
├── frontend/           # Streamlit UI
├── scripts/            # Batch jobs / training pipelines
├── data/               # Input/output samples
├── outputs/            # Audio, gloss, and animation files
├── requirements.txt    # Python dependencies
└── README.md           # This file

⚙️ Installation

git clone https://github.com/yourusername/IndicSignAI.git
cd IndicSignAI
pip install -r requirements.txt

Blender must be installed and accessible via command-line (blender --background) for animation rendering.

📌 Key Challenges

Scarcity of Assamese–English–ISL aligned datasets
ISL lacks formal grammatical structure for automation
Difficulty in capturing facial expressions & non-manual markers
Limited availability of expressive, animated sign avatars

🔮 Future Work

Expand gloss corpora via human annotation and crowdsourcing
Integrate facial expression synthesis for emotional nuance
Extend support to additional Indian languages (e.g., Bengali, Manipuri)
Build lightweight ONNX/TensorRT models for mobile deployment

📚 References

AI4Bharat, IndicTrans2: Multilingual NMT for Indian Languages
Shen et al., Tacotron2: Natural TTS with Spectrograms (ICASSP 2018)
Ren et al., FastSpeech2: High-Quality TTS (NeurIPS 2020)
OpenPose & MediaPipe for pose estimation
SignAvatar.ai, Blender-based avatar rendering
IIIT-D, ISLRTC, and NSL23 gloss datasets

Full reference list available in docs/references.md

🤝 Acknowledgments

AI4Bharat for IndicTrans2
ISLRTC & IIIT-Delhi for dataset resources
SignAvatar and open-source Blender animation tools

📄 License

This project is licensed under the MIT License – see the LICENSE file for details.

🙌 Contributing

We welcome contributions in the form of:

Dataset collection and annotation
Model improvement or benchmarking
UI/UX enhancement for the frontend

Fork the repository, create a branch, and submit a Pull Request.

📬 Contact

For academic or research collaborations, please contact us via GitHub Issues or Discussions.