This month’s initiative centers on developing versatile linguistic tools and expanding our core Speech-to-Text (STT) capabilities. Our primary focus is on the creation of the Mahina Toolkit (mahina-tk), a modular language utility set designed to work seamlessly in both live speech and static text environments.
Mahina Toolkit (mahina-tk)
- Modular Design:
- Merges existing Core Tools into a portable, modular web app
- Supports both live speech (real-time STT) and static text analysis.
- Integrated Platform Development:
- Building Mahina HUD visionOS app as a central platform for Natural Language Processing (NLP)
toolsets and AR language overlay tech. - Experimental integration of mixed reality features to enhance user
interaction.
- Additional Developments:
- Deployment of advanced handwriting and text recognition tools, including OCR functionalities.
- Handwriting tools: implementation of deepseek-vl-1.3b model for character recognition (OCR).
- Availability:
- mahina-tk API will be available for commercial licensing, with free access offered to educational institutions.
Toolkit Use Cases
- Use Case 1: Real-Time Speech-to-Text (STT) Processing
-
- Feature:
- Transcribes live speech with transient word display, POS
tagging, integrated language tools
- Transcribes live speech with transient word display, POS
- Feature:
- Use Case 2: Static Text Analysis
-
- Feature:
-
- Analyzing longer, static text with POS tagging, integrated
language tools.
- Analyzing longer, static text with POS tagging, integrated
-
- Feature:
STT Multisub Support & Compatibility Updates
- Transliteration – Romanization:
-
- Expanded language support now includes Thai, Lao, Croatian, Greek, Lithuanian, Swedish, Arabic and more.
- Transliteration Mode:
- Chinese (zh-TW, zh-CN), added partial pinyin support.
Ongoing & Future Developments
- PencilKit Integration:
-
- Currently under construction, integrating with OCR tools with open-source language models such as
deepseek-vl-1.3b-chat
- Currently under construction, integrating with OCR tools with open-source language models such as
- Visual Environment Augmentation (VEA):
- In the early exploration phase with a focus on hardware compatibility.
- 3D Mapping and other environment augmentation tools being explored.
Stay updated here:
Github: github.com/phasatek