A small tool to view real-world ActivityPub objects as JSON! Enter a URL
or username from Mastodon or a similar service below, and we'll send a
request with
the right
Accept
header
to the server to view the underlying object.
{
"@context": [
"https://join-lemmy.org/context.json",
"https://www.w3.org/ns/activitystreams"
],
"type": "Page",
"id": "https://lemmy.dbzer0.com/post/41533749",
"attributedTo": "https://lemmy.dbzer0.com/u/andrew0",
"to": [
"https://beehaw.org/c/foss",
"https://www.w3.org/ns/activitystreams#Public"
],
"name": "Open Source Text-to-Speech and Speech-to-Text on Android?",
"cc": [],
"content": "<p>Hello everyone! I am interested in replacing the Google <em>Speech Recognition and Synthesis</em> app on Android. For Speech-to-Text (STT), I’ve tried <a href=\"https://github.com/woheller69/whisperIME\" rel=\"nofollow\">Whisper</a> and <a href=\"https://gitlab.futo.org/keyboard/voiceinput\" rel=\"nofollow\">FUTO</a>, and settled on the latter because it seemed to be more versatile. Also, FUTO seems to have some decent recognition, but not yet capable of handling all the languages that I want. Regardless, so far happy with STT. The only annoyance I have is that it does not appear as an option in the settings for Speech recognition :(</p>\n<p>However, I can’t seem to find any replacements that have good Text-to-Speech (TTS) quality. I tried <a href=\"https://github.com/espeak-ng/espeak-ng\" rel=\"nofollow\">espeak-ng</a> and <a href=\"https://github.com/RHVoice/RHVoice\" rel=\"nofollow\">RHVoice</a>, but both have robotic outputs.</p>\n<p>Given the recent advancements in AI, I was expecting that there would be ways to incorporate open source TTS models like <a href=\"https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX\" rel=\"nofollow\">Kokoro</a> to generate speech on the go. Nevertheless, I could not really find any such apps so far.</p>\n<p>Has anyone managed to completely replace the Google app with (an)other privacy-focused FOSS app(s)?</p>\n",
"mediaType": "text/html",
"source": {
"content": "Hello everyone! I am interested in replacing the Google *Speech Recognition and Synthesis* app on Android. For Speech-to-Text (STT), I've tried [Whisper](https://github.com/woheller69/whisperIME) and [FUTO](https://gitlab.futo.org/keyboard/voiceinput), and settled on the latter because it seemed to be more versatile. Also, FUTO seems to have some decent recognition, but not yet capable of handling all the languages that I want. Regardless, so far happy with STT. The only annoyance I have is that it does not appear as an option in the settings for Speech recognition :(\n\nHowever, I can't seem to find any replacements that have good Text-to-Speech (TTS) quality. I tried [espeak-ng](https://github.com/espeak-ng/espeak-ng) and [RHVoice](https://github.com/RHVoice/RHVoice), but both have robotic outputs. \n\nGiven the recent advancements in AI, I was expecting that there would be ways to incorporate open source TTS models like [Kokoro](https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX) to generate speech on the go. Nevertheless, I could not really find any such apps so far. \n\nHas anyone managed to completely replace the Google app with (an)other privacy-focused FOSS app(s)?",
"mediaType": "text/markdown"
},
"attachment": [],
"sensitive": false,
"published": "2025-04-05T16:54:56.790983Z",
"updated": "2025-04-05T17:12:43.432618Z",
"audience": "https://beehaw.org/c/foss",
"tag": [
{
"href": "https://lemmy.dbzer0.com/post/41533749",
"name": "#foss",
"type": "Hashtag"
}
]
}