Pass audio file to text to speech engine?

RoadhammerGaming · Sep 6, 2017

Hello, in my app I want to record the user's speech, run it through a band pass filter, then pass the resulting audio file (PCM / WAV) to the text to speech engine to speak the filtered results, I have everything working except cannot find a way to pass an audio file to the tts engine, I have googled this for a long time now (2 weeks) and no luck. is there any workaround for achieving this?
What I tried was calling the RecognizerIntent, then start the band pass filter via recording, and also tried the other way around by start the band pass method first then calling the recognizer intent but either way kills the tts instance even tho it's running on a separate thread. Also I have tested this using the normal tts procedure in the recognizer intent and also the web search version of the recognizer intent both with the same results, If I don't implement the band pass filter (NOTE that a recording thread is started at this time) it works fine but as soon as I implement the bandpass filter it fails, with a helpfull message when in web search mode that says "google is unavailable" Here's my current code:

RecognizerIntent, normal version:

Java:

    public void getMic() {//bring up the speak now message window
        tts = new TextToSpeech(this, new TextToSpeech.OnInitListener() {
            @Override
            public void onInit(int status) {
                if (status == TextToSpeech.SUCCESS) {
                    result = tts.setLanguage(Locale.US);
                    if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                        l = new Intent();
                        l.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);
                        startActivity(l);
                    }
                }
            }
        });
        k = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
        k.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
        k.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
        k.putExtra(RecognizerIntent.EXTRA_PROMPT, "Say something");
        try {
            startActivityForResult(k, 400);
        } catch (ActivityNotFoundException a) {
            Log.i("CrowdSpeech", "Your device doesn't support Speech Recognition");
        }
        if(crowdFilter && running==4){
        try {
            startRecording();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
      }
    }

Recognizer intent web search version:

Java:

    public void getWeb() {//Search the web from voice input
        k = new Intent(RecognizerIntent.ACTION_WEB_SEARCH);
        k.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
        k.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
        k.putExtra(RecognizerIntent.EXTRA_PROMPT, "Say something");
        try {
            startActivityForResult(k, 400);
        } catch (ActivityNotFoundException a) {
            Log.i("CrowdSpeech", "Your device doesn't support Speech Recognition");
        }
        if(crowdFilter && running==4){
            try {
                startRecording();
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            }
        }
    }

And the startRecording method that applies the bandpass filter:

Java:

    private void startRecording() throws FileNotFoundException {

        if (running == 4) {//start recording from mic, apply bandpass filter and save as wave file using TARSOS library
            dispatcher = AudioDispatcherFactory.fromDefaultMicrophone(RECORDER_SAMPLERATE, bufferSize, 0);
            AudioProcessor p = new BandPass(freqChange, tollerance, RECORDER_SAMPLERATE);
            dispatcher.addAudioProcessor(p);
            isRecording = true;
            // Output
            File f=new File(myFilename.toString()+"/Filtered result.wav");
            RandomAccessFile outputFile = new RandomAccessFile(f, "rw");
            TarsosDSPAudioFormat outputFormat = new TarsosDSPAudioFormat(44100, 16, 1, true, true);
            WriterProcessor writer = new WriterProcessor(outputFormat, outputFile);
            dispatcher.addAudioProcessor(writer);
            recordingThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    dispatcher.run();
                }
            }, "Crowd_Speech Thread");
            recordingThread.start();
        }
    }

The only reason I'm doing it this way is in hopes that by applying the filter that the tts engine would receive the modified audio, which is also saved in a file because originally I wanted to just pass the file to tts to read after recording, Is there any way to accomplish this?

Another thing I'm thinking of is there any possible way inside my project that I can modify the source code inside the library that the recognizer intent references so that I can add a parameter to get audio from file?

EDIT: 9/8/17
Getting closer to an answer, I dug deeper and found that google gets a flac file instead of a wave file to translate speech into text, so I imported 2 new libraries, AndroidAudioConverter and FFmpegAndroid via the build.gradle at the app level:

Java:

dependencies {
    //other compliations

    compile 'com.github.adrielcafe:AndroidAudioConverter:0.0.8'
    compile 'com.writingminds:FFmpegAndroid:0.3.2'
}
repositories {
    maven {
        url "https://jitpack.io"
    }
}

and then used a googleResponse class I found online along with another recognizer class to convert the wav file to a flac file and submit it to google, now trying to find out how the get the response text and send it to be spoken, so much bouncing around and unused/un needed (in my app's case) methods in the recognizer class is totally confusing me!

tinkyID · Jun 9, 2020

Ones I wanted to transcribe my audio file with an interview. I tried to invent a similar code for Java like u, but this text I needed in near future. The easiest way for me was to find someone who can make this task fast, and for not much money. Luckily I found this website with a transcription services review which helped me to choose services. Thanks to them and their website I chose services, which are good for the price, and good quality of transcription! U can take a quick overlook on their site, maybe u’ll find out smth interesting about your problem! Have no regrets about their reviews and completed work.

Pass audio file to text to speech engine?

RoadhammerGaming

Newbie

tinkyID

Lurker