DESCRIPTION

TLK is an open source set of tools for Automatic Speech Recognition (ASR) developed at the Universitat Politècnica de València (UPV) by the transLectures-UPV Team. Among other functionalities, it features parameter estimation of hidden Markov models (HMMs) and recognition (speech, text…).

The current stable version (1.3.1) includes the following main ASR functions:

  • Diagonal Gaussian mixture and Bernoulli mixture HMM acoustic models.

  • Feature extraction.

  • I/O of acoustic models.

  • Initialisation of acoustic models.

  • Parameter estimation for acoustic models, including the Baum-Welch and Viterbi algorithms.

  • Acoustic model adaptation: MLLR and CMLLR features.

  • Recognition using ARPA language models and self-generated acoustic models.

  • Viterbi alignment.

  • Incremental training of acoustic models.

  • Weighted interpolation of acoustic models.

  • Deep Neural Network (DNN) support in recognition.

  • Fast model loading via binary models.

And these additional usability features:

  • High-level tools to facilitate the segmentation preprocessing, training and recognition of standard acoustic systems.

  • Tool to directly transcribe a media file using a pre-install system.

  • Simple configuration files for training setup.

  • Compressed ZIP file support.

  • Internationalisation of tools and documentation.

TUTORIALS

TLK includes three tutorials:

  1. A tutorial on the use of tLtranscribe, which can directly transcribe a media file using a pre-installed system. This tool is recommended for non-experts, as it does not require any kind of configuration.

  2. A tutorial that covers the whole automatic transcription process of real video lectures using TLK’s high-level scripts. This tutorial is self-contained (it comes with all the data required), and is designed so that it can be followed without previous expertise in automatic speech recognition.

  3. A more technical short tutorial on how to train a simple monophone model using TLK’s basic, low-level tools.

TOOLS

High-level scripts

tLtask-preprocess

prepares data for it to be used with the TLK tools.

tLtask-train

trains an HMM acoustic model.

tLtask-recognise

transcribes a list of audio files.

tLtask-segment

segments the input media file into speech and non-speech segments and extracts the corresponding speech samples.

tLtranscribe

transcribes automatically a media file using a pre-installed system.

Low-level tools

tLalign

align samples and transcriptions.

tLbin2txt

convert TLK files from binary to text format and vice versa.

tLclassify

classify samples.

tLcmllr

compute a transformation matrix using the CMLLR technique.

tLcmllrfeas

transform samples using a CMLLR matrix.

tLextract

extract features from audio files.

tLhinit

initialise acoustic models from Viterbi alignments.

tLinit

initialise acoustic models from linear alignments.

tLlmformat

format language models for use with tLrecognise.

tLmkproto

compute an acoustic model prototype.

tLmllr

adapt acoustic models using the MLLR technique.

tLmumix

split mixture components in acoustic models.

tLrecognise

recognise samples.

tLtomix

transform an acoustic model to mixture form.

tLtrain

train acoustic models using maximum likelihood.

tLupdate

update acoustic models from count files.

tLmustd

standardizes the feature vector of the input sample list.

tLdnn-classify

generate samples corresponding to DNN estimation of senones.

tLdnn-adapt

adapt a deep neural network given a list of samples.

tLdnn-combine

combine the samples generated from two different DNNs.

LIBRARY

The TLK programming library is libTLK, conceived to make it easy to write programs that employ the core utilities of TLK. The library can be directly linked to any program. Almost all programs in TLK are based on the TLK library.

The most recent version of the library’s documention will be always available at https://www.translectures.eu//doctools/libtL/index.html.

AUTHORS

The transLectures-UPV Team. For a full list of members see the AUTHORS file.

For any question related to the software or the manual please send mails to <translectures-tlk@dsic.upv.es>

Copyright (C) 2013,2014 The transLectures-UPV Team. The software and this documentation are distributed under the terms of the Apache License, Version 2.