The
underlying problem of speech-recognition is quite straightforward to layout:
how does one create software that encompasses, recognizes, and interprets the variations,
nuances, and evolution of natural languages in a deterministic manner? These
aforementioned aspects of language are attributed to the multiplicities of
context and semantics possessed by words and groups of words. Consequently, this
significantly increases the complexity of the problem for the computer as
deciphering context to ascertain the correct definition of a word or meaning of
a group of words requires a sort of innate knowledge and instinct that is very
difficult to translate into a programming language. In addition to the interpretative
problems arising from the polymorphic and amorphous nature of languages, there are also
technical difficulties posed in actually recognizing the words spoken by the
user as well as incorporating noise-recognition and cancellation software to
ensure that the computer analyzes the right words.
Let’s
start by focusing on the two problems highlighted regarding the actual
recognition of the words spoken. Even in a noise-free environment, recognizing
the words spoken by a user can be difficult because of the various accents that
exist. A method typically employed to account for this is a training session
with a user for typically half an hour to an hour. The purpose is mutual: new
users become accustomed to speaking to the machine and conversely the machine
becomes accustomed to the voice of the user. To enhance such a machine’s rate
of learning, a relational data-based management system (RDBMS) that keeps track
of accents by relating them to dominant variables from user-generated data that
greatly determine a user’s accent could be employed. These
variables might include ethnicity, date of birth (accents may slightly vary
depending on when someone is born in a given region), and any information
regarding where a user lived at some point in his/her life and for how long.
The enhancement of a machine’s rate of learning is of paramount importance in
the future if speech-technology is to be implemented in other industries, such
as fast-food restaurants and their drive-thrus, because we expect these
machines to recognize what we are saying with relative immediacy and without
having to spare a “training-session” that may not be feasible.
The
impediment of noise is important to deal with as a speech-recognizing machine
takes the input as is and does not inherently know given an input of sounds
which ones are intentional. To facilitate accurate speech-recognition of the
user, these programs could incorporate a technique used by noise-cancelling
headphones. This technique basically involves splitting up noise into the
individual sounds causing it via digital signal processing algorithms and then
artificially generating sound waves that interfere destructively with these
sounds to consequently negate the noise. It is theoretically possible to apply the same
technique here except once the sound waves the machine retrieves are isolated
via digital signal processing, the machine should isolate the sounds that
originate from the user by cross-verifying the properties of the sound waves
generated by the user from a built-in database of sounds produced by the user
in the past. Incorporating this may not solve all the problems faced as
processing multiple sounds may not result in an accurate recognition of
user-input because of the processing power of current speech-recognition
machines as well as a possible lack of data on the user’s speech patterns and
properties.
Finally,
we come to the interpretative problems in speech-recognition software, which is
the set of problems most pertaining to linguists. Compared to the problems
previously discussed which can be solved through technological means, the
problem of interpreting contexts is orders of magnitude more difficult. The
lack of a systematic structure to context and its continuous evolution with
time make it hard to track merely through a database. Although it’s a
relatively new field, machine learning will be indispensable in the future
growth of speech-recognition software. The implementation of machine learning
algorithms will theoretically aid the machine in keeping up-to-date with the
contexts of current words and phrases and the accuracy will get progressively
better as more contexts are introduced to the machine. This will reduce the
size of the database needed and simultaneously improve the efficacy of
speech-recognition software.
In
summation, despite the rapid growth and relevance of speech-technology, there
are still many obstacles to overcome, especially in speech-recognition software,
before speech-technology becomes more widely incorporated into industries and
eventually our everyday lives. Better technology and methodologies to be able
to filter and accurately conceive input from the user as well as employing
machine-learning algorithms to improve and expand the interpretative
capabilities of speech-recognition software are pivotal steps that, if taken, will allow us to tap into the vast potential
speech-technology can provide us in the future.
No comments:
Post a Comment