We are currently in the first century of the information age and the new era of information systems engineering. The Signals and Information Processing (SIP) Lab is offering exciting and challenging final-year projects to students who can demonstrate the required interest, motivation and passion. If you are interested in any of the projects on offer please email Roberto Togneri at <roberto[@]ee[.]uwa[.]edu[.]au> or drop by Room 4.10 for an obligation free discussion, additional information and even reading material to help you make a truly informed choice.
PLEASE NOTE: Most of the projects will require proficiency in the MATLAB programming language, and maybe basic shell scripting and/or command-line environments. Most students should be familiar with MATLAB, and be able to learn the basics of shell scripting if required. This should take you no more than 1-2 weeks practice depending on your current computing skills. You may like to refer to my online documentation and tutorials on MATLAB and UNIX/Shell to help you with this. For projects emphasising real-time or embedded implementations proficiency in C/C++ Programming will be important.If you answered yes to all of the above, then you should be seriously considering post-graduate studies after you graduate in 2010. For more information please see:
Postgraduate Research in Signal and Information Processing
1. Audio-Visual Speaker Identification
Humans identify speakers based on a variety of attributes of the person which include acoustic cues, visual appearance cues and behavioural characteristics (for example, gestures). In the past, machine implementations of person identification have focussed on single techniques relating to audio cues alone (speaker recognition), visual cues alone (face identification) or other biometrics. More recently, researchers are attempting to combine multiple modalities for person identification (see http://ee.ucd.ie/mmsp/projects/biometrics.html). Audio-based speaker recognition accuracy under acoustically degraded conditions (such as background noise) and channel mismatch (different telephone handsets) still need further improvement. Conversely visual-based recognition is highly reliant on correct positioning of the camera and ambient lighting. So why not combine the two strategies such that one compensates for the degradations experienced by other? In this project you will implement a basic audio-visual speaker recognition prototype using standard tools for face recognition and speaker recognition. You can do this by direct capture of audio-visual features of friends and family, recordings of pertinent TV broadcasts (e.g. newsreader broadcasts) or make use of available AV corpora. This is a highly challenging systems level project possibly involving advanced theory and algorithm development of various fusion strategies that you may like to investigate.
2. Biomedical Signal Analysis of EEG waveforms
Are you interested to know how the brain works (see http://bci.tugraz.at/downloads.html)? Do you have a solid background or keen interest in signal analysis, modelling and neurophysiology? If so, then read on! This project will involve highly motivated students to investigate one aspect of computer modelling of electroencephalogram (EEG) activity, especially as related to interactive evoked response potential (ERP) via Transcranial Magnetic Stimulation (TMS) or in Brain Computer Interface (BCI) applications. Interactive ERP is a new process in electrophysiology whereby stimuli are delivered in response to selected short term patterns of the EEG. For example, you can try to model the effect of TMS which has not yet been tried. Or attempt classification of EEG patterns in response to changes of mental states for BCI. You will be able to use the EEGLab software (http://sccn.ucsd.edu/eeglab) to analyse EEG data, GENESIS simulator (http://www.genesis-sim.org/GENESIS/whatisit.html) to investigate simulations of brain neural activity and consult colleagues at the Centre for Clinical Research in Neuropsychiatry (CCRN) (http://www.health.wa.gov.au/ccrn/home).
3. Characterisation of Features for Mining Exploration by Capturing and Analysing Human Interpretation Behaviour
The mining industry uses multiple and incomplete datasets comprising geological, geochemical, geophysical and remote sensing data to target locations for mineral exploration. Such analysis is a largely subjective process as their interpretation of data is significantly governed by prior knowledge and experiences as demonstrated clearly from human analysis outputs of seismic data. Thus the decision for choosing an exploration target within a mining company is a task that has to consider and reconcile varying individual opinions from a group of geoscientists. The aim of our study is to observe and analyse the image features (patterns) of interest by capturing the data interpretation behaviour of individuals using devices such as eye and head trackers, and EEG (Electroencephalography) that captures the neurological response. The devices will allow the identification of the location of the data that a person is looking at and the level of his/her neurological responses at that time. With this aim, we offer two separate projects aimed at motivated students with a solid background in signal processing or pattern recognition with good hardware and software skills and a keen interest in human behaviour and cognition and working with one of the key industries of this state. One project will characterise EEG response patterns for geological features using signal processing methodologies comprising PCA, ICA and blind source separation, another project will characterise image features of interest and examine the variations amongst individual data interpreters based on data from an eye tracker and applying statistical and pattern recognition to model the responses. Both projects will be co-supervised by colleagues from the Centre for Exploration Targeting (http://www.cet.uwa.edu.au).
4. Auditory Speech Processing
The human auditory system allows us to listen to the subtlest sounds and yet cope with the loudest noises. We can understand one another in the presence of other speakers and other noises. However when engineers attempt to get communications systems to code, transmit and recognise speech it becomes apparent from the difficulties involved there is a lot we don't understand about human audition. In this project you will explore different auditory software and paradigms (Google for "Auditory Toolbox", "Development System for Auditory Modelling" and "Auditory Image Model") and implement one or more of these as a frontend to a speech resynthesis, recognition or classification system and evaluate the performance. This is an ideal project for the highly motivated student interested in biomedical computational processing with a solid background in signal processing, systems modelling and software use.
5. Robust Speech Processing using Spectral Subtraction and Voice-Activity Detection
In telecommunications: mobile telephony, VoIP, and speech recognition it is important to transmit intelligible speech in the presence of interfering noise. Speech enhancement is an important goal for speech signal and telecommunications engineers to improve voice communications and as a frontend to speech recognisers (see http://www.isr.umd.edu/Labs/SCL/research.html). In this project you will contaminate samples of speech with additive noise and then attempt to use a combination of spectral subtraction (where you subtract an estimate of the noise spectrum from the noisy speech to yield the "clean" speech) with voice-activity detection (detecting when speech is not present so you know when you can estimate the noise spectrum!) for single-channel speech enhancement. So how much of the noise can you remove? How intelligible is the resulting signal? Can you improve the enhancement by tweaking the parameters? Do this project and find out!
6. Robust Voice Tagging for Inventory systems
An industry partner has a problem with voice tagging for hands-free inventories, which is just simple isolated word recognition: yes, no, 0-9. Sounds simple and easy enough? Nevertheless the existing commercial systems are just not reliable enough and too slow. So is there a way to achieve a 99% recognition accuracy with simple 0-9 digit recognition trained to a specific user and headset/pickup? Can a simple pattern recognition approach of the utterance spectrogram work (rather than a full blown system using, say, the HTK software (see http://htk.eng.cam.ac.uk)? And how can you tell when a person is speaking to the system? Will it work in all situations (i.e. the user is excited today or just bored, etc. so the speech changes, will the system then fail)? Do this project and find out whether you can build a reliable, fast digit recogniser and trump the existing commercial systems!
7. Detecting the Presence of People in Images
From forensic analysis to medical diagnosis, the ability to correctly detect and identify objects of interest from images is paramount. In this project you will investigate the key surveillance problem of detecting the presence of people in an image (and if possible their location and the number of people present, see http://cvlab.epfl.ch/projects/cti/surv). You will do this exploring a variety of techniques from image processing and pattern recognition: skin detection, feature selection, body component classification, etc. You can capture your own images of people in different environments (dark room, crowded mall, etc.) and see how well you can detect the location and number of people in the image. This is a challenging project for students interested in one of the strategic image and visualisation issues: object detection and identification.
8. Blind Extraction of a Target Source or Speaker
Imagine you are in a room with many people (sources) speaking. We humans can usually cope with this "cocktail party" problem and extract only the speech from the person we are interested in listening too. Being able to automate this process has obvious applications: interference-free voice transmission by being able to extract the dominant speaker (and ignore all others), extraction of a biomedical signal (e.g. EEG, ECG, etc.) of interest given many neural sources, etc.. These systems represent blind mixtures of competing sources. With blind extraction one can extract the signal of interest amongst the mixture. In this project you will investigate blind extraction of sources, and carry out evaluations using simulated mixtures (e.g. two speakers, speaker and music, etc.). Then you listen to the result ... so how well have you extracted the speaker from the crowd? This is an ideal project for students with a good background in signal processing and MATLAB programming, and a strong interest in dabbling with speech and audio signal applications.
9. A New Way forward for Speaker Recognition
The SIP Lab has carried out investigations on adapting a new paradigm for signal processing to classification for face recognition (Sparse Representation Classification or SRC) and then proposing an even simpler, and yet equally powerful, paradigm (Linear Regression Classification or LRC) which has recently been published in a prestigious journal. But we can do more! Using the idea of Gaussian mean super-vectors we can now form a "template" of a speaker's voice (as we do with a face image) and we have had some preliminary success in applying SRC and LRC to speaker identification. In this challenging but rewarding project you will work with fellow researchers and further explore the latest innovations of the LRC in face recognition and apply these to speaker identification. You will work in the exciting area of biometric identification, explore novel methodologies, develop a deep understanding of face and speaker recognition, and stand a very good chance of a research publication.
| Last Updated: 31 July 2009 |