- Top Authors
- Independent Component Analysis (ICA)
- Research Papers
- A critique of pure learning and what artificial neural networks can learn from animal brains
Fishpond works with suppliers all over the world to bring you a huge selection of products, really great prices, and delivery included on over 25 million products that we sell. We do our best every day to make Fishpond an awesome place for customers to shop and get what they want — all at the best prices online.
About Fishpond. It's easy to get started - we will give you example code.
After you're set-up, your website can earn you money while you work, play or even sleep! You should start right now! Sign up now. Are you the Author or Publisher of a book? Or the manufacturer of one of the millions of products that we sell. There are now many different ICA techniques or algorithms. A summary of the most widely used algorithms and techniques can be found in books and references therein about ICA e.
For example, the work described in A. Instead, in order to optimize its performance, this algorithm has gone through several recharacterizations by a number of different entities. One such change includes the use of the "natural gradient", described in Amari, Cichocki, Yang Other popular ICA algorithms include methods that compute higher-order statistics such as cumulants Cardoso, ; Comon, ; Hyvaerinen and Oja, It is emphasized that the methods mentioned so far are restricted to the separation of signals resulting from a linear stationary mixture o f source signals.
The phenomenon resulting from the summing of direct path signals and their echoic counterparts is termed reverberation and poses a major issue in artificial speech enhancement and recognition systems. Presently, ICA algorithms require include long filters which can separate those time-delayed and echoed signals, thus precluding effective real time use. In such a prior art system, a network of filters, acting as a neural network, serve to resolve individual signals from any number of mixed signals inputted into the filter network. The direct filters Wi and W 2 communicate for direct adjustments.
The cross filters are feedback filters that merge their respective filtered signals with signals filtered by the direct filters. After convergence of the ICA filters, the produced output signals Ui and U 2 represent the separated signals. Torkkola suggests an ICA system maximizing the entropy of separated outputs but employing un-mixing filters instead of static coefficients like in Bell's patent. However, the ICA calculations described in Torkkola to calculate the joint entropy and to adjust the cross filter weights are numerically unstable in the presence of input signals with time- varying input energy like speech signals and introduce reverberation artifacts into the separated output signals.
The proposed filtering scheme therefore does not achieve stable and perceptually acceptable blind source separation of real-life speech signals. Many ICA implementations also require multiple rounds of feedback filters and direct correlation of filters. As a result, it is difficult to accomplish ICA filtering of speech in real time and use a large number of microphones to separate a large number of mixed source signals, h the case of sources originating from spatially localized locations, the un-mixing filter coefficients can be computed with a reasonable amount of filter taps and recording microphones.
However if the source signals are distributed in space like background noise originating from vibrations, wind noise or background conversation, the signals recorded at microphone locations emanate from many different directions requiring either very long and complicated filter structures or a very large number of microphones. The computational complexity of such a system should be compatible with the processing power of small consumer devices such as c ell phones, Personal Digital Assistants PDAs , audio surveillance devices, radios, and the like.
The speech process operates on a device s having at least two microphones, such as a wireless mobile phone, headset, or cell phone. At least two microphones are positioned on the housing of the device for receiving desired signals from a target, such as speech from a speaker. The microphones are positioned to receive the target user's speech, but also receive noise, speech from other sources, reverberations, echoes, and other undesirable acoustic signals. At least both microphones receive audio signals that include the desired target speech and a mixture of other undesired acoustic information.
The mixed signals from the microphones are processed using a modified ICA independent c omponent analysis process. The speech process uses a predefined speech characteristic, which has been predefined, to assist in identifying the speech signal, hi this way, the speech process generates a desired speech signal from the target user, and a noise signal.
The noise signal may be u sed t o further filter a nd p rocess t he d esired speech signal. The two channels of input signals are filtered by the cross filters, which are preferably infinitive impulse response filters with nonlinear bounded functions. The nonlinear bounded functions are nonlinear functions with pre-determined maximum and minimum values that can be computed quickly, for example a sign function that returns as output either a positive or a negative value based on the input value.
Following repeated feedback of signals, two channels of output signals are produced, with one channel containing substantially desired audio signals and the other channel containing substantially noise signals. Input signals, which are combinations of desired speech signals and noise signals, are received from at least two channels. An equal number of independent component analysis cross filters are employed. Signals from the first channel are filtered by the first cross filter and combined with signals from the second channel to form augmented signals on the second channel.
The augmented signals on the second channel are filtered by the second cross filter and combined with signals from the first channel to form augmented signals on the first channel. The augmented signals on the first channel can be further filtered by the first cross filter. The filtering and combining processes are repeated to reduce information redundancy between the two channels of signals. The produced two channels of output signals represent one channel of predominantly speech signals and one channel of predominantly non-speech signals.
Additional speech enhancement methods, such as spectral subtraction, Wiener filtering, de-noising and speech feature extraction may be performed to further improve speech quality. In one stabilization example, the filter weight adaptation rule is designed in such a manner that the weight adaptation dynamics are in pace with the overall stability requirement of the feedback structure. Unlike previous approaches, the overall system performance is thus not solely directed towards the desired entropy maximization of separated outputs but considers stability constraints to meet a more realistic objective.
This objective is better described as a maximum likelihood principle under stability constraint. These stability constraints in maximum 1 ikelihood e stimation c orrespond to modeling temporal characteristics of the source signals. In entropy maximization approaches signal sources are assumed i. However, real signals such as sounds and speech signals are not random signals but have correlations in time and are smooth in frequency. This results in a corresponding original ICA filter coefficient learning rule.
The scaling factor is determined from a recursive equation and is a function of the channel input energy. It is thus unrelated to the entropy maximization of the subsequent ICA filter operations. Furthermore the adaptive nature of the ICA filter structure implies that the separated output signals contain reverberation artifacts if filter coefficients are adjusted too fast or exhibit oscillating behavior. Thus the learned filter weights have to be smoothed in the time and frequency domains to avoid reverberation effects.
Since this smoothing operation slows down the filter learning process, this enhanced speech intelligibility design aspect has an additional stabilizing effect on the overall system performance. For example, an alternative embodiment of the present invention contemplates including voice activity detection and adaptive Wiener filtering since these methods exploit solely temporal or spectral information about the processed signals, and would thus complement the ICA filtering unit. These typically result in deteriorated convergence performance and overall system stability. Quantization effects can be controlled by limiting the cross filter lengths and by changing the original feedback structure s o the post-processed ICA output is instead fed back into the ICA filter structure.
It is emphasized that the down scaling of input energy in a finite precision environment is not only necessary from a stability point of view, but also because of the finite range of computed numerical values. Although performance in finite precision environments is reliable and adjustable, the proposed speech processing scheme should preferably be implemented in floating point precision environments. Finally implementation under computational constraints is accomplished by appropriately choosing the filter length and tuning the filter coefficient update frequency.
Indeed the computational complexity of the ICA filter structure is a direct function of these latter variables. Detailed Description of the Preferred Embodiment  Preferred embodiments of a speech separation system are described below in connection with the drawings. In order to enable real-time processing with limited computing power, the system uses an improved ICA processing sub-module of cross filters with simple and easy-to-compute bounded functions.
Compared to conventional approaches, this simplified ICA method reduces the computing power requirement and successfully separates speech signals from non-speech signals. The system includes a speech enhancement module , an optional speech de- noising module , and an optional speech feature extraction module The speech enhancement module includes an improved ICA processing sub-module and optionally a post-processing sub-module The improved ICA processing sub-module uses simplified and improved ICA processing to achieve real-time speech separation with relatively low computing power.
In applications that do not require real-time speech separation, the improved ICA processing can further reduce the requirement on computing power. As used herein, the terms ICA and BSS are interchangeable and refer to methods for minimizing or maximizing the mathematical formulation of mutual information directly or indirectly through approximations, including time- and frequency- domain based decorrelation methods such as time delay decorrelation or any other second or higher order statistics based decorrelation methods.
It is to be understood that multiple modules or systems can be combined into one module or system and one module or system can be separated into multiple modules or systems to perform the same functions. In preferred embodiments with respect to cell phone applications, the improved ICA processing sub-module , in its own or in combination with other modules, is embodied in a microprocessor chip located in a cell phone.
When implemented in software or other computer-executable instructions, the elements of the present invention are essentially the code segments to perform the necessary tasks, such as with routines, programs, objects, components, data structures, and the like. The program or code segments can be stored in a processor readable medium or transmitted by a computer data signal embodied in a carrier wave over a transmission medium or communication link.
- Introduction to Reactive Gas Dynamics.
- Variational Bayes;
- Bayesian Autoregressive Models?
- Social Exclusion and Mobility in Brazil (Directions in Development) (Directions in Development).
- From Cause to Causation: A Peircean Perspective?
- Art and the Early Greek State (New Studies in Archaeology).
- Dzogchen Teachings.
The "processor readable medium" may include any medium that can store or transfer information, including volatile, nonvolatile, removable and non-removable media. The computer data signal may include any signal that can propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, etc. The code segments may be downloaded via computer networks such as the Internet, Intranet, etc. In any case, the present invention should not be construed as limited by such embodiments.
The speech separation system may also include one or more speech recognition modules not shown to be described below.
As described below, the speech separation system is preferably incorporated into an electronic device that accepts speech input in order to control certain functions, or otherwise requires separation of desired noises from background noises. Many applications require enhancing or separating clear desired sound from background sounds originating from multiple directions. Such applications include human-machine interfaces such as in electronic or computational devices which incorporate capabilities such as voice recognition and detection, speech enhancement and separation, voice- activated control, and the like.
Due to the lower processing power required by the invention speech separation system, it is suitable in devices that only provide limited processing capabilities. Input signals Xi and X 2 are received from channels and , respectively. Typically, each of these signals would come from at least one microphone, but it will be appreciated other sources may be used. Cross filters Wi and W 2 are applied to each of the input signals to produce a channel of separated signals Ui and a channel of separated signals U 2. Channel speech channel contains predominantly desired signals and channel noise channel c ontains p redominantly noise signals.
It should be understood that although the terms "speech channel" and "noise channel" are used, the terms "speech" and "noise" are interchangeable based on desirability, e. In addition, the method can also be used to separate the mixed noise signals from more than two sources. An infinitive impulse response filter is a filter whose output signal is fed back into the filter as at least a part of an input signal.
A finite impulse response filter is a filter whose output signal is not feedback as input. In other forms, the cross filters can each have dozens, hundreds or thousands of filter coefficients. As described below, the output signals Ui and U 2 can be further processed by a post processing sub-module, a de-noising module or a speech feature extraction module. The gain margin for such a system is low in general meaning that an increase in input gain, such as encountered with non stationary speech signals, can lead to instability and therefore exponential increase of weight coefficients.
Since speech signals generally exhibit a sparse distribution with zero mean, the sign function will oscillate frequently in time and contribute to the unstable behavior. Finally since a large learning parameter is desired for fast convergence, there is an inherent trade-off between stability and performance since a large input gain will make the system more unstable. If the learning rules for the filter coefficients are stable, extensive analytical and empirical studies have shown that systems are stable in the BIBO bounded input bounded output.
The final corresponding objective of the overall processing scheme will thus be blind source separation of noisy speech signals under stability constraints. There is a compromise between performance and stability. Scaling the input down by sc fact reduces the SNR which leads to diminished separation performance. The input should thus only be scaled to a degree necessary to ensure stability. Additional stabilizing can be achieved for the cross filters by running a filter architecture that accounts for short term fluctuation in weight coefficients at every sample, thereby avoiding associated reverberation.
This adaptation rule filter can be viewed as time domain smoothing. Further filter smoothing can be performed in the frequency domain to enforce coherence of the converged separating filter over neighboring frequency bins. This can be conveniently done by zero tapping the K-tap filter to length L, then Fourier transforming this filter with increased time support followed by Inverse Transforming.
Since the filter has effectively been windowed with a rectangular time domain window, it is correspondingly smoothed by a sine function in the frequency domain. This frequency domain smoothing can be accomplished at regular time intervals to periodically reinitialize the adapted filter coefficients to a coherent solution. Preferably, f x is a nonlinear bounded function which quickly approaches the maximum value or the minimum value depending on the sign of the variable x.
For example, Eq. A sign function f x is a function with binary values of 1 or -1 depending on whether x is positive or negative. Example nonlinear bounded functions include, but are not limited to:. Although floating point precision is preferred, fixed point arithmetic may be employed as well, more particularly as it applies to devices with minimized computational processing capabilities.
Notwithstanding the capability to employ fixed point arithmetic, convergence to the optimal ICA solution is more difficult. Indeed the ICA algorithm is based on the principle that the interfering source has to be cancelled out. Because of certain inaccuracies of fixed point arithmetic in situations when almost equal numbers are subtracted or very different numbers are added , the ICA algorithm may show less than optimal convergence properties. Because of the limited filter coefficient resolution, adaptation of filter coefficients will yield gradual additional separation improvements at a certain point and thus a consideration in determining convergence properties.
The quantization error effect depends on a number of factors but is mainly a function of the filter length and the bit resolution used. The input scaling issues listed previously are also necessary in finite precision computations where they prevent numerical overflow. Because the convolutions involved in the filtering process could potentially add up to numbers larger than the available resolution range, the scaling factor has to ensure the filter input is sufficiently small to prevent this from happening. The number of audio input channels can be increased beyond the minimum of two channels.
As the number of input channels increases, speech separation quality may improve, generally to the point where the number of input channels equals the number of audio signal sources. For example, if the sources of the input audio signals include a speaker, a background speaker, a background music source, and a general background noise produced by distant road noise and wind noise, then a four-channel speech separation system will normally outperform a two-channel system.
Of course, as more input channels are used, more filters and more computing power are required. For example, in a cellular phone application, one channel may contain substantially desired speech signal, another channel may contain substantially noise signals from one noise source, and another channel may contain substantially audio signals from another noise source. For example, in a multiuser environment, one channel may include speech predominantly from one target user, while another channel may include speech predominantly from a different target user.
A third channel may include noise, and be useful for further process the two speech channels. It will be appreciated that additional speech or target channels may be useful. For example, teleconference applications or audio surveillance applications may require separating the speech signals of multiple speakers from background noise and from each other.
The improved ICA process can be used to not only separate one source of speech signals from b ackground n oise, b ut a lso t o s eparate o ne s peaker's s peech s ignals from another speaker's speech signals. Pre-processing techniques as well as postprocessing techniques which complement the methods and systems described herein clearly will enhance the performance of blind source separation techniques applied to audio mixtures. For example, post-processing techniques can be used to improve the quality of the desired s ignal utilizing the undesirable output or the unseparated inputs.
Similarly, pre-processing techniques or information can enhance the performance of blind source separation techniques applied to audio mixtures by improving the conditioning of the mixing scenario to complement the methods and systems described herein. It is quite possible that the speech channel contains an undesirable level noise signals and the noise channel still contains some speech signals.
For example, if there are more than two significant sound sources and only two microphones, or if the two microphones are located close together but the sound sources are located far apart, then improved ICA processing alone might not always adequately separate desired speech from noise. This is achieved by feeding the separated ICA outputs through a single or multi channel speech enhancement algorithm, for example. A Wiener filter with the noise spectrum estimated from non-speech time intervals detected with a voice activity detector is used to achieve better SNR for signals degraded by background noise with long time support.
In addition, the bounded functions are only simplified approximations to the joint entropy calculations, and might not always reduce the signals' information redundancy completely. Therefore, after signals are s eparated using improved ICA processing, post processing may be performed to further improve the quality of the speech signals. Based on the reasonable assumption that the remaining noise signals in the speech channel have similar signal signatures as the noise signals in the noise channel, those signals in the desired speech channel whose signatures are similar to the signatures of the noise channel signals should be filtered out in the post-processing unit.
For example, spectral subtraction techniques can be used to perform post processing. The signatures of the signals in the noise channel are identified. Compared to prior art noise filters that relay on predetermined assumptions of noise characteristics, the post processing is more flexible because it analyzes the noise signature of the particular environment and removes noise signals that represent the particular environment.
Independent Component Analysis (ICA)
It is therefore less likely to be over-inclusive or under-inclusive in noise removal. Other filtering techniques such as Wiener filtering and Kalman filtering can also be used to perform post processing. This method identifies spatial patterns of BOLD-signal changes across the entire brain and the entire experiment time-course McKeown et al. As such, this method does not implement any a-priori assumptions as to onsets or locations of task effects. Due to this model-free nature, ICA is especially useful for artefact detection, denoizing of data and analysing resting state data Beckmann, For the same reason, it is also useful for the analysis of task-related data that consist of long and inhomogeneous task periods Beckmann, , as for example, during motor sequence learning Kincses et al.
ICA can be used to explore spatial network properties as the composition of the default mode network Leech et al. In this study, we explored the use of probabilistic ICA Beckmann and Smith, on complex, task-fMRI data derived from our version of the TAP where the opponent showed neutral and angry facial expressions.
We re-analysed data previously studied with a GLM to test the validity of the approach by directly comparing ICA and GLM results and to verify additional information gained by the model-free analysis. In a first step, we identified task-related networks among all ICA components to identify those that are recruited during the aggressive interaction. We further investigated how these relate to different task conditions, i. Finally, the GLM analyses revealed a correlation between activity in the dorsal anterior cingulate cortex dACC and intra-subject variability in aggressive behaviour in angry trials Beyer et al.
Thus, we also related component activity to between-trial, within-subject variability in aggressive behaviour, to show whether the recruitment of a neural system including the dACC would be differentially modulated during trials in which participants chose to aggress or refrained from aggression. Thus, the aim of this study was to allow for a direct comparison between GLM and data-driven analysis approaches. That is, one or more networks spanning mPFC and superior temporal gyrus should show increased activity in early task stages of angry trials.
Activity patterns observed for the outcome phase increased activity in angry trials across areas in temporal parietal and prefrontal cortex should likely split up into subnetworks. A network including dACC should be sensitive to punishment selections on a per-trial basis. Forty-one male healthy volunteers participated in the study. Upon arrival in the laboratory, participants received instructions together with their ostensible opponent. They then entered the scanner for the aggression task see below. Afterwards, participants were fully debriefed and paid for their participation.
For the analysis presented here, one additional participant was excluded due to poor whole-brain coverage in the functional MRI data. All participants gave written informed consent and received 8 Euro per hour as compensation for their participation. A detailed description of the task setup can be found in Beyer et al. Briefly, participants were introduced to a confederate of the experimenter who they believed would be playing against them in a competitive reaction time task, the TAP.
The game is split up into a decision phase in which the participant selects a punishment level for his opponent, the reaction time task, and an outcome phase in which participants learn whether he won or lost as well as learn the punishment level selected by the opponent. If the participant loses, he receives the punishment stimulus at the end of the outcome phase. We used an aversive noise as punishment, which could be adjusted in terms of loudness. This noise was adjusted individually such that, with the scanner running a functional scan, participants could clearly perceive the different noise levels, and judged the loudest noise as uncomfortable, but non-painful.
At the beginning of each decision phase, we implemented video sequences showing the opponent during the punishment selection, bearing either a neutral or angry facial expression. Two-second video sequences of the opponent were presented at the beginning of each trial, before participants made their punishment selection.
In one-third of trials, the opponent showed an angry expression while making the punishment selection. In two-third of trials, he showed a neutral expression. The outline of a single trial of the TAP is shown in A. B The eigenspektrum analysis for the estimation of the optimal number of components.
C The frequency analyses for the task time-course and example component time-courses are shown. D The task regressors for the early and late half of the task, convolved with an HRF, overlaid on the boxcar function for the entire trial. Data pre-processing was performed using the FSL5. Movement-related components are then automatically identified and removed from the data by means of spatial regression. The cleaned data were then high-pass filtered with a s frequency cut-off using FEAT.
The main steps of the analysis procedure are shown in Figure 1. To ensure comparability of components between subjects, we used the concatenation option to conduct a group-level ICA. Thus, all functional datasets were concatenated in time and a single ICA decomposition was generated for the entire dataset Calhoun et al. After inspection of the initial eigenspectrum analysis performed by MELODIC Figure 1B , we fixed the number of components to be extracted to 30, a value at the upper limit of the optimal range estimation.
This method can result in task-negative component time courses, reflecting a relative deactivation of voxels weighted positively for respective component. Thus, for each component, we obtained a single time course, with all trials from all participants concatenated. Each data point in these time courses corresponds to one MRI volume. For each component, we determined the frequency of maximal power. We then compared the peak frequency of the task time course to the peak frequencies of the component time courses.
We used this approach of spectral matching to explore, for each component, whether its signal time course was related to the task i. Thus, a component time course could be identified as task-related, even if it was not phase-locked to a specific event in the task or the task onset itself. Sixteen components showed a maximum of frequency power at 0.
The remaining two components components 12 and 13 showed a maximum of frequency power at 0. As shown in Figure 3 , the signal time courses of these components showed two large peaks within a trial, during early and late task stages, and we therefore included them in the data analysis as well. For the 18 task-related components, we conducted time course analyses to test: whether a component was differentially modulated during angry and neutral trials;. We did not perform a dual regression analysis, but used the original component time courses, which are based on a concatenation of the subject-wise data, to extract subject specific time courses.
The concatenated signal time course for each component was split up into subject-specific sections, thus resulting in 31 time-courses per spatial group component. Each subject-wise time course consisted of data points, corresponding to three runs of MRI volumes.
These boxcar functions were convolved with a hemodynamic response function and served as reference for the visualization of component time courses. For each subject, time courses were split up into trial-wise time courses of 15 volumes each corresponding to Time points at the beginning of each trial, prior to any plausibly expected hemodynamic modulation 1 and 2 , and later time points overlapping with following trials 13, 14 and 15 were omitted from this analysis.
For each time point, we computed the subject-wise means for angry and neutral trials. We then compared angry against neutral trials across subjects. As the GLM-analysis showed a correlation of neural reactivity to angry facial expressions and aggressive behaviour within subjects, we additionally tested whether this finding could be replicated using the component time courses. Note that as this analysis is based on previous findings in the same datasets, significant results should not be understood as a replication of previous findings.
Rather, this analysis serves better understanding of potential differences in the results gained by GLM-based and data-driven analysis approaches. Within participants, we standardized punishment selections across all trials. Thus, for each subject we obtained 72 correlation coefficients 18 components with 4 time points each. We conducted one sample t -tests to compare these coefficients across subjects, using a P -value threshold of 0. As this analysis was conducted complementary to the main analysis of comparing time courses between conditions, we did not additionally correct for the comparisons between conditions.
As noted above, any positive findings from this analysis need to be interpreted with caution. Task-related components, which were not differentially modulated during angry and neutral trials, included components spanning mostly sensory and motor areas Figure 2 , thus likely related to task-events as stimulus presentation and button presses. For example, component 2 included mostly auditory cortex and its activity peaked following task offset. This component was therefore likely related to the presentation of the punishment sound Figure 3.
Other components represented activation of the visual system with a time course corresponding to the presentation of visual stimuli in decision and outcome phase components 5 and 6 and the motor system, corresponding to the button presses during decision phase and reaction time task component 9. Component maps. Spatial maps are shown for task-related, condition-unspecific components excluding those shown in Figure 3.
Condition-unspecific components. Shown are examples of task-related, condition-unspecific components with their spatial maps on the left and the corresponding mean time-courses on the right. Error bars denote standard errors of the mean. During the late task phase, increased activity was observed in a network including bilateral inferior and middle frontal gyrus, anterior insula, the cingulate gyrus ranging from anterior to posterior areas, inferior parietal lobule and middle temporal gyrus component 11; Figure 3.
A critique of pure learning and what artificial neural networks can learn from animal brains
Again, this component did not differentiate between conditions. For the outcome phase, this contrast showed increased activity in left temporal pole, middle temporal gyrus and precentral gyrus, the right inferior frontal gyrus, fusiform gyrus, superior parietal lobule, and thalamus Beyer et al. Based on the following results, we propose that these activation patterns are related to modulation of several distinct neural systems. We found components that specifically differentiate between task conditions in late task stages, as well as some which show significant differences across the entire task time course.
Overall, six components showed differential activity during neutral and angry trials, reflected in significant differences between component time courses for angry and neutral trials. Component 4 showed strong overlap with effects observed using the GLM model, both for the decision and outcome phases. As can be seen in Figure 4 , it showed a decrease in activity during neutral trials, but was less modulated by angry trials. The difference between angry and neutral regressors was significant for both early and late task stages time points 4, 5, 7 and 8.