IEEE ICASSP 2020 Virtual Conference May 2020 | IEEETV

Thu, 16 July, 2020

Showing 201 - 250 of 1951

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Acoustic Scene Classification For Mismatched Recording Devices Using Heated-Up Softmax And Spectrum Correction

[2 Videos ]

Deep neural networks (DNNs) are successful in applications with matching inference and training distributions. In real-world scenarios, DNNs have to cope with truly new data samples during inference, potentially coming from a shifted data distribution. Th

Show videos in this product

Acoustic Scene Classification For Mismatched Recording Devices Using Heated-Up Softmax And Spectrum Correction

00:14:28

0 views

Deep neural networks (DNNs) are successful in applications with matching inference and training distributions. In real-world scenarios, DNNs have to cope with truly new data samples during inference, potentially coming from a shifted data distribution. Th
Acoustic Scene Classification For Mismatched Recording Devices Using Heated-Up Softmax And Spectrum Correction

00:14:28

0 views

Deep neural networks (DNNs) are successful in applications with matching inference and training distributions. In real-world scenarios, DNNs have to cope with truly new data samples during inference, potentially coming from a shifted data distribution. Th

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Multiple Points Input For Convolutional Neural Networks In Replay Attack Detection

00:14:39

0 views

The models based on convolutional neural network (CNN) have shown remarkable performance in spoofing detection for automatic speaker verification. In order to input data into CNN-based models in mini-batch unit, the shape of all data in each mini-batch mu

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Improving Auditory Attention Decoding Performance Of Linear And Non-Linear Methods Using State-Space Model

00:14:21

680 views

Identifying the target speaker in hearing aid applications is crucial to improve speech understanding. Recent advances in electroencephalography (EEG) have shown that it is possible to identify the target speaker from single-trial EEG recordings using aud

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Meta-Learning To Communicate: Fast End-To-End Training For Fading Channels

00:16:19

0 views

When a channel model is available, learning how to communicate on fading noisy channels can be formulated as the (unsupervised) training of an autoencoder consisting of the cascade of encoder, channel, and decoder. An important limitation of the approach

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Lipreading Using Temporal Convolutional Networks

00:13:49

0 views

Lip-reading has attracted a lot of research attention lately thanks to advances in deep learning. The current state-of-the-art model for recognition of isolated words in-the-wild consists of a residual network and Bidirectional Gated Recurrent Unit (BGRU)

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Improving Cross-Dataset Performance Of Face Presentation Attack Detection Systems Using Face Recognition Datasets

00:14:36

848 views

Presentation attack detection (PAD) is now considered critically important for any face-recognition (FR) based access-control system. Current deep-learning based PAD systems show excellent performance when they are tested in intra-dataset scenarios. Under

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Orthogonal Training For Text-Independent Speaker Verification

00:13:26

0 views

In this paper we propose orthogonal training schemes to improve the effectiveness of cosine similarity measurements in text-independent speaker verification (SV) tasks. Compared to the PLDA backend, cosine similarity is simple to compute, and it does not

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

00:03:39

0 views

Multi-modality fusion technologies have greatly improved the performance of neural network-based Video Description/Caption, Visual Question Answering (VQA) and Audio Visual Scene-aware Dialog (AVSD) over the recent years. Most previous approaches only exp

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

On The Choice Of Graph Neural Network Architectures

00:13:16

1 view

Seminal works on graph neural networks have primarily targeted semi-supervised node classification problems with few observed labels and high-dimensional signals. With the development of graph networks, this setup has become a de facto benchmark for a sig

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Griffinâ€“Lim Like Phase Recovery Via Alternating Direction Method Of Multipliers

00:12:58

0 views

Recovering a signal from its amplitude spectrogram, or phase recovery, exhibits many applications in acoustic signal processing. When only an amplitude spectrogram is available and no explicit information is given for the phases, the Griffin-Lim algorithm

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Bilevel Optimization Using Stationary Point Of Lower-Level Objective Function

00:14:56

590 views

In this letter, we address an audio signal separation problem and propose a new effective algorithm for solving a bilevel optimization in discriminative nonnegative matrix factorization (NMF). Recently, discriminative training of NMF bases has been develo

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Deepjscc: The Future Of Wireless Video Transmission

00:08:40

717 views

We propose a demonstration of a joint source-channel coding (JSCC) scheme, called DeepJSCC, for wireless video transmission. Unlike conventional digital communication systems, which rely on separate source and channel coding, DeepJSCC is a purely data-dri

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

A Random Gossip Bmuf Process For Neural Language Modeling

00:14:10

0 views

Neural network language model (NNLM) is an essential component of industrial ASR systems. One important challenge of training an NNLM is to leverage between scaling the learning process and handling big data. Conventional approaches such as block momentum

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Consistency-Aware Multi-Channel Speech Enhancement Using Deep Neural Networks

00:13:54

0 views

This paper proposes a deep neural network (DNN)--based multi-channel speech enhancement system in which a DNN is trained to maximize the quality of the enhanced time-domain signal. DNN-based multi-channel speech enhancement is often conducted in the time-

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Unsupervised Training For Deep Speech Source Separation With Kullback-Leibler Divergence Based Probabilistic Loss Function

00:14:19

0 views

In this paper, we propose a multi-channel speech source separation method with a deep neural network (DNN) which is trained under the condition that no clean signal is available. As an alternative to a clean signal, the proposed method adopts an estimated

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

A Deep Gradient Boosting Network For Optic Disc And Cup Segmentation

00:12:24

0 views

Segmentation of optic disc (OD) and optic cup (OC) is critical in automated fundus image analysis system. Existing state-ofthe-arts focus on designing deep neural networks with one or multiple dense prediction branches. Such kind of designs ignore connect

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Feature Selection Under Orthogonal Regression With Redundancy Minimizing

00:14:51

0 views

Various supervised embedded methods have been proposed to select discriminative features from original ones, such as Feature Selection with Orthogonal Regression (FSOR) and Robust Feature Selection. Compared with embedded methods based on the least square

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Audio Codec Enhancement With Generative Adversarial Networks

00:15:26

0 views

Audio codecs are typically transform-domain based and efficiently code stationary audio signals, but they struggle with speech and signals containing dense transient events such as applause. Specifically, with these two classes of signals as examples, we

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Differentiable Branching In Deep Networks For Fast Inference

00:09:45

0 views

In this paper, we consider the design of deep neural networks augmented with multiple auxiliary classifiers departing from the main (backbone) network. These classifiers can be used to perform early-exit from the network at various layers, making them con

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Cross Image Cubic Interpolator For Spatially Varying Exposures

00:07:03

0 views

Spatially varying exposures via rolling shutter is an efficient way to capture differently exposed images for high dynamic range (HDR) scenes. Neither camera movement nor moving objects is an issue for such a captured method. However, a possible issue is

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Beyond The Dcase 2017 Challenge On Rare Sound Event Detection: A Proposal For A More Realistic Training And Test Framework

00:13:15

0 views

There are many ways to evaluate rare sound event detection (SED) approaches, e.g., the DCASE 2017 challenge provides a widely employed framework. This paper proposes a rare SED training and test framework, which is reflecting an SED application in a more

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Lie Group State Estimation Via Optimal Transport

00:12:52

0 views

Many applications in science and engineering involve tracking the state of a stochastic differential equation (SDE) evolving in a Lie group. This has been tackled by particle filtering although some existing schemes fail to satisfy geometric constraints.

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Distributed Quantization For Sparse Time Sequences

00:15:27

0 views

Analog signals processed in digital hardware are quantized into a discrete bit-constrained representation. Quantization is typically carried out using analog-to-digital converters (ADCs), operating in a serial scalar manner. In some applications, a set of

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Maximum Likelihood Estimation Of The Interference-Plus-Noise Cross Power Spectral Density Matrix For Own Voice Retrieval

00:14:59

1 view

In headset and hearing aid applications, it is of interest to retrieve the user's own voice in a noisy environment, e.g. for telephony applications. To do so, the cross power spectral density (CPSD) of the noise is required. In this paper, a novel maximum

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Multitask Learning And Multistage Fusion For Dimensional Audiovisual Emotion Recognition

00:13:07

0 views

Due to its ability to accurately predict emotional state using multimodal features, audiovisual emotion recognition has recently gained more interest from researchers. This paper proposes two methods to predict emotional attributes from audio and visual d

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Joint Resource Allocation And Routing For Service Function Chaining With In-Subnetwork Processing

00:14:53

0 views

Network Function Virtualization (NFV) is an efficient approach to simplify and accelerate the deployment of diverse network services. A critical challenge lies in mapping Virtual Network Functions (VNFs) to high-volume servers, resource allocation, and tr

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

On-The-Fly Feature Selection And Classification With Application To Civic Engagement Platforms

00:14:07

0 views

Online feature selection and classification is crucial for time sensitive decision making. Existing work however either assumes that features are independent or produces a fixed number of features for classification. Instead, we propose an optimal framewo

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Mutual-Information-Based Sensor Placement For Spatial Sound Field Recording

00:13:48

0 views

A sensor (microphone) placement method based on mutual information for spatial sound field recording is proposed. The sound field recording methods using distributed sensors enable the estimation of the sound field inside a target region of arbitrary shap

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Dynamic Variational Autoencoders For Visual Process Modeling

00:14:39

0 views

This work studies the problem of modeling visual processes by leveraging deep generative architectures for learning linear, Gaussian representations from observed sequences. We propose a joint learning framework, combining a vector autoregressive model an

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Exploiting Rays In Blind Localization Of Distributed Sensor Arrays

00:14:12

1 view

Many signal processing algorithms for distributed sensors are capable of improving their performance if the positions of sensors are known. In this paper, we focus on estimators for inferring the relative geometry of distributed arrays and sources, i.e. t

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Opendenoising: An Extensible Benchmark For Building Comparative Studies Of Image Denoisers

00:14:21

0 views

Image denoising has recently taken a leap forward due to machine learning. However, image denoisers, both expert-based and learning-based, are mostly tested on well-behaved generated noises (usually Gaussian) rather than on real-life noises, making perfor

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Adaptive Blind Audio Source Extraction Supervised By Dominant Speaker Identification Using X-Vectors

00:14:56

0 views

We propose a novel algorithm for adaptive blind audio source extraction. The proposed method is based on independent vector analysis and utilizes the auxiliary function optimization to achieve high convergence speed. The algorithm is partially supervised

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Esrgan+ : Further Improving Enhanced Super-Resolution Generative Adversarial Network

[2 Videos ]

Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) is a perceptual-driven approach for single image super-resolution that is able to produce photorealistic images. Despite the visual quality of these generated images, there is still room fo

Show videos in this product

Esrgan+ : Further Improving Enhanced Super-Resolution Generative Adversarial Network

00:13:15

0 views

Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) is a perceptual-driven approach for single image super-resolution that is able to produce photorealistic images. Despite the visual quality of these generated images, there is still room fo
Esrgan+ : Further Improving Enhanced Super-Resolution Generative Adversarial Network

00:00:00

0 views

Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) is a perceptual-driven approach for single image super-resolution that is able to produce photorealistic images. Despite the visual quality of these generated images, there is still room fo

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Vamp With Vector-Valued Diagonalization

00:14:48

0 views

Vector approximate message passing is studied where vector-valued diagonalization instead of a uniform one is employed. Thereby, individual variances are tracked within the algorithm instead of an average one. Straightforward application based on the expe

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Sparse Directed Graph Learning For Head Movement Prediction In 360 Video Streaming

00:14:46

0 views

High-definition 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular remedy is to extract and send a spatial sub-region corresponding to a viewer's current field-of-view

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Bandwidth Extension Of Musical Audio Signals With No Side Information Using Dilated Convolutional Neural Networks

00:12:48

2 views

Bandwidth extension has a long history in audio processing. While speech processing tools do not rely on side information, production-ready bandwidth extension tools of general audio signals rely on side information that has to be transmitted alongside th

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Cif: Continuous Integrate-And-Fire For End-To-End Speech Recognition

00:15:00

1 view

In this paper, we propose a novel soft and monotonic alignment mechanism used for sequence transduction. It is inspired by the integrate-and-fire model in spiking neural networks and employed in the encoder-decoder framework consists of continuous functio

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Gray-Scale Image Colorization Using Cycle-Consistent Generative Adversarial Networks With Residual Structure Enhancer

00:13:49

0 views

The colorization of gray-scale images has always been a challenging task in computer vision. Recently, novel approaches have been introduced for unsupervised image translation between two domains using Generative Adversarial Networks (GANs). Since one can

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Inferring Dynamic Group Leadership Using Sequential Bayesian Methods

00:10:40

0 views

In group object tracking, the identification of the group leader can be highly beneficial for predicting the intention and future manoeuvres of objects as well as learning the underlying group behaviour traits. This paper presents an online approach for i

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Audio-Visual Recognition Of Overlapped Speech For The Lrs2 Dataset

00:12:54

0 views

Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

A Proximal Dual Consensus Method For Linearly Coupled Multi-Agent Non-Convex Optimization

00:12:11

0 views

Motivated by large-scale signal processing and machine learning applications, this paper considers the distributed multi-agent optimization problem for a linearly constrained non-convex problem. Each of the agents owns a local cost function and local vari

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Extracting Unit Embeddings Using Sequence-To-Sequence Acoustic Models For Unit Selection Speech Synthesis

00:12:15

0 views

This paper presents a method of using the intermediate representations between linguistic and acoustic features in a Tacotron model to derive the cost functions for unit selection speech synthesis. By extracting the outputs of the Tacotron encoder, each p

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Non-Local Nested Residual Attention Network For Stereo Image Super-Resolution

00:12:25

0 views

Nowadays CNN-based stereo image super-resolution(SR) methods have obtained remarkable performance. However, most of existing methods only superficially portrayed the low layer features without considering the uneven distribution of information, which is i

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Vggsound: A Large-Scale Audio-Visual Dataset

00:13:05

0 views

Our goal is to collect a large-scale audio-visual dataset with low label noise from videos `in the wild' using computer vision techniques. The resulting dataset can be used for training and evaluating audio recognition models. We make three contributions.

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

A Simple And Efficient Iterative Method For Toa Localization

00:12:16

1 view

This paper develops a simple and efficient method for source localization using signal time-of-arrival (TOA) measurements. There exist many TOA localization algorithms, most of which require matrix inversions. Their complexity often makes them unsuitable

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Automatic And Simultaneous Adjustment Of Learning Rate And Momentum For Stochastic Gradient-Based Optimization Methods

00:12:38

0 views

Stochastic gradient-based methods are prominent for training machine learning and deep learning models. The performance of these techniques depends on their hyperparameter tuning over time and varies for different models and problems. Manual adjustment of

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Guided Learning For Weakly-Labeled Semi-Supervised Sound Event Detection

00:15:00

0 views

We propose a simple but efficient method termed Guided Learning for weakly-labeled semi-supervised sound event detection (SED). There are two sub-targets implied in weakly-labeled SED: audio tagging and boundary detection. Instead of designing a single mo

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Transmit Beampattern Shaping Via Waveform Design In Cognitive Mimo Radar

00:13:56

1 view

This paper is focused on designing a set of constant modulus waveform for cognitive Multiple-Input Multiple-Output (MIMO) radar systems. The aim is to shape the beampattern in transmitter to minimize the Integrated Side-lobe Level (ISL) in spatial domain

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Power Optimization Using Embedded Automatic Gain Control Algorithm With Photoplethysmography Signal Quality Classification

00:17:53

0 views

This paper presents the design and implementation of an Automatic Gain Control (AGC) embedded algorithm for photoplethysmographic (PPG) sensors. We use a number of statistical and spectral characteristics of the raw and filtered PPG signals, referred to a