DICTA 2017 - Tutorials

Tutorial 1: Multi-view Learning

Dr Chang Xu (The University of Sydney)
Abstract: In recent years, a great many methods of learning from multi-view data by considering the diversity of different views have been proposed. These views may be obtained from multiple sources or different feature subsets. For example, a person can be identified by face, fingerprint, signature or iris with information obtained from multiple sources, while an image can be represented by its color or texture features, which can be seen as different feature subsets of the image. In this talk, we will organize the similarities and differences between the variety of multi-view learning approaches, highlight their limitations, and then demonstrate the basic fundamentals for the success of multi-view learning. The thorough investigation on the view insufficiency problem and the in-depth analysis on the influence of view properties (consistence and complementarity) will be beneficial for the continuous development of multi-view learning.

Tutorial 2: Learning with Label Noise

Dr Tongliang Liu (The University of Sydney)
Large-scale training data boost the performances of supervised learning but also burdens us with the laborious and expensive labelling task. Some cheap ways are developed to label the data. The obtained labels are therefore likely to be erroneous. A natural question is that can we avoid the adverse effects of the label noise and get the optimal solutions just as learning from the clean data? Or, how to mitigate the adverse effects?
In this tutorial, the recent advances in both theoretical foundations and algorithm designs for label noise will be surveyed. We will first introduce the different types of label noise and the challenges behind them. We will explain the well-designed algorithms or surrogate losses which can provably learn from the corrupted labels efficiently. We will finally give some insights of open questions about label noise.

Tutorial 3: Large scale, vision-based person re-identification

Dr Pantelis Elinas (Data61, Australia)
Dr Fei Mai (Canon Information Systems Research Australia CiSRA)
Dr Geoffrey Taylor (Canon Information Systems Research Australia CiSRA)
Dr Getian Ye (Canon Information Systems Research Australia CiSRA)

Large scale, vision-based person re-identification (or person re-id for short). It is a challenging computer vision and machine learning problem with many practical applications including surveillance, security, and business analytics. Over the years, person re-id has received considerable attention from academia and industry with steady progress in improving re-id accuracy towards a product roadmap for commercial use. However, person re-id, especially for large (hundreds and thousands) camera networks with non-overlapping fields of view still remains largely unsolved. Challenges include large variation in image quality due to person pose, lighting variation, and sensor properties as well as low variability in person appearance. Application of the latest techniques in computer vision and machine learning are paving the way towards a solution but there is much work yet to be done.
In this tutorial consisting of the following 4 sessions, we will introduce the problem of vision-based person re-identification and survey the most researched methods applied to its solution and their performance. We are also going to survey the available datasets for training and evaluating person re-id algorithms and discuss promising future directions and practical implementations. We hope this tutorial will assist members of the DICTA community to get started with research in this challenging problem.
1. Overview and problem definition - The first session will introduce the problem of person re-identification (for large-scale camera networks) and relate it to other similar computer vision problems such as long-term object tracking. Furthermore, the standard evaluation metrics and popular datasets used for benchmarking re-id algorithms will be surveyed and critically discussed.
2. Representation (features) - The second session will consider in depth the problems of representation and feature extraction/selection. This includes an overview of the state-of-the-art engineered and learned image features for characterising a person from single images and videos. The different features will be compared in terms of properties, efficiency of extraction, dimensionality, and matching accuracy using baseline metrics on standard benchmark datasets introduced in the previous session.
3. Matching (classification and metric learning) - The third session will address the challenging problem of person classification by matching the high dimensional features extracted from images. The widely researched and successful application of metric learning methods will be discussed including improvements using classifier ensembles and deep learning as well as extensions for domain adaptation and unsupervised learning.
4. Other topics - In the last session, we will present an overview of additional topics including post-processing techniques, e.g., manifold ranking, fine-grained matching using saliency, latent topic modelling, and human-in-the-loop methods for retrieval applications. Lastly, the challenges of deploying practical person re-identification systems will be discussed including computational and storage requirements for systems with thousands of cameras.

Tutorial 4: Consistent sequential dictionary learning

Dr Karim Seghouane (The University of Melbourne)
Abstract: Algorithms for learning overcomplete dictionaries for sparse signal representation are mostly iterative minimization methods. Such algorithms (including the popular K-SVD) alternate between a sparse coding stage and a dictionary update stage. For most, however, the notion of consistency of the learned quantities has not been addressed. As an example, the non-consistency of the dictionary learned by K-SVD will be discussed in this presentation. New adaptive dictionary learning algorithms are presented based on the observation that the observed signals can be approximated as a sum of matrices of the same or different ranks. The proposed methods are derived via sequential penalized rank one or K matrix approximation, where a sparsity norm is introduced as a penalty that promotes sparsity. The proposed algorithms use block coordinate descent approach to consistently estimate the unknowns and have the advantage of having simple closed form solutions for both sparse coding and dictionary update stages. The consistency properties of both the estimated sparse code and dictionary atom are discussed. Experimental results are presented on simulated data and on a real functional magnetic resonance imaging (fMRI) dataset from a finger tapping experiment. Results illustrate the performance improvement of the proposed algorithm compared to other existing algorithms.