# The Third Workshop of Melbourne-China Big Data Research Network

Date and Time:     14:30 - 18:00, Wednesday, 23 August 2017
Venue: Room 8.03, Doug McDonell Building, The University of Melbourne

### Schedule

Schedule Talk Title Speaker Affiliation
14:30-14:35 Opening Prof. Rui Zhang University of Melbourne
14:35-15:30 Challenges and Opportunities of Internet Architecture Prof. Jianping Wu Tsinghua University
15:30-16:00 Biomarkers for Triple Negative Breast Cancer Prof. Lan Ma Tsinghua University
16:00-16:20 Robust Survey Aggregation Mr. Qingtao Tang and Prof. Shu-Tao Xia Tsinghua University
16:20-16:40 Density-Based Multiscale Analysis for Clustering in Strong Noise Settings Tiantian Zhang and Associate Prof. Bo Yuan Tsinghua University
16:40-17:00 A Taste of Current Projects at Melbourne Bioinformatics Associate Prof. Daniel Park The University of Melbourne
17:00-17:20 Perspectives on personal and spatio-temporal data mining Yousef Kowsar, Lida Rashidi and Prof. Lars Kulik The University of Melbourne
17:20-17:40 Learning to Recommend Personalized Diverse Items Mr. Xiaojie Wang (Supervisor: Prof. Rui Zhang) The University of Melbourne
17:40-18:00 Exploiting frequent patterns to help explain individual predictions Yunzhe(Alvin) Jia (Supervisors: Prof. James Bailey, Prof. Ramamohanarao Kotagiri and Prof. Christopher Leckie) The University of Melbourne

Talk 1: Challenges and Opportunities of Internet Architecture

Abstract
The Internet is the most important information infrastructure of cyberspace which is becoming an increasingly important environment for our life, study and work. The internet architecture is the core technology of the internet which faces more important challenges and opportunities, such as scalability, security, real-time, mobility, performance and manageability etc. In this talk, we introduce the research programs and testbeds for internet architecture in the world, and point out in particular the serious situation and major challenges of internet security. Then it indicates the possible opportunities for the development during the solution of internet security. Finally we will also introduce the authenticated source address validation solution and its effect in internet security.

Speaker: Professor Jianping Wu

Bio: Jianping Wu is a Professor of Department of Computer Science at Tsinghua University and a member of Chinese Academy of Engineering. Dr. Wu is serving as the Dean of the Department of Computer Science, Dean of Institute for Network Sciences and Cyberspace and Director of Information Technology Center at Tsinghua University. Dr. Wu is also serving as the directors of Network Center and Technical Board of China Education and Research Network (CERNET), and the director of the National Engineering Laboratory for Next Generation Internet, a member of Advisory Committee of National Information Infrastructure for Secretariat of State Council of China, and Vice President of Internet Society of China (ISC). He is an IEEE Fellow. Dr. Wu was also the Chairman of Asia Pacific Advanced Network (2007-2011). He was elected a member of Chinese Academy of Engineering in 2015.

Dr. Wu has devoted himself to the research of computer network including technology research, network engineering and cultivating talents for many years. He has led his team to do an in-depth study of network design, network engineering, network core equipment development and network architecture, etc. He was the leader for establishing the China Education and Research Network (CERNET) in 1995, and establishing the first nationwide pure IPv6 network in 2005. Dr. Wu proposed the idea of Source Address Validation Architecture and the 4over6 transition technology from IPv4 to IPv6. Dr. Wu has published more than 300 academic papers, edited and co-edited eight books, and supervised more than 100 undergraduates. As the first author, he developed 4 IETF RFCs in IETF. He is inventor or co-inventor on 20 patents, He received Jonathan B. Postel Service Award of ISOC in 2010.

Talk 2: Biomarkers for Triple Negative Breast Cancer

Abstract
Introduction of recent research progress of discovery and validation of biomarkers for triple negative breast cancer and our lab’s works on these biomarkers by using the bioinformatics analysis technology.

Speaker: Professor Lan Ma

Bio: Lan Ma is a professor in the Graduate School at Shenzhen, Tsinghua University. She has wide research interests in stem cell biology, nano-biomedicine and vaccine & antibody bio-engineering. She completed her Ph.D. in the University of Chinese Academy of Science, 2003. Before this, Lan Ma got a Bachelor's degree and a master's degree in Wuhan University (1987) and Peking University (1993) respectively. Currently, Lan Ma is the Associate Dean of Graduate School at Shenzhen, Tsinghua University. Before working for Tsinghua University from 2005, Lan Ma worked for Yunnan University as an Associate Professor and visiting scholar in Wisconsin National Primate Research Center, University of Wisconsin-Madison.

Talk 3: Robust Survey Aggregation

Abstract
Surveys, which have been widely used in business, social science and data mining, are a common way to investigate the characteristics, behaviors, or opinions of target population. Most existing survey aggregation methods are sensitive to outliers, as these methods assume that the samples are from Gaussian distribution for convenience. To address this issue, we propose a robust survey aggregation method based on Student-t distribution and sparse representation. Specifically, we assume that the samples follow Student-t distribution, instead of the common Gaussian distribution. Due to the Student-t distribution, our method is robust to outliers, which can be explained from both Bayesian point of view and non-Bayesian point of view. In addition, inspired by James-Stain (JS) estimator and Compressive Averaging (CAvg), we propose to sparsely represent the global mean vector by an adaptive basis comprising both data specific basis and combined generic basis. Theoretically, we prove that JS and CAvg are special cases of our method. Extensive experiments demonstrate that our proposed method achieves significant improvement over the state-of-the-art methods on both synthetic and real datasets.

Speaker: Mr. Qingtao Tang and Prof. Shu-Tao Xia

Bio: Qingtao Tang received his B.S. degree in statistics from Renmin University of China, Beijing, China, in 2015. He is currently a Master student in the Department of Computer Science and Technology of Tsinghua University, China. His research focuses on machine learning and data mining. He publishedseveral papers on IJCAI-17, IJCAI-16, ECAI-16 and IEEE TNNLS.
Shu-Tao Xia received his B.S. and PhD degrees in mathematics and applied mathematics from Nankai University, Tianjin, China, in 1992 and 1997, respectively. In 2004, he joined the Graduate School at Shenzhen, Tsinghua University, China, where he has been a full Professor of the Department of ComputerScience and Technology since 2007. He is now working on teaching and research activities on the areas of machine learning, coding and information theory, and networking. He published about 60s papers on international journals, e.g., IEEE Transactions on Information Theory (TIT), TSP, TNNLS and TCSVT, and refereed conferences, e.g., IJCAI, AAAI, NIPS, ISIT, ICASSP and DCC.

Talk 4: Density-Based Multiscale Analysis for Clustering in Strong Noise Settings

Abstract
Finding clustering patterns in data is challenging when clusters can be of arbitrary shapes and the data contains high percentage (e.g., 80%) of noise. This paper presents a novel technique named density-based multiscale analysis for clustering (DBMAC) that can conduct noise-robust clustering without any strict assumption on the shapes of clusters. Firstly, DBMAC calculates the r-neighborhood statistics with different r (radius) values. Next, instead of trying to find a single optimal r value, a set of radius values appropriate for separating “clustered” objects and “noisy” objects is identified, using a formal statistical method for multimodality test. Finally, the classical DBSCAN is employed to perform clustering on the subset of data with significantly less amount of noise. Experiment results confirm that DBMAC is superior to classical DBSCAN in strong noise settings and also outperforms the latest technique SkinnyDip when the data contains arbitrarily shaped clusters.

Speaker: Tiantian Zhang and Associate Prof. Bo Yuan

Bio: Bo is a computer scientist with broad interests in Data Mining, Evolutionary Computation, Global Optimization, Parallel Computing and Pattern Recognition. He received the B.E. degree from Nanjing University of Science and Technology, P.R. China, in 1998, and the M.Sc. and Ph.D. degrees from The University of Queensland, Australia, in 2002 and 2006 respectively. From 2006 to 2007, he was a Research Officer on a project funded by the Australian Research Council at The University of Queensland.

He is an Associate Professor (Lecturer: 07/2007 - 12/2009) in the Division of Informatics, Graduate School at Shenzhen, Tsinghua University, P.R. China, and a member of the Intelligent Computing Lab. He is also a member of the IEEE and the IEEE Computational Intelligence Society.

Talk 5: A Taste of Current Projects at Melbourne Bioinformatics

Abstract
Melbourne Bioinformatics is The University of Melbourne’s hub of bioinformatics expertise, engaging in collaborative research and teaching in support of life science activities at the university. We work on diverse projects spanning microbial genomics to cancer research and diagnostics. This presentation aims to provide a brief overview of some of our flagship activities and will select an exemplar project in cancer genomics to dive into a little more detail.

Speaker: Associate Professor Daniel Park

Bio: Associate Professor Daniel Park trained as a molecular biologist at the University of Cambridge, where he read Natural Sciences as an undergraduate before completing his PhD through the Department of Biochemistry in 1999. He has developed numerous technologies in the area of DNA sequencing and novel applications of PCR, resulting in numerous granted patents and diagnostic products that are in widespread use. A/Prof Park started in bioinformatics in 2010 when identifying candidate cancer predisposition genes from whole-exome sequencing datasets. Since then, among other things, he has developed improved systems for targeted sequence screening (Hi-Plex), published methods to reduce noise in massively parallel sequencing datasets and pioneered new approaches to benchmarking genetic variant effect prediction tools.

Talk 6: Perspectives on Personal and Spatio-temporal Data Mining

Abstract
In this talk we present ongoing research into personalized data mining. The idea is to support individuals customized feedback and understand their behaviours through their personal data. We aim to develop new methods that will enable personalized knowledge about people. We will study two showcases: time-series data from wearables and anomaly detection. Wearables are an emerging technology that can be used to assist personal wellbeing. We focus on monitoring athletes' performances during their trainings due to the high risk of injuries. We are able to advise on how to perform a certain routine correctly and effectively using machine learning techniques for time-series data from wearables. An injury can be seen as an outlier or an anomaly. Anomalies can provide valuable insights into the correlation between an abnormal pattern and a real-world phenomenon. Graph-based anomaly detection plays a vital role in various application domains such as the role of individuals in social networks. A key challenge in graph-based anomaly detection is how to deal with streaming structured input data, i.e., dynamic graphs. Although these graphs impose a curse of dimensionality on the learning models, they usually contain structural properties that anomaly detection schemes can exploit. Our main goal is to find a feature extraction technique that preserves graph structure while balancing the accuracy of the model against its scalability.

Speaker: Yousef Kowsar, Lida Rashidi and Prof. Lars Kulik

Bio: Lars Kulik is a Professor in the Department of Computing and Information Systems at the University of Melbourne. He received his PhD from the Department of Informatics at the University of Hamburg, Germany. Prior to joining the University of Melbourne he was an associate faculty researcher in the Department of Spatial Information Science and Engineering at the University of Maine. His overall research goal is to integrate spatial information into pervasive computing systems that anticipate, adapt and respond to the needs of users, and provide services based on the user's location and context. His research focuses on methods for protecting privacy, techniques for personal and spatio-temporal data mining, efficient algorithms for intelligent traffic system, and robust algorithms that can cope with imperfection, especially in the context of mobile and wearable computing.

Talk 7: Learning to Recommend Personalized Diverse Items

Abstract
In the Recommender Systems field, recommendation result diversification has attracted substantial attention, with an increasing consensus that users are more satisfied with diverse recommendations. We define the diversity of recommendations based on genre information that is readily available in domains like movies and music. In this work, we observe that the distribution of genre preference varies greatly across different users. We argue that the diversity level of the recommendations for a user should be personalized according to the user's genre preference. Inspired by $\alpha\mbox{-}$nDCG, a widely used diversity measure in Information Retrieval, we propose a new diversity measure, p$\mbox{-}$nDCG, which promotes the personalized diversity of recommendations. We also propose an efficient personalized diversification algorithm which can be used to produce recommendations with preferred diversity level for each user. We demonstrate the feasibility of p$\mbox{-}$nDCG and the effectiveness of the new diversification algorithm using the MovieLens 100k dataset. We find that: (1) Optimization towards p$\mbox{-}$nDCG rather than $\alpha\mbox{-}$nDCG will make the top $K$ items in the recommendation list more consistent with a user's preferred diversity level in terms of the user's genre preference, especially when cutoff $K$ is small; (2) The proposed personalized diversification algorithm significantly outperforms the baseline algorithms, including the start-of-the-art learning to rank models.

Speaker: Mr. Xiaojie Wang

Bio: Xiaojie Wang is a Ph.D. candidate in the School of Computing and Information Systems at The University of Melbourne, Australia. He received his B.S degrees in Applied Mathematics and Computer Science from Renmin University of China in 2016. His research interests include information retrieval, data mining, and machine learning.

Talk 8: Exploiting frequent patterns to help explain individual predictions

Abstract
Users can not simply trust a prediction of a classifier, especially when the decision made has severe consequence in domains like medical diagnosis and criminal analysis. Explanations for an individual prediction can help them to accept or reject the prediction with more confidence.  Frequent patterns based techniques are a promising class of machine learning methods which are easy to interpret. Our recent work focuses on employing patterns to help provide instance-level explanations for any classifier.  Patterns are used as supporting evidence that a classifier considers important for the prediction of a given instance. A framework of pattern based explanation generation and possible applications will be discussed in this talk.

Speaker: Yunzhe(Alvin) Jia

Bio: Yunzhe(Alvin) Jia is a Ph.D. candidate in the department of Computing and Information System at the University of Melbourne, with particular interests in frequent patterns and interpretable machine learning. He holds a master’s degree in Computer Science from New York University and a B.Eng. in software engineering from Xiamen University.