This page lists the research projects been conducted on Spatial and Temporal (Spatio-temporal) Data Analytics. The following projects are led by Professor Rui Zhang in the Department of Computing and Information Systems, the University of Melbourne. The projects are listed as follows.
Cross-validation is a commonly used method for evaluating the effectiveness of Support Vector Machines (SVMs). However, existing SVM cross-validation algorithms are not scalable to large datasets. In this work, we propose a scheme to dramatically improve the scalability and efficiency of SVM cross-validation. For the tested datasets of sizes that existing algorithms can handle, our scheme achieves several orders of magnitude of speedup. More importantly, our scheme enables SVM cross-validation on datasets of very large scale that existing algorithms are unable to handle.
Zeyi Wen, Rui Zhang, Kotagiri Ramamohanarao, Jianzhong Qi and Kerry Taylor, "MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs", The IEEE International Conference on Data Mining (ICDM), December 2014
Conventional spatial indexes employ multi-dimensional tree structures and thus require enormous efforts to implement in a full-fledge database management system. Alternatively, mapping based techniques employs B-tree while sacrificing the efficiency in querying. We propose a mapping based spatial index scheme, which equips with size separation, data distribution transformation, and novel mapping algorithms, to achieve both the simplicity in implementation and the efficiency. A key advantage of this technique is that it can be added on top of an off-the-shelf database system without touching the database engine and therefore can be implemented with very little effort.
Rui Zhang, Jianzhong Qi, Martin Stradling, and Jin Huang, "Towards a Painless Index for Spatial Objects", to appear in ACM Transactions on Database Systems, 2014
Earth Mover's Distance based Similarity Analysis (EMDSA) is an important and effective tool in many multimedia retrieval and pattern recognition applications. We share a large-scale data generator we have designed and implemented for evaluating EMDSA techniques. The current implementation of the generator extracts content-based image features to generate distributions from real data collections. The generator supports a wide range of features and has flexible execution options. Both the binary and the source of the generator are available for download. We also provide dozens of feature datasets which are generated using the generator on the MirFlickr image collection, the Flickr image collection, and the ImageNet image collection.
Rui Zhang and Jin Huang, "A Generic Multi-Dimensional Data Generator for Earth Mover's Distance Similarity Analysis", technical report, February 2014
The Earth Mover’s Distance (EMD) similarity join retrieves pairs of records (represented as high-dimensional histograms) with EMD below a given threshold. It has a number of important applications such as near duplicate image retrieval and pattern analysis in probabilistic datasets. We propose an efficient MapReduce algorithm to exploit a cluster of commodity machines to perform efficient EMD similarity join on large datasets.
Jin Huang, Rui Zhang, Rajkumar Buyya, and Jian Chen, "Melody-Join: Efficient Earth Mover's Distance Similarity Join Using MapReduce", to appear in in Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2014
Destination prediction is an essential task for many emerging location based applications such as recommending sightseeing places and targeted advertising based on destination. We propose a novel algorithm to dramatically improve the practicability, accuracy, and runtime efficiency.
Andy Yuan Xue, Jianzhong Qi, Xing Xie, Rui Zhang, Jin Huang, Yuan Li, "Solving the Data Sparsity Problem in Destination Prediction", to appear in The International Journal on Very Large Data Bases (VLDBJ), accepted in July 2014
Andy Yuan Xue, Rui Zhang, Yu Zheng, Xing Xie, Jianhui Yu, and Yong Tang, "DesTeller: A System for Destination Prediction Based on Trajectories with Privacy Protection", in Proceedings of the International Conference on Very Large Data Bases (VLDB) 2013 (Demo)
Andy Yuan Xue, Rui Zhang, Yu Zheng, Xing Xie, Jin Huang, and Zhenghua Xu, "Destination Prediction by Sub-Trajectory Synthesis and Privacy Protection Against Such Prediction", in Proceedings of the IEEE International Conference on Data Engineering (ICDE) 2013