Spatial and Temporal Data Analytics @ CIS UNIMELB

Multi-Dimensional Big Data Analytics

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Cross-validation is a commonly used method for evaluating the effectiveness of Support Vector Machines (SVMs). However, existing SVM cross-validation algorithms are not scalable to large datasets. In this work, we propose a scheme to dramatically improve the scalability and efficiency of SVM cross-validation. For the tested datasets of sizes that existing algorithms can handle, our scheme achieves several orders of magnitude of speedup. More importantly, our scheme enables SVM cross-validation on datasets of very large scale that existing algorithms are unable to handle.

Related Papers:

Zeyi Wen, Rui Zhang, Kotagiri Ramamohanarao, Jianzhong Qi and Kerry Taylor, "MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs", The IEEE International Conference on Data Mining (ICDM), December 2014

Painless Index for Large Scale Spatial Objects (Patent Pending)

Conventional spatial indexes employ multi-dimensional tree structures and thus require enormous efforts to implement in a full-fledge database management system. Alternatively, mapping based techniques employs B-tree while sacrificing the efficiency in querying. We propose a mapping based spatial index scheme, which equips with size separation, data distribution transformation, and novel mapping algorithms, to achieve both the simplicity in implementation and the efficiency. A key advantage of this technique is that it can be added on top of an off-the-shelf database system without touching the database engine and therefore can be implemented with very little effort.

Related Papers:

Rui Zhang, Jianzhong Qi, Martin Stradling, and Jin Huang, "Towards a Painless Index for Spatial Objects", to appear in ACM Transactions on Database Systems, 2014

A Generic Multi-Dimensional Data Generator for Evaluating Earth Mover's Distance Similarity Analysis
  • Grid

Earth Mover's Distance based Similarity Analysis (EMDSA) is an important and effective tool in many multimedia retrieval and pattern recognition applications. We share a large-scale data generator we have designed and implemented for evaluating EMDSA techniques. The current implementation of the generator extracts content-based image features to generate distributions from real data collections. The generator supports a wide range of features and has flexible execution options. Both the binary and the source of the generator are available for download. We also provide dozens of feature datasets which are generated using the generator on the MirFlickr image collection, the Flickr image collection, and the ImageNet image collection.

Related Papers:

Rui Zhang and Jin Huang, "A Generic Multi-Dimensional Data Generator for Earth Mover's Distance Similarity Analysis", technical report, February 2014

Check generator binary, source, or datasets and click the "Download the Chosen" button again to download

  • Binary
  • M16D
  • F16D
  • I16D
  • M64D
  • M256D

Melody-Join: Efficient Earth Mover's Distance Similarity Joins Using MapReduce
  • Image Histogram
  • Image Histogram
  • Temporal Distribution

The Earth Mover’s Distance (EMD) similarity join retrieves pairs of records (represented as high-dimensional histograms) with EMD below a given threshold. It has a number of important applications such as near duplicate image retrieval and pattern analysis in probabilistic datasets. We propose an efficient MapReduce algorithm to exploit a cluster of commodity machines to perform efficient EMD similarity join on large datasets.

Related Papers:

Jin Huang, Rui Zhang, Rajkumar Buyya, and Jian Chen, "Melody-Join: Efficient Earth Mover's Distance Similarity Join Using MapReduce", to appear in in Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2014

Destination Prediction Based on GPS Trajectory Mining

Destination Prediction by Sub-Trajectory Synthesis
  • Trajectory Dataset
  • Demo Screenshot

Destination prediction is an essential task for many emerging location based applications such as recommending sightseeing places and targeted advertising based on destination. We propose a novel algorithm to dramatically improve the practicability, accuracy, and runtime efficiency.

Related Papers:

Andy Yuan Xue, Jianzhong Qi, Xing Xie, Rui Zhang, Jin Huang, Yuan Li, "Solving the Data Sparsity Problem in Destination Prediction", to appear in The International Journal on Very Large Data Bases (VLDBJ), accepted in July 2014
Andy Yuan Xue, Rui Zhang, Yu Zheng, Xing Xie, Jianhui Yu, and Yong Tang, "DesTeller: A System for Destination Prediction Based on Trajectories with Privacy Protection", in Proceedings of the International Conference on Very Large Data Bases (VLDB) 2013 (Demo)
Andy Yuan Xue, Rui Zhang, Yu Zheng, Xing Xie, Jin Huang, and Zhenghua Xu, "Destination Prediction by Sub-Trajectory Synthesis and Privacy Protection Against Such Prediction", in Proceedings of the IEEE International Conference on Data Engineering (ICDE) 2013

Professor Rui Zhang, Department of Computing and Information Systems, The University of Melbourne

Last Updated