Triangle counting is a cornerstone operation in large graph analytics. It has been a challenging problem historically, owing to the irregular and dynamic nature of the algorithm. This inhibits compile-time optimizations and also requires runtime optimizations such as message aggregation and load-imbalance mitigation. Popular triangle counting algorithms are either inherently slow, fail to take advantage of advanced hardware in modern processors, or involve sparse matrix operations. With its support for fine-grained asynchronous messages, the Partitioned Global Address Space (PGAS) with the Actor model has been identified to be efficient for irregular applications \citehclib22,hclib23,10181177. However, few triangle counting implementations have been optimally implemented on top of PGAS Actor runtimes. To address the above-mentioned challenges, we propose a set-intersection-based implementation of a distributed triangle counting algorithm atop the PGAS actor runtime. Evaluation of our approach on the PACE Phoenix cluster and the Perlmutter supercomputer showed encouraging results.
SIGMOD
EVA: An End-to-End Exploratory Video Analytics System
Gaurav Tarlok Kakkar, Jiashen Cao, Pramod Chunduri, and
8 more authors
In recent years, deep learning models have revolutionized computer vision, enabling diverse applications. However, these models are computationally expensive, and leveraging them for video analytics involves low-level imperative programming. To address these efficiency and usability challenges, the database community has developed video database management systems (VDBMSs). However, existing VDBMSs lack extensibility and composability and do not support holistic system optimizations, limiting their practical application. In response to these issues, we present our vision for EVA, a VDBMS that allows for extensible support of user-defined functions and employs a Cascades-style query optimizer. Additionally, we leverage Ray’s distributed execution to enhance scalability and performance and explore hardware-specific optimizations to facilitate runtime optimizations. We discuss the architecture and design of EVA, our achievements thus far, and our research roadmap.
Distributed query processing engines suffer from expensive data shuffles that incur high I/O and network costs to materialize and transfer large amounts of data over the network. We analyzed our production workload in scope, an SQL-like language used extensively within Microsoft for big-data analytics and found that a significant fraction of these shuffles are introduced by Spools (logical operators that facilitate reuse of intermediate results) and exchanges (pair of data partitioning and aggregation operators). This paper presents the Splitter, a super-operator that implements all Spool induced shuffles and specializes the data materialized for each of its consumers. Splitters reduce the cost of shuffles by opportunistically pre-evaluating operations performed after the Spool, to reduce the size of data materialized. The Splitter could also eliminate shuffles entirely by fusing all operators up to and including a subsequent exchange or data outputting operation along one or more consumer branches. Our evaluation on production jobs at Microsoft indicates that 27.5% of jobs that run daily use at least one Splitter, which eliminates exchange induced shuffles after Spools in 7.5% of all jobs. Introducing the Splitter operator leads to overall savings of about 12% in latency and query running time, 16% in CPU time, 39% in data read and 22% in data written for all jobs that have Splitters. We see speed ups of upto 6.5x in latency and 22x in total I/O time on some jobs where a Splitter was introduced.
We demonstrate Pipemizer, an optimizer and recommender aimed at improving the performance of queries or jobs in pipelines. These job pipelines are ubiquitous in modern data analytics due to jobs reading output files written by other jobs. Given that more than 650k jobs run on Microsoft’s SCOPE job service per day and about 70% have inter-job dependencies, identifying optimization opportunities across query jobs is of considerable interest to both cluster operators and users. Pipemizer addresses this need by providing recommendations to users, allowing users to understand their system, and facilitating automated application of recommendations. Pipemizer introduces novel optimizations that include holistic pipeline-aware statistics generation, inter-job operator push-up, and job split & merge. This demonstration showcases optimizations and recommendations generated by Pipemizer, enabling users to understand and optimize job pipelines.
Triangle counting is a cornerstone operation used in large graph analytics. Hash based algorithms for triangle counting fail to take advantage of available vectorization in modern processors. Linear algebraic based methods often involve sparse matrix multiplication which is inherently expensive. Non linear algebraic methods have a slow implementation and count each triangle multiple times which is then re-scaled to obtain the exact triangle count. We propose a fast vector instruction implementation of a set operation based triangle counting algorithm, which avoids matrix multiplication and finds the exact triangle count directly. Our implementation outperforms reference implementations proposed by the MIT graph challenge and miniTri when tried on about 40 graphs from the SNAP large network dataset, giving a speedup ranging from 41x to more than 1500x. A comparison against existing state of the art techniques gave a speedup of 3x on average. We additionally show that this algorithm can easily be plugged into graph frameworks that either use or can be modified to use compressed sparse row like representation, to give faster compute times. We also propose an optimization to the k truss decomposition algorithm that can be used with the optimized triangle counting algorithm to give better performance
State-of-the-art techniques for community detection in graphs either use feature-based clustering algorithms or structure-driven spectral clustering approaches. The former takes into consideration only the semantic relations based on node features, and the latter uses structural information for an analysis in eigenspace. Hence, neither of these two approaches can provide a very accurate detection of communities. The existing node2vec algorithm translates structural information into an embedding space. We propose a novel approach that combines all of these and obtains encouraging results in large graphs. The proposed approach in this work brings in few innovations in the form of an auto-tuning framework to address the existing shortcomings of these methods. It stabilizes an important hyper-parameter of spectral clustering and brings in a step-heuristic along with a proposed BinKMeans (binary search-driven k-means) algorithm to better detect a distinct drop in eigenvalue while using spectral clustering. Additionally, the pipeline implements NodeFeat2Vec to bring in feature information in node2Vec. An in-depth evaluation with real-world graph data showed that this proposed approach obtained superior results when evaluated using appropriate metrics on communities in labelled real-world graphs.
The advent of cameras has only accelerated the need to digitize content as it helps prevent data corruption by natural processes and enables faster transfer of the data across communities. Handwritten documents and ancient manuscripts form a large part of this data as they call for a need to be translated from the local languages they were written in. The first step into solving this problem is the recognition of handwritten text. Existing handwritten datasets for the Devanagari script can be used for the recognition of individual characters, but they fail to perform well when the text contains matras and conjuncts created by joining character modifiers. This also introduces a dependency between the model and the data source due to required pre-processing for extracting characters recognized by the model from the word itself. These datasets also lack variation in their penmanship which is essential to encompass diversity in the writing style. We present an extensive dataset that addresses these issues. Our dataset has around 4 million characters of varying handwriting styles, complex characters and matras. Training a simple CNN on our data, to detect characters with matras, gave accuracies exceeding 98%. We also show that using this dataset allows a separation of the input data format from the model design, thus allowing researchers to focus on the latter. This dataset is made publicly available at DevChar2020.
The venture of meta-heuristic optimization algorithms into the field of biology has only accelerated growth in swarm intelligence and evolutionary algorithms. We present one such novel approach, inspired by several avian algorithms, mimicking the lifestyle and breeding pattern of the Laysan Albatross. The proposed algorithm is suitable for any nonlinear continuous function optimization task and proves to perform better for training neural networks, compared to gradient-based approaches such as back-propagation and other bio-inspired approaches such as the Whale Optimization Algorithm. We also show that it is also not as susceptible to overfitting. We tested our algorithm on a total of 12 datasets and obtained an average train accuracy of 0.785, average training standard deviation of 0.037, average test accuracy of 0.793 and an average test standard deviation of 0.035.
The advent of computational intelligence has only accelerated the motive to automate manual processes. In sports, the award of the best player is questionable in many cases and requires the opinion of multiple distinguished experts in the sport. This often results in an unfair judgement of the situation, given the time constraint for making this decision, and the dependence on pure human skill and reasoning. This calls for a reliable and fair system to fairly award the man of the match, taking into account the numerous variables. Though there are machine learning based approaches trying to model the game for predicting win/loss outcome, it is hard to model the “man of the match” in a cricket game using the same approach. We propose a novel graph based approach to award the “man of the match” to a player for a cricket game. We model the game as a directed graph with each player as a node and an interaction between two players as an edge. We use ball by ball delivery details to construct this graph with a heuristics to calculate the edge weights and compute node centrality of every player using the Personalized Pagerank algorithm to find the most central player in the game. A high centrality indicates a good overall performance of the player. Comparison with existing game data showed encouraging results.