research work | Kaushik Ravichandran

2023

SC23
Accelerating Actor-based Distributed Triangle Counting

Aniruddha* Mysore, Kaushik* Ravichandran, Youssef Elmougy, and 2 more authors

Supercomputing Conferece 2023

Abs Bib

Triangle counting is a cornerstone operation in large graph analytics. It has been a challenging problem historically, owing to the irregular and dynamic nature of the algorithm. This inhibits compile-time optimizations and also requires runtime optimizations such as message aggregation and load-imbalance mitigation. Popular triangle counting algorithms are either inherently slow, fail to take advantage of advanced hardware in modern processors, or involve sparse matrix operations. With its support for fine-grained asynchronous messages, the Partitioned Global Address Space (PGAS) with the Actor model has been identified to be efficient for irregular applications \citehclib22,hclib23,10181177. However, few triangle counting implementations have been optimally implemented on top of PGAS Actor runtimes. To address the above-mentioned challenges, we propose a set-intersection-based implementation of a distributed triangle counting algorithm atop the PGAS actor runtime. Evaluation of our approach on the PACE Phoenix cluster and the Perlmutter supercomputer showed encouraging results.
@article{Mysore2023accelerating, title = {Accelerating Actor-based Distributed Triangle Counting}, author = {Mysore, Aniruddha* and Ravichandran, Kaushik* and Elmougy, Youssef and Hayashi, Akihiro and Sarkar, Vivek}, journal = {Supercomputing Conferece}, year = {2023}, publisher = {Supercomputing Conferece} }
SIGMOD
EVA: An End-to-End Exploratory Video Analytics System

Gaurav Tarlok Kakkar, Jiashen Cao, Pramod Chunduri, and 8 more authors

2023

Abs Bib

In recent years, deep learning models have revolutionized computer vision, enabling diverse applications. However, these models are computationally expensive, and leveraging them for video analytics involves low-level imperative programming. To address these efficiency and usability challenges, the database community has developed video database management systems (VDBMSs). However, existing VDBMSs lack extensibility and composability and do not support holistic system optimizations, limiting their practical application. In response to these issues, we present our vision for EVA, a VDBMS that allows for extensible support of user-defined functions and employs a Cascades-style query optimizer. Additionally, we leverage Ray’s distributed execution to enhance scalability and performance and explore hardware-specific optimizations to facilitate runtime optimizations. We discuss the architecture and design of EVA, our achievements thus far, and our research roadmap.
@article{Kakkar2023EVA, title = {EVA: An End-to-End Exploratory Video Analytics System}, author = {Kakkar, Gaurav Tarlok and Cao, Jiashen and Chunduri, Pramod and Xu, Zhuangdi and Vyalla, Suryatej Reddy and Dintyala, Prashanth and Prabakaran, Anirudh and Bang, Jaeho and Sengupta, Aubhro and Ravichandran, Kaushik and others}, booktitle = {Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning}, year = {2023} }

2022

VLDB
[PREPRINT] Design and implementation of multi-consumer shuffles in big-data systems

Kaushik Ravichandran, Bhargav Gulavani, Kaushik Rajan, and 4 more authors

Proceedings of the VLDB Endowment 2022

Abs Bib

Distributed query processing engines suffer from expensive data shuffles that incur high I/O and network costs to materialize and transfer large amounts of data over the network. We analyzed our production workload in scope, an SQL-like language used extensively within Microsoft for big-data analytics and found that a significant fraction of these shuffles are introduced by Spools (logical operators that facilitate reuse of intermediate results) and exchanges (pair of data partitioning and aggregation operators). This paper presents the Splitter, a super-operator that implements all Spool induced shuffles and specializes the data materialized for each of its consumers. Splitters reduce the cost of shuffles by opportunistically pre-evaluating operations performed after the Spool, to reduce the size of data materialized. The Splitter could also eliminate shuffles entirely by fusing all operators up to and including a subsequent exchange or data outputting operation along one or more consumer branches. Our evaluation on production jobs at Microsoft indicates that 27.5% of jobs that run daily use at least one Splitter, which eliminates exchange induced shuffles after Spools in 7.5% of all jobs. Introducing the Splitter operator leads to overall savings of about 12% in latency and query running time, 16% in CPU time, 39% in data read and 22% in data written for all jobs that have Splitters. We see speed ups of upto 6.5x in latency and 22x in total I/O time on some jobs where a Splitter was introduced.
@article{Ravichandran2023design, title = {[PREPRINT] Design and implementation of multi-consumer shuffles in big-data systems}, author = {Ravichandran, Kaushik and Gulavani, Bhargav and Rajan, Kaushik and Friedman, Marc and Li, Xiangnan and Sivathanu, Muthian and Wang, Hongmin}, journal = {Proceedings of the VLDB Endowment}, volume = {16}, year = {2022}, publisher = {VLDB Endowment} }
VLDB
Pipemizer: An Optimizer for Analytics Data Pipelines

Sunny Gakhar, Joyce Cahoon, Wangchao Le, and 8 more authors

Proceedings of the VLDB Endowment 2022

Abs Bib HTML

We demonstrate Pipemizer, an optimizer and recommender aimed at improving the performance of queries or jobs in pipelines. These job pipelines are ubiquitous in modern data analytics due to jobs reading output files written by other jobs. Given that more than 650k jobs run on Microsoft’s SCOPE job service per day and about 70% have inter-job dependencies, identifying optimization opportunities across query jobs is of considerable interest to both cluster operators and users. Pipemizer addresses this need by providing recommendations to users, allowing users to understand their system, and facilitating automated application of recommendations. Pipemizer introduces novel optimizations that include holistic pipeline-aware statistics generation, inter-job operator push-up, and job split & merge. This demonstration showcases optimizations and recommendations generated by Pipemizer, enabling users to understand and optimize job pipelines.
@article{gakhar2022pipemizer, title = {Pipemizer: An Optimizer for Analytics Data Pipelines}, author = {Gakhar, Sunny and Cahoon, Joyce and Le, Wangchao and Li, Xiangnan and Ravichandran, Kaushik and Patel, Hiren and Friedman, Marc and Haynes, Brandon and Qiao, Shi and Jindal, Alekh and Leeka, Jyoti}, journal = {Proceedings of the VLDB Endowment}, volume = {15}, year = {2022}, publisher = {VLDB Endowment} }
Elsevier
[TO APPEAR] Fast exact triangle counting in large graphs using SIMD acceleration

Kaushik Ravichandran, Akshara Subramaniasivam, PS Aishwarya, and 1 more author

Advances in Computers 2022

Abs Bib

Triangle counting is a cornerstone operation used in large graph analytics. Hash based algorithms for triangle counting fail to take advantage of available vectorization in modern processors. Linear algebraic based methods often involve sparse matrix multiplication which is inherently expensive. Non linear algebraic methods have a slow implementation and count each triangle multiple times which is then re-scaled to obtain the exact triangle count. We propose a fast vector instruction implementation of a set operation based triangle counting algorithm, which avoids matrix multiplication and finds the exact triangle count directly. Our implementation outperforms reference implementations proposed by the MIT graph challenge and miniTri when tried on about 40 graphs from the SNAP large network dataset, giving a speedup ranging from 41x to more than 1500x. A comparison against existing state of the art techniques gave a speedup of 3x on average. We additionally show that this algorithm can easily be plugged into graph frameworks that either use or can be modified to use compressed sparse row like representation, to give faster compute times. We also propose an optimization to the k truss decomposition algorithm that can be used with the optimized triangle counting algorithm to give better performance
@article{ravichandran2022fast, title = {[TO APPEAR] Fast exact triangle counting in large graphs using SIMD acceleration}, author = {Ravichandran, Kaushik and Subramaniasivam, Akshara and Aishwarya, PS and Kumar, NS}, booktitle = {Principles of Big Graph: In-depth Insight}, journal = {Advances in Computers}, volume = {129}, year = {2022}, organization = {Elsevier}, publisher = {Elsevier} }

2021

Springer
Embedding-Assisted Auto-Tuned Community Detection

Akshara Subramaniasivam, Kaushik Ravichandran, Aishwarya Poomuttam Sreedas, and 1 more author

2021

Abs Bib HTML

State-of-the-art techniques for community detection in graphs either use feature-based clustering algorithms or structure-driven spectral clustering approaches. The former takes into consideration only the semantic relations based on node features, and the latter uses structural information for an analysis in eigenspace. Hence, neither of these two approaches can provide a very accurate detection of communities. The existing node2vec algorithm translates structural information into an embedding space. We propose a novel approach that combines all of these and obtains encouraging results in large graphs. The proposed approach in this work brings in few innovations in the form of an auto-tuning framework to address the existing shortcomings of these methods. It stabilizes an important hyper-parameter of spectral clustering and brings in a step-heuristic along with a proposed BinKMeans (binary search-driven k-means) algorithm to better detect a distinct drop in eigenvalue while using spectral clustering. Additionally, the pipeline implements NodeFeat2Vec to bring in feature information in node2Vec. An in-depth evaluation with real-world graph data showed that this proposed approach obtained superior results when evaluated using appropriate metrics on communities in labelled real-world graphs.
@incollection{subramaniasivam2021embedding, title = {Embedding-Assisted Auto-Tuned Community Detection}, author = {Subramaniasivam, Akshara and Ravichandran, Kaushik and Poomuttam Sreedas, Aishwarya and Das, Bhaskarjyoti}, booktitle = {Emerging Technologies in Data Mining and Information Security}, pages = {413--427}, year = {2021}, organization = {Springer}, publisher = {Springer}, url = {https://link.springer.com/chapter/10.1007/978-981-15-9774-9_39}, doi = {10.1007/978-981-15-9774-9_39} }
Springer
DevChar: An Extensive Dataset for Optical Character Recognition of Devanagari Characters

Akshara Subramaniasivam, Azhar Shaik, Kaushik Ravichandran, and 1 more author

2021

Abs Bib HTML

The advent of cameras has only accelerated the need to digitize content as it helps prevent data corruption by natural processes and enables faster transfer of the data across communities. Handwritten documents and ancient manuscripts form a large part of this data as they call for a need to be translated from the local languages they were written in. The first step into solving this problem is the recognition of handwritten text. Existing handwritten datasets for the Devanagari script can be used for the recognition of individual characters, but they fail to perform well when the text contains matras and conjuncts created by joining character modifiers. This also introduces a dependency between the model and the data source due to required pre-processing for extracting characters recognized by the model from the word itself. These datasets also lack variation in their penmanship which is essential to encompass diversity in the writing style. We present an extensive dataset that addresses these issues. Our dataset has around 4 million characters of varying handwriting styles, complex characters and matras. Training a simple CNN on our data, to detect characters with matras, gave accuracies exceeding 98%. We also show that using this dataset allows a separation of the input data format from the model design, thus allowing researchers to focus on the latter. This dataset is made publicly available at DevChar2020.
@incollection{subramaniasivam2022devchar, title = {DevChar: An Extensive Dataset for Optical Character Recognition of Devanagari Characters}, author = {Subramaniasivam, Akshara and Shaik, Azhar and Ravichandran, Kaushik and George, Manu}, booktitle = {Advances in Communication, Devices and Networking}, pages = {125--136}, year = {2021}, publisher = {Springer}, url = {https://link.springer.com/chapter/10.1007/978-981-16-2911-2_13}, doi = {10.1007/978-981-16-2911-2_13} }
Springer
Albatross Optimization Algorithm: A Novel Nature Inspired Search Algorithm

Keertan Krishnan, Akshara Subramaniasivam, Kaushik Ravichandran, and 1 more author

In International Conference on Emerging Trends and Technologies on Intelligent Systems 2021

Abs Bib HTML

The venture of meta-heuristic optimization algorithms into the field of biology has only accelerated growth in swarm intelligence and evolutionary algorithms. We present one such novel approach, inspired by several avian algorithms, mimicking the lifestyle and breeding pattern of the Laysan Albatross. The proposed algorithm is suitable for any nonlinear continuous function optimization task and proves to perform better for training neural networks, compared to gradient-based approaches such as back-propagation and other bio-inspired approaches such as the Whale Optimization Algorithm. We also show that it is also not as susceptible to overfitting. We tested our algorithm on a total of 12 datasets and obtained an average train accuracy of 0.785, average training standard deviation of 0.037, average test accuracy of 0.793 and an average test standard deviation of 0.035.
@inproceedings{krishnan2021albatross, title = {Albatross Optimization Algorithm: A Novel Nature Inspired Search Algorithm}, author = {Krishnan, Keertan and Subramaniasivam, Akshara and Ravichandran, Kaushik and Subramanyam, Natarajan}, booktitle = {International Conference on Emerging Trends and Technologies on Intelligent Systems}, pages = {203--216}, year = {2021}, organization = {Springer}, publisher = {Springer}, url = {https://link.springer.com/chapter/10.1007/978-981-16-3097-2_17}, doi = {10.1007/978-981-16-3097-2_17} }

2020

Springer
A Novel Graph Based Approach to Predict Man of the Match for Cricket

Kaushik Ravichandran, Lalit Gattani, Advithi Nair, and 1 more author

In International Conference on Machine Learning, Image Processing, Network Security and Data Sciences 2020

Abs Bib HTML

The advent of computational intelligence has only accelerated the motive to automate manual processes. In sports, the award of the best player is questionable in many cases and requires the opinion of multiple distinguished experts in the sport. This often results in an unfair judgement of the situation, given the time constraint for making this decision, and the dependence on pure human skill and reasoning. This calls for a reliable and fair system to fairly award the man of the match, taking into account the numerous variables. Though there are machine learning based approaches trying to model the game for predicting win/loss outcome, it is hard to model the “man of the match” in a cricket game using the same approach. We propose a novel graph based approach to award the “man of the match” to a player for a cricket game. We model the game as a directed graph with each player as a node and an interaction between two players as an edge. We use ball by ball delivery details to construct this graph with a heuristics to calculate the edge weights and compute node centrality of every player using the Personalized Pagerank algorithm to find the most central player in the game. A high centrality indicates a good overall performance of the player. Comparison with existing game data showed encouraging results.
@inproceedings{ravichandran2020novel, title = {A Novel Graph Based Approach to Predict Man of the Match for Cricket}, author = {Ravichandran, Kaushik and Gattani, Lalit and Nair, Advithi and Das, Bhaskarjyoti}, booktitle = {International Conference on Machine Learning, Image Processing, Network Security and Data Sciences}, pages = {600--611}, year = {2020}, organization = {Springer}, publisher = {Springer}, doi = {10.1007/978-981-15-6318-8_48}, url = {https://link.springer.com/chapter/10.1007/978-981-15-6318-8_48}, }

2019

CMU

Optimization of Graph Algorithms using SIMD Acceleration

Kaushik Ravichandran, Jiyuan Zhang, and Franz Franchetti

In Carnegie Mellon University ECE Summer Intern Research Symposium 2019

Bib HTML

@inproceedings{ravichandran2019new,
  title = {Optimization of Graph Algorithms using SIMD Acceleration},
  author = {Ravichandran, Kaushik and Zhang, Jiyuan and Franchetti, Franz},
  booktitle = {Carnegie Mellon University ECE Summer Intern Research Symposium},
  year = {2019},
  organization = {CMU},
  url = {https://drive.google.com/file/d/1LfbDYFa6KT3SRerchnjnBAsl_h2ruvEi/},
}

2017

IEEE

A new approach to detect paste forgeries in an image

Kaushik Ravichandran, Nypunya Devraj, and SS Shylaja

In 2017 Fourth International Conference on Image Information Processing (ICIIP) 2017

Abs Bib HTML

@inproceedings{ravi2017new,
  title = {A new approach to detect paste forgeries in an image},
  author = {Ravichandran, Kaushik and Devraj, Nypunya and Shylaja, SS},
  booktitle = {2017 Fourth International Conference on Image Information Processing (ICIIP)},
  pages = {1--6},
  year = {2017},
  organization = {IEEE},
  publisher = {Springer},
  doi = {10.1109/ICIIP.2017.8313799},
}