A Data Engineer passionate about Data Science 📊. I like automating things, building pipelines, exploring scalability problems, improving efficiency and performance tuning. I’m a strong advocate for 📜 open source, ☁️ Cloud computing, 🚀 DevOps, 🆕 Innovation and Automation 🤖
I am a Data Engineer with 5+ years of experience in Design, Architecture, Development, and Deployment of Hadoop, Spark, AWS & Big Data Technologies with work experience in the Middle East and India in the Healthcare and Pharmaceuticals domains.View Resume
M.S. Data Analytics Engineering, Expected Aug 2023
B.Tech in Computer Science and Engineering, 2016
Manipal Institute of Technology
MY MAJOR EXPERTISE
Technologies: Hadoop, Spark, Scala, Snowflake, AWS: RDS, S3, EMR, Athena, Hive, Impala, Unix, Shell scripting, Control M, Bamboo, Git, Bitbucket, Maven, Eclipse, Cloudera distribution
Technologies: Hadoop, Sqoop, Hive, Impala, Shell scripting, MySQL, Spark, Scala, SonarQube, Flume, Unix, Git
Central Data Repository for MIT, Manipal:
Delivered a web application with its main objectives to serve as a means of data entry, to collect the required data, to analyze the given data, and finally to generate reports dynamically according to the custom report format requirements of the user. The data was loaded from the databases using Sqoop and analyzed using a Hadoop cluster. The reports are generated after querying using Hive and displayed in the web application.
Project Management System:
Developed a web application that enabled the interaction between different users of different departments and their respective projects while accessing their functions on a large scale.
CERTIFICATIONS, HONOURS AND AWARDS
Created a data driven storyboard showing the impact of global plastic pollution on the environment; Land and Ocean and the recycling rates of the different countries using Tableau.
Analyzed FBI uniform major crimes reporting in every US state and visualized on a Tableau dashboard. A data mining hackathon.
Created data driven storyline using the Center for Medicare & Medicaid Services (CMS) nursing facility data to generate visuals that highlight the nursing home’s resource limits using Flourish, Data Wrapper and Tableau hosted on Google sites. A Computation and Visualization Hackathon.
Predicting hit songs on Spotify by classifying 40,000 songs using various Classification Machine Learning Models
Visualized the expenditure trends in various sectors like Education, Pharmaceuticals, Military, Infrastructure, Research and Development by different countries for the years 1960 - 2020 using Flourish, Data wrapper hosted on Google Sites.
Taking a look at data of 1.6 million twitter users and drawing useful insights while exploring interesting patterns. The techniques used include text mining, sentimental analysis, probability, time series analysis and Hierarchical clustering on text/words using R
Discovering & visualizing various trends in 120 years of Olympic history using R
A Content-based recommendation engine API for movies of the 1900’s built using NLP, Flask, Heroku and Python.
Stream real time Tweets of current affairs like covid-19 using Kafka 2.0.0 high throughput producer & consumer into Elasticsearch using safe, idempotent and compression configurations.
Determining which programming languages and execution engines are the quickest or the slowest at processing files
Some of my recent literary work