A Data Engineer passionate about Data Science ๐. I like automating things, building pipelines, exploring scalability problems, improving efficiency and performance tuning. I’m a strong advocate for ๐ open source, โ๏ธ Cloud computing, ๐ DevOps, ๐ Innovation and Automation ๐ค
I am a Data Engineer with 6 years of experience in Design, Architecture, Development, and Deployment of Python, AWS, Snowflake, Hadoop, Spark & Big Data Technologies with work experience in the U.S, Middle East, and India
View ResumeM.S. Data Analytics Engineering, Expected Dec 2023
Northeastern University
B.Tech in Computer Science and Engineering, 2016
Manipal Institute of Technology
MY MAJOR EXPERTISE
Technologies : Python, Plotly Dash, MariaDB, MySQL, Jupyter Notebook
Technologies : Python, API, AWS: S3, EMR, Athena, Glue, Redshift, Lambda, Batch, PySpark, Shell Scripting, Git, GitHub, Packages: Pandas, Requests, BeautifulSoup, Multiprocess, Pytest
Technologies: Hadoop, Spark, Scala, Snowflake, AWS: RDS, S3, EMR, Athena, Hive, Impala, Unix, Shell scripting, Control M, Bamboo, Git, Bitbucket, Maven, Eclipse, Cloudera distribution
Technologies: Hadoop, Sqoop, Hive, Impala, Shell scripting, MySQL, Spark, Scala, SQL, SonarQube, Flume, Unix, Git
Central Data Repository for MIT, Manipal:
Project Management System:
CERTIFICATIONS, HONOURS AND AWARDS
Exploring and drawing meaningful insights for patients readmitted with Diabetes
Predicting the Energy consumed by appliances using Machine Learning algorithms built from scratch
Real-Time analytics dashboard generated on input YouTube video. Shows sentiment analysis that can be used to drive up ad-revenue.
Created a data driven storyboard showing the impact of global plastic pollution on the environment; Land and Ocean and the recycling rates of the different countries using Tableau.
Data mining FBI uniform major crimes reported in every US state and visualized on a Tableau dashboard.
Computed and visualized a data driven story of the Center for Medicare & Medicaid Services (CMS) nursing facility data to generate visuals that highlight the nursing home’s resource limits using Flourish, Data Wrapper and Tableau hosted on Google sites.
Predicting hit songs on Spotify by classifying 40,000 songs using various Classification Machine Learning Models
Visualized the expenditure trends in various sectors like Education, Pharmaceuticals, Military, Infrastructure, Research and Development by different countries for the years 1960 - 2020 using Flourish, Data wrapper hosted on Google Sites.
Taking a look at data of 1.6 million twitter users and drawing useful insights while exploring interesting patterns. The techniques used include text mining, sentimental analysis, probability, time series analysis and Hierarchical clustering on text/words using R
Discovering & visualizing various trends in 120 years of Olympic history using R
A Content-based recommendation engine API for movies of the 1900โs built using NLP, Flask, Heroku and Python.
Clustering Neighborhoods of Paris and London using Machine learning.
Predicting the cost of treatment and insurance using Machine Learning.
Stream real time Tweets of current affairs like covid-19 using Kafka 2.0.0 high throughput producer & consumer into Elasticsearch using safe, idempotent and compression configurations.
Determining which programming languages and execution engines are the quickest or the slowest at processing files
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Some of my recent literary work