Experience

School of Business, George Mason University

Data Scientist • Oct, 2021 — Present

  • Design and Implement ETL pipelines - Developed and implemented data pipelines and ETL processes for ingesting and processing large amounts of Business data from AWS S3 storage.
  • Data Cleaning and Feature Engineering - Performed data cleaning, pre-processing tasks such as handling missing values, removing duplicates, and correcting data errors, and implemented validation checks to ensure data quality and consistency.
  • Research and Analysis on Business Reports - Applied advanced natural language processing techniques to analyze EDGAR 10-K & 10-Q business reports of various companies, including text classification, named entity recognition & text summarization using custom trained BERT and Transformers.

College of Education and Human Development,
George Mason University

Graduate Research Assistant • Sept, 2021 — Present

  • Teaching Assistance - Providing assistance in setting up the environment and delivering the weekly content for Praxis Certification Exam and AWS Cloud Practitioner Certification.
  • Research on Education Methodologies -Verifying existing educational methodologies with Machine Learning algorithms on large datasets and formulate statistical conclusions on them. data quality and consistency.

Education

George Mason University   (2021 — 2023)

Masters in Computer Science

Courses: Machine Learning, Natural Language Processing, Massive Data Mining, Artificial Intelligence.

SCSVMV University   (2017 — 2021)

Bachelors in Computer Science

Courses: Machine Learning, Natural Language Processing, Massive Data Mining, Artificial Intelligence.

Projects

Cryto Wallet

Crypto Currency Portfolio with Predictive Analytics for Investors.

  • Developed an ETL pipeline for sourcing and storing data from Crypto APIs into S3 buckets using Python and Apache Airflow. Analyzed and made predictions on the data using PySpark ML models.
  • Implemented a web application using Angular and Flask to present the portfolio and predictions to investors.
  • Deployed the Dockerized application on Google Kubernetes Engine using Rancher installed on AWS EC2.
  • Gained experience in dealing with massive data and data engineering concepts and various technologies including Python, PySpark, Apache Airflow. Got real-time hands on DevOps tools and learnt to deploy applications on cross cloud platforms for cost optimization.

Skills

  • Languages - Python, R, C, C++, JavaScript, JAVA, C#, Dart, SQL, GO, Scala
  • Databases - MySQL, PostgreSQL, MongoDB, GraphQL
  • ML Libraries - TensorFlow, Pytorch, SpaCy, OpenCV, HuggingFace, PySpark
  • Big Data - Apache Spark, MapReduce, Hadoop, HDFS, Databricks, AWS ElasticMapReduce
  • Web Development - HTML, CSS, AJAX, Angular, jQuery, JEE, Flask, NodeJS, REST, SOAP, Spring Boot
  • Cloud & Devops - AWS, Azure ,GCP, Git, Maven, Gradle, Kubernetes, Docker, Jenkins, Rancher, Terraform
  • Soft Skills - Leadership, Event Management, Public Speaking