Analyst
Airflow
Mongo DB
Kafka
A dashboard created for the k-Means Clustering algorithm in Machine Learning. With the power of Unsupervised Learning, this dashboard allows for effective and efficient data segmentation and analysis.
Check out this video showcasing a dashboard I built for the Linear Regression model. Using Python and data visualization libraries, I created an interactive tool for predicting apartment prices.
aaaaaaaa Data Analytics on Superstore Dataset
In this project, I analyzed a retail dataset to uncover key insights and help optimize business strategies. Using SQL and Pandas, I found the top-selling categories, identified the most loyal customers, and tracked purchasing patterns over time. By applying the Pareto Principle, I pinpointed the customers driving 80% of the profit. I also segmented customers based on their buying frequency to tailor strategies for each group. These insights can be leveraged to refine marketing strategies, enhance customer experiences, and drive revenue growth.
aaaaaaaa ETL Data Pipeline and BI reports
In this project, I designed and implemented an Airflow-based data pipeline to streamline the extraction, transformation, and loading (ETL) process on cloud servers(EC2). The pipeline efficiently collects data from various databases, consolidating it into a central database for unified access. Additionally, it automates the generation of daily CSV files by executing scheduled queries on the collected data. These CSV files are then securely stored in
Amazon S3, ensuring data archival and accessibility.
Machine Learning Projects
"Transforming Data into Actionable Insights: Discover my diverse portfolio of projects, showcasing expertise in predictive modeling, regression analysis, hypothesis testing and many more showcasing my expertise in data-driven solutions."
This project involves exploring data collected by a team of Polish economists who studied bankruptcy. The objective is to build a predictive model for determining whether a company will face bankruptcy.
Conducted a chi-square test to determine if sending an email can increase program enrollment at WorldQuant University (WQU). Additionally, I built custom Python classes to implement an extract, transform, load (ETL) process, ensuring efficient data handling and analysis.
Segmenting Credit-Strained Households: Utilizing the 2019 Survey of Consumer Finances, this project employs unsupervised learning to identify and cluster households with credit difficulties. The insights gained can inform marketing strategies and aid sociological research.
The project involves building logistic regression and decision tree models to predict the level of earthquake damage to buildings. Extracted data from SQLite database. The objective is to contribute to the development of unbiased and effective disaster response strategies.
Built a linear regression model to predict apartment prices in Argentina, using a data pipeline to impute missing values and encode categorical features. Reduced overfitting to improve model performance.
" Let's turn data into insights together. Whether you're a data enthusiast or a curious explorer, let's uncover the hidden stories within data together ! "
Email: [email protected]