Proven Data Analyst with a track record of leveraging Python programming and statistical analysis to drive decision-making. Excelled in transforming complex datasets into actionable insights, Skilled in SQL databases and adept at communicating complex findings, significantly improving data utilization for strategic planning.
-Python Programming : Pandas, Numpy, sklearn, Matplotlib & Seaborn,Plotly,
Google Advanced Data Analytics Certificate - Coursera Platform
Providing data driven suggestions for HR (using python):
Input Data: A dataset provided by HR department in Salifort Motors organization of 14,999 rows &10 columns.
Goal: Analyzing the data collected by the HR department and building a model that predicts whether or not an employee will leave the company.
Result: After conducting feature engineering, the decision tree model achieved AUC of 93.8%, precision of 87.0%, recall of 90.4%, f1-score of 88.7%, and accuracy of 96.2%, on the test set. The random forest modestly outperformed the decision tree model.
Automatidata project (using python):
Input Data: A dataset provided by New York City Taxi & Limousine Commission (New York City TLC) of 22,698 rows & 21 Columns.
Goal: Predicting whether or not a customer is a generous tipper.
Result: Choosing Random Forest model as F1 score was 72.35% and it had an overall accuracy of 68.65%. It correctly identified 78% of the actual responders in the test set, which is 48% better than a random guess. It may be worthwhile to test the model with a select group of taxi drivers to get feedback.
POS Project (using python):
Input Data: Transactions & POS reports provided by Banque Misr.
Goal: Review unsettled POS transactions to get notifications about potential risks.
Result: Daily report of POS unsettled transactions & POSs that don't proceed transactions.
Medical Application Project (using python):
Input Data: samples collected from 253 doctors consists of 7 features and 1 label (the output target).
Goal: making predictions to classify if this doctor will write any of these drugs in prescription to his patients or not.
Result: choosing Decision Tree Classifier the best method to use as it gives the best F-score 83.33%.
Titanic Project (using python):
Input Data: collected data from the famous sunken ship “Titanic”.
Goal: making analysis to predict on the features of the survivors.
Result: Predictions with accuracy of 80.47% about features of survivors without using ML methods.
Finding Donors Project (using python):
Input Data: samples from potential donors.
Goal: using ML methods to predict on the income of potential donors,
Result: choosing AdaBoost Classifier method the best method to use as it gives the best accuracy score 87.02% & the best F-score 74.39%.
Transactions Dashboard (using Power BI):
Input Data: PTV transactions reports.
Goal: making analysis for decision making purpose.
Result: Transactions Dashboard.