Highly motivated to advance professionally and academically, with over five years of experience and a strong academic background in data science and AI. An accomplished AI engineer with a master's degree, possessing a proven track record in developing and customizing AI models for various applications, including trend summarization, sentiment analysis, and topic modeling. Expertise extends to research, writing surveys and articles, and mentoring, supported by strong leadership and a commitment to leveraging AI for real-world solutions.
Data Mining : Python - SQL - Numpy - Pandas Data Visualization : Matplotlib - seaborn - Power Bi - Tableau Machine learning framework: Sikit-learn - Linear Regression - Logistic Regression - SVM - Decision Tree - K-means - Hierarchical Clustering - KNN - PCA Deep learning framework: Tensorflow - Keras - Pytorch - DNN Computer Vision : Image Preprocessing - Data Augmentation techniques - CNN (Transfer learning , Vision Transformers) - OpenCV - OCR - Yolo Natural Language Processing - Text Scrapping (request , beautiful soup) - Text Feature Extraction - Stemming & Word Embedding - Sequences algorithms ( RNN / LSTM / GRU / Transformers ) - Glove - Fast Text - Spacy - Bert - W2V - D2V
1. Extract stock sentiment from news headlines
Train a sentiment analysis model on the financial news headlines from Finviz. Then, clean the text and apply machine learning techniques to detect if there is a good feeling about the stock or not.
2- sentiment analysis of comments on Egyptian online journalism
Train a sentiment analysis model on the news comments by fine-tune Arabert using benchmark dataset 40000 Egyptian tweets to measure mpact on Egyptian society.
3. Trend Summarization from trending topics in x platform
Scrapping tweets using the Twitter API,then preprocessing text and extracting features using TF_IDF, and finally using multi document summarization algorithms.
4. The Hottest Topics in Machine Learning
discover topics from research papers of NIPS, which is a prestigious machine learning and computational neuroscience conference held every year. The project can be divided into two parts: the pre-processing step and the identification of topics using the Latent Dirichlet Allocation (LDA).
5. Question answering with a fine-tuned BERT
Fine-tune BERT on the CoQA dataset, which consists of a collection of 127 thousand questions with answers released by Stanford in 2019. The goal is to use the BERT model to answer questions based on the dataset provided.