CV
Download CV
Education
- Ph.D. in Computer Science, University of Lille, Lille, France, 2024-2027
- M.Sc. in Natural Language Processing, University of Lorraine, Nancy, France, 2022-2024
- B.Sc. in Data Science, Lebanese University, Beirut, Lebanon, 2019-2022
Work Experience
- Position: Ph.D Student
- Duration: October 2024 - September 2027
- Organization: INRIA, Villeneuve-d’Ascq, France
- Technologies Used: Pharo, LLMs, Pharo
- Description: The project aims to enhance the code completion capabilities of the Pharo Integrated Development Environment (IDE) by leveraging Large Language Models (LLMs). This includes developing algorithms for efficient code suggestions, type inference, and real-time feedback within the IDE. Special focus is placed on optimizing performance and usability, given the limited training data available for Pharo. The project also explores novel techniques in code generation and heuristics to improve the coding experience.
- Position: AI Engineer
- Duration: August 2023 - September 2024
- Organization: INERIS, Verneuil-en-Halatte, France
- Technologies Used: Machine Learning, NLP, Software Engineer
- Description: My role focuses on developing “INERIS-IA,” a tool to classify textual documents based on INERIS’s strategic goals using machine learning and natural language processing techniques. Two experiments were conducted: the first used boolean queries for document retrieval, and the second improved corpus quality through document similarity and keyword extraction. The findings show that combining machine learning with human validation significantly improves accuracy while reducing the workload in document classification.
- Position: Intern - AI Researcher
- Duration: June 2023 - August 2023
- Organization: LIPN - Sorbonne University, Villetaneuse, France
- Technologies Used: Generative Models, Applied Machine Learning, Reinforcement Learning
- Description: My role focuses on the comparison of various sampling techniques for probabilistic planning, particularly in generating literary narratives. This internship evaluates different methods, including the Score Function Estimator (SFE) and more advanced techniques like Gumbel-Softmax, to assess their effectiveness in creating coherent and creative stories. The conclusion is the development of a prototype model using a discrete VAE (Variational Autoencoder) to generate literary narratives, demonstrating the feasibility and potential of integrating these methodologies into the creative process.
Projects
- Project Name: Real-Fake detection
- Duration: September 2023 - January 2024
- Organization: IDMC, Nancy, France
- Technologies Used: Python, TensorFlow/Keras, Numpy, Matplotlib/Seaborn
- Description: This project demonstrates the process of building and training a neural network for classification tasks using machine learning frameworks. It walks through data preprocessing, model architecture definition, training, evaluation, and visualization of results, providing insights into model performance and prediction accuracy.
- GitHub Repository: GitHub
- Project Name: Evaluating the effectiveness of sBert and miniLM on analogy classification with FrameNet
- Duration: September 2022 - June 2023
- Organization: IDMC, Nancy, France
- Technologies Used: SBERT, MiniLM, Transfer Learning
- Description: This project focuses on evaluating the effectiveness of two NLP models—SBERT (Sentence-BERT) and MiniLM—in classifying analogies using the FrameNet dataset. FrameNet is a rich semantic database that captures the meanings of lexical units within specific contexts called frames. The objective is to identify valid and invalid analogies in FrameNet, leveraging the capabilities of SBERT and a fine-tuned MiniLM model. SBERT is used to create dense vector embeddings of sentences that retain semantic similarity, while MiniLM is trained and fine-tuned to classify analogies effectively. The project found that MiniLM achieved an impressive 99% accuracy in distinguishing between valid and invalid analogies, outperforming SBERT, which had around 55% accuracy.
- GitHub Repository: GitHub
- Published Paper: Paper
- Project Name: DeGatto: A Sentiment Analysis Framework for E-Commerce
- Duration: September 2022 - January 2023
- Organization: IDMC, Nancy, France
- Technologies Used: Sentiment Analysis, Python, Machine Learning, Deep Learning, GUI
- Description: The “DeGatto” project is a sentiment analysis framework designed for e-commerce, specifically focusing on women’s apparel reviews. Using a dataset sourced from Kaggle, which includes over 23,000 sentences and aspect-level annotations for material, size, design, and comfort, the project aims to support e-commerce businesses and customers by analyzing feedback at both the sentence and aspect levels. Various NLP, DL, and ML models were tested, including LSTM (BiLSTM), SVM, Logistic Regression, and Multinomial Naive Bayes, with BiLSTM achieving the best results in sentence-level analysis and LinearSVC performing well at aspect-level analysis. A visualization tool was developed using ReactJS and NodeJS, enabling users to view results as bar or pie charts. The project concluded that BiLSTM is the most suitable model for sentence-level sentiment analysis, while LinearSVC excels in aspect-level analysis, providing a robust framework for sentiment classification in e-commerce contexts.
- GitHub Repository: GitHub
- Published Paper: Paper
Skills
- Programming Languages & Software
- Python, Pharo, Java, R, LaTeX, JavaScript, PHP, MySQL
- Django, MongoDB, VS, PyCharm, Pharo, AWS
- Operating Systems
- Frameworks & Tools
- Data Visualization: Matplotlib, Seaborn
- Natural Language Processing (NLP): NLTK, SpaCy
- Data Manipulation and Analysis: Pandas, NumPy
- Scientific Computing: SciPy, NumPy
- Machine Learning and Deep Learning: TensorFlow, Keras
- Big Data Processing: Apache Hadoop, Apache Spark
- Ontology and Knowledge Management: Protege
- Speech Analysis: Praat
- Linguistics
- Linguistics and Textual Linguistics
- Corpus Linguistics
- Computational Linguistics
- Formal Grammar
- Discourse and Lexicometrics
- Semantics