@mrevilsmash
@mrevilsmash
I have hands-on experience in extracting structured transition triplets from .docx documents using Python and natural language processing (NLP) techniques. This involves reading Word files with python-docx, processing paragraph-level text, and identifying transition sentences using tools like spaCy, nltk, and regular expressions. I structured the data into [previous paragraph, transition sentence, next paragraph] format for further analysis or training purposes. The process also included data cleaning, sentence tokenization, and organizing the output into CSV/JSON formats using pandas. This project strengthened my skills in document parsing, ...
Member since June 2025