Text processing and word stemming for classification models in master data management (MDM) context in R