Machine learning (pxtextmining)

Author

YiWen Hon

pxtextmining is a package built using Python. The package creates and trains models for categorising patient comments using the labels from the qualitative framework. The code is fully open source and is available on GitHub.

On a basic level, this is what is happening:

  1. The dataset created using the qualitative framework is processed. This involves standardising capital letters to lowercase, or removing punctuation and numbers. The words are then converted into numbers, this process is called vectorisation. This is a necessary step to enable the models to handle the words.

  2. The processed dataset is split into two parts, a training and a test set, usually in a ratio of about 3:1. The model is then trained on the training set. The test set is set aside completely and only used at the end to evaluate the performance of the trained model.

  3. Different model architectures and algorithms will be tried, each attempting to learn patterns from the dataset. The models will also be fine-tuned to improve performance. A full discussion of the model types explored will be posted on this website.

  4. After rigorous testing and experimentation, the model that is able to predict labels for patient comments most effectively will be selected and saved. This model will be used to label any new text comments that it receives.

  5. The best model will be connected to the experiencesdashboard frontend via an API.