Deep Learning models based on the Transformer architecture have revolutionized the state of the art of NLP tasks. As English is the language in which most significant advances are made, languages like Spanish require specific training, but this training has a computational cost so high that only big corporations with servers and GPUs are capable of generating them. This work has explored how to create a model for the Spanish language from a big multilingual model. Specifically, a model aimed at creating text summarization, a very common task in NLP.
The aim was to gather information about off-label use of medicines by screening patient comments on social media. We faced several challenges in doing so: It is difficult for pharmaceutical companies that do not have direct contact with patients to collect information about off-label use of medicines. However, they have to monitor the off-label use of their medicines in order to comply with legal regulations. In addition, patient records are subject to data protection and, if available, it is not possible to reconstruct which drug was prescribed for which condition.