In recent years the volume of information available electronically has increased exponentially, coining the term Big Data to refer to this
phenomenon. The medical domain is an area in which the number of documents generated by the centers for patient primary care constantly increases. However, a bottleneck is generated because processing these documents requires specialized personnel craftly performing tasks. In this framework, the development of automated tools of textual analysis can be a breakthrough for health systems. This project will develop a set of processors that allow automatic analysis of medical texts taking into account criteria of robustness, high precision and coverage. Computer technologies have reached a level of maturity where it is possible to have tools that can help medical staff to increase productivity.
This project will result in a set of tools that, using advanced methods and algorithms, will provide a comprehensive and versatile tool set for the following tasks:
- Morphological, syntactic and semantic analysis of medical texts according to the state of art in natural language processing. An important point is to go beyond the generic linguistic processors and adapt them to the medical domain, especially in recognition of named entities, which are especially important in the treatment of medical texts, such as drugs, chemicals, diseases, symptoms, procedures or body parts.
- Extraction of semantic graphs related to medical records and acquisition of patterns of clinical behavior. The medical records of each patient contain textual information about the clinical evolution of the patient. The analysis of this information can be of significant interest for the development of future clinical performances. Therefore, the development of a methodology able to get semantic graphs where that information is represented in structured format, as well as to acquire patterns of clinical behavior from them is of great interest to the medical community in primary care.
The project will use supervised and semi-supervised machine learning techniques. Spanish and Catalan tools will be developed. The
Spanish is an ambitious goal, given its wide use in various health systems, Spanish and international.