Current projects
Past projects
My research activity is multidisciplinary and is
concerned with several areas: computer sciences,
linguistics, Natural Language Processing and biomedical
area. More particularly, I'm involved in several research
topics (which results have been used and tested in several
research projects):
- Automatic creation of terminological
resources, which becomes a special research area
further to the development of semantic methods for the
information access (ie, semantic web). More
specifically, I investigate the detection of semantic
relations between terms (equivalence, hierarchical or
transversal relations). The methods exploited and
designed rely on the lexical inclusion and on the
contribution of morphology. More recently, in
collaboration with Thierry Hamon, LIMSI, Université
Paris 13, I work on the detection of synonymy relations
through the compositionality principle. The evaluation
of the acquired resources shows that the obtained
precision is often higher than 95%. The comparison with
the existing synonymy resource WordNet indicates that
the recovery between the resources is very low: our
method provides with many new synonymy relations not
recorded in the existing resource. Currently, we work
also on the reliability of the acquired synonyms and
exploit for this the endogeneously generated clues and
the structure of the graphs.
- Improvement of the access to information
thanks to the information retrieval and extraction
methods. My first experiments addressed the access to
the medical portal CISMeF: query expansion and proposal
of more key-words to the users. The expansion has been
done with morphological variations of the medical terms
and showed a positive effect on the search results.
- Caracterization of Web information addresses
various points of view: detection of racist content on
the Web and detection of the technicity level of health
documents. In both areas, the methodology relies on the
contrastive analysis of the documents and on the
exploitation of their internal properties (lexicon,
document structure, colors, morphology,
stylistics...). Machine learning and lexicometrical
algorithms have been used and provided with similar and
convergent results. Moreover, the detection of the
technicality of the health documents showed nearly 90%
precision and recall, which is a very good performance
of the automatic system and of the chosen features
(morphology).
- Quality of the health online information
consists into automatic detection of the reliability and
medical quality of online health literature. I started
working on this topics in 2006 being member of the
Health on the Net Foundation in Geneva, Switzerland. The
developed tool implements the ethical HONcode. It is
based on machine learning algorithms. The evaluation
indicates that the results generated are at least as
good as those provided by human annotators.
- Information extraction consists into detection
in narrative documents elements which present an
interest to a given task. My main experience is related
to the participation in the international NLP challenges
led by the I2B2 initiative. The exploited methods are
based upon semantic resources, rule-based and/or machine
learning approaches. Some of the addressed tasks are:
extraction of medications and of their characteristics
(dosage, frequency, duration...), of clinical events
(medical problems, lab examinations, treatments), of
causal and temporal relations between different clinical
events.