Resources

CLEAR - Simple Corpus for Medical French
Rated lexicon with French medical words
UA-target: parallel and aligned corpus with Ukrainian language as target
French morphologically related medical words


CLEAR - Simple Corpus for Medical French

Source work:
Natalia Grabar, Rémi Cardon
CLEAR - Simple Corpus for Medical French
ATA 2018 (ENLG workshop on Automatic Text Adaptation)
8 November 2018, Tilburg, The Netherlands
pdf

Download the datasets with medical comparable corpora in French:

  1. encyclopedia articles: 6Mo archive
  2. drug leaflets: 146Mo archive
  3. Cochrane summaries: 7Mo archive
Download the dataset with general language comparable corpora in French:
  1. encyclopedia articles: 155Mo archive
The dataset contains three corpora of documents with comparable contents.
Each corpus provides technical and simple/simplified texts on a given topic in French.
This work was funded by the French National Agency for Research (ANR) as part of the CLEAR project (Communication, Literacy, Education, Accessibility, Readability), ANR-17-CE19-0016-01.


Rated lexicon with French medical words

Source work:
Natalia Grabar, Thierry Hamon
A large rated lexicon with French medical words
LREC (Language Resources and Evaluation Conference) 2016
23-28 May 2016, Portorož, Slovenia
pdf

The French medical lexicon has been annotated by three annotators into three categories:

  1. + I can understand
  2. / I am not sure
  3. - I cannot understand

Download the datasets with the rated medical lexicon
The dataset contains three files from three annotators.
This work was funded by the French National Agency for Research (ANR) as part of the CLEAR project (Communication, Literacy, Education, Accessibility, Readability), ANR-17-CE19-0016-01.


UA-target: parallel and aligned corpus with Ukrainian language as target

(source languages: French, English, Polish)

Source work:
Natalia Grabar, Thierry Hamon
Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation
COLINS 2017
21 April 2017, Kharkiv, Ukraine
pdf

Download the whole dataset
The dataset contains 122 UTF8 text files. Paired files are aligned at the sentence level.

Global description:

  • total number of characters: 17,743,950
  • total number of words: 2,116,694
  • total number of sentences: 156,740
  • source languages: French, English, Polish
  • target language: Ukrainian


French morphologically related medical words

Source work:
Natalia Grabar, Pierre Zweigenbaum.
A general method for sifting linguistic knowledge from structured terminologies.
AMIA 2000: 310-4.
PMID 11079895
pdf

Data:

lemme-tag-fr-4.4.liste (2389 lignes)
Contenu: paires de mots liés morphologiquement, lemmes, dérivation, composition, étiquetage syntaxique, règles de formation, certains accents manquants:

  • canal/SBC|canalaire/ADJ|/SBC|aire/ADJ
  • urine/SBC|urinaire/ADJ|e/SBC|aire/ADJ
  • sinus/SBC|sinusal/ADJ|/SBC|al/ADJ
  • irradier/V|irradiation/SBC|er/V|ation/SBC
  • neurocytome/SBC|neuroblastome/SBC|ocytome/SBC|oblastome/SBC
  • psammome/SBC|psammomateux/ADJ|ome/SBC|omateux/ADJ
  • liquide/ADJ|liquide/SBC|/ADJ|/SBC

lemme-deriv-fr.2.liste (462 lignes)
Contenu: paires de mots liés morphologiquement, lemmes, dérivation, composition:

  • abdomen|abdominal
  • abrasion|abrasé
  • acanthose|acanthosique
  • adhérence|adhérent
  • agrégation|agrégé
  • aine|inguinal
  • aisselle|axillaire
  • amnios|amniotique
  • amphophilie|amphophile
  • amygdale|amygdalien

forme-deriv-fr-4.2.liste (2418 lignes)
Contenu: paires de mots liés morphologiquement, formes, dérivation, composition, certains accents manquants:

  • coronaire|coronarien
  • dorsum|dorsal
  • épithélium|épithélial
  • hypertendu|hypertension
  • distendu|distension
  • sarcomateux|sarcomateuse
  • hospitalise|hospitalisation

forme-deriv-fr-4.4.liste (2418 lignes)
Contenu: paires de mots liés morphologiquement, formes, dérivation, composition, règles de formation, certains accents manquants:

  • coronaire|coronarien|aire|arien
  • irradiant|irradiation|ant|ation
  • cranium|cranien|um|en
  • ameliorer|amelioration|er|ation
  • membrane|membranaire|e|aire
  • sclerose|sclerosant|e|ant
  • branche|branchial|e|ial

forme-flex-deriv-fr.4.liste (5826 lignes)
Contenu: paires de mots liés morphologiquement, formes, flexion, dérivation, composition, règles de formation, certains accents manquants:

  • cholera|cholerae||e
  • rhabditida|rhabditidae||e
  • influenza|influenzae||e
  • arizona|arizonae||e
  • grec|grece||e
  • lourd|lourde||e
  • cord|corde||e
  • acid|acide||e
  • froid|froide||e
  • grand|grande||e

forme-flex-deriv-fr-4.2.liste (4517 lignes)
Contenu: paires de mots liés morphologiquement, formes, flexion, dérivation, composition, règles de formation, certains accents manquants:

  • abandon|abandonne
  • abdominal|abdominale
  • abdominal|abdominales
  • atrophie|atrophique
  • cicatrice|cicatriciel
  • cicatricielle|cicatricielles
  • abrasé|abrasion
  • absent|absence

forme-flex-fr.2.liste (3470 lignes)
Contenu: paires de mots liés morphologiquement, formes, flexion, certains accents manquants:

  • adoptif|adoptive
  • ancien|ancienne
  • ancien|anciens
  • canin|canine
  • capillaire|capillaires
  • caverneux|caverneuse
  • dural|durale

famille-forme-flex-deriv-fr.liste (1678 familles)
Contenu: familles de mots liés morphologiquement, formes, flexion, dérivation, composition:

  • abdom|abdomen|abdominal|abdominale|abdominales|abdominaux|abdomino
  • abeille|abeille|abeilles
  • aberra|aberrante|aberration
  • abondant|abondante|abondants
  • abras|abrasion|abrasé
  • absen|absence|absent
  • absorbé|absorbée|absorbées|absorbés
  • acantholy|acantholyse|acantholyses|acantholytique
  • acantho|acanthomateux|acanthome|acanthose|acanthosique
  • acanth|acanthocyte|acanthrocyte

famille-lemme-tag-fr.liste (1078 familles)
Contenu: familles de mots liés morphologiquement, lemmes, dérivation, composition :

  • abdom|abdomen/SBC|abdomino/PFX|abdominal/ADJ
  • aberra|aberrant/ADJ|aberration/SBC
  • abras|abrasé/ADJ|abrasion/SBC
  • absen|absent/ADJ|absence/SBC
  • acantholy|acantholyse/SBC|acantholytique/ADJ
  • acantho|acanthome/SBC|acanthose/SBC|acanthosique/ADJ|acanthomateux/ADJ
  • acanth|acanthocyte/SBC|acanthrocyte/SBC
  • acari|acarien/SBC|acariase/SBC
  • achromi|achromie/SBC|achromique/ADJ
  • acid|acide/ADJ|acide/SBC|acido/PFX|acidité/SBC|acidose/SBC|acidurie/SBC|acidémie/SBC|acidophile/ADJ|acidocétose/SBC