Future Earth: Mountain Biodiversity Thesaurus
Program Overview
Introduction to the Global Mountain Biodiversity Assessment Program
The Global Mountain Biodiversity Assessment (GMBA) program is a pioneering initiative aimed at filling the gap in global assessments of mountain ecosystems and species. This program seeks to provide a comprehensive understanding of mountain biodiversity through the analysis of existing literature and the development of innovative text mining and Artificial Intelligence-based approaches.
Program Objectives
The primary objectives of the GMBA program are to:
- Conduct a first-of-its-kind global assessment of mountain biodiversity research and trends
- Develop and apply text mining workflows for the analysis of scientific literature in multiple languages, including English, Chinese, and Spanish
- Create a custom-built thesaurus of mountain biodiversity-related terms in Chinese (or Spanish)
- Establish an annotated corpus of mountain mammals literature in Chinese (or Spanish)
- Develop training datasets for named-entity recognition and relationship extraction in Chinese (or Spanish)
Program Structure and Tasks
The program is structured around three main tasks:
Task 1: Translation and Adaptation
- Translate relevant GMBA libraries for literature mining and annotation to Chinese (or Spanish)
- Adapt the current text mining workflow for application in Chinese (or Spanish) literature
Task 2: Corpus Development
- Create a corpus of annotated Chinese (or Spanish) scientific literature on mountain mammals
- Apply the adapted text mining workflow to OpenAlex and other language-specific scientific repositories
Task 3: Dataset Development for AI Training
- Develop datasets of annotated publications in Chinese (or Spanish) for the training of Artificial Intelligence approaches
- Focus on relationship extraction and named entity recognition
Required Skills and Qualifications
Applicants should possess:
- Communication skills in English and either Chinese or Spanish
- Proficiency in R programming (essential)
- Knowledge of (mountain) biodiversity and ecology (highly desirable)
- Experience with text mining, processing, and analysis tools (desirable)
- Familiarity with Artificial Intelligence, database programming (postgreSQL), and open research data ecosystems (desirable)
- Geo-processing skills (ArcGis or QGis) (desirable)
Mentorship and Collaboration
The program offers mentorship under Dr. Davnah Urbach (GMBA executive director) in collaboration with Mark Snethlage. Participants will have the opportunity to collaborate with the GMBA team and contribute to the global mountain mammals assessment, potentially leading to co-authorship on individual products.
Outputs and Publications
The expected outputs of the program include:
- A custom-built thesaurus of mountain biodiversity-related terms in Chinese (or Spanish)
- An annotated corpus of mountain mammals literature in Chinese (or Spanish)
- Chinese or Spanish training datasets for named-entity recognition and relationship extraction All outputs will be shared following open science principles, with individual authors acknowledged. Participants may also have opportunities for individual publications based on project outputs.
