Reconstructing the Digital Genealogy of Human Languages

Ancient Language Big Data Platform

The iLogos Platform is the first research infrastructure focusing on ancient language big data, integrating the iLatin Latin corpus, iGreek Greek corpus, and intelligent dictionary iLex, providing new tools for historical linguistics and digital humanities research.

Platform Components

Three core components building a new paradigm for ancient language research

iLatin Corpus

Comprehensive collection of Latin texts from classical to medieval periods, including inscriptions, manuscripts, literary works, and multi-source data. Supports advanced grammatical search, diachronic analysis, and variant comparison.

Time Coverage: 3rd Century BC - 15th Century AD

Data Scale: Target 15M+ tokens

In Development

iGreek Corpus

Collection of Ancient Greek texts covering all periods and dialects, from Mycenaean Linear B to Byzantine literature, enabling cross-era language evolution tracking and dialect comparison.

Dialects Covered: Ionic, Doric, Attic, etc.

Data Scale: Planned 10M+ tokens

Planning Phase

iLex Dictionary

Intelligent Latin dictionary deeply integrated with corpus instances, providing etymological information, usage frequency statistics, diachronic semantic evolution, and co-occurrence network visualization.

Core Features: Intelligent search, word family analysis, collocation statistics

Entry Target: 50,000+ base entries

Simultaneous Development

Technical Features

Integrating modern NLP technology with classical linguistic methods

NLP Enhanced Analysis

Custom ancient language processing models supporting lemmatization, syntactic parsing, semantic annotation

Diachronic Visualization

Cross-century tracking of word usage frequency, semantic evolution, grammatical changes

Advanced Search

Support for regular expressions, fuzzy queries, grammatical feature searches

Open API

RESTful API interfaces supporting academic research and third-party tool integration