information extraction and text analytics

Dissertation topics suggested by Andreas Vlachidis
I welcome dissertation proposals that relate to information extraction and text analytics
with focus on the Humanities and Social science domains. If you are interested in any of the
following topics, please email me to arrange a meeting. The topics will be allocated on a
first-come-first-served basis. Note that students who would like to work on any topics
suggested by an member of staff should not expect from that member of staff or their
supervisor any more guidance or help than they would get if they choose their own topic. It
is the student’s responsibility to further develop or narrow down the topic and do the
relevant research.
1. Negation Detection approaches in the humanities domain
Topic area: Information Extraction, Semantic Annotation
Suitable for: For students interested in Information Extraction and Semantic Annotation of
humanities text.
Contact: Dr. Andreas Vlachidis , a.vlachidis@ucl.ac.uk
Description: The techniques and approaches that are employed to address the issue of
negation within Natural Language Processing vary and cover a wide spectrum of ruledbased, machine learning and lately, deep learning applications. However, the focus so far
has been on the biomedicine domain and there is little evidence of research aimed at the
study of negation detection in the domain of humanities. A dissertation in this area might
explore areas of application of negation detection in the domain of Humanities and propose
pathways for the automatic detection of negated expressions in unstructured text. Other
research pathways to the Negation Detection problem might focus on the ontological
modelling of negated facts where the focus shifts from positive to negative assertions, and
comparison and benchmarking of the transferability qualities of existing tools to the
humanities domain.
2. Text mining and Critical Discourse Analysis on cultural and social contexts
Topic area: Text Analysis, Corpus linguistics, Corpus-Based quantitative methods
Suitable for: For students with interest in computer-linguistic and quantitative methods
Contact: Dr. Andreas Vlachidis , a.vlachidis@ucl.ac.uk
Description: Over the past few years, text mining has started to catch on the domains of
social sciences, anthropology, education and sociology. Using automated methods of
corpus linguistics and critical discourse analysis we can identify the presence of features of
interest over large document collections and investigate social and cultural phenomena
through quantitative analysis methods. A dissertation on this area might examine the
political economy of AI technologies in educational or cultural contexts which could lead to
forms or exclusion and discrimination; or the way ‘immigrant workers’ and ‘asylum seekers ’
are represented in news reporting; or similar social and cultural phenomena that can be
investigated using CDA quantitative methods.
3. Named Entity Recognition and Linking of Ancient and Historic Places
Topic area: Information Extraction, Named Entity Recognition and Linking, Linked Open
Data
Suitable for: For students interested in Named Entity Recognition and Semantic Web
Technologies
Contact: Dr. Andreas Vlachidis , a.vlachidis@ucl.ac.uk
Description: Named Entity Recognition (NER), is a particular Natural Language Processing
(NLP) task aimed at the recognition and classification of units of information to predefined
categories, such as names of person, location, organisation, etc. Several NER Linking
applications have become available recently that link textual mentions of entities to
corresponding LoD unique references that originate from general purpose knowledge bases
such as Wikipedia. However, use of domain specific knowledge bases for a finer and more
accurate linking of entities to corresponding reference remains an unexplored area. A
dissertation in this area might explore the integration of the domain focused place name
resources such Pleiades (https://pleiades.stoa.org/) in the process of Named Entity
Recognition and Linking.