Some of Europe’s leading computational linguists studied or worked at Essex, which last year celebrated 40 years of computational linguistics research. Essex is also renowned for looking at natural language processing on a much larger scale than other organisations which tend to concentrate on specific areas of language.
“Our expertise in natural language processing at Essex means we can make sense of the data generated by these games as we know what questions to ask to get the maximum amount of reliable data,” added Professor Kruschwitz. "This work helps push state-of-the-art in artificial intelligence (AI) research by turning tedious annotation tasks into an enjoyable pastime."
The research will particularly focus on ambiguity in anaphora - the use of a word such as a pronoun referring back to a word used earlier in a text or conversation. The new game will involve players annotating a wide range of copyright-free material including Brothers Grimm fairy tales, Victorian texts and Wikipedia.
The game was developed by Chris Madge, a PhD student on the Intelligent Games and Game Intelligence Programme (IGGI), a collaborative project between the universities of Essex, Queen Mary, York and Goldsmiths.
Professor Kruschwitz added: “Human language may seem easy for us to understand, but it is full of complexities when it comes to how phrases, people, places and ideas are connected. Collecting data from people playing TileAttack means we can collect knowledge about natural language to help computers fully understand the rules of language. TileAttack is part of our research direction of 'gamifying' every single step in a long pipeline of natural language processing steps."
The project is being funded as part of a €2.5 million ERC (European Research Council) Advanced Grant awarded to Essex and Queen Mary led by Professor Massimo Poesio, and also involving Professor Richard Bartle and Dr Jon Chamberlain at Essex.
The researchers are also working in partnership with the Linguistic Data Consortium, which was formed in 1992 to address the critical data shortage facing language technology research and development.