Fine-Grained Entity Recognizer
This distribution contains the source code for the experiments presented in the following research publication (PDF):
Xiao Ling and Daniel S. Weld (2012).
"Fine-Grained Entity Recognition",
in Proceedings OF THE TWENTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI), 2012.
One can test the trained model on the evaluation data or new data as they wish.
Run ./downloadModel.sh
to download the Model and save it at the root directory.
A better model has been trained and can be fetched by ./downloadModel.sh new
. Change the config value accordingly.
sbt >= 0.13.0
To run the experiments in the AAAI-12 paper, you can proceed as follows:
$ ./run.sh "aaai/exp.conf" &> aaai/exp.log
To make predictions on new data, please see package edu.washington.cs.figer.FigerSystem
for example code or run:
$ sbt "runMain edu.washington.cs.figer.FigerSystem <text_file>"
Alternatively, you can change the parameter values (e.g. the input file name) in config/figer.conf
and get a more structured output by running:
$ ./run.sh "config/figer.conf"
$ sbt assembly
Run
$ sbt ~container:start
and go to localhost:8080/index.html
for a simple web demo.
The training data train.data.gz
is gzipped and serialized in Protocol Buffer. Please see entity.proto
in the code package for the definitions.
Download link
In config/figer.conf
, make the following changes:
useModel=false
modelFile=<the output model file>
trainFile=<training file> # the training file has to follow the specs from `entity.proto`. See `train.data.gz` for example
Then run ./run.sh config/figer.conf
to train a new model (It will need over 10G memory and about an hour to finish).
A mapping file from mids to titles is available here.