WOJ-DB: a Word Order Judgement Database
Survey data collected for project leading to Ph.D thesis entitled "The mat sat on the cat: Investigating structure in the evaluation of order in machine translation"
by Martin McCaffery
supervised by Mark-Jan Nederhof
23 June 2017


INTRODUCTION
This dataset contains both subjective (human) and mechanical judgements of the quality of sentences, along with metadata on both sentences and their judges. It was produced for the abovementioned PhD project but was intended as a tool to aid researchers in as broad an area as possible.
A more detailed overview can be found in the Ph.D. thesis text, section 6.2.


FORMAT & FILES
The data is provided in tab-separated-values format, suitable for use with the read.csv() command in R.

- sentences.txt contains three primary columns. The first is a human-produced fluent "reference" translation; the second is a machine-produced "hypothesis" translation; and the third is a unique identifier for the pair.

- dataset.csv contains all scores and metadata detailed below. Any references are to the Ph.D. thesis which is freely available online.


DATASET FIELDS

### SENTENCE IDENTIFICATION
sent.id								unique ID for sentence, matching sentences.txt
sent.index							position (1..50) where sentence appeared within its given survey
participant.id						unique ID for this questionnaire and participant

### PARTICIPANT METADATA (thesis section 6.3.2)
english.native						does the participant consider themselves a native English speaker?
age									participant's self-reported age
english.since						number of years the participant has spoken English, if not native
langs.minimal						number of languages spoken at a minimal fluency level
langs.moderate						number of languages spoken at a moderate fluency level
langs.fluent						number of languages spoken fluently
langs.native						number of languages spoken as a native
langs.points						score based on languages spoken: see 6.5.2
gender								participant's self-reported gender: see 6.3.2 and page 167
nationality							participant's self-reported nationality: see 6.3.2 and page 167
education							participant's self-reported education level: see 6.3.2 and page 167
caring								participant's self-reported interest in language: see 6.3.2 and page 167
dyslexia							whether participant has dyslexia: see 6.3.2 and page 167

### HUMAN SCORES (thesis section 6.2.1 and page 167)
human.hol							reported overall sentence quality: section 6.2.1 question 1
human.ord							reported quality of word ordering: section 6.2.1 question 2

### SENTENCE METADATA (thesis section 6.3.4)
is.translation						is the hypothesis a machine translation of the same source sentence as the reference?
is.permutation						inverse of is.translation: was this hypothesis produced by altering the reference itself as per thesis section 6.2.3?
permutation.type					if (is.permutation), text name of permutation type
is.order							equal to (permutation.type == 'order')
is.phrase							equal to (permutation.type == 'phrase')
is.choice							equal to (permutation.type == 'choice')
is.swap								equal to (permutation.type == 'swap')
ref.length							number of words in reference sentence
hyp.length							number of words in hypothesis sentence
src.length							number of words in source sentence (not English; not provided)
degree								if (is.permutation), number of words altered by permutation algorithm
human.adq							WMT score for sentence, normalised per section 6.1.1.5

real.score							either 'degree' or 'human.adq', as appropriate
real.score.factorised				real.score, rounded to a five-point scale
sent.id.count						how many times this sentence appeared across all surveys
both.count							how many rows contain these exact human.hol and human.ord scores

### AUTOMATIC SCORES FOR TOOLS PRODUCED FOR THESIS
# names contain, in order: <tool name>_<parser>_<tagger>_<aligner>_<config>
# - "dted" and "derp" are the names of the tools (chapters 4 and 5 respectively)
# - "mlt" and "stn" are the MaltParser and Stanford Parser (section 3.4.3)
# - "met" and "stn" are the MaxEnt Treebank and Stanford Parser (section 3.4.2)
# - "cdec" and "giza" are the cdec and GIZA++ projects (section 3.4.5)
# - 'f' at the end indicates dependency trees were flattened before use (section 3.4.4)
# - other suffix letters indicate DTED configurations (section 4.3)
# The full list of columns for DTED and DERP is as follows:
dted_mlt_met_cdec_fcl   dted_mlt_met_cdec_fco
dted_mlt_met_cdec_cl    dted_mlt_met_cdec_co    dted_mlt_met_cdec_c
dted_mlt_met_cdec_fc    dted_mlt_met_cdec_fb    dted_mlt_met_cdec_b
dted_mlt_met_giza_fcl   dted_mlt_met_giza_fco   dted_mlt_met_giza_cl
dted_mlt_met_giza_co    dted_mlt_met_giza_c     dted_mlt_met_giza_fc
dted_mlt_met_giza_fb    dted_mlt_met_giza_b     dted_mlt_stn_cdec_fco
dted_mlt_stn_cdec_co    dted_mlt_stn_giza_fcl   dted_mlt_stn_giza_fco
dted_mlt_stn_giza_cl    dted_mlt_stn_giza_co    dted_mlt_stn_giza_c
dted_mlt_stn_giza_fc    dted_mlt_stn_giza_fb    dted_mlt_stn_giza_b
dted_stn_stn_giza_fcl   dted_stn_stn_giza_fco   dted_stn_stn_giza_cl
dted_stn_stn_giza_co    dted_stn_stn_giza_c     dted_stn_stn_giza_fc
dted_stn_stn_giza_fb    dted_stn_stn_giza_b     dted_mlt_met_meteor_c
dted_mlt_met_meteor_fc  dted_mlt_met_meteor_fb  dted_mlt_met_meteor_b
dted_mlt_stn_meteor_c   dted_mlt_stn_meteor_fc  dted_mlt_stn_meteor_fb
dted_mlt_stn_meteor_b   dted_stn_stn_meteor_c   dted_stn_stn_meteor_fc
dted_stn_stn_meteor_fb  dted_stn_stn_meteor_b   derp_mlt_met_cdec_f
derp_mlt_met_cdec       derp_mlt_met_giza_f     derp_mlt_met_giza
derp_mlt_stn_cdec_f     derp_mlt_stn_cdec       derp_mlt_stn_giza_f
derp_mlt_stn_giza       derp_stn_stn_cdec_f     derp_stn_stn_cdec
derp_stn_stn_giza_f     derp_stn_stn_giza

### AUTOMATIC SCORES FOR OTHER TOOLS
meteor.1.5							Meteor scores calculated during PhD project
meteor.1.5.wo						Meteor (chunking) scores: see 3.5.4
meteor.wmt							Meteor scores from WMT data: see 7.3.1
depref.align						scores extracted from WMT data: see 7.3.1
depref.exact						scores extracted from WMT data: see 7.3.1
LEPOR_v3							scores extracted from WMT data: see 7.3.1
MEANT								scores extracted from WMT data: see 7.3.1
nLEPOR.baseline						scores extracted from WMT data: see 7.3.1
reverted							scores extracted from WMT data: see 7.3.1
SIMPBLEU.prec						scores extracted from WMT data: see 7.3.1
SIMPBLEU.recall						scores extracted from WMT data: see 7.3.1
terrorcat							scores extracted from WMT data: see 7.3.1
UMEANT								scores extracted from WMT data: see 7.3.1
apac								scores extracted from WMT data: see 7.3.1
BEER								scores extracted from WMT data: see 7.3.1
DCUcomb.seg							scores extracted from WMT data: see 7.3.1
DCU_seg								scores extracted from WMT data: see 7.3.1
DiscoTK.light.kool					scores extracted from WMT data: see 7.3.1
DiscoTK.light						scores extracted from WMT data: see 7.3.1
DiscoTK.party						scores extracted from WMT data: see 7.3.1
DiscoTK.party.tuned					scores extracted from WMT data: see 7.3.1
nrc_amber							scores extracted from WMT data: see 7.3.1
nrc_bleu							scores extracted from WMT data: see 7.3.1
upc.ipa								scores extracted from WMT data: see 7.3.1
upc.stout							scores extracted from WMT data: see 7.3.1
VERTa.EQ							scores extracted from WMT data: see 7.3.1
VERTa.W								scores extracted from WMT data: see 7.3.1
BEER_Treepel						scores extracted from WMT data: see 7.3.1
BS									scores extracted from WMT data: see 7.3.1
chrF3								scores extracted from WMT data: see 7.3.1
chrF								scores extracted from WMT data: see 7.3.1
DPMFcomb							scores extracted from WMT data: see 7.3.1
DPMF								scores extracted from WMT data: see 7.3.1
dreem								scores extracted from WMT data: see 7.3.1
LeBLEU.default						scores extracted from WMT data: see 7.3.1
LeBLEU.optimized					scores extracted from WMT data: see 7.3.1
meteor_wsd							scores extracted from WMT data: see 7.3.1
ratatouille							scores extracted from WMT data: see 7.3.1
UoW.LSTM							scores extracted from WMT data: see 7.3.1
upf.cobalt							scores extracted from WMT data: see 7.3.1
VERTa.70Adeq30Flu					scores extracted from WMT data: see 7.3.1