Programs for training / testing / ensembling BERT

Overview

This is a set of programs of training / testing BERT for legal textual entailment (COLIEE Task4).
Programs: BERT.zip

Description

This is a set of tools for training / testing BERT for legal textual entailment task. You can train BERT with TSV file (label, question, relevant article) as training data. The model output certainty factor to same directory as judgement of problems of test data. You can also ensemble multiple certainty factor of BERT.

Requirements

Usage

Programs for training BERT
  1. Put training data named “train.tsv”, augmentation data named “augmentation.tsv” on the input directory. If you test the model, put test data named “test.tsv” on the test data directory.
  2. Run “train.py
    $ python train.py -i input_directory/ -t test_data_directory -o output_directory/ -m model_directory/ -e
    Argument
    --input/-i: input data directory (train.tsv and augmentation.tsv)
    --test/-t: (optional)test data directory (test.tsv)
    --output/-o: output directory
    --model_path/-m: directory of storing a model
    --evaluate/-e: a flag of evaluating a model after training
  3. The model which you trained (BERT_Task4.h5) is on a directory you specified with “-m”. if you run with test data, a certainty factor file (CF.tsv) and a prediction file (prediction.tsv) will be output on the output directory and detailed classification report will be displayed.
Program for prediction/test
  1. Put test data named “test.tsv” on the input data directory and put a BERT model on the model directory.
  2. Run “test.py”
    $ python test.py -i input_directory/ -o output_directory/ -m model_directory/ -e
    Argument
    --input/-i: input data (test.tsv) directory
    --output/-o: output directory
    --model_path/-m: a model file using for test
    --evaluate/-e: a flag of evaluating a model
  3. A prediction file (prediction.tsv) and a certainty factor file (CF.tsv) will be output on the output directory. If you run the program with “-e”, detailed classification report will be displayed.
Program for ensemble prediction
  1. Put certainty factor files named “CF*.tsv” on test data directory.
  2. Run “ensemble_predict.py
          $ python ensemble_predict.py -t test_data_directory/ -o output_directory/
        
    Argument
    --test_data_path/-t: directory of certainty factor files (CF*.tsv) for ensemble
    --output/-o: output directory ”
  3. A prediction file (prediction.tsv) will be output on the output directory.
Program for model selection/testing ensemble
  1. Put certainty factor files named “CF*.tsv” on validation data directory. If you test ensemble, put certainty factor files named “CF*.tsv” on test data directory.
  2. Run “ensemble_test.py”
          $ python ensemble_test.py -t test_data_directory/ -o output_directory/
        
    Argument
    --val_data_path/-v: validation data directory
    --test_data_path/-t: (optional) test data directory
    --output/-o: output directory
    --how_many_ensemble/-r: number of models for ensemble
  3. An Ensemble result file will be output on the output directory.

Generated Models