Run Submission Guidelines (BTEC Task)

Input Files:

  • Each participant must translate two input files, the testsets of IWSLT 2009 and IWSLT 2010, for each translation task they registered for.

    • (Arabic)
      • BTEC/Arabic-English/test/TXT/IWSLT09_BTEC.testset.ar.txt
      • BTEC/Arabic-English/test/TXT/IWSLT10_BTEC.testset.ar.txt
    • (French)
      • BTEC/French-English/test/TXT/IWSLT09_BTEC.testset.fr.txt
      • BTEC/French-English/test/TXT/IWSLT10_BTEC.testset.fr.txt
    • (Turkish)
      • BTEC/Turkish-English/test/TXT/IWSLT09_BTEC.testset.tr.txt
      • BTEC/Turkish-English/test/TXT/IWSLT10_BTEC.testset.tr.txt

Data Format:

  • The same formatas the BTEC Develop Corpus. For details, refer to the respective README files:

    • BTEC/Arabic-English/README.BTEC_AE.txt
    • BTEC/French-English/README.BTEC_FE.txt
    • BTEC/Turkish-English/README.BTEC_TE.txt
  • Input text is case-sensitive and contains punctuation
  • English MT output should:

    • be in the same format (<SentenceID>\01\MT_output_text) as the input file
    • be case-sensitive, with punctuation
    • contain the same amount of lines (=sentences) as the input file

      • Example:

        TEST_IWSLT10_001\01\This is the English translation of the 1st input sentence.
        TEST_IWSLT10_002\01\This is the English translation of the 2nd input sentence.
        TEST_IWSLT10_003\01\
        TEST_IWSLT10_004\01\Sentence ID=003 could not be translated, thus the translation is empty!
        TEST_IWSLT10_005\01\...
        ...
        TEST_IWSLT10_469\01\This is the English translation of the last input sentence.
        

Run Submission Format:

  • Each participant must translate and submit at least one translation of the given input files for each of the translation task they registered for. 
  • Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run that will be used for human assessments. All other run submissions are treated as CONTRASTIVE runs. If none of the runs are marked as PRIMARY, the latest submission (according to the file time-stamp) will be used as the primary run submission.
  • Runs must be submitted as a gzipped TAR archive (format see below) and send as an email attachement to Michael Paul (michael DOT paul AT nict DOT go DOT jp).

     

    TAR archive file structure:

    <UserID>/<TestSet>_<TranslationTask>.<UserID>.primary.txt
            /<TestSet>_<TranslationTask>.<UserID>.contrastive1.txt
            /<TestSet>_<TranslationTask>.<UserID>.contrastive2.txt
            /...

    where:

    <UserID> = user ID of participant used to download data files
    <TestSet> = IWSLT09 | IWSLT10
    <TranslationTask> = BTEC_AE | BTEC_FE | BTEC_TE

    Examples:

        nict/IWSLT10_BTEC_AE.nict.primary.txt
            /IWSLT10_BTEC_FE.nict.primary.txt
            /IWSLT10_BTEC_FE.nict.contrastive1.txt
            /IWSLT10_BTEC_FE.nict.contrastive2.txt
            /IWSLT10_BTEC_FE.nict.contrastive3.txt
            /IWSLT10_BTEC_TE.nict.primary.txt
            /IWSLT10_BTEC_TE.nict.contrastive1.txt
  • Re-submitting your runs is allowed as long as the mails arrive BEFORE the submission deadline. If multiple TAR archives are submitted by the same participant, only the runs from the most recent submission mail will be used for the IWSLT 2010 evaluation, and those from previous mails will be ignored.