The EMILLE/CIIL Corpus

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Document Type:
    text
  • Language:
    Tamil
    Sinhala; Sinhalese
    Urdu
    Panjabi; Punjabi
    English
    Kannada
    Telugu
    Marathi
    Bengali
    Kashmiri
    Gujarati
    Malayalam
    Assamese
    Oriya
    Hindi
  • Additional Information
    • Publication Information:
      ELRA (European Language Resources Association)
    • Publication Date:
      2004
    • Collection:
      OLAC: Open Language Archives Community
    • Abstract:
      The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu. The EMILLE monolingual corpora contain approximately 92,799,000 words (including 2,627,000 words of transcribed spoken data for Bengali, Gujarati, Hindi, Punjabi and Urdu). The parallel corpus consists of 200,000 words of text in English and its accompanying translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. The annotated component includes the Urdu monolingual and parallel corpora automatically annotated for parts-of-speech, together with twenty written Hindi corpus files annotated to show the nature of demonstrative use. All other components are annotated at the sentence level. The corpus is marked up using CES-compliant SGML and encoded using Unicode. References: Xiao, Z, McEnery, A., Baker, P. and Hardie, A. 2004. ‘Developing Asian language corpora: standards and practice’ in Sornlertlamvanich, V., Tokunaga, T. and Huang, C. (eds.) Proceedings of the Fourth Workshop on Asian Language Resources, pp. 1-8. March 25, Sanya.This database is available for research use by academic organisations only. For a use by commercial organisations, a subset of the EMILLE/CIIL Corpus is available under the reference ELRA-W0038 The EMILLE Lancaster Corpus.
    • File Description:
      Not specified
    • Relation:
      http://catalog.elra.info/en-us/repository/browse/ELRA-W0037/
    • Online Access:
      http://catalog.elra.info/en-us/repository/browse/ELRA-W0037/
    • Rights:
      Rights available for: nonCommercialUse
    • Accession Number:
      edsbas.963CF38A
  • Citations
    • ABNT:
      The EMILLE/CIIL Corpus. [s. l.], 2004. Disponível em: https://ezproxy.mscc.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.963CF38A. Acesso em: 21 mar. 2023.
    • AMA 11th Edition:
      The EMILLE/CIIL Corpus. January 2004. Accessed March 21, 2023. https://ezproxy.mscc.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.963CF38A
    • APA 7th Edition:
      The EMILLE/CIIL Corpus. (2004).
    • Chicago 17th Edition:
      “The EMILLE/CIIL Corpus.” 2004, January. https://ezproxy.mscc.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.963CF38A.
    • Harvard:
      ‘The EMILLE/CIIL Corpus’ (2004). Available at: https://ezproxy.mscc.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.963CF38A (Accessed: 21 March 2023).
    • Harvard: Australian:
      ‘The EMILLE/CIIL Corpus’ 2004, viewed 21 March 2023, .
    • MLA 9th Edition:
      The EMILLE/CIIL Corpus. Jan. 2004. EBSCOhost, ezproxy.mscc.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.963CF38A.
    • Chicago 17th Edition:
      “The EMILLE/CIIL Corpus,” January 1, 2004. https://ezproxy.mscc.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.963CF38A.
    • Vancouver/ICMJE:
      The EMILLE/CIIL Corpus. 2004 Jan 1 [cited 2023 Mar 21]; Available from: https://ezproxy.mscc.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.963CF38A