|  | OLAC Record oai:mulce.org:mce-infral-tagged_blogs | 
| Metadata | ||
| Title: | Structured, tokenized and tagged data from Infral's blogs | |
| Access Rights: | open access after registration | |
| Audience: | Researchers or teachers in educational sciences or linguistics | |
| Bibliographic Citation: | Laurent, M. (2011). Structured, tokenized and tagged data from Infral's blogs. LETEC Corpus from INFRAL collection, Chanier, T. (editor). Mulce.org : Clermont Université. [oai : mulce.org:mce-infral-tagged_blogs ; http://repository.mulce.org ] | |
| Conforms To: | IMS-CP for packaging | |
| Contributor (author): | Laurent Mario | |
| Contributor (compiler): | Laurent Mario | |
| Chanier Thierry | ||
| Contributor (data_inputter): | Laurent Mario | |
| Contributor (depositor): | Chanier Thierry | |
| Contributor (editor): | Chanier Thierry | |
| Contributor (researcher): | Laurent Mario ; Chanier Thierry | |
| Creator: | Laurent Mario ; Chanier Thierry | |
| Creator (URI): | Mario Laurent | |
| Date Created (W3CDTF): | 2011-12-02 | |
| Description: | This corpus is based on data extracted from the global Learning & Teaching Corpus Infral archived in the data repository Mulce : http://repository.mulce.org. It was created by Mario Laurent based on his Masters' project carried out in Laboratoire de Recherche sur le Langage, Université Blaise Pascal, Clermont-Ferrand. | |
| Structuring language interactions into exploitable corpora is necessary to analyze the data from the Infral project. To understand the development of intercultural competences we have to quantify the production of the different participants, such as language use or lexical diversity. In order to achieve this, we used Python programming language and the NLTK library. During the Infral course, participants from a French and a German university communicated using both languages via blogs. We developed a program that converts plain text from Infral's blogs into a structured XML file where each message is tokenized into words. Each word is tagged according to its form and its original language. | ||
| Extent: | 28 500 ko | |
| Format (IMT): | text/xml | |
| application/pdf | ||
| Identifier: | mce-infral-tagged_blogs | |
| Identifier (URI): | http://mulce.univ-bpclermont.fr:8080/PlateFormeMulce/VIEW/PUBLIC/03/VMeta.do?adr=Infral%2FCorpus_objets%2Fmce-infral-tagged_blogs | |
| Language: | French | |
| German | ||
| Language (ISO639): | fra | |
| deu | ||
| Publisher: | Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org | |
| References: | Abendroth-Timmer, D., Bechtel, M., Chanier, T. & Ciekanski, M. (2009) "From developing to investigating intercultural competence in practice through oral and written interactions in online exchanges", Kongress für Fremdsprachendidaktik der Deutschen Gesellschaft für Fremdsprachenforschung (DGFF-Tagung), Universität Leipzig, octobre 2009. [http://edutice.archives-ouvertes.fr/edutice-00548891/ ] | |
| Abendroth-Timmer , D., Chanier, T., Ciekanski , M., Bechtel M. & Henning E-V. (2010) "Du développement à l’investigation de la compétence interculturelle en pratique à partir des interactions à l’oral et à l’écrit dans des échanges en ligne à distance." Colloque "Plurilingualism and Pluriculturalism in a Globalised World: which Pedagogy?" (PLIDAM), 17-19 Juin, Paris. | ||
| Laurent, M. (2011). Structuration des données des blogues de la formation Infral à l’aide des outils de programmation Python et NLTK. Report of Master 2 Sciences du Langage, Univertié Blaise Pascal | ||
| Chanier, T. & Ciekanski, M. (2010). Utilité du partage des corpus pour l'analyse des interactions en ligne en situation d'apprentissage : un exemple d'approche méthodologique autour d'une base de corpus d'apprentissage. ALSIC - Apprentissage des Langues et Systèmes d'Information et de Communication 13 [http://edutice.archives-ouvertes.fr/edutice-00486676/ ] | ||
| Reffay, C, Chanier, T., Noras, M. & Betbeder, M.-L. (2008). Contribution à la structuration de corpus d'apprentissage pour un meilleur partage en recherche. In Basque, J. & Reffay, C. (dir.), numéro spécial EPAL (échanger pour apprendre en ligne), Sciences et Technologies de l'Information et de la Communication pour l'Education et la Formation (STICEF), 15, [http://sticef.univ-lemans.fr/num/vol2008/01-reffay/sticef_2008_reffay_01p.pdf , http://edutice.archives-ouvertes.fr/edutice-00159733 ] | ||
| References (URI): | http://edutice.archives-ouvertes.fr/edutice-00548891/ | |
| http://edutice.archives-ouvertes.fr/edutice-00486676/ | ||
| http://sticef.univ-lemans.fr/num/vol2008/01-reffay/sticef_2008_reffay_01p.pdf | ||
| Requires: | mce-infral-letec-all | |
| Rights: | Rights holders of this corpus are: Thierry Chanier ; Dagmar Abendroth-Timmer; Maud Ciekanski ; Mark Bechtel ; Laurent Mario ; licence = http://creativecommons.org/licenses/by-nc-sa/2.0/ | |
| Rights (URI): | http://lrl-diffusion.univ-bpclermont.fr/mulce/metadata/vdex/mce_licence.xml | |
| Spatial Coverage (ISO3166): | DE | |
| FR | ||
| Spatial Coverage (TGN): | 7005286 | |
| 7008356 | ||
| Subject: | NLP; XML; telecollaboration ; intercultural; online teaching | |
| French language | ||
| Subject (ISO639): | fra | |
| Subject (LCSH): | Education | |
| Data processing | ||
| Computer-assisted instruction | ||
| Language and languages | ||
| Study and teaching | ||
| Subject (OLAC): | applied_linguistics | |
| discourse_analysis | ||
| text_and_corpus_linguistics | ||
| Temporal Coverage: | name=Infral course ; start=2008-09-29; end=2009-01-09 | |
| name=Master Project ; start=2011-03-01; end=2011-30-06 | ||
| Type (DCMI): | Dataset | |
| Collection | ||
| Type (Discourse): | dialogue | |
| narrative | ||
| Type (OLAC): | primary_text | |
| OLAC Info | ||
| Archive: | Multimodal Learning and teaching Corpora Exchange | |
| Description: | http://www.language-archives.org/archive/mulce.org | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
| OAI Info | ||
| OaiIdentifier: | oai:mulce.org:mce-infral-tagged_blogs | |
| DateStamp: | 2012-09-05 | |
| GetRecord: | OAI-PMH request for simple DC format | |
| Search Info | ||
| Citation: | Mario Laurent; Laurent Mario ; Chanier Thierry. 2011. Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org. | |
| Terms: | area_Europe country_DE country_FR dcmi_Collection dcmi_Dataset iso639_deu iso639_fra olac_applied_linguistics olac_dialogue olac_discourse_analysis olac_narrative olac_primary_text olac_text_and_corpus_linguistics | |
| Inferred Metadata | ||
| Country: | France | |
| Area: | Europe | |