PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (2024)

UNIR

João Ricardo Pessoa Xavier de Siqueira 07/10/2024

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (3)

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (4)

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (5)

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (6)

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (7)

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (8)

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (9)

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (10)

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (11)

PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (12)

Prévia do material em texto

<p>English for</p><p>Academic Purposes</p><p>reflections, description & pedagogy</p><p>Simone Sarmento, Rozane Rebechi,</p><p>Marine Laísa Matte (Org.)</p><p>English for</p><p>Academic Purposes</p><p>reflections, description & pedagogy</p><p>Simone Sarmento</p><p>Rozane Rebechi</p><p>Marine Laísa Matte</p><p>(Org.)</p><p>Porto Alegre • 2024 • 1ª edição</p><p>Conselho Editorial</p><p>Cristiane Tavares – Instituto Vera Cruz/SP</p><p>Daniela Mussi – UFRJ</p><p>Idalice Ribeiro Silva Lima – UFTM</p><p>Joanna Burigo – Emancipa Mulher</p><p>Leonardo Antunes – UFRGS</p><p>Lucia Tennina – UBA</p><p>Luis Augusto Campos – UERJ</p><p>Luis Felipe Miguel – UnB</p><p>Maria Amelia Bulhões – UFRGS</p><p>Regina Dalcastagnè – UnB</p><p>Regina Zilberman – UFRGS</p><p>Renato Ortiz – Unicamp</p><p>Ricardo Timm de Souza – PUCRS</p><p>Rodrigo Saballa de Carvalho – UFRGS</p><p>Rosana Pinheiro Machado – University College Dublin</p><p>Susana Rangel – UFRGS</p><p>Winnie Bueno – Winnieteca</p><p>copyright © 2024 Simone Sarmento, Rozane Rebechi, Marine Laísa Matte</p><p>Projeto gráfico e edição: Editora Zouk</p><p>Revisão: Simone Sarmento, Rozane Rebechi, Marine Laísa Matte</p><p>Imagem da capa: SKELL</p><p>direitos reservados à</p><p>Editora Zouk</p><p>r. Cristóvão Colombo, 1343 sl. 203</p><p>90560-004 – Floresta – Porto Alegre – RS – Brasil</p><p>f. 51. 3024.7554</p><p>www.editorazouk.com.br</p><p>Dados Internacionais de Catalogação na</p><p>Publicação (CIP) de acordo com ISBD</p><p>Elaborado por Vagner Rodolfo da Silva - CRB-8/9410</p><p>E58</p><p>English for Academic purposes [recurso eletrônico] : reflections,</p><p>description e pedagogy / organizado por Simone Sarmento, Rozane Rebechi,</p><p>Marine Laisa Matte. - Porto Alegre, RS : Zouk, 2024.</p><p>268 p. ; ePUB.</p><p>Inclui bibliografia.</p><p>ISBN: 978-65-5778-135-7 (Ebook)</p><p>1. Linguística. I. Sarmento, Simone. II. Rebechi, Rozane. III. Matte, Marine</p><p>Laisa. IV. Título.</p><p>CDD 410</p><p>2024-175 CDU 81’1</p><p>Contents</p><p>Exploring the complexities of EAP: a collection of voices</p><p>Simone Sarmento, Rozane Rebechi and Marine Laísa Matte</p><p>7</p><p>The role of Corpus Linguistics in EAP</p><p>Deise P. Dutra and Tony Berber Sardinha</p><p>14</p><p>From specialized corpus to the EAP classroom: integrating authentic</p><p>data into materials design</p><p>Ana Eliza Pereira Bocorny, Ana Luiza Freitas and Rozane Rodrigues</p><p>Rebechi</p><p>55</p><p>Do-It-Yourself Corpora to Support SHAPE and STEM Research Paper</p><p>Writing</p><p>Paula Tavares Pinto, Luciano Franco da Silva, Talita Serpa and Diva</p><p>Cardoso de Camargo</p><p>97</p><p>Creating a local learner corpus: Insights on project design and data</p><p>analysis from the pilot phase</p><p>Sandra Zappa-Hollman, Alfredo Afonso Ferreira, Greta Perris, Simone</p><p>Sarmento, Marine Laísa Matte and Laura Baumvol</p><p>127</p><p>The role of genre in academic language use: the case of Critiques and</p><p>Case Studies in BAWE</p><p>Marine Laísa Matte, Deise Amaral and Larissa Goulart</p><p>155</p><p>Investigating Brazilian English Learners’ Use of Academic</p><p>Collocations: A Corpus-Based Study</p><p>Marine Laísa Matte and Simone Sarmento</p><p>178</p><p>From corpus to classroom: evaluating Web-based tools to teach</p><p>collocations</p><p>Larissa Goulart, Maria Kostromitina and Jennifer Klein</p><p>204</p><p>Driving forces to adopt EMI: scholars’ perceived benefits of English</p><p>medium of instruction in Brazilian higher education</p><p>Laura Baumvol, Lucas Marengo and Simone Sarmento</p><p>243</p><p>About the authors</p><p>263</p><p>7</p><p>Exploring the complexities of EAP: a collection of voices</p><p>Simone Sarmento (UFRGS)</p><p>Rozane Rebechi (UFRGS)</p><p>Marine Laísa Matte (UFRGS/IFSul)</p><p>In this introduction, we aim to discuss aspects related to English for</p><p>Academic Purposes (EAP), to highlight the significance of this collection</p><p>to the broader field of EAP, and to provide a brief overview of the book and</p><p>its contributions.</p><p>EAP refers to the study and use of English in academic settings, with</p><p>a focus on the development of the language skills necessary to succeed in</p><p>higher education (Hyland, 2009). This includes the improvement of com-</p><p>petencies in academic reading, writing, listening, and speaking, as well as</p><p>the ability to understand and produce discipline-specific vocabulary and</p><p>discourse. The field has become increasingly important in recent years, as</p><p>the demand for English language proficiency continues to grow in academ-</p><p>ic contexts around the world. As a result, there has been a surge of research</p><p>and teaching practices focused on language skills and competencies re-</p><p>quired for academic success, from writing research papers to participating</p><p>in academic discussions (Biber, 2006).</p><p>With the field of EAP being an active area of research, new studies are</p><p>being published regularly. These studies usually rely on a myriad of meth-</p><p>ods, since different methodological procedures can be employed to answer</p><p>research questions related to the use of academic language. Among them,</p><p>Corpus Linguistics (CL) comes as a highly productive research method-</p><p>ology for investigating the demands of academic communication, includ-</p><p>ing the usage of language. One of the greatest contributions of CL to the</p><p>field of EAP is that it enables access to large amounts of authentic language</p><p>data, which can be used to identify and analyze the lexical, grammatical,</p><p>and discourse features of academic language (Nesi, 2016). As a result, EAP</p><p>8</p><p>researchers and instructors can identify the most frequent and relevant lan-</p><p>guage patterns creating targeted language learning materials and activities</p><p>for students. Thus, students can develop their own academic writing and</p><p>speaking skills by studying and practicing how to use language patterns</p><p>and structures that are typical of academic discourse.</p><p>Finally, CL can facilitate the identification of patterns among differ-</p><p>ent academic disciplines, enabling instructors to tailor their EAP teaching</p><p>to the students’ specific needs in different fields. For example, the language</p><p>used in medical research papers is likely to differ from that used in hu-</p><p>manities papers, and CL creates opportunities to identify these patterns,</p><p>allowing instructors to provide targeted support to students based on their</p><p>individual needs. As we will show below, seven out of the eight chapters</p><p>in this book use CL to varying degrees, exemplifying the productivity of</p><p>corpus-based research for the field of EAP.</p><p>EAP also encompasses English as a Medium of Instruction (EMI), a</p><p>relatively new branch of EAP, in which the English language is used as the</p><p>primary means for delivering academic content and facilitating commu-</p><p>nication in a multilingual academic environment (Macaro, 2017). EMI in</p><p>higher education settings refers to the use of English as the primary lan-</p><p>guage of instruction for academic courses or programs in universities and</p><p>other higher education institutions where the students’ first language is</p><p>not English. The use of EMI in higher education can offer learners several</p><p>benefits, such as the opportunity to study in an international environment,</p><p>exposure to English-language academic literature and research, and the de-</p><p>velopment of language skills that can enhance future academic and profes-</p><p>sional opportunities. However, EMI also poses challenges, such as ensuring</p><p>that students have sufficient language proficiency to understand the subject</p><p>matter and instruction and that instructors are able to deliver high-quality</p><p>instruction in English (Marengo, 2022). Research on EMI seeks to better</p><p>illuminate the benefits and challenges of using English as a teaching lan-</p><p>guage and to identify effective strategies and best practices for promoting</p><p>both language and subject learning in EMI settings.</p><p>This book, entitled “English for Academic Purposes: Reflections, de-</p><p>scription & pedagogy”, brings together nine chapters (the first being this</p><p>9</p><p>introduction) written by a diverse group of scholars and practitioners from</p><p>different universities that share a common interest in exploring the com-</p><p>plexities of academic language and communication. The contributors offer</p><p>unique perspectives on the possibilities, challenges, and opportunities of</p><p>researching, teaching, and learning EAP.</p><p>This book is a valuable resource for the field of English for Academic</p><p>Purposes (EAP) for several reasons. First, it provides a diverse range</p><p>achieve a particular rhetorical purpose and, in this way, can</p><p>serve as entry points into academic language, thereby enabling EAP curric-</p><p>ulum developers to design instructional materials centered around macro</p><p>functions rather than around individual linguistic features.</p><p>In this section, we review multi-dimensional analysis studies that</p><p>provide an overview of academic language by looking at articles, article</p><p>sections, reports, textbooks, and campus registers. The first of these studies</p><p>38</p><p>was conducted by Gray (2013), who analyzed variation in research articles</p><p>by academic discipline, using a corpus of 270 research articles compris-</p><p>ing three sub-registers (theoretical, qualitative, and quantitative research</p><p>reports) from six disciplines (philosophy, history, applied linguistics, po-</p><p>litical science, biology, and physics). The first dimension, labeled “academ-</p><p>ic involvement and elaboration versus information density,” distinguishes</p><p>between research articles that interact with the reader and present frequent</p><p>evaluation, argumentation, and interpretation with overt textual signals</p><p>(the positive pole) and texts that exhibit high-density informational lan-</p><p>guage (the negative pole). The positive pole is marked by such linguistic</p><p>features as first-person pronouns, predicative adjectives, modals (predic-</p><p>tion, possibility, necessity), subordinating conjunctions, adverbial con-</p><p>juncts, and a range of that-complement clauses and to-clauses. In contrast,</p><p>the negative pole comprises nouns, prepositions, passive voice, past tense,</p><p>a high type–token ratio, and long words. The distribution of the disciplines</p><p>shows a contrast basically between one single discipline (philosophy), with</p><p>very high scores on the positive pole, and all the other disciplines, which</p><p>have either negative scores or scores close to zero on the positive pole.</p><p>Thus, the involved and elaborated style is very discipline specific whereas</p><p>the high-information style is more commonly embraced by different dis-</p><p>ciplines. Yet ample variation exists within each discipline; although most</p><p>disciplines prefer an information focus rather than an involved, elaborated</p><p>style, they also allow for both styles. The exception is philosophy, which in-</p><p>cludes the involved, elaborated style only), and quantitative biology (which</p><p>includes the high-information style only). The two theoretical disciplines</p><p>of philosophy and theoretical physics both have texts with positive scores</p><p>(although theoretical physics includes texts with negative scores, unlike</p><p>philosophy), suggesting that the involved, elaborated style is generally pre-</p><p>ferred by theoretical papers.</p><p>The second dimension distinguishes between contextualized</p><p>narration (positive pole) and procedural description (negative pole).</p><p>Contextualized narration is marked by features such as past tense verbs,</p><p>third-person pronouns, coordinating conjunctions, that- and to-com-</p><p>plement clauses, long words, a high type–token ratio, and long texts.</p><p>39</p><p>Meanwhile, procedural description is marked by nouns, attributive adjec-</p><p>tives, and passive voice. The way the disciplines are distributed along the</p><p>dimensions shows two clusters: one comprising qualitatively oriented dis-</p><p>ciplines (history, political science, and applied linguistics), with high scores</p><p>on the positive pole, and the other comprising theoretical and quantita-</p><p>tive disciplines, with low positive scores or negative scores. This finding</p><p>suggests that contextualized narration is a style largely preferred for qual-</p><p>itative reports, whereas procedural description is a common style used in</p><p>non-qualitative articles.</p><p>The third dimension is based on a distinction between a human (pos-</p><p>itive pole) and non-human focus (negative pole). The positive pole includes</p><p>such linguistic characteristics as second- and third-person pronouns; men-</p><p>tal, cognition, and communication verbs; and that- and to-complement</p><p>clauses. The negative pole, in contrast, comprises adjectives (in attributive</p><p>position), adverbs, and prepositions. Disciplines having a human focus are</p><p>essentially applied linguistics (qualitative, but to a lesser degree, quantita-</p><p>tive) and philosophy whereas all the other disciplines share a non-human</p><p>focus.</p><p>Finally, the fourth dimension identifies academese as a major trait</p><p>in academic writing, which corresponds to “a concern to overtly represent</p><p>research as empirical, well-motivated and founded in previous research”</p><p>(Gray, 2013: 174). Academese is associated with the prevalent use of nomi-</p><p>nalizations, process nouns, abstract nouns, attributive adjectives, existence</p><p>verbs, that- and to-complement clauses, and long words. This is most com-</p><p>monly found in articles from applied linguistics and political science.</p><p>Although a research article is generally seen as a single unit in which</p><p>the internal variation is minimal or of limited relevance, research articles are</p><p>in fact comprised of several sections, each performing a particular function</p><p>in the text. For instance, according to Swales (1990), introductions are sup-</p><p>posed to establish a territory and a niche (problem) and occupy the niche</p><p>(present a solution), among other rhetorical moves. In contrast, methods</p><p>are supposed to lay out the procedures followed by the study and present</p><p>the data, tools, and other methodological decisions taken by the authors</p><p>when conducting the study. Given the different rhetorical purposes of the</p><p>40</p><p>different research article sections, it is legitimate to expect that variation</p><p>exists within research articles that reflects the different purposes of the var-</p><p>ious sections. The variation across the language used in different sections</p><p>should be of interest to EAP practitioners, especially those concerned with</p><p>writing instruction, as a detailed description of the most typical language</p><p>used in different sections could help them better understand and select the</p><p>teaching points necessary to prepare their students to write efficient article</p><p>sections.</p><p>Dutra and Berber Sardinha (2018, 2021) looked at variation across</p><p>sections in a corpus of applied linguistics, biology, and chemistry research</p><p>articles. Each article was segmented into individual sections—namely, ab-</p><p>stract, introduction, method, results, discussion, and conclusion. The cor-</p><p>pus comprises 900 sections for each discipline, totaling 2.9 million words.</p><p>The first dimension, labeled interpretive elaboration, includes</p><p>third-person pronouns, communication and mental verbs, that- and</p><p>to-complement clauses, wh-words, infinitives, and nominalizations. This</p><p>dimension corresponds to a distinction between applied linguistics and</p><p>the other two disciplines, as all sections from applied linguistics, especially</p><p>conclusions and discussions, exhibit positive scores on this dimension.</p><p>The second dimension, which corresponds to logical argumentation,</p><p>comprises characteristics such as present tense verbs, adverbs, adjectives in</p><p>predicative position, adverbial conjuncts, that- and to-complement clauses,</p><p>demonstrative pronouns, and prediction modals. The conclusion and dis-</p><p>cussion sections, mainly from applied linguistics, biology, and chemistry,</p><p>have higher scores on this dimension.</p><p>The third dimension reveals a distinction between informational</p><p>density (on the positive pole) and procedural narrative and description (on</p><p>the negative pole). Informational density corresponds to the dense use of</p><p>long words and adjectives in attributive position whereas procedural narra-</p><p>tive and description relies on past tense verbs, agentless passives, long sec-</p><p>tions, and activity verbs. The variation across sections shows that informa-</p><p>tional density is more typical of abstracts, conclusions, and introductions</p><p>whereas procedural narrative and description is more typical of methods</p><p>and results. Based on the results, the discipline is not a good predictor of</p><p>41</p><p>the variation. Rather, the variation is patterned along a combination of dis-</p><p>cipline and section, with no clear-cut distinctions.</p><p>For instance, biology</p><p>conclusions score high on informational density whereas biology methods</p><p>score high on narrative and description.</p><p>In general, all dimensions predict a higher share of the variation</p><p>when considering discipline and section together rather than when a sec-</p><p>tion alone or discipline alone is considered. This suggests that, because sec-</p><p>tions can be very discipline specific, care should be taken in EAP to not</p><p>generalize across disciplines when trying to characterize the language of</p><p>research article sections. Rather, EAP practitioners should be aware of the</p><p>section specificities of different disciplines when teaching their students to</p><p>write academic articles.</p><p>Whereas the previous studies reviewed thus far focused on journal</p><p>articles, the next study looked at student writing in an American university.</p><p>Hardy and Römer (2013) analyzed the Michigan Corpus of Upper-level</p><p>Student Papers (MICUSP), which includes samples of written assignments</p><p>from 16 disciplines, totaling more than 2.6 million words. The samples rep-</p><p>resent a range of registers, such as argumentative essays, proposals, reports,</p><p>and research papers, among others.</p><p>The first dimension comprises two poles: involved, academic nar-</p><p>rative (positive pole) and descriptive, informational discourse (negative</p><p>pole). The linguistic features that loaded on the positive pole of the first</p><p>dimension include verbs of different types (mental verbs, private verbs,</p><p>activity verbs), past tense verbs, that-deletion, and first- and third-person</p><p>pronouns. On the other hand, features loading on the negative pole con-</p><p>vey dense quantities of information, such as nominal features like nouns,</p><p>nominalizations, and adjectives. The disciplines are sharply distinguished</p><p>on this dimension, with the humanities, arts, and social sciences scoring on</p><p>the positive pole (particularly philosophy and education) and biological,</p><p>health, and physical sciences scoring on the negative pole (most markedly</p><p>physics and biology). The exception is linguistics, which scored in the neg-</p><p>ative pole.</p><p>The second dimension, labeled expression of opinions and men-</p><p>tal processes, primarily comprises a large number of stance (both to- and</p><p>42</p><p>that-stance clauses, controlled by adjectives and verbs) and that-comple-</p><p>ment clauses (controlled by factive, non-factive, verb of likelihood, adjec-</p><p>tive of likelihood). The disciplines are distributed along this dimension in</p><p>a similar manner as in the first dimension, with the humanities and social</p><p>sciences having higher scores on the positive pole (philosophy and edu-</p><p>cation being the top two), thereby being more readily associated with the</p><p>expression of opinions and mental processes, whereas in the remaining</p><p>disciplines the expression of opinions and mental processes is much less</p><p>common (civil engineering and physics as the most marked).</p><p>The third dimension corresponds to a distinction between situa-</p><p>tion-dependent, non-procedural evaluation (positive pole) and procedural</p><p>discourse (negative pole). The features loading on the positive pole include</p><p>a range of adverbs (including stance), verbs, pronouns, and that-comple-</p><p>ment clauses controlled by verbs of likelihood. In contrast, the negative</p><p>pole is based on nouns and passives. The register distribution along the</p><p>dimension is similar to the previous dimensions, with a split between the</p><p>humanities on one pole and the remaining sciences on the other. The hu-</p><p>manities (e.g., philosophy, English) score highly on the situation-depen-</p><p>dent, non-procedural evaluation end of the dimension whereas the natural</p><p>and exact sciences (physics, mechanical engineering) score highly on the</p><p>procedural discourse end.</p><p>The final dimension, labeled production of possibility, is based on</p><p>the use of modals (possibility, prediction), stance (that-complement claus-</p><p>es controlled by adjectives, to-complement clauses controlled by adjec-</p><p>tives), infinitives, and verbs in general. Unlike the previous dimensions, the</p><p>disciplines are not evenly split between the humanities and the remaining</p><p>sciences. The disciplines most marked by this dimension include human</p><p>sciences (e.g., philosophy, linguistics), life sciences (nursing, psycholo-</p><p>gy), and education; the least marked include the humanities (history and</p><p>classical studies), natural sciences (physics), and engineering (mechanical</p><p>engineering).</p><p>As the results of this study indicate, the language used in disci-</p><p>pline-specific writing differs sharply, mainly between the humanities and</p><p>the remaining disciplines. In the humanities, authors prefer language that</p><p>43</p><p>is more involved, narrative, opinionated, and situation dependent; in all</p><p>the remaining disciplines, authors tend to use language that is more in-</p><p>formational, less opinionated, and procedural. Yet this divide between the</p><p>humanities and non-humanities does not apply to the expression of stating</p><p>possibilities and arguments, where the distinction is much more blurred as</p><p>each specific discipline has a different attachment to this type of discourse.</p><p>Multi-dimensional analysis has been applied to the description of</p><p>academic English mostly from a grammatical perspective, as the studies</p><p>discussed thus far have demonstrated. However, multi-dimensional anal-</p><p>ysis can provide detailed descriptions of academic language from a lexi-</p><p>cal perspective as well, thereby shedding light on how academic language</p><p>is patterned for such aspects as collocations (Zuppardi, 2020; Zuppardi &</p><p>Berber Sardinha, 2020) and discourse (Berber Sardinha, 2021). We next</p><p>review Zuppardi and Berber Sardinha’s (2020) study, which provides a</p><p>unique view on how collocations cluster in academic writing that can help</p><p>EAP educators as they prepare their students to handle the large number of</p><p>collocations needed to master academic English.</p><p>Zuppardi and Berber Sardinha (2020) used a novel form of multi-di-</p><p>mensional analysis based on collocations (Berber Sardinha, 2017; Zuppardi,</p><p>2020) to analyze a large corpus of academic writing comprising articles and</p><p>textbooks from seven disciplines: behavioral and cognitive sciences, social</p><p>and economic sciences, anthropology, political science, psychology, and</p><p>economics.</p><p>The first dimension corresponds to a distinction between colloca-</p><p>tions referring to human nature, culture, and research methods and col-</p><p>locations related to economics. Collocations in the first group encompass</p><p>a large number of nominal, adjectival, and verbal collocations formed</p><p>around nodes such as literature (e.g., literature review), culture (common</p><p>culture), behavior (human behavior), human (human tendency), develop-</p><p>mental (developmental basis), genetic (genetic variation), highlight (highlight</p><p>the importance), review (review the evidence), and live (live alone). In con-</p><p>trast, the economics collocations include collocations around nodes such</p><p>as saving (national saving), currency (foreign currency), corporation (large</p><p>corporation), fiscal (fiscal policy), extra (extra revenue), nominal (nominal</p><p>44</p><p>rate), finance (finance and investment), purchase (purchase bond), and bor-</p><p>row (borrowing constraints).The second dimension, which refers to human</p><p>evolution and society, includes collocations around noun nodes such as</p><p>species (separate species), ape (ape behavior), and anthropologist (cultur-</p><p>al anthropologist); adjective nodes like ancient (ancient remains), African</p><p>(African populations), and evolutionary (evolutionary change); and verb</p><p>nodes such as date (date fossils), remember (remember a discussion), and</p><p>gather (gather data).</p><p>The third dimension, interpreted as business and finance, encom-</p><p>passes collocations around nouns like dollar (dollar cost), bank (bank ac-</p><p>count), and interest (interest payments); adjectives like net (net worth), an-</p><p>nual (annual income), and marginal (marginal cost); and verbs like sell (sell</p><p>products), pay (pay dividend), and raise (raise funds).</p><p>The final dimension, referring to statistical vocabulary, includes col-</p><p>locations with</p><p>the following nodes: nouns like error (error variance), cor-</p><p>relation (correlation coefficient), and population (population parameter);</p><p>adjectives such as linear (linear model), estimated (estimated effect), and ex-</p><p>planatory (explanatory variable); and verbs like compute (compute average)</p><p>and estimate (estimate model).</p><p>The dimensions provide a network-like outlook on collocations, un-</p><p>like the literature in general, which tends to see collocations individually</p><p>or in small sets. The study demonstrated that collocations are shared sys-</p><p>tematically across texts. Therefore, a skilled academic writer requires being</p><p>able to select the most appropriate collocations for the particular topics ad-</p><p>dressed in the article or textbook. Similarly, the fact that words tend to ap-</p><p>pear in predictable combinations has consequences for readers as well, as a</p><p>proficient reader is able to anticipate these collocations in the text. Overall,</p><p>this study shows that, for the most part, the bulk of the collocations in aca-</p><p>demic writing is not a set of specialized technical expressions; rather, most</p><p>collocations can be frequently found in non-academic domains.</p><p>Biber (2006) presented a multi-dimensional analysis of the TOEFL</p><p>2000 Spoken and Written Academic Language Corpus (T2K-SWAL), which</p><p>consists of spoken and written registers with which students in American</p><p>universities need to engage as part of campus life. The first dimension</p><p>45</p><p>includes two poles: one corresponding to orality and the other to literacy.</p><p>The pole corresponding to orality is comprised of linguistic features usu-</p><p>ally associated with informal spoken language, such as contractions, first-/</p><p>second-/third-person pronouns, stranded prepositions, that-omission,</p><p>discourse particles, and demonstrative and indefinite pronouns. In addi-</p><p>tion, this pole includes linguistic features that reflect a non-technical use</p><p>of language, such as common and relatively common adverbs, verbs in the</p><p>present tense, lexical bundles initiated by pronouns, verbs, and wh-pro-</p><p>nouns, all of which reflect the interactive tendency of the dimension. The</p><p>highest scoring academic registers in this pole include office hours, study</p><p>groups, classroom management, and classroom teaching. In these registers,</p><p>the face-to-face interactions between teachers and students are enabled by</p><p>these linguistic features, which in turn allow for the desired level of infor-</p><p>mality and interaction in North American university settings.</p><p>In the negative pole, the predominant linguistic features are related</p><p>to the use of specialized nouns, such as abstract nouns, human nouns, and</p><p>group nouns, as well as to-clauses controlled by stance nouns or adjectives.</p><p>The lexical bundles also reflect this nominal orientation of the dimension,</p><p>including lexical bundles initiated by prepositions. This dimension pole</p><p>also includes passive structures, formed with by-passive and by-less-pas-</p><p>sive voice structures, and adjectives in an attributive position. All these</p><p>features—in addition to others not mentioned here—generally refer to</p><p>nominal structures common in specialized literate language. The academic</p><p>registers that scored highest on this pole are textbooks and course packs,</p><p>which make consistent use of the features present in this dimension pole.</p><p>Like the first dimension, the second dimension also includes two</p><p>poles: one corresponding to procedural discourse and the other to con-</p><p>tent-focused discourse. Procedural discourse is marked mainly by modals</p><p>(present and future), common verbs of activity and causative verbs,</p><p>to-clauses controlled by verbs, and conditional adverbial clauses. Content-</p><p>focused discourse, on the other hand, is principally marked by specialized</p><p>vocabulary, such as rare nouns, rare adjectives, rare verbs, and special-</p><p>ized adjectives. This dimension basically distinguishes between spoken</p><p>and written registers, with few exceptions. The pole corresponding to</p><p>46</p><p>procedural discourse includes spoken registers such as classroom manage-</p><p>ment, office hours, and classroom teaching whereas the pole correspond-</p><p>ing to content-based discourse comprises registers such as textbooks and</p><p>course packs.</p><p>The third dimension refers to a reconstructed account of events, dis-</p><p>tinguishing between language used to report past events (in the positive</p><p>pole) and to convey concrete information (negative pole). The positive pole</p><p>is essentially composed of non-specialized vocabulary (common nouns:</p><p>human and mental, common verbs of communication, and common men-</p><p>tal verbs), plus a range of that-clauses controlled by communication verbs,</p><p>likelihood verbs, and stance nouns as well as that-omission and past tense</p><p>verbs. This dimension distinguishes between written and spoken registers,</p><p>with spoken registers (such as study groups, office hours, lab) occurring</p><p>mainly in the positive pole and written registers occurring mainly in the</p><p>negative pole.</p><p>The last dimension refers to teacher-centered stance, which relies on</p><p>adverbial linguistic features such as attitudinal, different adverbial features</p><p>(certainty and likelihood), conditional adverbial clauses, and that-clauses</p><p>controlled by stance nouns. Unlike the other dimensions, it does not neat-</p><p>ly distinguish between written and spoken registers. In the positive pole,</p><p>the most prominent academic registers are classroom teaching and office</p><p>hours; in the negative pole, they are study groups and institutional writing.</p><p>Conclusion</p><p>In this chapter, we presented corpus-based studies and their contri-</p><p>butions to EAP. First, we discussed the advances in vocabulary studies as</p><p>the area moved from lists of individual words to phraseological patterns</p><p>analysis. Second, grammatical complexity research was considered, show-</p><p>ing how CL can point out novel ways of observing linguistic phenomena.</p><p>Finally, we presented multi-dimensional analysis studies and the insights</p><p>they have provided into the understanding of lexical-grammatical patterns</p><p>in academic registers. EAP education can include learning about the regis-</p><p>ters that students are likely to find in universities, beyond the usual registers</p><p>47</p><p>from academia, such as academic articles and dissertations. Corpus lin-</p><p>guistics has been an integral part of EAP education, and the continued</p><p>application of corpus-based language analysis promises to further enrich</p><p>EAP programs.</p><p>References</p><p>Ackermann, K. & Chen, Y-H. (2013). Developing the Academic Collocation List</p><p>(ACL): A corpus-driven and expert-judged approach. Journal of English for Academic</p><p>Purposes, 12, 235–247. https://doi.org/10.1016/j.jeap.2013.08.002</p><p>Almela, A., Cantos Gómez, P. & Berber Sardinha, T. (2022). Métodos multidimensio-</p><p>nales basados en corpus del español. In G. Parodi, P. Cantos Gómez, & L. Howe (Eds.),</p><p>The Routledge Handbook of Spanish Corpus Linguistics (pp. 545-557). Routledge.</p><p>Altenberg, B. & Tapper, M. (1998). The use of adverbial connectors in advanced Swed-</p><p>ish learners’ written English. In: Granger, S. (Ed.). Learner English on computer. Lon-</p><p>don: Pearson Education, pp. 80-93.</p><p>Almeida, V., Orfanó, B. & Dutra D. (2022). Is there a better choice? Verb-noun combi-</p><p>nations in academic writing. In: V. Viana (Ed.). Teaching English with Corpora: A Re-</p><p>source Book (pp. 228-231).Abingdon: Routledge. http://dx.doi.org/10.4324/ b22833-47</p><p>Alves, J. C. (2022). Grammatical complexity in a learner corpus: assessing students’</p><p>development through a longitudinal study. Master’s Thesis, Universidade Federal de</p><p>Minas Gerais, Brazil.</p><p>Ang, L. H., Tan, K. H. & He, M. (2017). A Corpus-based Collocational Analysis of</p><p>Noun Premodification Types in Academic Writing. The Southeast Asian Journal of En-</p><p>glish Language Studies, 23(1), 115–131. DOI: 10.17576/3L-2017-2301-09</p><p>Ansarifar, A., Shahriari, H. & Pishghadam, R (2018). Phrasal complexity in academic</p><p>writing: A comparison of abstracts written by graduate students and expert writers</p><p>in applied linguistics. Journal of English for Academic Purposes, 31, 58-71. https://doi.</p><p>org/10.1016/j.jeap.2017.12.008</p><p>Bardovi-Harlig, K. (1992). A second look at T-unit analysis: Reconsidering the sen-</p><p>tence. TESOL Quarterly, 26, 390–395. doi:10.2307/3587016.</p><p>https://doi.org/10.1016/j.jeap.2013.08.002</p><p>https://doi.org/10.1016/j.jeap.2017.12.008</p><p>https://doi.org/10.1016/j.jeap.2017.12.008</p><p>48</p><p>Berber Sardinha, T. (2000). Análise Multidimensional. DELTA, 16(1), 99-127.</p><p>Berber Sardinha, T. (2017). Lexical priming and register variation. In M. Pace-Sigge</p><p>& K. Patterson (Eds.), Lexical Priming: Applications and Advances. Amsterdam: John</p><p>Benjamins. (pp. 190-230). https://doi.org/10.1075/scl.79.08ber</p><p>Berber Sardinha, T. (2021). Discourse of academia from a multi-dimensional perspec-</p><p>tive. In E. Friginal & J. Hardy (Eds.), The Routledge Handbook of Corpus Approaches to</p><p>Discourse Analysis (pp. 298-318). Abingdon: Routledge.</p><p>Berber Sardinha, T. & Veirano Pinto, M. (Eds.). (2014). Multi-Dimensional Analysis,</p><p>25 years on: A Tribute to Douglas Biber. John Benjamins.</p><p>Berber Sardinha, T. & Veirano Pinto, M. (Eds.). (2019). Multi-Dimensional Analysis:</p><p>Research Methods and Current Issues. New York: Bloomsbury.</p><p>Berber Sardinha, T. & Shimazumi, M. (2021). Variation in learner writing in English: A</p><p>multi-dimensional analysis of the new ICLE v.3. [Paper presentation]. XV Encontro de</p><p>Linguística de Corpus (ELC). Online.</p><p>Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge Univer-</p><p>sity Press.</p><p>Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Comput-</p><p>ing, 8, 243–257.</p><p>Biber, D. (2006). University Language: A corpus-based study of spoken and written reg-</p><p>isters. Amsterdam/Philadelphia, PA: John Benjamins.</p><p>Biber, D. (2009). A corpus-driven approach to formulaic language in English: multi-</p><p>word patterns in speech and writing. International Journal of Corpus Linguistics, 14(3),</p><p>275-311.</p><p>Biber, D., Conrad, S. & Reppen, R. (1998). Corpus linguistics: Investigating language</p><p>structure and use. Cambridge: Cambridge University Press.</p><p>Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. (1999). Longman grammar</p><p>of spoken and written English. Longman.</p><p>Biber, D., Conrad, S. & Cortes, V. (2004). If you look at.: lexical bundles in university</p><p>teaching and textbooks. Applied Linguistics, 25(3), 371–405.</p><p>Biber, D. & Conrad, S. (2009). Register, genre and style. Cambridge. Cambridge.</p><p>https://doi.org/10.1075/scl.79.08ber</p><p>49</p><p>Biber, D. & Gray, B. (2010). Challenging stereotypes about academic writing: complex-</p><p>ity, elaboration, explicitness. Journal of English for Academic Purposes, 9(1), 2–20. doi:</p><p>10.1016/J.JEAP.2010.01.001</p><p>Biber, D., Gray, B. & Poonpon, K. (2011). Should we use characteristics of conversation</p><p>to measure grammatical complexity in L2 writing development? TESOL Quarterly,</p><p>45(1), 5-35. https://doi.org/10.5054/tq.2011.244483</p><p>Biber, D. & Gray, B. (2016). Grammatical complexity in academic English: Linguistic</p><p>change in writing. Cambridge: Cambridge University Press.</p><p>Biber, D. & Gray, B., Staples, S. (2016). Contrasting the Grammatical Complexities of</p><p>Conversation and Academic Writing: Implications for EAP Writing Development and</p><p>Teaching. Language in Focus Journal, 2(1), 1-18. DOI: 10.1515/lifijsal-2016-0001</p><p>Biber, D., Reppen, R., Staples, S. & Egbert, J. (2020). Exploring the longitudinal devel-</p><p>opment of grammatical complexity in the disciplinary writing of L2-English university</p><p>students. International Journal of Learner Corpus Research, 6(1), 38-71, 2020. https://</p><p>doi.org/10.1075/ijlcr.18007.bib</p><p>Bocorny, A. E. P. & Welp, A. (2021). Desenho de tarefas pedagógicas para o ensino de</p><p>Inglês para Fins Acadêmicos: conquistas e desafios da Linguística de Corpus. Revista</p><p>Estudos da Linguagem, 29(2), 1529-1638. DOI: 10.17851/2237-2083.29.2.1529-1638</p><p>Campion, M. & Elley, W. (1971). An Academic Word List. Wellington New Zealand</p><p>Council for Educational Research.</p><p>Carter, R. & McCarthy, M. (2006). Cambridge grammar of English A comprehensive</p><p>guide to spoken and written English usage. Cambridge: Cambridge University Press.</p><p>Conrad, S. & Biber, D. (Eds.). (2001). Variation in English: Multi-Dimensional Studies.</p><p>London: Longman.</p><p>Cortes, V. (2013). The purpose of this study is to: Connecting lexical bundles and moves</p><p>in research article introductions. Journal of English for Academic Purposes,12(1), 33-</p><p>43. https://doi.org/10.1016/j.jeap.2012.11.002.</p><p>Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213–238.</p><p>Crosthwaite, P., Luciana & Wijaya, D. (2021). Exploring language teachers’ lesson</p><p>planning for corpus-based language teaching: a focus on developing TPACK for cor-</p><p>pora and DDL, Computer Assisted Language Learning. (pp. 1-29). https://doi.org/10.1</p><p>080/09588221.2021.1995001</p><p>https://doi.org/10.5054/tq.2011.244483</p><p>https://doi.org/10.1075/ijlcr.18007.bib</p><p>https://doi.org/10.1075/ijlcr.18007.bib</p><p>https://doi.org/10.1016/j.jeap.2012.11.002</p><p>https://doi.org/10.1080/09588221.2021.1995001</p><p>https://doi.org/10.1080/09588221.2021.1995001</p><p>50</p><p>Davies, M. (2008). The Corpus of Contemporary American English (COCA): 600 million</p><p>words, 1990-present. Available online at https://www.english-corpora.org/coca/.</p><p>Delegá-Lucio, D. (2013). A variação entre textos argumentativos e o material didático</p><p>de inglês: Aplicações da análise multidimensional e do Corpus Internacional de Apren-</p><p>dizes de Inglês (ICLE). Doctoral dissertation. Pontifícia Universidade Católica de São</p><p>Paulo, Brazil.</p><p>Dutra, D. P., Orfanò, B. M., Guedes, A. S, Alves, J. C. & Fekete, J. G. (2022). The learner</p><p>corpus path: a worthwhile methodological challenge. DELTA, 38(2), 1-24. https://doi.</p><p>org/10.1590/1678-460X202238249731</p><p>Dutra, D. & Berber Sardinha, T. (2021). A multi-dimensional typology of English re-</p><p>search article sections. American Association for Applied Linguistics Conference</p><p>(AAAL). Online.</p><p>Dutra, D. P.; Queiroz, J. M.; Macedo, L. D.; Costa, D.& Mattos, E. (2020). Adjective</p><p>as nominal premodifiers in Chemistry and Applied Linguistics Corpora. In: Römer,</p><p>U.; Cortes, V. & Friginal, E. (Eds.). Advances in Corpus-based Research on Academic</p><p>Writing Effects of discipline, register, and writer expertise. Amsterdam: John Benjamins</p><p>Publishing Company. (pp. 205-226) Amsterdam: John Benjamins Publishing Compa-</p><p>ny. https://doi.org/10.1075/scl.95.09dut.</p><p>Dutra, D. P., Orfanó, B. M. & Almeida, V. C. (2019). Result linking adverbials in</p><p>learner corpora. Domínios de Lingu@gem, 13(1), 400-431. https://doi.org/10.14393/</p><p>DL37-v13n1a2019-17</p><p>Dutra, D. P. & Berber Sardinha, T. (2018). A linguistic typology of sections in research</p><p>articles: a Multi-Dimensional perspective. [Paper presentation] AZCL Conference,</p><p>Northern Arizona University, Flagstaff, AZ., USA.</p><p>Dutra, D. P.; Queiroz, J. & Alves, J. C. (2017). Adding information in argumentative</p><p>tests: a learners corpus-based study of additive linking adverbials. Estudos Anglo</p><p>Americanos, 46(1), 9-32.</p><p>Egbert, J. & Staples, S. (2019). Doing Multi-Dimensional Analysis in SPSS, SAS, and R.</p><p>In T. Berber Sardinha & M. Veirano Pinto (Eds.), Multi-dimensional analysis: Research</p><p>methods and current issues (pp. 125-144). New York: Bloomsbury.</p><p>Ellis, N. (2008). Phraseology: the periphery and the heart of language. In F. Meunier, F.</p><p>& S. Granger (Eds.). Phraseology in Foreign Language Learning and Teaching. Amster-</p><p>dam & Philadelphia: Benjamins, 1-13.</p><p>https://www.english-corpora.org/coca/</p><p>https://doi.org/10.1590/1678-460X202238249731</p><p>https://doi.org/10.1590/1678-460X202238249731</p><p>https://doi.org/10.1075/scl.95.09dut</p><p>https://doi.org/10.1075/scl.95.09dut</p><p>https://doi.org/10.14393/DL37-v13n1a2019-17</p><p>https://doi.org/10.14393/DL37-v13n1a2019-17</p><p>51</p><p>Friginal, E. & Hardy, J. A. (2014). Conducting Multi-Dimensional analysis using SPSS.</p><p>In T. Berber Sardinha & M. Veirano Pinto (Eds.), Multi-Dimensional Analysis, 25</p><p>years on: A Tribute to Douglas Biber (pp. 298-316). Amsterdam & Philadelphia: John</p><p>Benjamins.</p><p>Firth, J. R. (1957). Papers in linguistics: 1934–1951. London, England: Oxford Univer-</p><p>sity Press.</p><p>Gardner, D. & Davies, D. (2014). A new academic vocabulary list. Applied Linguistics,</p><p>35(3), 305–327.https://doi.org/10.1093/applin/amt015</p><p>Ghadessy, P. (1979). Frequency counts, word lists, and materials preparation: a new</p><p>approach, English Teaching Forum 17, 24–7.</p><p>Granger, S., Larsson, T. (2021). Is core vocabulary a friend or foe of academic writing?</p><p>Single-word vs multi-word uses of thing Journal of English for Academic Purposes, 52</p><p>https://doi.org/10.1016/j.jeap.2021.100999.</p><p>Granger, S., Dupont, M., Meunier, F., Naets, H. & Paquot, M. (2020). The Internation-</p><p>al Corpus of Learner English, Version 3. Louvain-la-Neuve: Presses universitaires de</p><p>Louvain.</p><p>Granger, S. (1998). The computer learner corpus: a versatile new source of data for</p><p>SLA research. In S. Granger (Ed.), Learner English on Computer (pp. 3–18). Harlow:</p><p>Longman.</p><p>Gray, B. (2013). More than discipline: Uncovering multi-dimensional patterns of vari-</p><p>ation in academic research articles. Corpora, 8, 153-181.</p><p>Hardy, J. & Römer, U. (2013). Revealing disciplinary variation in student writing: A</p><p>multi-dimensional analysis of the Michigan Corpus of Upper-level Student Papers</p><p>(MICUSP). Corpora, 8, 183-207.</p><p>Hutter, Jo-Anne. (2015). A Corpus Based Analysis of Noun Modification in Empirical</p><p>Research Articles in Applied Linguistics. Master’s Thesis, Portland State University.</p><p>Hyland, K. (2008). “As can be seen: Lexical bundles and disciplinary variation”, English</p><p>for Specific Purposes 27, 4–21. doi:10.1016/j.esp.2007.06.001</p><p>Hyland, K. (2016). General and specific EAP. K. Hyland, K.; & P. Shaw, (Eds.). The</p><p>Routledge Handbook of English for academic purposes. New York: Routledge. (pp.</p><p>17-29).</p><p>https://doi.org/10.1093/applin/amt015</p><p>https://doi.org/10.1016/j.jeap.2021.100999</p><p>52</p><p>Hyland, K. & Jiang, F. (2021). Delivering relevance: The emergence of ESP as a dis-</p><p>cipline. Journal of English for Academic Purposes, 64, 13-25 https://doi.org/10.1016/j.</p><p>esp.2021.06.002</p><p>Johns, T. (1991). Should you be persuaded - two samples of data-driven learning mate-</p><p>rials. T. Johns, P. King, P. (eds) Classroom Concordancing. ELR Journal, 4, 1-16.</p><p>Lake, W. M. & Cortes, V. (2020). Lexical bundles as reflections of disciplinary norms</p><p>in Spanish and English literary criticism, history, and psychology research. In Romer,</p><p>U., Cortes, V. & Friginal, E. Advances in Corpus-based Research on Academic Writing</p><p>Effects of discipline, register, and writer expertise (pp 95-183). Amsterdam: John Benja-</p><p>mins Publishing Company.</p><p>Liu, C.-Y. & Chen, H.-J. H. (2020). Analyzing the functions of lexical bundles in un-</p><p>dergraduate academic lectures for pedagogical use. English for Specific Purposes, 58,</p><p>122-137 https://doi.org/10.1016/j.esp.2019.12.003</p><p>Lorentz, G. (1998). Overstatement in advanced learners’ writing: Stylistic aspects of</p><p>adjective intensification. In S. Granger (Ed.), Learner English on Computer (pp. 53–66).</p><p>Harlow: Longman.</p><p>Lynn, R. W. (1973). Preparing word lists: a suggested method. RELC Journal 4, 25–32.</p><p>Matte, M. L. & Sarmento, S. (2018). A corpus-based study of connectors in student</p><p>academic writing. English for Specific Purposes World, 20(55), 1-21.</p><p>McCarthy, M., McCarten, J., & Sandiford, H. (2014). Touchstone 1. Cambridge: Cam-</p><p>bridge University Press.</p><p>Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cam-</p><p>bridge University Press.</p><p>Neely, E. & Cortes, V. (2009). A little bit about: analyzing and teaching lexical bundles</p><p>in academic lectures. Language Value, 1(1) 17–38.</p><p>Nesselhauf, N. (2003). The use of collocations by advanced learners of English and</p><p>some implications for teaching. Applied Linguistics, 24(2), 223–242.</p><p>Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: John Benjamins.</p><p>https://doi.org/10.1016/j.esp.2019.12.003</p><p>53</p><p>Nesi, H. (2016). Corpus studies in EAP. K. Hyland, K.; & P. Shaw, (Eds.). The Routledge</p><p>Handbook of English for academic purposes (pp. 2016-217). New York: Routledge.</p><p>Nesi, H. & Basturkmen, H. (2006). “Lexical bundles and discourse signalling in aca-</p><p>demic lectures”. International Journal of Corpus Linguistics, 11(3), 283- 304.</p><p>Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 profi-</p><p>ciency: A research synthesis of college-level L2 writing. Applied Linguistics, 24, 492–</p><p>518. doi:10.1093/applin/24.4.492.</p><p>Paquot, M. (2008). Exemplification in learner writing: A cross-linguistic perspective.</p><p>In F. Meunier, F. & S. Granger (Eds.). Phraseology in Foreign Language Learning and</p><p>Teaching (pp. 101-119). Amsterdam & Philadelphia: Benjamins.</p><p>Parkinson, J. & Musgrave, J. (2014). Development of noun phrase complexity in the</p><p>writing of English for Academic Purposes students. Journal of English for Academic</p><p>Purposes, 14, 48-59. https://doi.org/10.1016/j.jeap.2013.12.001</p><p>Praninskas, J. (1972). American University Word List. Longman.</p><p>Queiroz, J. (2019). The grammatical complexity of English noun phrases in Brazilian</p><p>learners’ academic writing: a corpus-based study. MA thesis - Universidade Federal de</p><p>Minas Gerais, Belo Horizonte, Brazil.</p><p>Reppen, R. (2018). Teaching lexical bundles: Which ones and how? In E. Hinkel (Ed.).</p><p>Teaching essential units of language: Beyond single word vocabulary (pp. 186-200).</p><p>Routledge. https://doi.org/10.4324/9781351067737</p><p>Reppen, R. & Olson, S. B. (2020). Lexical bundles across disciplines. In U. Römer, V.</p><p>Cortes & E. Friginal. Advances in Corpus-based Research on Academic Writing: Effects</p><p>of discipline, register, and writer expertise (pp. 169-182). Amsterdam: John Benjamins.</p><p>Römer, U. (2010). Using general and specialized corpora in English language teaching:</p><p>past, present and future. M. C. Campoy-Cubillo, B. Belles-Fortuno, & M. L. Gea-Valor,</p><p>(Eds.). Corpus-Based Approaches to English Language Teaching (pp. 18-35). London:</p><p>Continuum.</p><p>Salager-Meyer, F., de Segura, G. M. L. & Ramos, R. C. G. (2016). EAP in Latin Amer-</p><p>ica. In K. Hyland, &P. Shaw, (Eds.), The Routledge Handbook of English for academic</p><p>purposes (pp. 109-124). New York: Routledge.</p><p>Sarmento, S.; Dutra, D. P.; Barbosa, M. V. & Moraes Filho, W. B. (2016 ) IsF e Interna-</p><p>cionalização: da teoria à prática. In S. Sarmento, D. M. de Abreu-e-Lima.; W. B. Moraes</p><p>https://doi.org/10.1016/j.jeap.2013.12.001</p><p>https://doi.org/10.1016/j.jeap.2013.12.001</p><p>https://doi.org/10.1016/j.jeap.2013.12.001</p><p>https://doi.org/10.4324/9781351067737</p><p>54</p><p>Filho. (Org.). Do Inglês sem Fronteiras ao Idiomas sem Fronteiras: a construção de uma</p><p>política linguística para a internacionalização (pp. 77-100). Belo Horizonte: Editora</p><p>UFMG.</p><p>Simpson-Vlach, R. & Ellis, N.C. (2010). An Academic Formulas List: New methods in</p><p>phraseology research. Applied Linguistics, 31(4), 487–512.</p><p>Sinclair, J. (1987). Collins COBUILD English language dictionary. London: Collins.</p><p>Sinclair, J. (1991). Corpus, concordance and collocation. Oxford: Oxford University</p><p>Press.</p><p>Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge:</p><p>Cambridge University Press.</p><p>Viana, V; O’Boyle, A. (2022). Corpus Linguistics for English for Academic Purposes</p><p>(Routledge Corpus Linguistics Guides) Abingdon: Taylor and Francis. Kindle Edition.</p><p>Welp, A., Didio, Á. & Finkler, B. (2019). Questões contemporâneas no cinema e na</p><p>literatura: o desenho de uma sequência didática para o ensino de inglês como língua</p><p>adicional. Brazilian English Language Teaching Journal, 10(2), 1-25, DOI: https:// doi.</p><p>org/10.15448/2178-3640.2019.2.3586</p><p>West, M. (1953). A general service list of English words. London: Longman, Green &</p><p>Co.</p><p>Xue, G. & Nation, P. (1984). A university word list. Language Learning and Communi-</p><p>cation 3, 215–29.</p><p>Zuppardi, M. C. (2020). Collocation dimensions in academic English. PhD dissertation.</p><p>Pontifícia Universidade Católica de São Paulo, São Paulo.</p><p>Zuppardi, M. C. & Berber Sardinha, T. (2020). A multi-dimensional view of collo-</p><p>cations in academic writing. U. Römer, V. Cortes, & E. Friginal, (Eds.), Advances in</p><p>Corpus-based Research on Academic Writing.</p><p>Effects of Discipline, Register, and Writ-</p><p>er Expertise (pp. 334–353). Amsterdam/Philadelphia: John Benjamins. https://doi.</p><p>org/10.1075/scl.95.14zup</p><p>Zuppardi, M. C., Veirano Pinto, M. & Berber Sardinha, T. (in prep.). Multi-Dimen-</p><p>sional Analysis. In C. Chapelle (Ed.), The Encyclopedia of Applied Linguistics (2nd ed.).</p><p>Hoboken, NJ: Wiley.</p><p>https://doi.org/10.1075/scl.95.14zup</p><p>https://doi.org/10.1075/scl.95.14zup</p><p>55</p><p>From specialized corpus to the EAP classroom:</p><p>integrating authentic data into materials design</p><p>Ana Eliza Pereira Bocorny (UFRGS)</p><p>Ana Luiza Freitas (UFCSPA)</p><p>Rozane Rodrigues Rebechi (UFRGS)</p><p>Introduction</p><p>Almost two decades ago, Sinclair (2004a) anticipated that cor-</p><p>pus-based language teaching would revolutionize language pedagogy. After</p><p>all, relying on empirical evidence enables the design of pedagogical appli-</p><p>cations based on authentic input, providing teachers and researchers with</p><p>an actual perspective of how language works. Today, the positive impact</p><p>of corpus-based approaches to additional language learning and teach-</p><p>ing is undeniable (Boulton & Cobb, 2017; Boulton, 2021; Karlsen, 2021;</p><p>Anthony, 2022a; O’Keeffe, 2022).</p><p>Despite the importance of corpus linguistics as a means of identi-</p><p>fying authentic language use and the fact that many studies (Flowerdew,</p><p>2009, 2013, 2014; Gray et al., 2020; Charles & Frankenberg-Garcia, 2021)</p><p>suggest integrating corpus data into English for Academic Purposes1 (EAP)</p><p>pedagogy, the use of authentic data in language classrooms around the</p><p>world is still incipient (Kavanagh, 2021; Poole, 2020; Pérez-Paredes, 2019).</p><p>Moreover, according to Römer (2006: 122), “there is still a strong resistance</p><p>towards corpora from the side of students, teachers, and materials writers.”</p><p>1 The term English for Academic Purposes (EAP) refers to the English which is</p><p>needed to study or conduct research in the academic context. Although it is often</p><p>associated with non-native speakers of the language, EAP has extended also to native</p><p>speakers who are faced with writing essays, presenting papers, reading articles, etc.</p><p>(Charles, 2013).</p><p>56</p><p>Previous studies have suggested that “lack of time, group sizes, and</p><p>technological obstacles” (Kavanagh, 2021: 2) could be standing in the way</p><p>between corpus data and the language classroom. Poole (2020: 1) reports</p><p>that although teachers embrace the use of corpus, they also reveal “emer-</p><p>gent tensions regarding the use of ready-made corpus activities and the</p><p>key affordances of discovery, authenticity, and autonomy often forward-</p><p>ed in support of corpus pedagogy.” Breyer (2011: 207) claims that the lack</p><p>of “(classroom) user-friendly concordancing software” was mentioned by</p><p>teachers as one of the hurdles to the smooth adoption of corpora as lan-</p><p>guage learning input. Other reasons identified by Mukherjee (2004: 243)</p><p>had to do with the fact that not enough teachers were acquainted with “the</p><p>basic foundations, implications, and applications of Corpus Linguistics.”</p><p>Ranging from the context of graduate and undergraduate students</p><p>from the Federal University of Rio Grande do Sul (UFRGS), this contri-</p><p>bution arose from the needs of Brazilian pre-service and in-service EAP</p><p>novice teachers when designing EAP writing course materials with corpus</p><p>data at the Center of Languages for Academic Purposes (CLA)2. After be-</p><p>ing introduced to corpus linguistics principles and methods, these novice</p><p>teachers were asked to design a Pedagogical Unit (PU), i.e., a set of learn-</p><p>ing activities sequenced together to promote advances in learning, for a</p><p>given EAP course where selected language features would be taught with-</p><p>in the context of a given academic genre. Those teachers were then asked</p><p>to extract and analyze said language data and integrate it into their EAP</p><p>materials.</p><p>Having this said, the aim of this chapter is twofold: (i) help EAP</p><p>teachers better understand corpus linguistics methods for the extraction</p><p>of language data from specialized corpora and (ii) show how said language</p><p>data can be used in the design of EAP writing course materials through a</p><p>pedagogical model that combines corpus and genre-based approaches.</p><p>The first section – ‘Combining corpus and genre-based approach-</p><p>es’ - reviews the literature on corpus and genre-based approaches to lan-</p><p>guage learning and teaching and on pedagogical models that combine</p><p>2 CLA website: https://www.ufrgs.br/cla/</p><p>57</p><p>both approaches. Section 2 – ‘The design of EAP materials’ - describes the</p><p>framework suggested in the study for designing EAP materials and pres-</p><p>ents a step-by-step guide on extracting and integrating corpus data into</p><p>materials used for EAP writing courses. Finally, we finish the chapter with</p><p>some final considerations and suggestions for further studies.</p><p>Combining corpus and genre-based approaches</p><p>Corpus Linguistics</p><p>According to Sinclair (1991: 171), “a corpus corresponds to a collec-</p><p>tion of natural texts chosen to characterize a state or variety of language”. For</p><p>Biber and Conrad (1999: 4), the notion of corpus is naturally approached</p><p>from the perspective of register: “a collection of spoken or written texts,</p><p>organized by the register and codified for other discursive considerations,</p><p>comprises a corpus.” McEnery and Hardie (2012: 1) define corpus linguis-</p><p>tics as “an area which focuses upon a set of procedures, or methods, for</p><p>studying language.” As such, it can be applied to different areas.</p><p>Two central concepts are pillars of the field: the empiricist approach</p><p>and the view of language as a probabilistic system. The empiricist system</p><p>is based on the fact that knowledge originates from data organized in the</p><p>form of a corpus. The view of language as a probabilistic system stems from</p><p>the epistemological basis of the field, according to which linguistic traits do</p><p>not happen randomly. Nevertheless, it is possible to point out and quantify</p><p>patterns of regularity, highlighting a correlation between such traits and the</p><p>situational contexts of use. From these patterns, it can be recognized that</p><p>a language is not limited to empty spaces arbitrarily filled. Instead, the lin-</p><p>guistic environment acts on the co-selection of lexical items. Within a lin-</p><p>guistic environment, a given item prefers another one. This way, language</p><p>is seen as a non-arbitrarily motivated and functional system of potential</p><p>choices. These aspects refer to the issue of usage patterns and, therefore, to</p><p>the idiomatic principle postulated by Sinclair (1991).</p><p>Let us take an example from the corpus used to extract linguistic data</p><p>in this text. ‘The aim of this study’ is a sequence whose continuity is limited</p><p>58</p><p>by a word within the verb category ‘be’ followed by the preposition ‘to’,</p><p>confirming a preference of academic textual genres/records (Hyland, 2008;</p><p>Biber & Conrad, 1999) for a greater incidence of this association of words.</p><p>Thus, the phrase above is expected to precede ‘is not’ or ‘was to’.</p><p>Although the literature proposes many definitions for what consti-</p><p>tutes a corpus (such as Atkins et al., 1992; Francis, 1992; Kennedy, 1998;</p><p>McEnery et al., 2006), the consensus is that it should comprise:</p><p>1. Authentic Linguistic Data;</p><p>2. Readable Computer Segments;</p><p>3. Specially Organized Language Portions;</p><p>4. Texts Capable of Representing a Particular Language or Variety of</p><p>Language.</p><p>For this chapter, a corpus is roughly understood as a set of ma-</p><p>chine-readable texts compiled with the aim to provide answers to specific</p><p>research questions (McEnery & Hardie, 2012). To achieve these goals, a</p><p>corpus should be built under well-defined criteria.</p><p>Corpus-Based Pedagogy</p><p>Since John Sinclair’s seminal work on corpus research led to the use</p><p>of corpus-based approaches (Sinclair, 1987, 1991, 2004b), corpus linguis-</p><p>tics has always been connected with language teaching. Contributions such</p><p>as Gavioli (2005), O’Keeffe et al. (2007), Aijmer (2009), Flowerdew (2012),</p><p>and Cotos (2014), among others, all followed the principles of adopting</p><p>empirical data to boost</p><p>language learning. Hence corpus-based pedagogy</p><p>is the application of corpus linguistics’s foundations to facilitate the teach-</p><p>ing and learning of additional languages springing from authentic occur-</p><p>rences of language.</p><p>Among the advantages of adopting corpora for language teaching</p><p>are the possibilities of explaining the differences in the uses of words and</p><p>linguistic forms, among other traits, based on the probability of occurrence</p><p>in specific contexts (Biber et al., 1998), as intuition alone could not explain</p><p>these facts (Sinclair, 1991). As pointed out by Shepherd (2009: 152), the</p><p>analytical enterprise “cannot depend on the researcher’s intuitions, since</p><p>59</p><p>human beings tend to recognize what is not typical more often than what</p><p>is standardized”. Corpora, therefore, are used to generate empirical knowl-</p><p>edge about languages. Besides, using corpora for pedagogical purposes can</p><p>disclose solutions to language queries that have not been dealt with other-</p><p>wise. Furthermore, the use of corpora can highlight frequency patterns of</p><p>words and language structures, and such patterns can be used to teach and</p><p>create or improve teaching materials.</p><p>The most common tools used in corpus analysis for pedagogical</p><p>purposes are concordancing programs, understood as text search engines</p><p>with sorting functions, as will be demonstrated in the ‘Step-by-step guide’</p><p>to ‘The design of EAP materials’ below. Currently, among the most pop-</p><p>ular concordancing programs are WordSmith Tools (Scott, 2020), Sketch</p><p>Engine (Kilgarriff et al., 2004), and AntConc 4.1 (Anthony, 2022b). As they</p><p>are queried, these tools enable users to get in contact with “a collection of</p><p>the occurrences of a word-form, each in its textual environment” (Sinclair,</p><p>1991: 32).</p><p>By using corpora for teaching purposes, users are empowered, as</p><p>this approach holds the potential to foster autonomous and personalized</p><p>learning (Boulton & Cobb, 2017; McEnery & Wilson, 1997). That happens</p><p>because, on the one hand, the adoption of corpora encourages discoveries.</p><p>Corpora can be employed, for example, to have students explore patterns</p><p>of specific language features that stand out from the concordance lines.</p><p>On the other hand, exploring language corpora by employing software en-</p><p>ables learners within the same class to focus on different language features.</p><p>Furthermore, corpus-based pedagogy can lead learners themselves to draw</p><p>conclusions about language use and its principles.</p><p>Data Driven Learning (DDL)</p><p>As Boulton (2021: 9) affirms, “Data-driven learning (DDL) typical-</p><p>ly involves language learners consulting corpus data, either directly or via</p><p>prepared materials, to answer questions about language.” Some alleged ben-</p><p>efits of using DDL are that it stimulates learners’ autonomy and increases</p><p>language awareness (Boulton, 2007). As for teachers, the use of DDL allows</p><p>60</p><p>for a change of roles from a lecturer to “a co-ordinator of student-initiated</p><p>research” (Johns, 1991: 3). Nevertheless, the change of roles mentioned by</p><p>Johns (1991) does not come without challenges, such as learning how to</p><p>compile and extract language data from a corpus or how to include the</p><p>language data extracted into the materials designed for EAP courses in</p><p>a meaningful and contextualized way. Besides, employing DDL implies</p><p>choosing which approach to be used, whether direct DDL, through hands-</p><p>on activities (where you teach your learners how to look for information</p><p>in the corpus) or indirect DDL, an approach through which you (teacher)</p><p>previously extract the language data yourself and include them into peda-</p><p>gogical units.</p><p>Corpus processing systems like Sketch Engine (Kilgarriff et al.,</p><p>2004), WordSmith Tools (Scott, 2020), AntConc (Anthony, 2022b), and</p><p>#LancsBox v6 (Brezina et al., 2020) can be of great help. They usually offer</p><p>varied resources to extract language features, such as lists of words, key-</p><p>words, and n-grams. In Sketch Engine (SE), it is also possible to use Corpus</p><p>Query Language (CQL) to create special search syntaxes or queries to look</p><p>for more complex grammatical and lexical patterns (see ‘Description of the</p><p>EAP writing course’, Table 5, for examples of language features and ways</p><p>to retrieve them from the corpus using CQL queries). The smart search</p><p>option available in #LancsBox v6 (henceforth, LancsBox) software package</p><p>is another option for extracting more complex language patterns. Pérez-</p><p>Llantada (2022), for example, uses the LancsBox smart search option to</p><p>retrieve passive voice forms from four corpora.</p><p>To cater to the challenges mentioned above, in this chapter we pro-</p><p>vide EAP teachers with a step-by-step guide on retrieving and integrating</p><p>corpus data into materials designed for EAP writing courses through indi-</p><p>rect DDL. At this moment, we chose to focus on indirect DDL because we</p><p>considered its simplicity an asset to encourage novice EAP teachers in their</p><p>pursuits of work with corpus-based pedagogy.</p><p>61</p><p>Genre, Genre-Analysis, Move-Analysis and Genre-Based Pedagogy</p><p>Bhatia (1993: 13) defines genre as “a recognizable communicative</p><p>event characterized by a set of communicative purpose(s) identified and</p><p>mutually understood by the members of the professional or academic com-</p><p>munity in which it regularly occurs”. For Swales (1990, 1994), these char-</p><p>acteristics are organized from models that shape the structure of the text</p><p>and guide specialists of the discursive communities in terms of content and</p><p>style choices. While guiding members, these models are, at the same time,</p><p>delimited by their motivations regarding the schematic formatting of the</p><p>manuscript.</p><p>When Swales (1990) introduced criteria for defining the academic</p><p>genre, he also established an organizational description of the conventions</p><p>for introducing academic articles, which would become widespread. The</p><p>structure, known as the Create a Research Space (CARS) model, comprises</p><p>the description of the segments3 that perform specific functions in the text,</p><p>called rhetorical moves.</p><p>Next, we present the CARS model, as adapted from Swales (1990:</p><p>141), set into three moves that cover specific steps:</p><p>1. Move 1 – Establish the Territory</p><p>Step 1: Establish the importance of research and/or</p><p>Step 2: Make generalizations about the topic and/or</p><p>Step 3: Review the literature</p><p>2. Move 2 – Establish the Niche</p><p>Step 1a: Counterargue or</p><p>Step 1b: Indicate gap(s) in already established knowledge or</p><p>Step 1c: Raise questions or</p><p>Step 1d: Continue the tradition</p><p>3 Various labels have been used to refer to the information units observed from this</p><p>format: moves and steps (Swales, 1990), moves and sub-moves (Santos, 1999), moves</p><p>and subfunctions (Motta-Roth, 1995), moves and strategies (Araújo, 1999) and rhe-</p><p>torical units (Meurer, 1997).</p><p>62</p><p>3. Move 3 – Occupy the Niche</p><p>Step 1a: Outline the goals or</p><p>Step 1b: Submit the survey or</p><p>Step 2: Present the main results or</p><p>Step 3: Indicate the structure of the article.</p><p>The models for the rhetorical structure of genres are not prescriptions</p><p>but classifications for didactic purposes. Therefore, as mentioned above,</p><p>they are subject to variations that derive from the characteristics of the</p><p>different research areas. According to Biber and Conrad (2009), academic</p><p>texts do not encompass universal characteristics, but may vary situation-</p><p>ally, given their publication conditions. However, the traits we recognize</p><p>as the most constant show us what is most relevant and conventional to</p><p>the user’s discursive community in question. Likewise, such traits indicate</p><p>what should be prioritized, as this investigation aims to highlight.</p><p>Genre pedagogy, genre-based pedagogy, and genre-based approach</p><p>are some of the names given to the framework comprised of a set of as-</p><p>sumptions, strategies, and practices for EAP teaching and learning that</p><p>have as a premise the need to communicate a message to a particular audi-</p><p>ence in an appropriate way using discourse genres (for example, research</p><p>papers, webinars, abstracts).</p><p>Swales’s (1990: 9)</p><p>genre pedagogy, as described in his seminal book</p><p>Genre Analysis: English in academic and research settings, “rests on a prag-</p><p>matic concern to help people, both non-native and native speakers, to de-</p><p>velop their academic, communicative competence”. It is essential to men-</p><p>tion that, even though genre pedagogy has its origins in academic settings,</p><p>the approach is used to teach different discourse genres.</p><p>Pedagogical Models Combining Corpus and Genre-Based Approaches</p><p>According to Charles (2020), even though corpus methods and</p><p>genre analysis share a close connection, applications of such approaches for</p><p>teaching purposes are not so frequent in practice. In said applications, both</p><p>the target genre and the language features to be taught play a fundamental</p><p>role. While the target genre serves as the starting point and the context</p><p>63</p><p>within which language features are built-in, the language data extracted</p><p>from the corpus reveal patterns that are conventionally used by experts</p><p>of the discourse community of a given discipline. Therefore, the language</p><p>features to be taught should be selected according to their relevance to the</p><p>chosen genre and students’ needs.</p><p>As reported by Moreno and Swales (2018), the identification of lin-</p><p>guistic features characterizing the various rhetorical moves of different</p><p>genres for pedagogical purposes has been reported in many studies as the</p><p>main aim of move analysis (for example, Cortes, 2013; Cotos et al., 2017;</p><p>Kanoksilapatham, 2005; Le & Harrington, 2015; Swales, 1981). Moreno</p><p>and Swales (2018: 41) highlight that filling the “function-form gap” in-</p><p>volves “establishing the most salient types of text items, or patterns, occur-</p><p>ring in a specific rhetorical context in an RA, or any other genre, that may</p><p>lead a competent reader to interpret a given communicative function in a</p><p>highly predictable manner”. Few research methodologies and pedagogical</p><p>models, though, have managed to converge these two analytic paradigms:</p><p>the top-down, which involves investigations into “the rhetorical composi-</p><p>tion of texts through Swalesian (1981, 2004) move analysis”, and the bot-</p><p>tom-up, which refers to “investigations into the linguistic characteristics</p><p>of texts through analysis of lexical, phraseological, grammatical, and lex-</p><p>ico-grammatical patterns of use” (Gray et al., 2020: 261). Charles (2007:</p><p>289), for example, suggested reconciling top-down (discourse analysis) and</p><p>bottom-up (corpus investigation) approaches as she presents EAP writing</p><p>materials designed through “a pedagogic approach which combines dis-</p><p>course analysis with corpus investigation”.</p><p>As the pedagogical model described above sets the scene for the EAP</p><p>teaching and learning framework to be suggested in this chapter, it is essen-</p><p>tial to remember that another gap needs to be filled: the one between cor-</p><p>pus linguistics and teaching practice. It is also noteworthy that initial de-</p><p>cisions should be made in EAP course planning and materials design. An</p><p>essential first step is to carry out a needs analysis in order to know the stu-</p><p>dents’ background (e.g., their language proficiency level, their background</p><p>knowledge in the discipline they work with), their learning preferences</p><p>(e.g., using inductive or deductive methods), as well as what they expect</p><p>64</p><p>and need from the course4. Also, decisions about which genre (e.g., oral</p><p>presentation, research article), section (e.g., abstract, introduction, meth-</p><p>odology, results), discipline (e.g., Nursing, Physics, Applied Linguistics),</p><p>and language skill(s) (e.g., reading, listening, writing, speaking) the EAP</p><p>course will focus on, need to be made. Information about the course to</p><p>be taught and its target audience allows for defining clear and achievable</p><p>learning objectives based on the learners’ prior knowledge, skills, needs,</p><p>preferences, and expectations. The choice of an appropriate methodology,</p><p>the selection and design of materials, the feedback between learners and</p><p>teachers, and the construction of knowledge that will be a consequence of</p><p>this process are essential elements for designing and implementing EAP</p><p>courses. It is always important to remember that course and materials de-</p><p>sign are not linear processes. Figure 1 shows an interplay between actions</p><p>and procedures involved in implementing an EAP course, being the design</p><p>of materials one of them:</p><p>Figure 1. Stages involved in the process of designing and implementing an</p><p>EAP course</p><p>4 See Viana et al. (2018) for a detailed overview of types of information that can be</p><p>gathered in a needs analysis, the likely sources to be examined and methods that can</p><p>be employed.</p><p>65</p><p>The design of EAP materials</p><p>Framework</p><p>Schneuwly and Dolz (2004: 51) define didactic sequences5 as “a se-</p><p>quence of teaching modules, organized together to improve a given lan-</p><p>guage practice.” The authors advocate for having genres as the basis for or-</p><p>ganizing didactic sequences. With the genre as a starting point, the process</p><p>of knowledge construction is scaffolded by tasks, activities, and exercises6</p><p>designed according to specific guiding principles (Bocorny & Welp, 2021:</p><p>1601-1602), ultimately achieving pre-established learning objectives with-</p><p>in a specific time frame.</p><p>For the design of activities with online corpora, Reppen (2010: 43)</p><p>suggests a checklist with general guidelines;</p><p>• Have a clear idea of the point that you want to teach;</p><p>• Select the corpus that is the best resource for your lesson;</p><p>• Explore the corpus completely for the point you want to teach;</p><p>• Make sure that your directions are complete and easy to follow;</p><p>• Make sure that your examples focus on the point that you are teaching;</p><p>• Provide a variety of ways for interacting with the materials;</p><p>• Use a variety of exercises types;</p><p>• If you are using computers, always have an alternative plan or activity</p><p>in the event of computer glitches.</p><p>In coursebooks, a pedagogical unit can be the focus of one or more</p><p>classes, and its structure tends to be the same throughout the book. Table</p><p>1 shows the structure of the pedagogical unit and the section titles used in</p><p>the EAP writing course presented as an example in this chapter:</p><p>5 In this study, the terms ‘didactic sequences’ and ‘pedagogical units’ are considered</p><p>equivalent in meaning.</p><p>6 In this study, the term ‘task’ is used as a didactic plan to produce a communicative</p><p>response from participants, comprising one or more sets of activities. The terms ‘activ-</p><p>ity’ and ‘exercise’, in turn, are considered equivalent in meaning, and, for this reason,</p><p>they are used interchangeably in the sense of segments that make up a task.</p><p>66</p><p>PEDAGOGICAL UNIT</p><p>STRUCTURE</p><p>SECTION TITLES OF A</p><p>PEDAGOGICAL UNIT</p><p>Context of use, purpose and definition 1) Activate previous knowledge</p><p>Characteristics of the genre 2) Learn about key characteristics</p><p>Rhetorical structure 3) Find the parts</p><p>Language features 4) Know important language features</p><p>Production of genre</p><p>5) Analyze examples</p><p>6) Write the first draft</p><p>7) Get feedback</p><p>8) Write the final draft</p><p>Table 1. Pedagogical unit structure for an EAP writing course</p><p>Welp et al. (2019: 6) list guiding principles to orient teachers in plan-</p><p>ning and designing general English teaching materials. Those principles</p><p>were adapted by Bocorny and Welp (2021: 1601-1602) to guide the design</p><p>of EAP materials:</p><p>1. Learning objectives should be established based on the knowledge area</p><p>and academic needs of the group of learners the tasks are aimed at;</p><p>2. Target genres should be academically relevant and coherent with the</p><p>established learning objectives;</p><p>3. Selected texts should be authentic and representative of social practices</p><p>and genres that circulate in the academic context;</p><p>4. Tasks should offer the learners opportunities to use the language prop-</p><p>er to the texts produced in the learners’ domain and raise awareness on</p><p>such use in a contextualized way;</p><p>5. Tasks dealing with linguistic resources should take into account the fre-</p><p>quency of lexical and discursive</p><p>items present in academic texts in the</p><p>learners’ area of knowledge;</p><p>6. Tasks’ order and statements should be organized in a way to promote</p><p>progress and scaffold learning;</p><p>7. Tasks should provoke relevant interactions between learners and texts,</p><p>learners and learners and learners and teacher;</p><p>8. Task performance should provide meaningful learning opportunities</p><p>and achieve results beyond the classroom.</p><p>67</p><p>Specifically, when it comes to the design of EAP materials within</p><p>a framework that combines corpus and genre-based pedagogies, two el-</p><p>ements are key: knowing the rhetorical structure of the target genre and</p><p>identifying language features that are relevant to the genre that is being</p><p>taught, considering the learners’ prior knowledge, skills, needs, and ex-</p><p>pectations (see ‘Corpus Linguistics’ and ‘Genre, Genre-Analysis, Move-</p><p>Analysis and Genre-Based Pedagogy’ above for details on both elements).</p><p>In particular, it is vital to identify the language features used to realize the</p><p>functions expressed in genre moves and steps. Moreno and Swales (2018:</p><p>40) mentioned that “A widely shared aspiration of move analysts has been</p><p>to identify the linguistic features characterizing the various RA moves not</p><p>only in English but also across languages.”</p><p>A checklist for planning and designing EAP materials within a cor-</p><p>pus and genre-based framework is proposed in the next section having in</p><p>mind these two major elements, along with the guidelines suggested by</p><p>Reppen (2010) and the principles put forward by Welp et al. (2019) and</p><p>used by Bocorny and Welp (2021).</p><p>Step-by-step guide</p><p>This section is organized as a guide to be used by novice EAP teach-</p><p>ers when designing materials within the proposed pedagogical model that</p><p>combines corpus and genre-based approaches. We use the first five guiding</p><p>principles suggested by Welp et al. (2019) and adapted by Bocorny and</p><p>Welp (2021) as a checklist to be followed. Next, we provide brief explana-</p><p>tions and describe some associated actions for each of the five first princi-</p><p>ples. Finally, examples of the proposed actions are presented, considering</p><p>an EAP writing course for producing Health Sciences structured abstracts.</p><p>Description of the EAP writing course</p><p>As can be seen in Table 2, structured abstracts are the target genre of</p><p>the course, which is aimed at upper-intermediate (B2, C1) Health Sciences</p><p>graduate students and researchers. The course is to be taught online with a</p><p>68</p><p>total of 16 hours divided into 8 hours of synchronous activities and 8 hours</p><p>of asynchronous activities:</p><p>Name of the course Written production of structured abstracts in the</p><p>area of Health Sciences</p><p>Target genre Structured abstracts</p><p>Target section All sections</p><p>Students level of proficiency Upper-intermediate (B2, C1)</p><p>Students level of education Tertiary level (graduate students)</p><p>Course modality Online</p><p>Length of the course 4 week course (16 hours: 8 hours of synchronous</p><p>activities and 8 hours of asynchronous activities)</p><p>Table 2. Description of the EAP writing course</p><p>PRINCIPLE 1. Learning objectives should be established based on the</p><p>knowledge area and academic needs of the group of learners the tasks</p><p>are aimed at</p><p>EXPLANATION: A learning objective is a description of what the learner</p><p>should be able to do upon successful completion of an educational step</p><p>(for example, course, task, exercise/activity) over a period of time. Clearly</p><p>defined learning objectives specify the knowledge, skills, and/or attitudes</p><p>the learner will gain from the educational step so that such aspects can be</p><p>assessed later on.</p><p>EXAMPLE: As can be seen in Table 3, there are two types of learning ob-</p><p>jectives for the course described: (i) the course learning goal, which is the</p><p>outcome that is expected after its successful conclusion (being able to pro-</p><p>duce a structured abstract in the area of Health Sciences to be submitted</p><p>to a journal in the area) and (ii) the learning goal of each class. The fruitful</p><p>accomplishment of each of these goals is verifiable through implementing</p><p>pedagogical tasks:</p><p>69</p><p>Learning objective</p><p>of the course</p><p>By the end of this course, participants should be able to pro-</p><p>duce a structured abstract in the area of Health Sciences to be</p><p>submitted to a journal in the area.</p><p>Learning objective</p><p>of class 1</p><p>By the end of this class, participants should be able to under-</p><p>stand what a structured abstract is and in which contexts it is</p><p>used in the area of Health Sciences.</p><p>Learning objective</p><p>of class 2</p><p>By the end of this class, participants should be able to rec-</p><p>ognize the rhetorical structure of a structured abstract in the</p><p>area of Health Sciences.</p><p>Learning objective</p><p>of class 3</p><p>By the end of this class, participants should be able to use lan-</p><p>guage features relevant to producing a structured abstract in</p><p>the area of Health Sciences.</p><p>Learning objective</p><p>of class 4</p><p>By the end of this class, participants should be able to produce</p><p>the first draft of a structured abstract in the area of Health</p><p>Sciences.</p><p>Table 3. Learning objectives for course and classes</p><p>PRINCIPLE 2. The target genres should be academically relevant and</p><p>coherent with the established learning objectives</p><p>EXPLANATION: The target genre is the one that is going to be worked</p><p>with along the course. As it has already been mentioned (see ‘Framework’),</p><p>within the framework proposed, two elements are central: knowing the</p><p>rhetorical structure of the target genre and identifying relevant language</p><p>features. Many patterns representing the rhetorical structure of academ-</p><p>ic genres can be found in the literature. Can et al. (2016: 4), for example,</p><p>present the rhetorical structure of abstracts within Applied Linguistics, as</p><p>shown in Figure 2:</p><p>70</p><p>Figure 2. Rhetorical structure of Applied Linguistics abstracts. From Can et al.</p><p>(2016: 4)</p><p>The rhetorical structure of a given genre can also be obtained by us-</p><p>ing: (i) text structure analyzers like AntMover (Anthony, 2003); (ii) rhetor-</p><p>ical tagging or rhetorical move-step coding (Bondi, 2022; Berdanier, 2019;</p><p>Gray et al., 2020; Yoon & Casal, 2020a; 2020b; Geluso, 2019) or, concerning</p><p>structured abstracts, (iii) the section headings, as suggested by Freitas and</p><p>Bocorny (2021).</p><p>EXAMPLE: The target genre of the course described is structured abstracts,</p><p>that is, abstracts that “describe a study using specific content headings rath-</p><p>er than paragraph format” (Stevenson & Harrison, 2009: 1). Figure 3 exem-</p><p>plifies the rhetorical structure aimed at in a writing course for structured</p><p>abstracts in health sciences:</p><p>71</p><p>Figure 3. Example of a structured abstract in Health Sciences. From Gaspar et</p><p>al. (2022: 2)</p><p>The example of the rhetorical structure frequency distribution shown</p><p>in Figure 4 was extracted from three corpora of structured abstracts in the</p><p>area of Epidemiology using the section headings, as suggested by Freitas</p><p>and Bocorny (2021). To obtain the rhetorical structure shown in Figure 4,</p><p>the following CQL was used in Sketch Engine: []{1,3} [word=”:”]:</p><p>Figure 4. Rhetorical structure of Epidemiology structured abstracts. From</p><p>Freitas and Bocorny (2021: 3)</p><p>72</p><p>As seen in Figure 4, the section headings in all the three corpora</p><p>are Methods, Results/Findings, and Conclusions, and in two corpora,</p><p>Background and Objectives (aim, purpose). The procedure for identifying</p><p>SECTION HEADINGS used in this study is described below.</p><p>PROCEDURE 1:</p><p>1) Go to Sketch Engine</p><p>2) Select the corpus you want to work with</p><p>3) Go to Concordance</p><p>4) Select Advanced</p><p>5) Click on CQL</p><p>6) Paste the CQL []{1,3} [word=”:”]</p><p>7) Click on GO</p><p>The results from PROCEDURE 1 are shown in Figure 5. These head-</p><p>ings can be categorized into families representing the sections of the struc-</p><p>tured abstracts of the discipline under study:</p><p>Figure 5. Section heading of the structured abstracts being studied</p><p>73</p><p>PRINCIPLE 3. The selected texts should be authentic and representative</p><p>of social practices and genres that circulate in the academic context</p><p>EXPLANATION: An</p><p>authentic and representative sample of texts to ex-</p><p>tract language data to inform materials design can be obtained in existing</p><p>freely-available corpora (for example, COCA7, MICUSP8, CODISSAE9).</p><p>However, suppose you want to design a pedagogical unit of a genre (or</p><p>section of a genre) that is not available in the existing freely-available cor-</p><p>pora. In that case, you can compile your corpus using tools like AntCorGen</p><p>(Anthony, 2022b)10 or Sketch Engine (Kilgarriff, 2004)11. AntCorGen, for</p><p>example, is very useful for designing tasks and exercises for discipline and</p><p>section-specific EAP writing courses on research articles or abstracts, that</p><p>is, EAP courses that focus on one of the sections of research articles within</p><p>a particular discipline. Now, suppose you want to work with a more spe-</p><p>cific genre within a particular area. In that case, you may have to compile</p><p>your corpus manually and upload it to a tool that will enable language data</p><p>extraction.</p><p>EXAMPLE: Three corpora were compiled for the course on the Written</p><p>Production of Health Sciences Structured Abstracts. As described</p><p>by Freitas and Bocorny (2021), the corpora comprise abstracts from</p><p>Epidemiology articles published in peer-reviewed indexed journals be-</p><p>tween 2003 and 2021. Their characteristics are represented in Table 4:</p><p>7 https://www.english-corpora.org/coca/</p><p>8 http://micusp.elicorpora.info/</p><p>9 https://drive.google.com/drive/folders/145ZFPOUuCwvTWFirM-</p><p>lqG1vGbD-1g7p7o?usp=sharing</p><p>10 https://www.laurenceanthony.net/software/antcorgen/</p><p>11 https://www.sketchengine.eu/blog/build-a-corpus-from-the-web/</p><p>74</p><p>Domain Corpus Words</p><p>with repeti-</p><p>tion</p><p>(tokens)</p><p>Words with-</p><p>out repeti-</p><p>tion</p><p>(types)</p><p>Texts Average</p><p>words per</p><p>abstract</p><p>Epidemiology SJC 662,747 21,087 1,915 346</p><p>Epidemiology PLOS ONE 1,000.003 43,066 4,330 230</p><p>Epidemiology BJSTD 83,261 9,010 360 231</p><p>Table 4. Numbers of corpora used in the study. From Freitas and Bocorny</p><p>(2021: 2)</p><p>PRINCIPLE 4. The tasks should offer the learners opportunities to use</p><p>the language proper to the texts produced in the learners’ domain and</p><p>promote reflections on such use in a contextualized way</p><p>EXPLANATION: After compiling the corpus that will be used to inform</p><p>the design of tasks and exercises within a pedagogical unit, it is time to</p><p>choose a language feature (or language features) that will be focused on.</p><p>Said language feature needs to be proper and relevant to the texts produced</p><p>in the learners’ knowledge area. The decision on which language features</p><p>to focus on in EAP courses can challenge novice EAP teachers. Some of</p><p>these features have been addressed in different studies as relevant for pro-</p><p>ducing academic genres. Swales and Feak (2009), for example, mention</p><p>tenses (past tense x simple present tense), passive voice, metadiscoursal ex-</p><p>pressions, lexical bundles, ‘that’ clauses, reporting verbs, pronouns (I, we).</p><p>Kanoksilapatham (2005) refers to passive constructions, past tense, ‘that’</p><p>clauses, and metatextual devices. Table 5 provides examples of language</p><p>features and ways of retrieving them from corpora using SE CQL queries. It</p><p>is important to emphasize that the previous identification of language fea-</p><p>tures elicited by learners as relevant also works as a compass needle point-</p><p>ing to what to focus on.</p><p>75</p><p>Language feature</p><p>to be analyzed</p><p>Way to extract</p><p>language feature using SE CQL queries</p><p>Sentence voice</p><p>Passive voice:</p><p>[]{1,5} [tag=”VBD.*” | tag=”VBG” | tag=”VBN” | tag=”VBP” |</p><p>tag=”VBZ”] [tag=”VVN”]</p><p>Passive voice in each section of a structured abstract:</p><p>[]{1,3} [word=”:”] []{1,5} [tag=”VBD.*” | tag=”VBG” |</p><p>tag=”VBN” | tag=”VBP” | tag=”VBZ”] [tag=”VVN”]</p><p>Obs: It is possible to FILTER the results obtained in the previ-</p><p>ous search by section heading or specific words (for example,</p><p>the word ‘by’) to obtain concordance lines with passive voice</p><p>in section CONCLUSION of a structured abstract followed</p><p>by the word ‘by’. See Appendix 5 for results.</p><p>Pronouns (I, we) Pronouns in each section of a structured abstract:</p><p>[]{1,3} [word=”:”] [lemma=”we” | lemma=”I”]</p><p>Lexical Bundles</p><p>Lexical bundles in each section of a structured abstract</p><p>[]{1,3} [word=”:”] []{1,4} [word=”study”] []{1,4}</p><p>Obs: In this case, the word ‘study’ can be replaced by any of the</p><p>collocation nodes identified in the wordlist (see Figure 11)</p><p>Table 5. Some language features and ways of retrieving them from corpora</p><p>using SE CQL queries.</p><p>Some of these language features are easier to extract and analyze.</p><p>Imagine that one of your students wants to know whether to use ‘I’ or ‘we’12</p><p>when writing structured abstracts. Simply checking the wordlist for pro-</p><p>nouns will show that, in our study corpus, ‘we’ occurs 3,345 times per mil-</p><p>lion words (pmw) while ‘I’ occurs 95 times (pmw). If your students want</p><p>to know which pronoun is more conventional in the different sections of</p><p>structured abstracts in initial position, after the section heading (for exam-</p><p>ple, ‘CONCLUSION: We concluded that’), it is possible to use the CQL</p><p>[]{1,3} [word=”:”] [lemma=”we” | lemma=”I”]. All the 1,037 concordance</p><p>12 Previous research has explored the role of personal pronouns in academic writ-</p><p>ing (Henderson & Barr, 2010; Martínez, 2005; Hyland, 2002). According to Hyland</p><p>(2002), a solid authorial identity that refers to authors taking ‘ownership’ for their work</p><p>has to do with the use of self-reference in active voice constructions (where personal</p><p>pronouns are used) as opposed to the anonymity of passive forms.</p><p>76</p><p>lines obtained with this query show section headings followed by the pro-</p><p>noun ‘we’. This information could orient an exercise on authorial identity</p><p>(see footnote 11) and on the use of pronouns in a course on writing struc-</p><p>tured abstracts.</p><p>EXAMPLE: For the course on Written Production of Structured Abstracts</p><p>in Health Sciences, the language feature selected was Lexical Frames (LFs),</p><p>that is, discontinuous sequences of words forming a structure around vari-</p><p>able slots (Gray & Biber, 2013). According to Gray and Biber (2013), writ-</p><p>ten academic discourse relies primarily on LFs. For this reason, that lan-</p><p>guage feature has great pedagogical importance in written academic genres.</p><p>PRINCIPLE 5. Tasks dealing with linguistic resources should take into</p><p>account the frequency of lexical and discursive items present in aca-</p><p>demic texts in the learner’s area of knowledge</p><p>EXPLANATION: The lexical and discursive items selected as language fea-</p><p>tures should be conventional. In other words, they should reveal the lan-</p><p>guage used by the expert discourse community of a given discipline.</p><p>EXAMPLE: Learning about tools that can facilitate the teacher’s access to</p><p>linguistic data obtained from corpora might help bridge the gap between</p><p>corpus linguistics and language teaching (Cheng, 2010). Different method-</p><p>ologies (for example, bundles-to-frames approach and fully inductive ap-</p><p>proach13) and tools (for example, AntGram 0.0.3 (Anthony, 2017), AntConc</p><p>4.1 (Anthony, 2022b)14, WordSmith Tools 8.0 (Scott, 2000), KfNgram 1.3.1</p><p>13 Bundles-to-frames approach (Biber, 2009; Römer, 2010) and fully inductive ap-</p><p>proach (Gray & Biber, 2013) are methodological procedures for identifying LFs in a</p><p>corpus. While, according to Gray and Biber (2013), the former starts by finding the</p><p>most frequent continuous lexical sequences in a register and then analyzes the se-</p><p>quences to determine if they are associated with discontinuous lexical frames with</p><p>variable slots, the latter “directly identifies the full set of discontinuous sequences in a</p><p>corpus” (Gray & Biber, 2013: 111).</p><p>14 The use of different versions of AntConc implies the impossibility of extracting</p><p>certain data related to Lexical Frames.</p><p>77</p><p>(Fletcher, 2012)) have been suggested for the extraction of LFs. AntConc</p><p>4.1 is, in our opinion, the most user-friendly tool for extracting LFs. Figure</p><p>6 shows the LFs extracted from the corpus of Health Sciences RA struc-</p><p>tured abstracts with AntConc 4.1 (Anthony, 2022b). The criteria used for</p><p>the</p><p>of</p><p>perspectives on the challenges and opportunities of teaching and learning</p><p>EAP. As EAP is a broad and complex field, encompassing various academic</p><p>contexts and language skills, this collaborative effort offers unique insights</p><p>that can enrich understanding within the field and inspire new approaches</p><p>to support students in their academic language development.</p><p>Second, this book demonstrates the complexity of the field of EAP by</p><p>presenting a range of different research initiatives. It highlights the numer-</p><p>ous factors that can impact language learning and use in academic settings,</p><p>which can inform the design of effective language teaching and learning</p><p>materials.</p><p>Lastly, this book encourages collaboration and dialogue by bringing</p><p>together a diverse group of scholars and practitioners. This collaborative</p><p>approach is intended to foster a sense of community and shared purpose</p><p>within the field of EAP, leading to the development of new ideas and ap-</p><p>proaches to teaching and learning. In summary, this book is an important</p><p>contribution to the field of EAP as it provides a platform for advancing re-</p><p>search and practice. We now provide a brief overview of the next chapters.</p><p>In the second chapter of this book, Deise Prina Dutra and Tony</p><p>Berber Sardinha provide a comprehensive overview of English for Specific</p><p>Purposes (ESP), a field that has experienced considerable growth and de-</p><p>velopment over the past decades. Within ESP, EAP has emerged as a key</p><p>area of focus, with studies from a CL perspective providing insights into the</p><p>characteristics of academic speech and writing. This chapter explores the</p><p>contribution of general, specialized, and learner corpora to EAP research</p><p>and practice, with a particular focus on how corpus-based approaches</p><p>have influenced the study of vocabulary and grammar in academic texts.</p><p>The authors review the major literature on corpus-based research in EAP</p><p>10</p><p>and highlight the ways in which multi-dimensional analysis can provide</p><p>a deeper understanding of the underlying patterns of lexico-grammatical</p><p>characteristics in academic writing. By examining these patterns, the au-</p><p>thors shed light on some of the differences across academic registers that</p><p>have previously been overlooked in the field.</p><p>In recent years, the integration of corpus-based language learning</p><p>and teaching has gained attention in the field of English for Academic</p><p>Purposes (EAP). Despite the potential benefits of using corpus data in EAP</p><p>pedagogy, the application of corpus-based approaches in Brazilian EAP</p><p>classrooms is still limited. This issue is addressed in the third chapter of</p><p>this book, authored by Ana Eliza Pereira Bocorny, Ana Luiza Freitas, and</p><p>Rozane Rebechi. The chapter provides a practical guide for EAP teachers</p><p>on how to integrate corpus data into materials designed for EAP writing</p><p>courses. The authors review corpus and genre-based approaches to lan-</p><p>guage learning and teaching, besides describing a framework and princi-</p><p>ples for the design of EAP materials that combine these pedagogies. The</p><p>chapter concludes by highlighting the feasibility of the application of genre-</p><p>based corpus linguistics for both novice and experienced teachers, who can</p><p>use the step-by-step guide to integrate corpus and genre-based approaches</p><p>for academic writing in their classrooms. This chapter will be of interest</p><p>to anyone seeking to enhance their understanding of the potential of cor-</p><p>pus-based pedagogy in EAP, particularly novice EAP teachers.</p><p>Chapter 4, authored by Paula Tavares Pinto, Luciano Franco da Sil-</p><p>va, Talita Serpa, and Diva Cardoso de Camargo, explores the potential of</p><p>using do-it-yourself corpora to support academic writing and translation</p><p>in the areas of humanities, science, and math. The authors demonstrate</p><p>how to quickly compile two specialized corpora in SHAPE (Social Sciences</p><p>Humanities, Arts for People and Economy) and STEM (Science, Technolo-</p><p>gy, Engineering, and Mathematics) areas with the tool AntCorGen and ex-</p><p>plore them with Sketch Engine to help researchers write their own research</p><p>papers. By examining the corpora, readers can identify frequently used</p><p>adjectives, verbs, and lexical bundles, as well as recurrent academic struc-</p><p>tures for each research paper section, such as the Introduction, Method-</p><p>ology, Discussion, and Conclusions. The chapter offers practical guidance</p><p>11</p><p>for researchers who wish to use corpora to enhance their academic writing</p><p>skills.</p><p>In Chapter 5, Sandra Zappa-Hollman, Alfredo Afonso Ferreira,</p><p>Greta Perris, Simone Sarmento, Marine Laísa Matte, and Laura Baumvol</p><p>report on their experiences designing and piloting a local learner corpus</p><p>for use by instructors, students, and researchers at a Canadian university</p><p>that offers first-year undergraduate programs for speakers of English as an</p><p>additional language. This project was motivated by the need for data-driv-</p><p>en instruction and research, and the authors present the stages of conduct-</p><p>ing the project, highlighting the importance of collaborative teamwork,</p><p>and sharing the results of initial data analysis for pedagogical and research</p><p>applications.</p><p>Chapter 6 focuses on how genre mediates variation in language, in-</p><p>dicating that different communicative purposes are expressed through the</p><p>use of different linguistic features. Marine Laísa Matte, Deise Amaral, and</p><p>Larissa Goulart analyze the variation of linguistic features associated with</p><p>academic writing in two genres of university assignments: Case Studies</p><p>and Critiques from the BAWE (British Academic Written English) corpus.</p><p>Mann-Whitney U tests indicate that there is variation in the use of features</p><p>between the two genres, with a higher frequency of features in Critiques.</p><p>The study reveals that, although the two genres share the same features,</p><p>their usage is mostly diverse as they serve different communicative objec-</p><p>tives. This finding suggests that different genres have specific language re-</p><p>quirements, which can influence the way in which authors express their</p><p>ideas and communicate with their readers.</p><p>In the seventh chapter, Marine Laísa Matte and Simone Sarmento</p><p>explore the role of collocations in EAP. Collocations are words that fre-</p><p>quently occur together due to their attraction, and their appropriate use</p><p>is indispensable for ensuring fluency and accuracy in written communi-</p><p>cation. In this study, the authors analyze how Brazilian students produce</p><p>collocations in academic texts written in English. The analysis is based on</p><p>a list of 125 nodes and their corresponding collocates in a comparison be-</p><p>tween the Brazilian Academic Written English (BrAWE) corpus and the</p><p>BAWE corpus. The findings indicate that, overall, the nodes are underused</p><p>12</p><p>in BrAWE. The study shows a balance of syntatic structures being used in</p><p>both corpora. Also, this research also reveals that Brazilian students use a</p><p>limited variety of collocations when compared to students in BAWE.</p><p>In recent years, Web-based Learning Tools (WBLTs) that use CL re-</p><p>search have become a popular way of teaching learners how to use colloca-</p><p>tions. In chapter 8, Larissa Goulart, Maria Kostromitina, and Jennifer Klein</p><p>evaluate the effectiveness of five WBLTs - FLAX, SKELL, Linggle, Just the</p><p>Word, and Netspeak - aimed at helping learners of English produce accu-</p><p>rate collocations. The evaluation is divided into three parts: research con-</p><p>ducted in the development of the WBLT, the WBLTs design and accessibil-</p><p>ity, and WBLT pedagogical applications. The results of the study show that</p><p>most of these tools rely on frequency-based collocations and contribute to</p><p>different types of class activities. The authors finish the chapter by propos-</p><p>ing task ideas for using these tools in the English language classroom.</p><p>In the last chapter of this collection, Laura Baumvol, Lucas Marengo,</p><p>and Simone Sarmento explore the concept of EMI. EMI is an approach to</p><p>teaching and learning in which English is the language of instruction, with</p><p>the purpose of imparting a diverse</p><p>extraction was: n-gram size = 6, open slots = 2, minimum frequency =</p><p>60, minimum range = 20.</p><p>Figure 6. LFs extracted with AntConc 4.1 described in PROCEDURE 2. From</p><p>Anthony (2022b)</p><p>PROCEDURE 2:</p><p>1) Open AntConc 4.1</p><p>2) Upload the corpus you want to work with</p><p>3) Click on N-Gram</p><p>4) Select the extraction criteria (in this extraction we used n-gram size</p><p>= 6, open slots = 2, minimum frequency = 60, minimum range = 20).</p><p>5) Click on START</p><p>The results show the most recurrent LFs in this corpus. It is possible</p><p>to see that the most frequent units are those that linguistically express the</p><p>rhetorical function ‘presenting the aim of the study’. If you double-click on</p><p>78</p><p>one of the LFs (for example, ‘this study + to + the’), you can see the unit in</p><p>context, as shown in Figure 7:</p><p>Figure 7. LF ‘this study + to + the’ in context. From Anthony (2022a)</p><p>The LFs extracted with AntConc 4.1 can ‘inspire’ the creation of a</p><p>CQL that could be used in SE to identify the LFs used in the different sec-</p><p>tions of the structured abstracts. For example, the LF ‘the + of + study was’</p><p>can lead to the following CQL [lemma=”the”] [tag=”N.*”] [lemma=”of ”]</p><p>[lemma=”this”] [lemma=”study”] [tag=”VB.*”] [lemma=”to”] [tag=”V.*”].</p><p>To extract the LF in different sections of structured abstracts, this CQL</p><p>should contain []{1,3} [word=”:”]. Hence, the CQL becomes: []</p><p>{1,3} [word=”:”] [lemma=”the”] [tag=”N.*”] [lemma=”of ”] [lemma=”this”]</p><p>[lemma=”study”] [tag=”VB.*”] [lemma=”to”] [tag=”V.*”].</p><p>Another way of identifying recurrent LFs in sections of structured</p><p>abstracts is by having collocation nodes as a starting point. Following</p><p>Flowerdew (2013), Freitas and Bocorny (2021) used a combination of lex-</p><p>ical and phraseological elements to extract LFs from Epidemiology RA</p><p>structured abstracts. A list of frequent noun collocation nodes was used</p><p>79</p><p>“as a starting point for collocation look-ups” (Frankenberg-Garcia et al.,</p><p>2021: 208). As can be seen in Figure 8, the five most frequent nouns in the</p><p>Epidemiology PLOS ONE study corpus were ‘patient’, ‘risk’, ‘study’, ‘cancer’,</p><p>and ‘result’. Collocation nodes could also be found in other word classes,</p><p>like verbs, adjectives, adverbs, and prepositions:</p><p>Figure 8. Noun wordlist for the Health Sciences PLOS ONE study corpus.</p><p>From Kilgarriff et al. (2004)</p><p>Using Sketch Engine and searching for concordance lines with the</p><p>lemma ‘study’ as a noun, it is possible to retrieve language data that could be</p><p>easily integrated into exercises to be used in the course Written Production</p><p>of Structured Abstracts in the Area of Health Sciences. Figure 9 shows</p><p>the results:</p><p>80</p><p>Figure 9. Concordance lines with the lemma ‘study’ as a noun. From Kilgarriff</p><p>et al. (2004)</p><p>PROCEDURE 3:</p><p>1) Open Sketch Engine</p><p>2) Select the corpus you want to work with</p><p>3) Choose Concordance</p><p>4) Select Advanced</p><p>5) Click on lemma, in Query type</p><p>6) Click on noun, in Part of speech</p><p>7) Write ‘study’ (or any other recurrent collocation node) under Lemma</p><p>8) Press GO</p><p>Figure 10 illustrates the search for ‘study’:</p><p>81</p><p>Figure 10. SE interface for PROCEDURE 3. From Kilgarriff et al. (2004)</p><p>The results obtained with PROCEDURE 3 can be filtered for each</p><p>structured abstract recurrent section heading: (METHODS, RESULTS/</p><p>FINDINGS, CONCLUSIONS BACKGROUND, and OBJECTIVES/AIM/</p><p>PURPOSE). For example, Figure 11 shows the filtered results of concor-</p><p>dance lines with the lemma ‘study’ for the section CONCLUSIONS:</p><p>82</p><p>Figure 11. Filtered results of concordance lines with the lemma ‘study’ for the</p><p>section CONCLUSIONS. From Kilgarriff et al. (2004)</p><p>PROCEDURE 4 presents the steps for filtering data:</p><p>1) Use the results obtained with PROCEDURE 3 (search for the lemma</p><p>‘study’, as a noun)</p><p>2) Click on the Filter icon, as shown in Figure 12:</p><p>Figure 12. Filtering data in SE. From Kilgarriff et al. (2004)</p><p>3) Select Advanced</p><p>4) Click on lemma, in Query type</p><p>5) Click on noun, in Part of speech</p><p>6) Write ‘Conclusion’, under Lemma</p><p>7) Press GO</p><p>Figure 13 illustrates the search:</p><p>83</p><p>Figure 13. SE interface for PROCEDURE 4. From Kilgarriff et al. (2004)</p><p>If you want to organize the results obtained with PROCEDURE 4,</p><p>you can click on the icon SORT (to the left of the FILTER icon). The results</p><p>obtained are shown in Figure 14:</p><p>Figure 14. Sorting data in SE. From Kilgarriff et al. (2004)</p><p>A more direct way of finding recurrent LBs (and afterwards the LFs)</p><p>in the sections of structured abstracts is to use Corpus Query Language</p><p>(CQL) syntaxes. The CQL []{1,3} [word=”:”] []{1,4} [word=”study”] []</p><p>{1,4}, for example, extracts all the collocations that occur in the sections of</p><p>84</p><p>the structured abstracts that have ‘study’ as a collocation node. In this case,</p><p>the collocation node ‘study’ can be replaced by any of the collocation nodes</p><p>identified in the wordlists extracted from the corpus. Figure 15 shows the</p><p>results when using this CQL:</p><p>Figure 15. Results for the CQL []{1,3} [word=”:”] []{1,4} [word=”study”] []</p><p>{1,4}. From Kilgarriff et al. (2004)</p><p>PROCEDURE 5:</p><p>1) Open Sketch Engine</p><p>2) Select the corpus you want to work with</p><p>3) Go to Concordance</p><p>4) Select Advanced</p><p>5) Click on CQL, in Query type</p><p>6) Paste the CQL []{1,3} [word=”:”] []{1,4} [word=”study”] []{1,4}</p><p>under CQL</p><p>7) Press GO</p><p>8) Click on KWIC (to organize the results alphabetically)</p><p>85</p><p>Figure 16. SE interface for extracting LBs from sections of the structured ab-</p><p>stracts using CQL []{1,3} [word=”:”] []{1,4} [word=”study”] []{1,4}. From</p><p>Kilgarriff et al. (2004)</p><p>The results in Figure 16 indicate that collocations with ‘study’ occur</p><p>across sections of these structured abstracts. These results can also be fil-</p><p>tered for each section identified as part of the rhetorical structure of the</p><p>abstracts under study. For example, as shown in Figure 17, collocations</p><p>with the word ‘study’ occur 440 times in the section CONCLUSION in the</p><p>corpus of Health Sciences:</p><p>86</p><p>Figure 17. Collocations with the word ‘study’ filtered for the section</p><p>CONCLUSION. From Kilgarriff et al. (2004)</p><p>The collocations extracted with the node ‘study’ filtered for the sec-</p><p>tion CONCLUSIONS show different LFs that can be used in exercises. An</p><p>example is the LF shown in Table 6, below:</p><p>* * study * that</p><p>- The showed (68x)</p><p>The results of (25x) Our shows (48x)</p><p>This suggests (54x)</p><p>suggested (6x)</p><p>indicates (24x)</p><p>indicated (8x)</p><p>Table 6. LF with the node ‘study’</p><p>As can be seen in Table 6, the LF *(The, Our, This) study *(show(ed),</p><p>suggests, indicates) is a chunk of language that can be taught as an option</p><p>to be used at the beginning of the section CONCLUSION(S) in structured</p><p>abstracts in Health Sciences. ‘The results of ’ precedes some of the sentenc-</p><p>es where this LF occurs. ‘Showed’ is the most recurrent slot filler after the</p><p>collocation node ‘study’. The procedure of filtering, shown in Figure 12, can</p><p>be done with the other sections of structured abstracts to identify LFs to be</p><p>87</p><p>included in exercises with the LFs that are recurrent in different sections of</p><p>structured abstracts.</p><p>Concluding remarks</p><p>As aforementioned, this chapter drew from the needs of Brazilian</p><p>pre-service and in-service EAP novice teachers, graduate and undergrad-</p><p>uate students from the Federal University of Rio Grande do Sul (UFRGS),</p><p>all teachers at CLA (Center of Languages for Academic Purposes). While</p><p>the COVID-19 pandemic obliged us to stay home for two years and two</p><p>months, we held weekly online pedagogical meetings. During these meet-</p><p>ings, we reported and reflected upon our online classroom experiences,</p><p>to find solutions to problems that we had never faced before. Moreover,</p><p>we discussed language learning and teaching theories. Finally, we planned</p><p>courses and classes. However, above all, we tried to figure out how corpus</p><p>linguistics and genre studies could guide us to design materials to help our</p><p>students, the Brazilian academic community, to write more conventional</p><p>academic texts. The insights that came up from these meetings guided the</p><p>writing of this chapter.</p><p>During this period, we identified that novice EAP teachers were not</p><p>confident using corpus linguistics to inform their teaching practice, even</p><p>though this approach has been proved effective by many scholars. With</p><p>this gap in mind, we created a framework drawing on the principles pro-</p><p>posed by Welp et al. (2019) and adapted by Bocorny and Welp (2021) to</p><p>design EAP materials combining corpus and genre-based pedagogies. In</p><p>this chapter, we introduced a step-by-step guide to help teachers to retrieve</p><p>and integrate corpus data into materials designed for EAP writing courses</p><p>through indirect DDL. Moreover, we provided explanations and descrip-</p><p>tions of actions for each of the five first principles. Besides exemplifying</p><p>those actions, we had in mind an EAP writing course for producing Health</p><p>Sciences structured abstracts.</p><p>The COVID-19 pandemic is now over (or so we believe), and we are</p><p>back to on-site classes. Nevertheless, we are glad to say that we genuine-</p><p>ly believe we have all become more skilled and knowledgeable teachers.</p><p>88</p><p>Although we had a particular group of teachers in mind to produce this</p><p>study, we believe that the insights it led to can be generalized. Even so, fur-</p><p>ther studies could focus on work with a more significant sample of teach-</p><p>ers, both from the secondary and tertiary levels. Above all, we expect this</p><p>contribution will help to bridge the gap between corpus linguistics and</p><p>EAP materials design.</p><p>Acknowledgement</p><p>For the shared route and mutual support, we wish to thank the CLA</p><p>teachers for sticking together and supporting each other in a genuine case</p><p>of scaffolding.</p><p>References</p><p>Aijmer, K. (2009). (Ed.) Corpora and language teaching. John Benjamins Publishing.</p><p>Anthony, L. (2003). AntMover (Version 1.1.0) [Computer Software]. Tokyo, Japan:</p><p>Waseda University. https://www.laurenceanthony.net/software</p><p>Anthony, L. (2017). AntGram (Version 0.0.3) [Computer Software]. Tokyo, Japan:</p><p>Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/</p><p>Anthony, L. (2019). AntCorGen (Version 1.2.0) [Computer Software]. Tokyo, Japan:</p><p>Waseda University. https://www.laurenceanthony.net/software</p><p>Anthony, L. (2022a). International Perspectives on Corpus Technology for Language</p><p>Learning - The University of Queensland Seminar Series). Addressing the challenges</p><p>of data-driven learning through corpus tool design: An introduction to AntConc 4</p><p>[Video]. https://languages-cultures.uq.edu.au/event/session/8171</p><p>Anthony, L. (2022b). AntConc (Version 4.1) [Computer Software]. Tokyo, Japan:</p><p>Waseda University. https://www.laurenceanthony.net/software</p><p>Araújo, A. D. (1999). Uma análise de organização discursiva de resumos na área de</p><p>Educação. Revista do GELNE – Grupo de Estudos Linguísticos do Nordeste, 1, 26-30.</p><p>Atkins, S., Clear, J. & Oslter, N. (1992). Corpus design criteria. Literary and Linguistic</p><p>Computing, 7, 1-16.</p><p>https://www.laurenceanthony.net/software</p><p>https://www.laurenceanthony.net/software</p><p>https://www.laurenceanthony.net/software</p><p>https://www.laurenceanthony.net/software</p><p>89</p><p>Berdanier, C. G. (2019). Genre maps as a method to visualize engineering writing and</p><p>argumentation patterns. Journal of Engineering Education, 108(3), 377-393.</p><p>Bhatia, V. K. (1993). Analysing Genre: Language Use in Professional Settings. London:</p><p>Longman.</p><p>Biber, D. (2009). A corpus-driven approach to formulaic language in English: Multi-</p><p>word patterns in speech and writing. International Journal of Corpus Linguistics, 14</p><p>(3), 275-311.</p><p>Biber, D., Conrad, S. & Reppen, R. (1998). Corpus linguistics Investigating language</p><p>structure and use. Cambridge: Cambridge University Press.</p><p>Biber, D. & Conrad, S. (1999). Lexical bundles in conversation and academic prose.</p><p>In H. Hasselgard & S. OKSEFJELL (Eds.). Out of Corpora: Studies in Honour of Stig</p><p>Johansson (pp. 181-190). BRILL.</p><p>Biber, D. & Conrad, S. (2009). Register, genre, and style. Cambridge: Cambridge</p><p>University Press.</p><p>Bocorny, A. E. P. & Welp, A. K. D. S. (2021). O desenho de tarefas pedagógicas para o</p><p>ensino de inglês para fins acadêmicos: conquistas e desafios da Linguística de Corpus.</p><p>Revista de estudos da linguagem, 29(2), 1589-1638.</p><p>Bondi, M. (2022). Comparable corpora in cross-cultural genre studies: Tools for the</p><p>analysis of CSR reports. Corpus Linguistics and Translation Tools for Digital Humanities:</p><p>Research Methods and Applications, 37-63.</p><p>Boulton, A. (2007). But where’s the proof? The need for empirical evidence for da-</p><p>ta-driven learning. In Proceedings of the BAAL Annual Conference 2007, p. 13-16.</p><p>Boulton, A. (2021). Research in data-driven learning. Beyond Concordance Lines:</p><p>Corpora in language education, 102, 9-34.</p><p>Boulton, A. & Cobb, T. (2017). Corpus use in language learning: A meta-analysis.</p><p>Language learning, 67(2), 348-393.</p><p>Brezina, V., Weill-Tessier, P. & McEnery, T. (2020). #LancsBox 5.x and 6.x [software].</p><p>http://corpora.lancs.ac.uk/lancsbox</p><p>Breyer, Y. (2011). Corpora in Language Teaching and Learning. Potential, Evaluation,</p><p>Challenges. Peter Lang.</p><p>http://corpora.lancs.ac.uk/lancsbox</p><p>90</p><p>Can, S., Karabacak, E. & Qin, J. (2016). Structure of moves in research article abstracts</p><p>in applied linguistics. Publications, 4(3), 23.</p><p>Charles, M. (2007). Reconciling top-down and bottom-up approaches to graduate</p><p>writing: Using a corpus to teach rhetorical functions. Journal of English for academic</p><p>purposes, 6(4), 289-302.</p><p>Charles, M. (2013). English for academic purposes. The handbook of English for specific</p><p>purposes, 137-153.</p><p>Charles, M. (2020). Combining genre analysis and corpus consultation in class: Using</p><p>do-it-yourself corpora to explore the literature review. Approaches to Specialized</p><p>Genres, 243-258.</p><p>Charles, M. & Frankenberg-Garcia, A. (2021). Introduction: Dichotomies and debates</p><p>in corpora and ESP/EAP writing. In M. Charles, A. & Frankenberg-Garcia (Eds.).</p><p>Corpora in ESP/EAP Writing Instruction (pp. 1-10). Routledge.</p><p>Cheng, W. (2010). What can a corpus tell us about language teaching?. In M. McCarthy</p><p>& A. O’Keeffe (Eds.) The Routledge handbook of corpus linguistics (pp. 319-332).</p><p>Routledge.</p><p>Cortes, V. (2013). The purpose of this study is to: Connecting lexical bundles and moves</p><p>in research article introductions. Journal of English for academic purposes, 12(1), 33-43.</p><p>Cotos, E. (2014). Genre-based automated writing evaluation for L2 research writing:</p><p>From design to evaluation and enhancement. New York, NY: Palgrave Macmillan.</p><p>Cotos, E., Haufman, S. & Link, S. (2017). A move/step model for methods sections:</p><p>Demonstrating Rigour and Credibility. English for Specific Purposes, 46, 90-106.</p><p>Fletcher, W. H. (2012). kfNgram (Version 1.3.1). Retrieved from http://kwicfinder.</p><p>com/kfNgram/kfNgramHelp.html</p><p>Flowerdew, L. (2009). Applying corpus linguistics to pedagogy: A critical evaluation.</p><p>International journal of corpus linguistics, 14(3), 393-417.</p><p>Flowerdew, L. (2012). Corpus and Language Education. Basingstoke: Palgrave</p><p>Macmillan.</p><p>Flowerdew, L. (2013). Corpus-based research and pedagogy in EAP: From lexis to</p><p>genre. Language Teaching, 48(1), 99-116.</p><p>91</p><p>Flowerdew, L. (2014). Corpus-based analyses in EAP. In J. Flowerdew & C. Candlin</p><p>(Eds.). Academic discourse (pp. 105-124). Routledge.</p><p>Francis, W. N. (1992). Language Corpora BC. In Svartvik, J. [ed.] Directions in Corpus</p><p>Linguistics. Proceedings of Nobel Symposium 82, Stockholm. Berlin/ New York, p. 17-32.</p><p>Frankenberg-Garcia, A., Lew, R., Rees, G. Roberts, J.C., Sharma, N. & Butcher, P.</p><p>(2021).ColloCaid(around 30 thousand academic English collocations and examples</p><p>of collocations in context curated from corpora of expert academic English), open</p><p>access athttp://www.collocaid.uk/</p><p>Freitas, A. L. P. & Bocorny, A. E. P. (2021). How to write medical abstracts? The rhe-</p><p>torical structure and phrases used in Epidemiology. Brazilian Journal of Sexually</p><p>Transmitted Diseases, 33, 1-6.</p><p>Gavioli, L (2005). Exploring corpora for ESP learning (pp. 1-176). Amsterdam: John</p><p>Benjamins.</p><p>Gaspar, P. C., Santos, A. S. D. dos.,</p><p>Santana, L. B., Aragón, M. G., Machado, N. M.</p><p>da S., López, M. A. A., Passos, M. R. L., Pereira, G. F. M. & Miranda, A. E. (2022).</p><p>The fight against sexually transmitted infections cannot stop in the COVID-19 era:</p><p>a brazilian experience in online training for sexually transmitted infections guide-</p><p>lines. Brazilian Journal of Sexually Transmitted Diseases, 34. https://doi.org/10.5327/</p><p>DST-2177-8264-20223404</p><p>Geluso, J. (2019). Frequency, semantic, and functional characteristics of discontinuous</p><p>formulaic language: A learner corpus study. Master’s dissertation, Iowa State University.</p><p>Gray, B. & Biber, D. (2013). Lexical frames in academic prose and conversation.</p><p>International journal of corpus linguistics, 18(1), 109-136.</p><p>Gray, B., Cotos, E. & Smith, J. (2020). Combining rhetorical move analysis with</p><p>multi-dimensional analysis: Research writing across disciplines. Advances in cor-</p><p>pus-based research on academic writing: Effects of discipline, register, and writer exper-</p><p>tise, 137-168.</p><p>Henderson, A. & Barr, R. (2010). Comparing indicators of authorial stance in psychol-</p><p>ogy students’ writing and published research articles. Journal of Writing Research, 2(2),</p><p>245-264.</p><p>Hyland, K. (2002). Authority and invisibility: Authorial identity in academic writing.</p><p>Journal of pragmatics, 34(8), 1091-1112.</p><p>http://www.collocaid.uk/</p><p>http://www.collocaid.uk/</p><p>http://www.collocaid.uk/</p><p>92</p><p>Hyland, K. (2008). Academic clusters: text patterning in published and postgraduate</p><p>writing. International Journal of Applied Linguistics, 18(1), 41–61.</p><p>Johns, T. (1991). From printout to handout: Grammar and vocabulary teaching in the</p><p>context of data-driven learning. English Language Research Journal, 4, 27–45.</p><p>Karlsen, P. H. (2021). Teaching and Learning English through Corpus-based Approaches</p><p>in Norwegian Secondary Schools: Identifying Obstacles and a Way Forward. Doctoral</p><p>thesis. Inland Norway University of Applied Sciences.</p><p>Kanoksilapatham, B. (2005). Rhetorical structure of biochemistry research articles.</p><p>English for specific purposes, 24(3), 269-292.</p><p>Kavanagh, B. (2021). Bridging the Gap from the Other Side: How Corpora Are Used</p><p>by English Teachers in Norwegian Schools. Nordic Journal of English Studies, 20(1),</p><p>pp.1–35. DOI: http://doi.org/10.35360/njes.522</p><p>Kennedy, G. (1998). An introduction to Corpus Linguistics. New York: Longman.</p><p>Kilgarriff, A., Rychlý, P., Smrž, P. & Tugwell, D. (2004) The sketch engine. Proceedings</p><p>of the 11th EURALEX International Congress: 105-116.</p><p>Le, T. N. P. & Harrington, M. (2015). Phraseology used to comment on results in</p><p>the discussion section of applied linguistics quantitative research articles. English for</p><p>Specific Purposes, 39, 45-61.</p><p>Martínez, I. A. (2005). Native and non-native writers’ use of first person pronouns in</p><p>the different sections of biology research articles in English. Journal of second language</p><p>writing, 14(3), 174-190.</p><p>McEnery, T. & Hardie, A. (2012). Corpus Linguistics: Method, Theory and Practice.</p><p>Cambridge: Cambridge University Press.</p><p>McEnery, T., Xiao, R. & Tono, Y. (2006). Corpus-Based Language Studies. USA/Canada:</p><p>Routledge.</p><p>McEnery, T. & Wilson, A. (1997). Teaching and Language Corpora. ReCALL, 9(1),</p><p>5-14.</p><p>Meurer, J. L. (1997). Esboço de um modelo de produção de textos. In J. L. Meurer &</p><p>D. Motta-Roth (Eds.). Parâmetros de textualização. (pp. 14- 27). Santa Maria: Editora</p><p>da UFSM.</p><p>http://doi.org/10.35360/njes.522</p><p>93</p><p>Moreno, A. I. & Swales, J. M. (2018). Strengthening move analysis methodology to-</p><p>wards bridging the function-form gap. English for Specific Purposes, 50, 40-63.</p><p>Motta-Roth, D. (1995). Rhetorical Features and Disciplinary Cultures: A Genre-Based</p><p>Study of Academic Book Review in Linguistics, Chemistry and Economics. Doctoral the-</p><p>sis. Universidade Federal de Santa Catarina.</p><p>Mukherjee, J. (2004). Bridging the gap between applied corpus linguistics and the real-</p><p>ity of English language teaching in Germany. In U. Connor & T. A. Upton, T. A. (Eds.).</p><p>Applied Corpus Linguistics, 239-250. Amsterdam: Rodopi.</p><p>O’Keeffe, A. (2022). Data-driven learning and second language acquisition – it’s time</p><p>to connect. [Video]. School of Languages and Cultures. https://languages-cultures.</p><p>uq.edu.au/event/session/7987</p><p>O’Keeffe, A., McCarthy, M. & Carter, R. (2007). From corpus to classroom: language use</p><p>and language teaching. Cambridge: Cambridge University Press.</p><p>Pérez-Llantada, C. (2022). Online Data Articles: The Language of Intersubjective</p><p>Stance in a Rhetorical Hybrid. Written Communication. https://doi.</p><p>org/10.1177/07410883221087486</p><p>Pérez-Paredes, P. (2019). The pedagogic advantage of teenage corpora for secondary</p><p>school learners. Data-Driven Learning for the Next Generation, 67-87.</p><p>Poole, R. (2020). “Corpus can be tricky”: revisiting teacher attitudes towards cor-</p><p>pus-aided language learning and teaching. Computer Assisted Language Learning,</p><p>1–22. doi:10.1080/09588221.2020.1825</p><p>Reppen, R. (2010). Using corpora in the language classroom. Cambridge University</p><p>Press.</p><p>Römer, U. (2006). Pedagogical Applications of Corpora: Some Reflections on the</p><p>Current Scope and a Wish List for Future Developments. Zeitschrift für Anglistik und</p><p>Amerikanistik, 54(2), 121-134. https://doi.org/10.1515/zaa-2006-0204</p><p>Römer, U. (2010). Establishing the phraseological profile of a text type: The construc-</p><p>tion of meaning in academic book reviews. English Text Construction, 3(1), 95-119.</p><p>Santos, A. R. (1999). Metodologia científica: a construção do conhecimento. Rio de</p><p>Janeiro: DP & A Editora.</p><p>https://languages-cultures.uq.edu.au/event/session/7987</p><p>https://languages-cultures.uq.edu.au/event/session/7987</p><p>https://doi.org/10.1515/zaa-2006-0204</p><p>94</p><p>Schneuwly, J. & Dolz, B. (2004). Gêneros orais e escritos na escola. Campinas: Mercado</p><p>de Letras.</p><p>Scott, M. (2020). WordSmith Tools version 8. Stroud: Lexical Analysis Software.</p><p>Shepherd, T. M. (2009). O Estatuto da Linguística de Corpus: Metodologia ou Área da</p><p>Linguística? Matraga, 16(24), 150-172.</p><p>Sinclair, J. (1987). Collins Cobuild English Language Dictionary: Helping Learners with</p><p>Real English. Heinle ELT.</p><p>Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: OUP.</p><p>Sinclair, J. (2004a). ‘Introduction’. In J. M. Sinclair (Ed.) How to Use Corpora in Language</p><p>Teaching (pp. 1-13). Amsterdam and Philadelphia: John Benjamins.</p><p>Sinclair, J. (2004b). Trust the text: Language, Corpus and Discourse. London/ New</p><p>York: Routledge.</p><p>Stevenson, H. A. & Harrison, J. E. (2009). Structured abstracts: Do they improve cita-</p><p>tion retrieval from dental journals?. Journal of orthodontics, 36(1), 52-60.</p><p>Swales, J. (1981). Aspects of article introductions. Birmingham: Language Studies Unit.</p><p>University of Aston [Aston ESP Research Reports 1].</p><p>Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge</p><p>University Press.</p><p>Swales, J. (1994). The writing of research articles introduction. Written Communication,</p><p>4(2), 175-191.</p><p>Swales, J. (2004). Research genres: Exploration and applications. Cambridge: Cambridge</p><p>University Press.</p><p>Swales, J. & Feak, C. B. (2009). Abstracts and the writing of abstracts (Vol. 2). University</p><p>of Michigan Press ELT.</p><p>Viana, V., Bocorny, A. & Sarmento, S. (2018). Teaching English for Specific Purposes.</p><p>ELT Development Series. TESOL Press.</p><p>Welp, A. K., Didio, Á. R. & Finkler, B. (2019). Questões contemporâneas no cinema e</p><p>na literatura: o desenho de uma sequência didática para o ensino de inglês como língua</p><p>adicional. BELT-Brazilian English Language Teaching Journal, 10(2), e35861-e35861.</p><p>95</p><p>Yoon, J. & Casal, J. E. (2020a). P-frames and rhetorical moves in applied linguistics</p><p>conference abstracts. Advances in corpus-based research on academic writing: Effects of</p><p>discipline, register, and writer expertise, 95, 282-305.</p><p>Yoon, J. & Casal, J. E. (2020b). Rhetorical structure, sequence, and variation: A</p><p>step-driven move analysis of applied linguistics conference abstracts. International</p><p>Journal of Applied Linguistics, 30(3),</p><p>462-478.</p><p>Appendix I - Checklist for planning and designing an EAP course</p><p>using a framework that combines corpus and genre-based pedagogies</p><p>Information about learners</p><p>Know learners’ language proficiency level</p><p>Know learner’s level of instruction or position (e.g.</p><p>undergraduate, graduate master, graduate doctor’s,</p><p>professor)</p><p>Know discipline learner works with</p><p>Know learners’ needs</p><p>Know learners’ wants</p><p>Know learners’ expectations</p><p>Information about the</p><p>course</p><p>Select the target genre</p><p>Select the target section (may not apply)</p><p>Select the target skill(s)</p><p>Know how many and which disciplines (multiple or</p><p>single) you will be working with</p><p>Planning the course Set learning objectives</p><p>Select methodology and approach</p><p>Select materials Find existing materials</p><p>Design materials that are</p><p>corpus-based, genre (sec-</p><p>tion) and discipline specific</p><p>Find the target-genre rhetorical structure in the liter-</p><p>ature or describe it</p><p>Decide which language features are worth working</p><p>within the academic context in which the target</p><p>genre is used and considering all the previously col-</p><p>lected information</p><p>Compile a genre (section) and discipline specific</p><p>corpus</p><p>Extract language data from the corpus</p><p>Use said language data to design tasks, exercises, ac-</p><p>tivities within the context of the target genre</p><p>96</p><p>Appendix II - Example of completed checklist for the course Written</p><p>Production of Health Sciences Structured Abstracts</p><p>Information about</p><p>learners</p><p>Language proficiency level B2, C1</p><p>Learner level of instruction or</p><p>position (e.g. undergraduate,</p><p>graduate master, graduate doc-</p><p>tor’s, professors)</p><p>Graduate students</p><p>Discipline, specialty learners</p><p>works with</p><p>Health sciences</p><p>Information about the</p><p>course</p><p>Target genre Structured abstracts</p><p>Target section (may not apply) Background and ob-</p><p>jectives, method, re-</p><p>sults, conclusion</p><p>Target skill(s) Written production</p><p>Discipline (multiple or single) Single discipline</p><p>Rhetorical structure of</p><p>the target genre</p><p>Found in the literature or de-</p><p>scribed by the teacher</p><p>Described by the</p><p>teacher</p><p>Language feature(s)</p><p>worth working within</p><p>the context of the target</p><p>genre</p><p>Lexical Frames The first LF after the</p><p>section name</p><p>Methodology Combination of corpus and</p><p>genre-based approaches</p><p>97</p><p>Do-It-Yourself Corpora to Support SHAPE and STEM</p><p>Research Paper Writing</p><p>Paula Tavares Pinto (Unesp)</p><p>Luciano Franco da Silva (Unesp)</p><p>Talita Serpa (Unesp)</p><p>Diva Cardoso de Camargo (Unesp)</p><p>Introduction</p><p>Writing research papers in English may be a challenge for newcom-</p><p>er authors at the beginning of their academic careers. For those who are</p><p>non-native speakers of English and did not have the chance to use academ-</p><p>ic English with frequency it may be even harder. Most of the time these</p><p>researchers are used to reading scientific papers, but do not have much</p><p>experience in writing them.</p><p>Some of the scholars who have studied academic writing in depth are</p><p>Swales and Feak (2004, 2009), Hyland (2004, 2014), Lee and Swales (2006),</p><p>and Flowerdew (2010). Even though these authors have widely described</p><p>the features of academic writing, there are some characteristics that may</p><p>still not be as salient for novice researchers such as the use of academic</p><p>collocations and lexical bundles. Some authors use word combinations that</p><p>do not sound natural to their scientific community and this may impair</p><p>their article acceptance. Some of the scholars who have pointed out the</p><p>academic issues found in research papers of non-native speakers of English</p><p>are Charles (2012), Howarth (2013), Chang and Swales (2014), Karpenko-</p><p>Seccombe, (2020), Tavares-Pinto et al. (2021) and Pinto et al. (2021).</p><p>In this context, corpus linguistics has played an important role in</p><p>providing a range of writing tools to help researchers from different fields</p><p>to find language patterns in academic discourse that are recognized by their</p><p>98</p><p>peers. This happens because authors will rely on large collections of aca-</p><p>demic texts, hereafter, corpora, which can show them how their research</p><p>community generally writes and the specific terminology and frequent</p><p>patterns that can be rapidly identified and retrieved for writing purpos-</p><p>es. This methodological approach can be used in different areas, such as</p><p>Mathematics, Humanities and Biological areas. In order to do that, authors</p><p>can use pre-compiled specialized corpora or compile their own collection</p><p>of research papers published in high impact journals and use them as a Do-</p><p>it-Yourself corpus (Vantarola, 2002; Maia, 2002; Frankenberg-Garcia et al.,</p><p>2019; Carvalho et al., 2021).</p><p>According to Berber Sardinha (2010: 304), these linguistic patterns</p><p>will show how co-occurring combinations are vital to the written discourse</p><p>and how things are “said” and “organized” when structuring language. To</p><p>the author, corpus linguistics</p><p>[…] shows that language is used in a patterned way (that is,</p><p>in a way recognized as ‘expected’ or ‘typical’ by its users), with</p><p>correlations between usage and context - different contexts are</p><p>expressed in different ways, with their own usage probabilities,</p><p>often quite specifically adjusted […] to the social, situational,</p><p>speaking, historical period context. etc. […]. Therefore, through</p><p>the use of corpora in teaching, we can bring this system to stu-</p><p>dents more clearly than with contributions from other linguistic</p><p>theories and methodologies. The nature of knowledge of a lan-</p><p>guage changes with corpora research. ‘Knowing a language’ im-</p><p>plies knowing how to say and write according to the conventions</p><p>of specific varieties of the language (a specific genre or register in</p><p>a given context); for this, it is necessary to know the lexicogram-</p><p>mar of the necessary and desired choices for that specific situa-</p><p>tion. In order to use lexicogrammar efficiently, it is necessary to</p><p>know the probabilities of those choices, that is, the frequencies of</p><p>the elements, their combinations and their frequencies1 (Berber</p><p>Sardinha, 2010: 304).</p><p>1 Original text: […] mostra que a linguagem é usada de modo padronizado (isto é,</p><p>de modo reconhecido como ‘esperado’ ou ‘típico’ por seus usuários), com correlações</p><p>entre uso e contexto - contextos diferentes são expressos de maneiras distintas, com</p><p>99</p><p>By using corpora, the writer will be able to observe the useful in-</p><p>formation according to his or her specific needs and will develop an au-</p><p>tonomous process of learning that will lead him or her to mastering the</p><p>academic English based on his interpretation of his or her peers’ writing.</p><p>This chapter will bring a discussion on how specialized corpora can</p><p>be explored by researchers who want to compile their own language data-</p><p>base to help them write different sections of their own research papers. We</p><p>will illustrate our proposal by taking examples from SHAPE disciplines,</p><p>which involve Social Sciences Humanities, Arts for People and Economy, as</p><p>well as STEM disciplines, which involve Science, Technology, Engineering,</p><p>and Mathematics.</p><p>The next sections of this chapter are divided into the following top-</p><p>ics: 2. Corpus Linguistics and Academic Writing; 3. AntCorGen for the</p><p>compilation of SHAPE and STEM areas; 4. Analyses with Sketch Engine; 5.</p><p>Building your Research paper with SHAPE Plos and STEM Plos corpus; 6.</p><p>Discussion and 7. Final considerations.</p><p>suas próprias probabilidades de uso, muitas vezes ajustadas de modo bastante especí-</p><p>fico […] ao contexto social, situacional, falante, período histórico, etc. […] Assim, por</p><p>meio de uso de corpora no ensino, podemos trazer aos alunos esse sistema de modo</p><p>mais claro do que com aportes de outras teorias e metodologias da linguística. A natu-</p><p>reza do conhecimento de uma língua se altera com a pesquisa em corpora. ‘Saber uma</p><p>língua’ implica conhecer como dizer e escrever segundo as convenções de variedades</p><p>específicas da língua (um gênero ou registro específico em um contexto determinado);</p><p>para isso, é preciso conhecer a lexicogramática das escolhas necessárias e desejadas</p><p>para aquela situação específica. Para usar</p><p>a lexicogramática com eficiência, é necessá-</p><p>rio conhecer as probabilidades daquelas escolhas, isto é, as frequências dos elementos,</p><p>suas combinatórias e as frequências destes (Berber Sardinha, 2010: 304).</p><p>100</p><p>Corpus Linguistics and Academic Writing</p><p>Corpora will help the authors in developing their pragmatic com-</p><p>petencies such as the intercultural competency which will, according to</p><p>Hurtado Albir (2001), help them in recognizing the contextual norms of</p><p>a given text. Varantola (2003) also points out that the “proficiency” will</p><p>depend on competence and practical skills that are combined to favor the</p><p>cultural and linguistic decision-making process.</p><p>As we elevate corpora to the status of teaching and informational</p><p>material, we allow the writer to concentrate on numerous possibilities of</p><p>language variation and specialized language which will be discussed in this</p><p>chapter. By using a bottom-up approach, the writers will also be able to</p><p>observe different texts within the academic genre, depending on the kind</p><p>of text they will be writing.</p><p>The use of corpus linguistics has been advocated by several scholars.</p><p>In the case of Brazilian academia, Berber Sardinha (2003) had pointed out</p><p>that university students and scholars should be able to have access to basic</p><p>tools and infrastructure in order to explore corpora in class. Almost 20</p><p>years later, we have seen this advance in academia since more and more</p><p>researchers have been using corpus linguistics tools to help them write</p><p>their own texts. This possibility has recently been used at the São Paulo</p><p>State University (Unesp) and at the Federal University of Rio Grande do</p><p>Sul (UFRGS) where 127 researchers and English for Academic Purposes</p><p>teachers worked in partnership to learn how to use corpora tools to write</p><p>their own research papers and produce EAP teaching materials. The expe-</p><p>rience was described in detail in two publications by the British Council</p><p>(Frankenberg-Garcia et al., 2019; Frankenberg-Garcia, 2020) and by</p><p>Carvalho et al. (2021).</p><p>During the course, junior and senior researchers were introduced to</p><p>corpus techniques and tools and were able to compile their own study cor-</p><p>pora from high impact journals in their respective fields/disciplines. In this</p><p>course, they learned how to use Sketch Engine (Kilgarriff, 2014) to explore</p><p>academic language and see how key terms were used in specific contexts.</p><p>Researchers and EAP teachers could help each other by analysing recurrent</p><p>101</p><p>language features and typical terminology in their DIY study corpora. There</p><p>were specialists of Engineering, Agricultural Sciences, Humanities, Social</p><p>Sciences and Health, among others. According to Carvalho et al. (2021):</p><p>results showed that although scholars were familiar with the ter-</p><p>minology of their own areas, the tool pointed out other possibili-</p><p>ties of word combinations they had difficulty with, such as verbal</p><p>collocations and the most common patterns of academic English</p><p>if compared to Portuguese. At the same time, the English teachers</p><p>who were participating in the workshops were inspired by the</p><p>terminology and language to develop teaching activities for their</p><p>own EAP students (Carvalho et al., 2021: 79).</p><p>To add the usefulness of corpora as a learning and translating mate-</p><p>rial, Zanettin et al. (2003: 2) have stated that “(…) competent use of corpo-</p><p>ra and corpus analysis tools will enable students to become better language</p><p>professionals in a working environment where computational facilities for</p><p>processing text have the rule rather than the exception”.</p><p>It is important to mention that writers will need to be trained on how</p><p>to better explore corpora with appropriate tools; and they will also need to</p><p>know how to interpret the information generated by those tools. By doing</p><p>so, these writers will be using the Data Driven Learning approach (Johns,</p><p>1991), which shows concordance lines that will be displayed on the screen</p><p>to the reader.</p><p>According to Varantola (2003), the use of corpora will provide two</p><p>sets of skills to the writers related to: i) corpus compilation - criteria for</p><p>corpus compilation, strategies to find relevant language pattern, access to</p><p>reliable corpora, recognition of corpus compilation tools, integration of</p><p>text processing and corpora processing tools; ii) use of corpus information -</p><p>skills for deduction based on corpus information, use of pre-compiled cor-</p><p>pora for translation retrieval, corpus assessment for translation decisions,</p><p>new correlated skills for corpus management.</p><p>Regarding the search for specialized terminology, Bowker (1999)</p><p>states that corpora will make it possible terminologists and language us-</p><p>ers to become aware of specificities of technical and scientific language.</p><p>102</p><p>The researcher points out that translators, when dealing with specialized</p><p>texts, will be able to interact with the lexicon and terminology in different</p><p>areas if using corpus tools and collections of specific texts. Therefore, when</p><p>focusing on academic, technical and scientific writing, corpora may help</p><p>researchers to compile glossaries that can be used in present and future</p><p>works. Pearson (1996) supports this idea stating that corpus will enable</p><p>the observation of domains and subdomains in the same areas. Also, Maia</p><p>(2000) points out the importance of deepening the use of corpora for spe-</p><p>cific purposes and collection of vocabulary and observation of complex</p><p>language when preparing teaching materials.</p><p>Formulaic language and its contributions to language studies</p><p>Corpora studies have shown that many la nguage patterns are so re-</p><p>current among language users that they could be classified as pre-fabricat-</p><p>ed structures. The recurrence of pre-fabricated expressions in the language</p><p>is explained by Sinclair (1991), through his idiom principle, in which he</p><p>proposes that speakers do not simply choose random words to perform</p><p>certain language functions, in fact, they seem to routinely use the same set</p><p>of language combinations instead of creating new ones.</p><p>In this chapter, we chose the term formulaic language, coined by</p><p>Wray (2002) to refer to the different types of semi-preconstructed lan-</p><p>guage combinations. Sinclair (1991) and Wray (2002) argue that the hu-</p><p>man brain optimizes the processing of large amounts of data, through the</p><p>repeated use of conventionalized language structures, which in turn reduc-</p><p>es cognitive demands of on-line processing during language production</p><p>and prevent speakers from becoming overloaded by decoding phrases and</p><p>combinations they have never heard before. Because of this double advan-</p><p>tage, the proper use of formulaic language is one of the central aspects for</p><p>teaching and learning any language (Schmitt & Carter, 2004; Wray, 2002;</p><p>Wray & Perkins, 2000). However, it is not so simple to define what is or</p><p>is not formulaic in language (Granger & Paquot, 2008; Schmitt & Carter,</p><p>2004; Siyanova-Chanturia, 2015; Wray, 2002). Depending on the theoreti-</p><p>cal-methodological criteria, one can find dozens of terms for similar lexical</p><p>103</p><p>combinations, such as idioms (e.g. kick the bucket), collocations (e.g. fast</p><p>food), lexical bundles (e.g. if you look at), among many others.</p><p>Nevertheless, Conrad et al. (2004) explain that, regardless of the name</p><p>adopted, there are some characteristics that tend to be especially recurrent</p><p>in the identification of formulaic language, such as fixedness; idiomaticity;</p><p>frequency; length of sequence; completeness in syntax, semantics, or prag-</p><p>matics; and intuitive recognition by the speaker of a language community.</p><p>The authors also explain that different types of formulaic language</p><p>are identified depending on the priority these features receive. In other</p><p>words, if the focus is idioms, the researcher is expected to prioritize certain</p><p>characteristics that would not be interesting to identify collocations or lex-</p><p>ical bundles, for example.</p><p>In the present study, we use the frequency-driven approach to find</p><p>the most recurrent combinations</p><p>in two DIY-study-corpora, which enables</p><p>the semi-automatic extraction of massive amounts of linguistic data from</p><p>a corpus, based on external criteria set by the researcher. Studies of recur-</p><p>rent combinations tend to converge towards similar goals, as evidenced by</p><p>Conrad et al. (2004: 58):</p><p>Our research questions in this approach are exploratory. We ask</p><p>whether there are multi-word sequences that are used with high</p><p>frequency in texts, whether different registers tend to use differ-</p><p>ent sets of these sequences, and, if so, to what extent the bundles</p><p>fulfill discourse functions and thus play an important part in the</p><p>communicative repertoire of speakers and writers.</p><p>By exploring DIY corpora, researchers will also have a better view of</p><p>discourse variation in different academic areas, therefore, when applying</p><p>to academic writing, many studies have presented evidence of disciplinary</p><p>variation based on corpus analyses (Bondi & Sanz, 2014; Gray, 2015;</p><p>Hyland, 2012; Römer et al., 2020).</p><p>According to Becher and Trowler (2001), scientific knowledge is cre-</p><p>ated from different disciplinary communities or tribes with particular in-</p><p>terests, literature and conventions that shape how researchers see the world</p><p>and interpret reality. Similarly, the concept of discipline is presented by</p><p>104</p><p>Hyland (2004, 2012) as a human institution where the creation of knowl-</p><p>edge and use of language are influenced by personal and interpersonal fac-</p><p>tors from its members, as well as by institutional and sociocultural norms</p><p>of the community in which they are part of. Considering that these inves-</p><p>tigations into disciplinary discourse have a great relevance, a member of a</p><p>disciplinary community or novice research needs not only to demonstrate</p><p>technical and theoretical competence in his field, but also know the linguis-</p><p>tic conventions that create and maintain the cultural identity of its mem-</p><p>bers (Becher & Trowler, 2001; Hyland, 2004, 2008).</p><p>The next section will discuss how two corpora were compiled in</p><p>SHAPE and STEM disciplines.</p><p>AntCorGen for the compilation of SHAPE and STEM areas</p><p>AntCorGen (Anthony, 2019) is a tool used to quickly compile spe-</p><p>cialized corpora with research papers from the PLOS one platform. A tu-</p><p>torial of this tool was recorded by its creator in a short video2. Below we</p><p>will talk about the compilation of SHAPE Plos and STEM Plos and their</p><p>exploration for academic writing.</p><p>SHAPE</p><p>As previously mentioned, SHAPE disciplines stand for Social</p><p>Sciences Humanities, Arts for People and Economy. All these disciplines</p><p>and subareas can be found at PLOS, which is a nonprofit, open access</p><p>multi-disciplinary publisher3. All areas of SHAPE can be easily accessed</p><p>in AntCorGen and the researcher can choose the parts of research papers</p><p>he wants to analyse. Since we wanted to have mostly written material, we</p><p>selected the articles’ abstracts, introduction, materials & methods, results &</p><p>discussion and conclusions, as we can see in the figure below:</p><p>2 AntCorGen tutorial . Access:</p><p>Oct. 30th, 2021.</p><p>3 PLOS available at Access: October 27th, 2021.</p><p>https://www.youtube.com/watch?v=WrsIzE9to4o</p><p>https://plos.org/about/</p><p>105</p><p>Figure 1. AntCorGen screen with part of SHAPE disciplines selected</p><p>We called this corpus SHAPE Plos and, since it was compiled for</p><p>describing the process in this chapter, we set the maximum of 100 articles,</p><p>but it is possible to have a much larger study corpus if desired. After this</p><p>compilation we had a study corpus of 445,291 words to be explored.</p><p>STEM</p><p>STEM disciplines are related to both Biology and Hard Sciences.</p><p>Although the figure below seems to have only Biology and Life Sciences,</p><p>the actual list of disciplines selected was longer and we could include areas</p><p>such as Math and Computer Sciences as well. In the same way, we selected</p><p>100 articles for STEM Plos corpus as shown in the figure below:</p><p>106</p><p>Figure 2. AntCorGen screen with part of STEM disciplines selected</p><p>After this compilation, we had a specialized corpus of STEM disci-</p><p>plines with a total number of 297,255 words to be observed and compared</p><p>to the results from SHAPE Plos.</p><p>Analyses with Sketch Engine</p><p>We uploaded both corpora, SHAPE and STEM, to Sketch Engine</p><p>(Kilgarriff, 2014) so we would be able to observe the frequent adjectives</p><p>and verbs in each broad area and see the similarities and differences be-</p><p>tween them. We could also generate concordance lines with search words,</p><p>terms and phrases that can be used by researchers to explore and observe</p><p>how international researchers in their area have been writing different sec-</p><p>tions of their research papers.</p><p>107</p><p>Figure 3. Sketch Engine dashboard with its tools and STEM Plos and SHAPE</p><p>Plos as main corpora</p><p>The main tools we have been using to explore both corpora are</p><p>Wordlist, Concordance and Word Sketch. Wordlist will list all words from</p><p>a study corpus in order of frequency, from the most frequent to the least;</p><p>Concordance will generate concordance lines with a search word in context</p><p>that can be expanded if the researcher wishes to do so; and WordSketch</p><p>will show a search word with co-occurrent categories such as modifiers,</p><p>verbs with the search word as object or subject, and frequent adjectives and</p><p>adverbs used with it. We are going to describe these searches in the next</p><p>section.</p><p>Adjectives and verbs in SHAPE Plos and STEM Plos</p><p>During the Academic Masterclasses (Frankenberg-Garcia et al.,</p><p>2019), one of the main searches was on how to use appropriate adjectives to</p><p>bring more emphasis to the articles; therefore, instructors taught research-</p><p>ers and EAP teachers how to look for adjectives within the corpus wordlist.</p><p>Below we bring the list of the twenty most frequent adjectives and modifi-</p><p>ers used by international researchers in SHAPE Plos corpus.</p><p>108</p><p>SHAPE Plos STEM Plos</p><p>Item Freq. Item Freq. Item Freq. Item Freq.</p><p>high 1138 large 439 such 549 standard 251</p><p>other 953 linguistic 404 different 509 open 242</p><p>such 786 small 369 other 508 many 230</p><p>social 785 female 337 available 443 low 228</p><p>different 699 likely 327 new 318 mobile 220</p><p>low 646 sound 326 high 280 several 203</p><p>more 599 religious 306 large 273 medical 192</p><p>significant 523 similar 304 same 271 multiple 186</p><p>same 494 cultural 298 more 258 specific 185</p><p>first 451 lexical 298 standard 251 good 183</p><p>Table 1. Twenty most first adjectives and modifiers in SHAPE Plos and</p><p>STEM Plos corpora</p><p>The adjectives and modifiers present in the lists of SHAPE disci-</p><p>plines, on the left columns, and STEM disciplines, on the right columns</p><p>in Table 1 show specific adjectives and modifiers to each area in bold and</p><p>common adjectives and modifiers to both areas which have been under-</p><p>lined. Authors can choose specific items as shown in the examples that we</p><p>are going to present, where we see concordance lines with “social” and “lin-</p><p>guistic” from SHAPE Plos and “standard” and “mobile” from STEM Plos.</p><p>The underlined words are the ones being modified by the search words:</p><p>109</p><p>1. Although a great deal of attention has been paid to how conspiracy theories cir-</p><p>culate on social media, and the deleterious effect that they, and their factual coun-</p><p>terpart conspiracies, have on political institutions, there has been little computa-</p><p>tional work done on describing their narrative structures. [SHAPE Plos]</p><p>2. Public pension insurance has become a major form of social protection around</p><p>the world.[SHAPE Plos]</p><p>3. Feature stability, time and tempo of change, and the role of genealogy versus a</p><p>reality in creating linguistic diversity are important issues in current computational</p><p>research on linguistic typology. [SHAPE Plos]</p><p>4. The database is pre-prepared for statistical and phylogenetic analyses and con-</p><p>tains both linguistic typological data from languages spanning over four millennia,</p><p>and linguistic metadata concerning geographic location, time period, and reliabil-</p><p>ity of sources. [SHAPE Plos]</p><p>We did the same search for adjectives</p><p>in STEM Plos, which can be</p><p>seen below:</p><p>5. Other implemented functions are focused on the quality control of the fitted stan-</p><p>dard curve: detection of outliers, estimation of the confidence or prediction interval,</p><p>and estimation of summary statistics. [STEM Plos]</p><p>6. On the other hand, if there are not enough dilutions of the standard samples, the</p><p>extra standard sample using the background information will influence data simi-</p><p>larly to an outlier due to the fact that the standard points have not reached the lower</p><p>asymptote.[STEM Plos]</p><p>7. Our data allow a fleeting glimpse into the future, where mobile health will not</p><p>replace the doctor-patient relationship, but will hopefully help to establish more ef-</p><p>fective and efficient treatment and accelerate e-health strategies. [STEM Plos]</p><p>8. During investigations of crimes involving mobile devices, there is usually some</p><p>accumulation or retention of data on the device that will need to be identified, pre-</p><p>served, analyzed and presented in a court of law–a process known as digital or mo-</p><p>bile forensics (also known as cyber forensics). [STEM Plos]</p><p>Some excerpts turned a noun into a complex expression, such as “so-</p><p>cial media”, “standard curve”, and “mobile forensics”, others simply quali-</p><p>fied a noun, such as “social protection”, “linguistic diversity’ and “standard</p><p>110</p><p>curve”. More than examples of context, the search for adjectives and modi-</p><p>fiers will bring several combinations that are frequently used with a search</p><p>word which can bring strength to a text and can be used as an inspiration</p><p>to authors in different areas.</p><p>Some adjectives were used in both SHAPE Plos and STEM Plos, such</p><p>as “high”. One of the advantages of using corpus tools is that you can quick-</p><p>ly search for examples in both corpora, which will show how the same ad-</p><p>jective is being used in different articles, as it is shown below:</p><p>9. Although, productivity is maximized by the combination of high wages and low</p><p>labor input, high productivity cities show invariably high wages and high levels of</p><p>employment relative to their size expectation.[SHAPE Plos]</p><p>10. Both logistic regression and PSM models revealed that early marriage decreased</p><p>the chances of completing the first cycle of high school. [SHAPE Plos]</p><p>11. We also find that the effect of ICT use on economic growth is higher in high</p><p>income group rather than other groups. [STEM Plos]</p><p>12. The framework employs features of centralized monitoring, high availability</p><p>and on demand access services of computational clouds for computational offload-</p><p>ing. [STEM Plos]</p><p>In examples 9, 11 and 12 we see the adjective “high” used to inten-</p><p>sify the nouns that are accompanied by it, the only exception being “high</p><p>school”, that is a complex term.</p><p>If an author wants to find other adjectives that can be used as a syn-</p><p>onym or that are present in the same contexts, such as antonyms, he can use</p><p>the Thesaurus option in Sketch Engine. We looked for options in SHAPE</p><p>Plos and found, in order of frequency, “great”, “large”, “overall” and “posi-</p><p>tive”, at the same time, we also looked for other options for “high” in STEM</p><p>Plos and found “overall”, “maximum”, “large” and “great”.</p><p>In the same Academic Masterclasses we encouraged researchers and</p><p>postgraduate students of SHAPE and STEM disciplines to do a search on</p><p>the most frequent verbs that had been used in their areas of research. In a</p><p>similar way, we show a list of the most frequent verbs in Table 2:</p><p>111</p><p>SHAPE Plos STEM Plos</p><p>Item Freq. Item Freq. Item Freq. Item Freq.</p><p>be 15020 provide 449 be 10879 develop 349</p><p>have 2865 suggest 436 use 2057 make 331</p><p>use 1563 compare 424 have 1637 propose 296</p><p>do 919 report 417 provide 592 create 280</p><p>show 910 associate 409 include 486 present 256</p><p>include 720 consider 405 show 480 follow 255</p><p>find 688 indicate 375 base 394 define 250</p><p>see 588 follow 369 do 373 find 243</p><p>give 483 make 369 require 368 describe 242</p><p>base 465 increase 345 allow 368 identify 237</p><p>Table 2. Twenty most frequent verbs and modifiers in SHAPE Plos and STEM</p><p>Plos corpora</p><p>When we compare the twenty most frequent verbs in SHAPE Plos</p><p>and in STEM Plos, we find eleven verbs used in both areas, some of them</p><p>are “show”, “include” and “provide”, which are academic content verbs that</p><p>are common to all areas, but will be used in specific contexts, as we can see</p><p>in the examples 1 to 6:</p><p>112</p><p>1. Results show that the amount of fine does not impact tax payments, whereas</p><p>participants’ beliefs regarding tax authority’s power significantly shape compliance</p><p>decisions. [SHAPE Plos]</p><p>2. Detail results that show how tally the simulation results and the analytical results</p><p>in both abstract and graphical forms and some scientific justifications for these have</p><p>been documented and discussed. [STEM Plos]</p><p>3. These effects include stress regularity and stress consistency, both of which have</p><p>been especially important in studies of word recognition and reading aloud in Ital-</p><p>ian. [SHAPE Plos]</p><p>4. A systematic framework and associated workflow include cloud service filtration,</p><p>solution generation, evaluation, and selection of public cloud services. [STEM Plos]</p><p>5. It would be difficult to provide a comprehensive explanation for this result.</p><p>[SHAPE Plos]</p><p>6. The evaluation of all network breakups can provide transportation planners and</p><p>administrators with plenty of data for further statistical analyses. [STEM Plos]</p><p>When we look for verbs that are specific to any of both areas, we</p><p>find only one verb that could be considered from SHAPE areas, which is</p><p>the verb to “see”. To illustrate that use, we bring some concordance lines in</p><p>examples 7 to 9:</p><p>7. We also see internal fluctuations in the use of this style during this campaign.</p><p>[SHAPE Plos]</p><p>8. Those who believe that their own religious group is something special tend to</p><p>see extremism as an opportunity to assert their own group interests. [SHAPE Plos]</p><p>9. Other than education, for social participation we see that disability character-</p><p>istics, motivation, and knowledge of the system are important for explaining the</p><p>education gradient. [SHAPE Plos]</p><p>We also find verbs in both columns that, although not being present</p><p>in both columns, could be used in SHAPE and STEM, such as “compare”</p><p>and “associate”, from SHAPE list, and “allow” and “propose”, which are part</p><p>of the STEM list.</p><p>113</p><p>Clusters, which are recurrent groups of words, can also help an au-</p><p>thor quickly identify features of academic writing. In the next subsection</p><p>we present this discussion.</p><p>Clusters in SHAPE Plos and STEM Plos corpora</p><p>Another tool from Sketch Engine that can be used in search of clus-</p><p>ters or lexical bundles is the one called n-gram. Table 3, below, shows the</p><p>twenty most recurrent n-grams in the introduction section from the papers</p><p>in the study corpora SHAPE Plos and STEM Plos. The n-grams in bold in-</p><p>dicate that the sequence was recurrent in both corpora.</p><p>SHAPE PLOS</p><p>Normalized</p><p>frequency</p><p>(x100.000)</p><p>STEM PLOS</p><p>Normalized</p><p>frequency</p><p>(x100.000)</p><p>the number of 69 the number of 58</p><p>as well as 61 in order to 48</p><p>number of children 35 as well as 46</p><p>more likely to 34 based on the 36</p><p>based on the 31 one of the 30</p><p>in terms of 29 the use of 30</p><p>in order to 26 can be used 27</p><p>the effect of 26 the accuracy of 23</p><p>in this study 24 due to the 22</p><p>the relationship between 21 be used to 21</p><p>due to the 21 according to the 21</p><p>one of the 20 in this paper 21</p><p>there is a 20 of the data 20</p><p>the fact that 19 on the other 19</p><p>on the other 19 the development of 19</p><p>114</p><p>a number of 18 a set of 19</p><p>the distribution of 17 that can be 18</p><p>the present study 17 on the other hand 18</p><p>on the other hand 17 in addition to 17</p><p>the use of 17 Part of the 16</p><p>Table 3. Clusters in SHAPE Plos and STEM Plos corpora</p><p>As it can be seen, the sequence as well as was the most recurrent in</p><p>the introduction section in both study subcorpora and it was common-</p><p>ly used to structure the discourse by adding new elements to the text, as</p><p>shown in examples 1 and 2:</p><p>1. Violent and delinquent</p><p>behaviour patterns, as well as associated attitudes, can</p><p>also manifest themselves in various forms of extremism. [SHAPE Plos]</p><p>2. Healthcare provision via wearable devices brought changes in treatment and</p><p>examination of patients, as well as research and development in different areas.</p><p>[STEM Plos]</p><p>Other recurring elements of textual cohesion in the introductory</p><p>section in both study corpora were the n-grams on the one hand and on the</p><p>other hand, illustrated below, used to express contrast between the ideas</p><p>and elements in the text, as shown in examples 3 and 4:</p><p>3. Loneliness at work is such a possible mediator: on the one hand there is a poten-</p><p>tial association between working temporarily and loneliness at work, on the other</p><p>hand there are indications of a negative association between loneliness at work and</p><p>job satisfaction. [SHAPE Plos]</p><p>4. Pharmacokinetics is the study of what the body does to a drug including pro-</p><p>cesses from drug absorption to excretion. On the other hand, pharmacodynamics</p><p>focuses on the effects of drugs on organisms. [STEM Plos]</p><p>Another discourse function expressed by the extracted n-grams was</p><p>the limitation of research conditions expressed by the clusters. Although</p><p>115</p><p>this function was found in both subcorpora, the data indicates that the au-</p><p>thors of SHAPE and STEM domains use different sequences for this func-</p><p>tion such as in terms of, the relationship between and based on the, as illus-</p><p>trated in examples 5 to 9:</p><p>5. In this study, we aim to refine the analysis in terms of the Liberal versus the In-</p><p>dividual views [SHAPE Plos]</p><p>6. The latter type of news effects has been studied mainly in terms of news on the</p><p>internet, rather than television. [SHAPE Plos]</p><p>7.In the present research, we investigate the relationship between linguistic co-</p><p>hesion and real-world action in times of social conflict and unrest. [SHAPE Plos]</p><p>8. We thus introduce a simple but practical measure evaluating network disinte-</p><p>gration based on the overall number of people isolated from the primary network.</p><p>[STEM Plos]</p><p>9. Based on the employed cryptographic mechanism, Lu et al. [6] distinguished</p><p>the privacy-preserving authentication scheme of VANETs into five categories.</p><p>[STEM Plos]</p><p>In the previous sections we discussed how the search for content</p><p>words and lexical bundles can help writers use a more specific and elab-</p><p>orated language in their articles. In the following sections, we will discuss</p><p>how researchers may access academic phrases by carrying out a search in</p><p>concordance lines that will help them write different sections of their re-</p><p>search papers.</p><p>Building your Research paper with SHAPE Plos and STEM Plos corpus</p><p>If a researcher wants to have examples of research papers in SHAPE</p><p>and STEM disciplines, they can search for common expressions in the cor-</p><p>pus. In our case, we have divided both subcorpora into research sections</p><p>that are usually found in research articles. Based on Karpenko-Seccombe</p><p>(2020), we are going to discuss how researchers can use their own special-</p><p>ized corpora for writing their research papers. The search we are going to</p><p>116</p><p>propose is similar to what is found in the Manchester Phrasebank (Morley,</p><p>2014), where it is possible to observe frequent phrases in different parts</p><p>of an article. However, different from the Manchester Phrasebank, where</p><p>phrases of all areas may be seen, the advantage of the search in a specialized</p><p>corpus is that the researcher will be able to read more contexts about their</p><p>own areas.</p><p>Researchers who read concordance lines can do it similarly to read-</p><p>ing a dictionary, where they will find several examples of a search word</p><p>or expression and they will select the one that better suits their own texts.</p><p>Therefore, there will be a combination of a fast search aided by the tool, and</p><p>human selection of the best examples which will be done by researchers.</p><p>In the following sections, authors will find useful strategies for</p><p>searching for contexts in the sections of introduction, materials and meth-</p><p>ods, discussion and conclusion.</p><p>Writing the Introduction Section</p><p>According to Swales and Feak (2009), a research paper introduction</p><p>typically contains three main steps or moves: a) establishing the area of re-</p><p>search, where the authors will show the importance of a field and introduce</p><p>previous research in the area; b) establishing a gap in the knowledge or</p><p>problem to be solved, and c) presenting the paper, i.e., identifying objec-</p><p>tives, introducing expected outcomes and describing the structure of the</p><p>work. In order to explore introductions in SHAPE Plos and STEM Plos</p><p>corpora, we searched for concordance lines with the query phrase “this pa-</p><p>per” and selected some of the lines to be used as examples here:</p><p>117</p><p>1. This paper attempts to fill the gap of existing research concerning the link be-</p><p>tween public pension and fertility. [SHAPE Plos]</p><p>2. In this paper, we perform a comprehensive survey of the worldwide linguistic</p><p>landscape as emerging from mining the Twitter microblogging platform. [SHAPE</p><p>Plos]</p><p>3. In this paper, we are interested in measuring linguistic regularities both at the</p><p>level of word structure and at the level of word order. [SHAPE Plos]</p><p>4. This paper explores the ways abortion attitudes intersect with causal beliefs about</p><p>gender categories, within the unique social context of a national referendum held to</p><p>legalise abortion in the Republic of Ireland. [SHAPE Plos]</p><p>5. In this paper, we introduce a novel mobile application called “Medikamenten-</p><p>plan” (“Medication Plan”), which was developed to support medication compliance</p><p>and vital sign documentation. [STEM Plos]</p><p>6. In this paper, we propose a concise, improved and effective privacy framework</p><p>for wearable device manufacturers, as well as application developers, capable of pro-</p><p>viding greater privacy and security to the wearable device owners. [STEM Plos]</p><p>7. This paper innovatively proposes countermeasures to improve the innovation of</p><p>e-commerce practitioners in rural areas. [STEM Plos]</p><p>8. The objective of this paper is to outline our approach of establishing and imple-</p><p>menting this IT infrastructure. [STEM Plos]</p><p>We can see that authors from SHAPE and STEM use similar strat-</p><p>egies to introduce their research papers. In 1, 4 and 7, authors used the</p><p>structure This paper + [adverb] + verb (infinitive). In examples 2, 3, 5 and 6,</p><p>authors opted to use In this paper + we + verb (infinitive). Finally, in exam-</p><p>ple 8, the author preferred to introduce his paper by using the structure The</p><p>objective of this paper is + to + verb (infinitive).</p><p>We can see a pattern in the previous examples that can be used in a</p><p>more confident way by researchers of SHAPE and STEM.</p><p>118</p><p>Writing the Materials and Methods Section</p><p>According to McCombes (2019), in the methodology section the au-</p><p>thors will explain what they did and how they did it. By doing so, other</p><p>researchers will be able to evaluate the reliability and the validity of a re-</p><p>search. In this section, authors will discuss the type of research they carried</p><p>out, and how they collected and analysed the data. They will also include</p><p>the tools and materials of the research. This section is usually written in the</p><p>past tense.</p><p>Similarly to the previous section, by consulting concordance lines</p><p>a researcher will have access to the writing of different authors who have</p><p>described their methods in SHAPE and STEM disciplines. Below you will</p><p>find eight concordance lines describing the methods and methodological</p><p>procedures that can be used as examples to writing this section:</p><p>119</p><p>1. Recent advances in data-driven methods of embedding words and phrases into a</p><p>multidimensional vector space such that their Euclidean distances have correlations</p><p>with their semantic similarity have made it possible to assign a quantitative measure</p><p>to the similarity metric. [SHAPE Plos]</p><p>2. This method provides a second ranking of headwords including non-named en-</p><p>tities. [SHAPE Plos]</p><p>3. The methods compared are: Cysouw</p><p>and colleagues consider the consistency of</p><p>the cross-linguistic distribution of an individual feature with the pattern generated</p><p>by multiple features, and they propose three quantifications of this measure based</p><p>on Mantel’s correlation, a coherence and a rank method (…) [SHAPE Plos]</p><p>4. There are several well-established methods for combining significance (p-value)</p><p>and effect size information from independent tests of the same null hypothesis, es-</p><p>pecially developed for meta-analyzes, such as:Fisher’s classic method [45], and the</p><p>more recentZ-transform [46], but a priori they are not appropriate to our case due</p><p>to the mentioned non-independence. [SHAPE Plos]</p><p>5. The above-described method resulted in better recognition of confluent colo-</p><p>nies than methods employing binary thresholding and segmentation (using, e.g.,</p><p>watershed separation), which we tried as alternatives.</p><p>[STEM Plos]</p><p>6. In this study we aimed at reproducing the results from 11 PLOS ONE papers</p><p>dealing with statistical methods for longitudinal data. [STEM Plos]</p><p>7. In this section, we introduce our experimental methods, which include defini-</p><p>tions, attack strategies and benchmark networks. [STEM Plos]</p><p>8. The most common issue was that papers did not provide enough detail about</p><p>the methods used (e.g. model type was mentioned but no detailed model specifi-</p><p>cations, for details see Table 4). [STEM Plos]</p><p>In examples 1, 4, 6 and 7 we see the structure modifier/adjective +</p><p>methods which can show a range of possibilities for a reader to select the</p><p>one that can be used in their own text. Examples 2 and 3 show the structure</p><p>This method + verb in the active voice, bringing “method” as the agent of an</p><p>action. In examples 5 and 8 we have method + present/past participle. We</p><p>have three different ways of describing our methodology which are used in</p><p>both SHAPE and STEM, that can be used by other researchers.</p><p>120</p><p>Writing the Discussion and Conclusions Sections</p><p>In “Discussion” and “Conclusions sections”, authors will talk about</p><p>their achievements and will conclude by: a) highlighting the significance</p><p>of the results; b) comparing their results with previous research; c) em-</p><p>phasising the novelty and contribution of their research or d) suggesting</p><p>treating results with caution. One way of knowing how researchers write</p><p>their Discussion and Conclusions sections is by searching for the keyword</p><p>“Results”, in both subcorpora, as we have done below:</p><p>121</p><p>1. The results show that, in the pre-period of 2010, women in the NRPS group have</p><p>more children and are more likely to have a second child than those without NRPS</p><p>coverage, while there is no significant difference between treatment and control</p><p>groups in the post-period of 2014. [SHAPE Plos]</p><p>2. These results suggest that a post-treatment effect on women’s fertility outcomes</p><p>may occur when they had participated in the pension scheme. [SHAPE Plos]</p><p>3. The results demonstrate a noteworthy extension of the common support between</p><p>the treated and control groups, implying that the overall distributions of the con-</p><p>ditional probability to participate in the NRPS are similar between the two groups.</p><p>[SHAPE Plos]</p><p>4. The results show that while some variables are significantly different between the</p><p>unmatched treated and control group, the differences between the two groups for</p><p>all covariates are no longer significant after matching. [SHAPE Plos]</p><p>5. The results for both Chromeleon and HappyTools show a higher percentage of</p><p>Fab-glycosylation in ACPA samples than IgG samples, with the values reported by</p><p>ThermoFisher Chromeleon and HappyTools showing a significant correlation (Fig</p><p>3 and S5–S7 Tables). [STEM Plos]</p><p>6. The results of the present study can be compared directly to our previous study</p><p>that focused on the accuracy of the GPS60 for the detection of bouts of walking and</p><p>resting [15]. [STEM Plos]</p><p>7. The algorithm then omits all results related to the combinations of links contain-</p><p>ing at least one of the marked links. [STEM Plos]</p><p>8. The results show that the relative Aps reported by HappyTools are comparable to</p><p>both Waters Empower and ThermoFisher Chromeleon (Fig 2 and S3 Table). [STEM</p><p>Plos]</p><p>Most of the examples shown above follow the structure presented in</p><p>Swales and Feak (2009), which are The results show/ suggest/ demonstrate</p><p>+ that. In examples 5 and 6 we see another structure, which is The results</p><p>for/of + object + verb. Finally, example 7 brings “results” as an object of a</p><p>sentence.</p><p>122</p><p>Discussion</p><p>In this chapter we discussed the advantages of compiling special-</p><p>ized corpora in the areas of SHAPE and STEM, which can be explored by</p><p>researchers in different areas with the aid of corpus tools, such as Sketch</p><p>Engine. By using this set of tools, researchers can quickly access the specific</p><p>terminology in their own areas as well as select the lexicon that will best</p><p>suit their own writing. By searching for specific vocabulary with WordList,</p><p>WordSketch and Concordance lines, it is possible to observe frequent ad-</p><p>jectives to each area, such as “social” in SHAPE and “standard” in STEM,</p><p>as well as observe that “high” is one of the most frequent adjectives in both</p><p>areas, however, it is used in specific contexts for each area such as in “high</p><p>productivity cities” in SHAPE and “high income” in STEM. On the other</p><p>hand, verbs did not show very specific use since the lists of frequent verbs</p><p>are very similar in SHAPE and STEM. The only verb that was present in the</p><p>list of twenty most frequent ones in SHAPE that was not frequent in STEM</p><p>is the verb to “see”.</p><p>Several of the most recurrent n-grams found in the introduction</p><p>sections are text-oriented (Hyland, 2008: 13), which means they are con-</p><p>cerned with the organization of the text and its meaning as a message or</p><p>argument. Some examples of text-oriented n-grams are: as well as, in addi-</p><p>tion to, on the other [hand]; these sequences are important to signal logical</p><p>relationship between the ideas presented and maintaining logical cohesion.</p><p>Following the findings of Swales and Feak (2004, 2009) and</p><p>Karpenko-Seccombe (2020), another important aspect discussed in this</p><p>paper was the use of similar language structures in each research section.</p><p>The examples previously presented show how authors keep the same way</p><p>for introducing their papers (This paper aims …), writing their methodol-</p><p>ogy (method + present/past participle), discussing their findings and con-</p><p>clusions (The results show/ suggest/ demonstrate + that). Taking that into</p><p>account we can infer that these structures provide safe ground to non-na-</p><p>tive speakers of English and novice researchers to “walk on” and to use</p><p>in their own research papers in order to be accepted by their discourse</p><p>123</p><p>communities which will include peer reviewers and internationally recog-</p><p>nized researchers.</p><p>The last aspect we would like to mention is that although we have</p><p>used Sketch Engine to explore SHAPE and STEM corpora to write our</p><p>own paper, there are similar tools that can be used by researchers, such as</p><p>AntConc (Anthony, 2005) and LexTutor (Cobb, n.d.).</p><p>Final Considerations</p><p>In this chapter we presented an overview on how to compile spe-</p><p>cialized corpora in SHAPE and STEM with the AntCorGen tool and how</p><p>researchers can use those corpora to access the academic language used by</p><p>their peers. By doing so, researchers will confirm or refute ways of present-</p><p>ing their studies according to each research paper section, as well as the</p><p>best way of describing their methodological approach, and call attention</p><p>to their studies contribution. We hope this chapter may inspire research</p><p>teams to start building their own language database that can be used by</p><p>future members and can be constantly updated.</p><p>References</p><p>Anthony, L. (2005, July). AntConc: design and development of a freeware cor-</p><p>pus analysis toolkit for the technical writing classroom. In IPCC 2005. Proceedings.</p><p>International Professional Communication Conference, 2005. (pp. 729-737). IEEE.</p><p>Anthony,</p><p>L. (2019). AntCorGen (Version 1.1.2) [Computer Software]. Tokyo, Japan:</p><p>Waseda University. Available from https://www.laurenceanthony.net/software</p><p>Becher, T. & Trowler, P. R. (2001). Academic Tribes and Territories (2nd ed.). SRHE.</p><p>Berber Sardinha, T. (2003). Uso de corpora na formação de tradutores. Delta: docu-</p><p>mentação de estudos em lingüística teórica e aplicada, 19, 43-70.</p><p>Berber Sardinha, T. (2010). Como usar a linguística de corpus no ensino de língua</p><p>estrangeira–por uma linguística de corpus educacional brasileira. Corpora no ensino</p><p>de línguas estrangeiras, 293-348.</p><p>https://www.laurenceanthony.net/software</p><p>124</p><p>Biber, D., Conrad, S. & Cortes, V. (2004). If you look at…: Lexical bundles in univer-</p><p>sity teaching and textbooks.Applied linguistics,25(3), 371-405.</p><p>Bondi, M. & Sanz, R. L. (2014). Abstracts in Academic Discourse: Variation and Change</p><p>(1st ed.). Peter Lang.</p><p>Bowker, L. (1999). Exploring the potential of corpora for raising language awareness in</p><p>student translators. Language awareness, 8(3-4), 160-173.</p><p>Carvalho, C. T., Laranja, L. A. N. & Pinto, P. T. (2021). DIY Corpora: o que são e</p><p>para quem são?. Tradterm, 37(1), 64-87. https://doi.org/10.11606/issn.2317-9511.</p><p>v37p64-87</p><p>Chang, Y. Y. & Swales, J. M. (2014). Informal elements in English academic writing:</p><p>threats or opportunities for advanced non-native speakers?. In Writing: Texts, processes</p><p>and practices (pp. 145-167). Routledge.</p><p>Charles, M. (2012). ‘Proper vocabulary and juicy collocations’: EAP students evaluate</p><p>do-it-yourself corpus-building. English for Specific Purposes, 31(2), 93-102.</p><p>Cobb, T. (n.d.). Range for texts v.3 [computer program]. Retrieved from at 19 november, 2021.</p><p>Flowerdew, L. (2010). Using corpora for writing instruction. In A. O’Keeffe; M.</p><p>McCarthy (Eds.) The Routledge handbook of corpus linguistics, 444-457.</p><p>Frankenberg-Garcia, A., Bocorny, A.E.P., Tavares-Pinto, P. & Sarmento, S. (2019)</p><p>Supporting the Internationalization of Brazilian Research. Workshops delivered at the</p><p>Federal University of Rio Grande do Sul and at São Paulo State University, Porto Alegre</p><p>and São José do Rio Preto, April-June 2019.</p><p>Frankenberg-Garcia, A. (2020). Combining user needs, lexicographic data and digital</p><p>writing environments.Language Teaching, v. 53, n. 1, 29-43.</p><p>Granger, S., Paquot, M. (2008). Disentangling the phraseological web.Phraseology: An</p><p>interdisciplinary perspective, 27-49.</p><p>Gray, B. (2015). On the complexity of academic writing: Disciplinary variation and</p><p>structural complexity. In V. Cortes & E. Csomay (Eds.), Corpus-based Research in</p><p>Applied Linguistics : Studies in Honor of Doug Biber (1st ed., pp. 49–78). John Benjamins</p><p>Publishing Company.</p><p>Howarth, P. A. (2013). Phraseology in English academic writing. Max Niemeyer Verlag.</p><p>https://doi.org/10.11606/issn.2317-9511.v37p64-87</p><p>https://doi.org/10.11606/issn.2317-9511.v37p64-87</p><p>125</p><p>Hyland, K. (2004). Disciplinary discourses: Social interactions in academic writing.</p><p>University of Michigan Press.</p><p>Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English</p><p>for Specific Purposes, 27(1), 4–21.</p><p>Hyland, K. (2012). Disciplinary Differences: Language Variation in Academic</p><p>Discourses. In K. Hyland & M. Bondi (Eds.), Academic Discourse Across Disciplines</p><p>(1st ed., pp. 17–45). Peter Lang.</p><p>Hyland, K. (2014). Disciplinary discourses: Writer stance in research articles. Routledge.</p><p>In H. Candlin, & K. Hyland (Eds.), Writing: Texts, Processes and Practices Hyland, K.</p><p>(2014). Disciplinary discourses: Writer stance in research articles. In H. Candlin, & K.</p><p>Hyland (Eds.), Writing: Texts, Processes and Practices. Routledge. pp. 99-121.</p><p>Hurtado Albir, A. (2001). Traducción y traductología. Introducción a la traductología.</p><p>Cátedra.</p><p>Johns, T. F. (1991). Should You Be Persuaded: Two Examples of Data-Driven Learning</p><p>Materials. English Language Research Journal, No. 4, 1-16.</p><p>Karpenko-Seccombe, T. (2020). Academic writing with corpora: A resource book for</p><p>data-driven learning. Routledge.</p><p>Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J. & Suchomel, V.</p><p>(2014). The Sketch Engine: ten years on. Lexicography, 1(1), 7-36.</p><p>Lee, D. & Swales, J. (2006). A corpus-based EAP course for NNS doctoral students:</p><p>Moving from available specialized corpora to self-compiled corpora. English for specif-</p><p>ic purposes, 25(1), 56-75.</p><p>Maia, B. (2000). Making corpora – a learning process. In: Bernardini, S. & Zanettin, F</p><p>(eds). I corpora nella didattica della traduzione. Bologna: CLUEB. 47-6.</p><p>Maia, B. (2002). Do-it-yourself, disposable, specialised mini corpora–where next?</p><p>Reflections on teaching translation and terminology through corpora. Cadernos de</p><p>Tradução, 1(9), 221-235.</p><p>McCombes, S. How to write a research methodology (2019). Available at Access on November 9th, 2020.</p><p>Morley, J. (2014). Academic phrasebank. Manchester: University of Manchester.</p><p>126</p><p>Pearson, J. (1996). Electronic texts and concordances in the translation classroom.</p><p>TEANGA: The Irish Yearbook of Applied Linguistics, 16, 85-95.</p><p>Pinto, P. T., de Camargo, D. C., Serpa, T. & da Silva, L. F. (2021) Analysing the be-</p><p>haviour of academic collocations in a corpus of research-papers: a data-driven study/</p><p>Analisando o comportamento de colocações acadêmicas em um corpus de artigos</p><p>científicos: um estudo dirigido por dados. Revista de Estudos da Linguagem, 29(2),</p><p>1229-1252.</p><p>Römer, U., Cortes, V. & Friginal, E. (2020). Advances in corpus-based research on</p><p>academic writing: Effects on discipline, register, and writer expertise (1st ed.). John</p><p>Benjamins Publishing Company.</p><p>Schmitt, N. & Carter, R. (2004). Formulaic sequences in action.Formulaic sequences:</p><p>Acquisition, processing and use, 1-22.</p><p>Sinclair, J. M. (1991). Corpus, concordance, collocation. Oxford: Oxford.</p><p>Siyanova-Chanturia, A. (2015). On the ‘holistic’nature of formulaic language.Corpus</p><p>Linguistics and Linguistic Theory,11(2), 285-301.</p><p>Swales, J. M. & Feak, C. B. (2004). Academic writing for graduate students: Essential</p><p>tasks and skills (Vol. 1). Ann Arbor, MI: University of Michigan Press.</p><p>Swales, J. M. & Feak, C. B. (2009). Abstracts and the writing of abstracts (Vol. 2).</p><p>University of Michigan Press ELT.</p><p>Tavares-Pinto, P., Rees, G. & Frankenberg-Garcia, A. (2021). Identifying collocation</p><p>issues in English L2 research article writing. Charles, Maggie; Frankenberg-Garcia,</p><p>Ana. Corpora in ESP/EAP Writing Instruction: Preparation, Exploitation, Analysis.</p><p>01ed. London: Rouledge, 01-20.</p><p>Varantola, K. (2003). Translators and disposable corpora. Corpora in translator educa-</p><p>tion, 55-70.</p><p>Wray, A. (2002).Formulaic language and the lexicon. Cambridge University Press, 110</p><p>Midland Ave., Port Chester, NY 10573-4930 (45 British pounds).</p><p>Wray, A. & Perkins, M. R. (2000). The functions of formulaic language: An integrated</p><p>model.Language & Communication,20(1), 1-28.</p><p>Zanettin, F., Bernardini, S. & Stewart, D. (ed.). (2003) Corpora in translator education.</p><p>Manchester: St. Jerome.</p><p>127</p><p>Creating a local learner corpus: Insights on project design</p><p>and data analysis from the pilot phase</p><p>Sandra Zappa-Hollman (UBC-CA)</p><p>Alfredo Afonso Ferreira (UBC-CA)</p><p>Greta Perris (UBC-CA)</p><p>Simone Sarmento (UFRGS)</p><p>Marine Laísa Matte (UFRGS)</p><p>Laura Baumvol (UBC-CA)</p><p>Introduction</p><p>A learner corpus (LC) is a principled collection of texts produced</p><p>by additional language learners. These texts are collected and systemati-</p><p>cally organized electronically to allow for a range of teaching and research</p><p>applications. Learner corpora have been typically created by academics or</p><p>publishers for so-called “delayed pedagogical use” (i.e., not necessarily for</p><p>the immediate benefits of those students sharing their writing samples), as</p><p>well as for research purposes; that is, for contributing to theorization in ad-</p><p>ditional language acquisition and applied linguistics through identification</p><p>of patterns in learner</p><p>range of contents through the medium</p><p>of the English language, rather than teaching the language itself. This phe-</p><p>nomenon is rapidly gaining ground on a global scale and is closely linked to</p><p>the internationalization and globalization of higher education institutions.</p><p>This study focuses on EMI practices in Brazil, using data collected through</p><p>a large-scale questionnaire sent to higher education teachers across all re-</p><p>gions and states of the country. The authors investigate whether EMI occurs</p><p>in the eight different fields of knowledge as classified by Brazilian funding</p><p>agencies and examine teachers’ perceptions of the benefits, or lack thereof,</p><p>of classes taught in English. The findings of the study indicate that EMI is</p><p>more widely accepted in the “harder” sciences, such as biological scienc-</p><p>es, health sciences, agricultural sciences, and STEM. On the other hand,</p><p>the fields of the “softer” sciences, including human sciences and linguistics,</p><p>literature, and arts, appear to be more cautious in adopting EMI in their</p><p>practices.</p><p>We hope all these voices can reverberate, so that new avenues of re-</p><p>search and teaching arise and foster dialogue around EAP!</p><p>13</p><p>Acknowledgements</p><p>This book has been supported by the Graduate Program in Linguistics</p><p>and Literature at the Federal University of Rio Grande do Sul and CAPES’</p><p>PROEX. Simone Sarmento holds a CNPq research productivity scholarship</p><p>level 1D.</p><p>References</p><p>Biber, D (2006). University language: A corpus-based study of spoken and written regis-</p><p>ters. John Benjamins Publishing.</p><p>Hyland., K. (2009). Academic discourse: English in a global context. A&C Black.</p><p>Macaro, E. (2017). English medium instruction: Global views and countries in fo-</p><p>cus: Introduction to the symposium held at the Department of Education, University</p><p>of Oxford on Wednesday 4 November 2015. Language Teaching 1–18. doi:10.1017/</p><p>S0261444816000380.</p><p>Marengo, L. H. F. (2022). The role of English language proficiency in Brazilian EMI prac-</p><p>tices. [Unpublished master’s thesis]. Federal University of Rio Grande do Sul.</p><p>Nesi, H. (2016). Corpus studies in EAP. K. Hyland, K. & P. Shaw (Eds.), The Routledge</p><p>Handbook of English for Academic Purposes. (206-217) Routledge.</p><p>14</p><p>The role of Corpus Linguistics in EAP</p><p>Deise P. Dutra (UFMG)</p><p>Tony Berber Sardinha (PUC-SP)</p><p>Introduction</p><p>Since the 1960s a considerable portion of research about English has</p><p>been intimately connected with the teaching and learning of English for</p><p>specific purposes (ESP), a branch of applied linguistics which has evolved,</p><p>especially from 1990 to 2020, to become “a mature discipline of global im-</p><p>portance” (Hyland & Jiang, 2022: 23). For instance, such research has helped</p><p>teachers and material designers by providing word frequency lists that can</p><p>support class preparation and textbook writing (e.g., General Service List</p><p>[GSL] by West, 1953; Academic Word List [AWL] by Coxhead, 2000).</p><p>ESP comprises several strands, including, among others, business</p><p>English, aviation English, English for medical purposes, and English for</p><p>academic purposes (EAP), which is the focus of this book. Unsurprisingly,</p><p>studies from a corpus linguistics (CL) perspective have informed EAP</p><p>practices, providing detailed descriptions of academic speech and writing</p><p>“from lexical, phraseological, grammatical, and genre perspectives” (Nesi,</p><p>2016: 206).</p><p>Whether corpus is the backbone of teaching syllabus and reference</p><p>materials, such as in dictionaries (Sinclair, 19871), grammar books (Biber</p><p>et al., 1999; Carter & McCarthy, 2006), and textbooks (McCarthy et al.,</p><p>2014), or is used by teachers and students (Johns, 1991; Crosthwaite et al.,</p><p>2021), these perspectives lead us to reflect on CL’s pedagogical implications</p><p>for language teaching and learning, especially on EAP. Römer (2010) views</p><p>1 Collins COBUILD English language dictionary was the first dictionary-based on</p><p>corpus.</p><p>15</p><p>the pedagogical application of corpus as either indirect, as researchers and</p><p>materials developers use corpora, or direct, when teachers and students are</p><p>able to have their hands on corpus data. Researchers and material designers</p><p>deal with corpora results when writing syllabi, textbooks, and reference</p><p>materials included in other materials. They are the ones who deal with the</p><p>data from the corpora and filter the relevant information for the audience</p><p>and teaching context of the proposed material. Therefore, the pedagogical</p><p>applications are indirect. Conversely, when teachers use corpora to prepare</p><p>activities or have their students carry out corpus investigations, they are</p><p>involved in the direct applications of CL through their teaching and learn-</p><p>ing experiences. Above all, when EAP teachers and students use corpus</p><p>tools or have access to materials based on corpus, they have access to real</p><p>language: “[T]he methodological paradigm of corpus research has a direct</p><p>influence on what is regarded as reliable knowledge sources. Corpus inves-</p><p>tigations give primacy to data, that is, they prioritize empirical analyses of</p><p>language use” (Viana & O’Boyle, 2022: 52).</p><p>In this chapter, we discuss how corpora studies relate to EAP, show-</p><p>ing how they have impacted this area in different ways. We first review the</p><p>major literature on corpus-based research into EAP vocabulary. Second,</p><p>we focus on grammatical complexity corpus-based research and how it has</p><p>affected and could better contribute to EAP. Finally, we discuss how multi-</p><p>dimensional analysis (Biber, 1988) approaches to EAP can provide insights</p><p>into the underlying patterns of lexico-grammatical characteristics found in</p><p>academic texts, discussing how these patterns can reveal striking differenc-</p><p>es across academic registers,2 some of which have been ignored in the field.</p><p>2 “… a register is a variety associated with a particular situation of use (including</p><p>particular communicative purposes). The description of a register covers three major</p><p>components: the situational context, the linguistic features, and the functional rela-</p><p>tionships between the first two components” (Biber & Conrad, 2009: 6).</p><p>16</p><p>Vocabulary through the lenses of CL: From lists of individual words to</p><p>phraseological patterns</p><p>Corpus-based research may be motivated by teaching and/or learn-</p><p>ing issues. One of the areas with a direct connection to pedagogical impli-</p><p>cations (e.g., syllabus preparation, materials design and classroom tasks) is</p><p>vocabulary, making corpus-generated frequency lists a valuable contribu-</p><p>tion to EAP. In this section, we concentrate on how word lists have evolved</p><p>from general English to academic general English to better cater to EAP</p><p>learners’ needs. The aim is to relate CL contribution to the presented lists</p><p>without exhaustively reviewing all corpus-generated vocabulary to date.</p><p>Distinctions will be made between contributions that focus on individual</p><p>vocabulary and on a phraseological perspective for list compilation.</p><p>Since West (1953 as cited in Coxhead, 2000) developed the GSL,</p><p>a corpus-based 2,000-word family list for English as a Second Language</p><p>(ESL)/ English as a Foreign Language (EFL) learners, it has been widely</p><p>used by English language teachers. The GSL was compiled to support the</p><p>teaching and learning of general English while being used as a reference</p><p>for other lists, including the new AWL3 (Coxhead, 2000). Following “the</p><p>assumption that frequency and coverage are important criteria for select-</p><p>ing vocabulary” (Coxhead, 2000: 215), Coxhead considered these compi-</p><p>lation criteria: representativeness (Biber, 1993), organization (subregisters’</p><p>distribution across subject areas), corpus size (Sinclair, 1991), and word</p><p>selection. To support EAP programs and students, the AWL was based</p><p>on an academic register corpus with 28 subject areas distributed in four</p><p>disciplines: arts, commerce, law, and science. The academic subregisters</p><p>covered in Coxhead’s academic corpus were articles, book chapters, course</p><p>3 Other academic lists were made available for teachers, students,</p><p>language. More recently, however, a growing number</p><p>of LCs have been created locally by researcher-practitioners for “immedi-</p><p>ate pedagogical use” in their specific institutional contexts (Granger, 2009,</p><p>2015), leading to data-driven enhancements in curriculum development,</p><p>teaching, and learning.</p><p>The LC project we report on here was designed to systematically col-</p><p>lect and access large samples of our students’ writing for relatively imme-</p><p>diate pedagogical application. Over time, this resource is meant to better</p><p>track writing development within and across student cohorts and identify</p><p>patterns of variation at larger scales such as across disciplines, language</p><p>background of learners, and instructional programs. This scope of interest</p><p>128</p><p>across teaching and research is indicative of the close relationship between</p><p>them in data-based learning. In addition to helping us systematize access</p><p>to student texts for research purposes, our LC is also designed to inform</p><p>curriculum development and instructional practice.</p><p>When we embarked on this project, our team represented a range</p><p>of expertise and background knowledge that enabled us to envision the</p><p>overall objectives and structure of our LC. Yet it was evident early on that</p><p>creating a successful local LC would require effort and a steep learning</p><p>curve. This chapter reports on some of the key choices we made as we de-</p><p>signed and implemented the pilot phase of the LC project. Some challenges</p><p>we had to overcome and important considerations we made in relation to</p><p>technological and logistical aspects. And to illustrate the potential bene-</p><p>fits to teaching and research in our context of even a small dataset from</p><p>the pilot phase of the project, we also present the results of an analysis of</p><p>comparative discourse in student expository writing. We close the chapter</p><p>with reflections synthesizing what we have learned from the pilot phase</p><p>and outline on our following steps.</p><p>The VanCor Project</p><p>The Vantage College Corpus of Student Texts Across Disciplines</p><p>(henceforth, VanCor) project that to create a systematic and searchable on-</p><p>line repository of student written assignments. VanCor is conceived as a</p><p>resource for faculty at Vantage College (VC) in The University of British</p><p>Columbia (UBC) have easy access to written assignments that students en-</p><p>gage in across a range of disciplines in first year programs. VanCor has the</p><p>potential to be relevant for research, data-driven curriculum development,</p><p>instructional materials development, and program evaluation purposes.</p><p>129</p><p>Institutional Context</p><p>Launched in 2014, VC is a unit at UBC that offers first year program-</p><p>ming for international English as an additional language (EAL) speakers</p><p>whose proficiency is slightly below the university’s English language admis-</p><p>sion standards for direct entry. At the time of data collection, three program</p><p>options were available: first-year Bachelor of Arts, first-year Bachelor of</p><p>Engineering, and first-year Bachelor of Science. Program faculty include a</p><p>team of English For Academic Purposes (EAP) instructors who work with</p><p>disciplinary faculty seconded to VC from their respective departments in</p><p>Arts, Engineering, and Science.</p><p>VC offers instructional programming tailored to support of students’</p><p>transition into the second-year of their bachelor’s degree at UBC. VC pro-</p><p>grams are characterized by a cohort-based model and standard timetables,</p><p>providing a coordinated curriculum that includes content-focused and lan-</p><p>guage-focused1 credit-bearing courses. Thus, alongside their program-spe-</p><p>cific courses, students receive general EAP and discipline-specific English</p><p>instruction. After successfully finishing their first year at VC, students con-</p><p>tinue as second-year students in their respective faculties. The program ex-</p><p>pands the usual two academic terms of first year to three academic terms,</p><p>totaling 11 months of instruction. This time extension accommodates the</p><p>required disciplinary courses in the respective programs of study as well as</p><p>VC-specific programming aimed at scaffolding students’ linguistic, cogni-</p><p>tive, and skills development as apprentice multilingual scholars.</p><p>The custom-designed programming includes an introductory re-</p><p>search methods course with an application component that engages stu-</p><p>dents in a small group research project they eventually present at an annual</p><p>1 VC uses an integrated language and content approach which views the learning</p><p>of language and subject area knowledge as inseparable and mutually constitutive. We</p><p>use “content-focused” and “language-focused” as shorthand to refer to what otherwise</p><p>are also referred to in the literature as “subject, or disciplinary” courses versus “lan-</p><p>guage” courses. Yet we view both types of courses as involving both content as well as</p><p>language. To try and foreground this relationship between language and content, we</p><p>classify these as courses that place an emphasis or focus in either of the two, based on</p><p>what most course learning outcomes stipulate.</p><p>130</p><p>student-led capstone conference. To explicitly support their academic (gen-</p><p>eral as well as discipline-specific) language and literacy, students complete</p><p>academic English courses informed by Systemic Functional Linguistics as</p><p>well as have access to on-demand academic English support via writing</p><p>consultations.2</p><p>These multiple, relatively uncommon, aspects of the programs at VC</p><p>make it an attractive context for researching learners’ language character-</p><p>istics, use, and development. In what follows, we recount the genesis of the</p><p>international collaboration that led to the VanCor project.</p><p>International Collaboration: A Brief History</p><p>The VanCor project brings together researchers and educators from</p><p>UBC in Canada, and the Federal University of Rio Grande do Sul (UFRGS)</p><p>in Brazil. The genesis of this project was in late 2019, over conversations</p><p>amongst Simone, Alfredo, Laura, and Sandra, about ways to collaborate</p><p>around a project of mutual interest. Since one of the mandates of VC is to</p><p>serve as a living lab for pedagogical and research innovation, designing a</p><p>research project with the goal of supporting activities such as curriculum</p><p>development and design of student tasks seemed most fit and appealing.</p><p>Given Simone’s expertise in LC development and the desire from VC mem-</p><p>bers to create an institutional learner corpus, our group decided to embark</p><p>on a project, seeing the potential benefits of the international collaboration.</p><p>By early 2020, we had obtained competitive funding via a Social Sciences</p><p>and Humanities Research Council (SSHRC) institutional grant. This fund-</p><p>ing supported the hiring of our two graduate research assistants from UBC.</p><p>In what follows, we provide an overview of the project sequence and key</p><p>stages.</p><p>2 For further details on the Vantage program, see Zappa-Hollman & Fox (2021),</p><p>Ferreira & Zappa-Hollman (2019), and Zappa-Hollman (2018), as well as the Vantage</p><p>College website: https://vantagecollege.ubc.ca/program-overview</p><p>https://vantagecollege.ubc.ca/program-overview</p><p>131</p><p>Project Timeline</p><p>The pilot phase involved four stages (Fig. 1).</p><p>Figure 1. VanCor project Timeline</p><p>The first stage involved an extensive, updated review of the literature</p><p>on learner corpora, with a focus on the creation and uses of LC for research</p><p>and pedagogical applications. This literature review was complemented</p><p>with consultations with experts in learner corpora in university contexts</p><p>outside of Canada as well as with consultations with UBC librarians with</p><p>expertise in data management.</p><p>The second stage involved defining the scope, objectives, procedures,</p><p>timeline, and developing the data collection instruments. To collect the</p><p>learner texts that our project participants were willing to share with us,</p><p>we used a survey (hosted on the Qualtrics survey tool). This survey, re-</p><p>produced in Appendix A, included a section for collecting demographic</p><p>data about the participants, a second section about assignments</p><p>informa-</p><p>tion and for uploading assignments (up to 15 files), and a third section</p><p>inviting participants for a debriefing interview. The interviews aimed to</p><p>gather feedback from the students about their experience participating in</p><p>the study (i.e., completing the survey), potentially offering deeper insights</p><p>about the process of writing their assignments. To complete the survey,</p><p>participants had to first provide their informed consent via a form included</p><p>at the start of the survey. The survey also included the request for student</p><p>132</p><p>consent to the collection and use of their data. At this stage we also applied</p><p>to the institutional ethics board for approval to conduct this pilot study.</p><p>The third stage involved participant recruitment and data collection.</p><p>This stage spanned six months and took place virtually3 in two courses</p><p>taught by two instructors who are also members of this project team. In</p><p>late November 2020 (end of our Fall term), we recruited participants in</p><p>one section4 of an academic writing course taught by Laura Baumvol in</p><p>the Arts program and collected texts from this class until January, 2021. At</p><p>the start of the Winter term, we recruited participants from two sections</p><p>of an adjunct course taught by Alfredo Ferreira that links EAP instruction</p><p>to courses in the Science program. The recruitment was carried out by the</p><p>two graduate research assistants during a 15-minute class visit of a syn-</p><p>chronous session when the instructors were not present. During this visit,</p><p>the students were introduced to the project through a 5-minute video with</p><p>an overview of the project goals and a description of what participating</p><p>involved. This was followed by some Q&A time in case prospective partic-</p><p>ipants had any queries. After the class visit, a link and QR code to the sur-</p><p>vey was posted as an announcement on the course learning management</p><p>system sites.</p><p>In total, we collected nine assignments and two sets of instructions,</p><p>and conducted two interviews were conducted; these took place once the fi-</p><p>nal grades for their respective classes had already been awarded. Following</p><p>data collection, the fifth stage involved data preparation and data analy-</p><p>sis. To protect the identity of participants and systematize the process of</p><p>data management, we assigned unique identifiers to each text and instruc-</p><p>tions, and removed all personal identifying information prior to starting</p><p>with data analysis. Next, we used a metadata coding sheet to describe the</p><p>relevant context and genre of each text. We developed our text metadata</p><p>coding sheet partly based on a similar resource from Graves and Hyland</p><p>(2017) with some adaptations for our context and project purposes. The</p><p>3 Since our project was carried out during the Covid-19 pandemic, all research ac-</p><p>tivities – including recruitment and data collection – were carried out online.</p><p>4 Each course section has a student registration of 25, maximum.</p><p>133</p><p>coding sheet can be found in Appendix B5. For this classification, we are</p><p>drawing on Systemic Functional Linguistic theory. Section 5 includes an il-</p><p>lustration of the analysis of corpus data for use in research and instruction.</p><p>Key Reference Literature</p><p>As mentioned above, we consulted canonical texts on learner corpo-</p><p>ra (Granger, 2002, 2009, 2015; Gardner & Nesi, 2013; Römer & O’Donnell,</p><p>2011) to gain insights on types of data to collect, steps, and sequencing to</p><p>follow, as well as tips to avoid common pitfalls and minimize challenges</p><p>in data retrieval and analysis. Recent articles focusing on the process of</p><p>designing and implementing a LC were helpful to learn from insights the</p><p>authors gained through trial and error.</p><p>For instance, Granger et al’s (2020) International Corpus of Learner</p><p>English (ICLE)6, which is composed of texts written by upper intermediate</p><p>and advanced learners of 25 different language backgrounds offers an ex-</p><p>cellent model for gathering metadata on the texts that allow for an in-depth</p><p>view of both the learners and the tasks.</p><p>Some projects have expanded their scope to provide additional types</p><p>of resources to assist with writing research, support instructors’ profes-</p><p>sional development, and train those intending to design and use an LC.</p><p>Two such corpora we found impressive in this regard are the Multilingual</p><p>Academic Corpus of Assignments: Writing & Speech (MACAWS) and the</p><p>Corpus & Repository of Writing (CROW), both with Dr. Shelley Staples as</p><p>a lead investigator. MACAWS (Staples et al., 2019) is an ongoing building</p><p>corpus of assignments written by students enrolled in language programs</p><p>at the University of Arizona. CROW (Staples & Dilger, 2018) contains texts</p><p>that L1 and L2 first-year undergraduate students write in their composition</p><p>classes in three universities in the US. Access to these resources is available</p><p>by requesting registration to their customized websites. Once registered</p><p>5 This genre classification system will be revised as we collect more texts from diffe-</p><p>rent genres.</p><p>6 https://uclouvain.be/en/research-institutes/ilc/cecl/icle.html</p><p>https://uclouvain.be/en/research-institutes/ilc/cecl/icle.html</p><p>134</p><p>with the MACAWS website, for example, we were able to access a reposito-</p><p>ry of pedagogical materials associated with the assignments, such as syllabi,</p><p>assignment sheets, lesson plans, and instructional materials; and language</p><p>learning activities in Portuguese and Russian designed based on the lan-</p><p>guage patterns that emerge from the corpus. In CROW, we also accessed</p><p>demographic data and a repository of resources intended to help others –</p><p>like us – with the design and use of LCs. The resources shared in these two</p><p>projects guided our decisions about several aspects of our own project. For</p><p>example, the demographic information helped us further refine the kind of</p><p>metadata we would collect. The corpus helped us reflect on whether col-</p><p>lecting drafts of our students’ final writing was useful for our project. The</p><p>pedagogical resources provided several suggestions for ways in which to</p><p>involve other VC instructors in the next phases of our project.</p><p>Another corpus that informed our pilot is the The Civil Engineering</p><p>Writing Project Conrad (2017)7, led by Susan Conrad at Portland State</p><p>University. The corpus includes student and practitioner writing in the field</p><p>of engineering, as well as an impressive collection of open-source instruc-</p><p>tional materials for use by course instructors and students in self-study.</p><p>The studies that emerged from the project (reviewed below) relate to genre</p><p>and linguistic analysis, grammar and mechanics errors, as well as holistic</p><p>evaluations of writing effectiveness. The genre classification in this corpus</p><p>helped us reflect on the way we would approach genres in our own texts.</p><p>We also drew from “Writing assignments across five academic pro-</p><p>grams” by Graves (2017), a chapter in an edited book by Canadian research-</p><p>ers who created a corpus of undergraduate student assignments. From this</p><p>resource we used the writing assignments coding guide, which served as</p><p>the basis for our coding guide. This coding sheet is used to record standard</p><p>information (e.g. genre, length of text, topic, grade, etc.) collected about</p><p>each text, which is then entered into the web-based application that com-</p><p>piles them and creates reports. In addition to guiding our coding proce-</p><p>dures, the process of adapting the coding sheet became an opportunity for</p><p>our team to revisit and adjust as needed the goals and scope of the project.</p><p>7 http://www.cewriting.org/</p><p>135</p><p>Alongside canvasing websites and scholarly publications, we also</p><p>reached out to a number of the scholars who led those works, primari-</p><p>ly with questions related to preferred communication and collaboration</p><p>practices within their research team, and questions related to data collec-</p><p>tion and management.8 The guidance that was generously offered extended</p><p>well beyond these questions. The scholars candidly shared their experienc-</p><p>es of learner corpora development</p><p>(e.g., MACAWS, CROW) and lessons</p><p>learned along the way. They highlighted important yet often overlooked</p><p>aspects of corpus development such as steps to ensure a sufficient num-</p><p>ber of texts are collected and advised starting small, staying focused on</p><p>the scope, which may involve starting with a smaller project before scaling</p><p>it up. Based on these insights, we adjusted our timeline for collecting the</p><p>texts, and decided that in the scaled-up version of our project we will not</p><p>provide monetary incentives for participation (as these can prove challeng-</p><p>ing for distribution as well as add significant cost to the project). These</p><p>projects also provided access to a wealth of resources encompassing the</p><p>lifecycle of a corpus-building project, from detailed information on devel-</p><p>oping the backend of the corpus, such as the database structure, automated</p><p>tools, indexing, text-processing tools, and illustrations of how corpora can</p><p>be used to create relevant pedagogical materials.</p><p>Drawing on the LC community of practice helped us reflect on our</p><p>research questions and practices with experts in the field, make informed</p><p>choices that strengthened our project, and enabled us to refine the project</p><p>and move forward with heightened confidence. We also consulted experts</p><p>in digital scholarship through workshops that provided crucial training on</p><p>the choice of digital tools available for project management, data collection</p><p>and storage, and the dissemination of project outcomes. These training ses-</p><p>sions also introduced us to institutional norms and best practices pertain-</p><p>ing to handling sensitive research data (e.g., institutional requirement to</p><p>store data on Canadian servers).</p><p>8 We are extremely grateful for the generosity of our colleagues from the Corpus</p><p>Linguistics field who have kindly shared their knowledge with our team.</p><p>136</p><p>Analysis of selected data from pilot project</p><p>Insights into student writing based on quantitative analysis of a large</p><p>sample is a key goal of local learner corpora collected for immediate ped-</p><p>agogical purposes. While a corpus that is sufficiently large for quantita-</p><p>tive analysis would have been a welcome outcome of our pilot project, as</p><p>mentioned earlier, the first aim of the project was to test approaches to</p><p>data collection in our context. Having described the steps we followed in</p><p>designing, collecting, and storing the data for our project, we now focus on</p><p>a small subset of four texts collected from Science students to illustrate how</p><p>even such a small sample can inform teaching, research, and the VanCor</p><p>project in valuable ways.</p><p>The contributions to instruction of a very limited sample of student</p><p>texts from the same instructional setting can be likened to those of quali-</p><p>tative, case study analysis (Duff, 2008), the primary overlap being that the</p><p>data emerge from a specific, well-defined context. As with case study data,</p><p>such pilot corpus data suggest hypotheses about learner practices that can</p><p>be subsequently explored in a wider study, targeted for collection in larger</p><p>corpora, and, on the teaching side, can inform the development of instruc-</p><p>tional materials to test in classrooms for fit with student needs and interests.</p><p>Given the interest in forming hypotheses and developing instruc-</p><p>tional materials from the pilot data, two aspects of the data come directly</p><p>into play. First, it is important to recognize that the texts are not represen-</p><p>tative of the Science program cohort or all students in the class: these are</p><p>relatively successful texts voluntarily submitted by four high-performing</p><p>students within the top 10% of the class. This quality of these data point to</p><p>a weakness in the opt-in approach to the collection of student texts: gen-</p><p>erally, high-performing students submitted writing assignments that were</p><p>also high-performing in terms of the grade received.</p><p>The second aspect of the data that inform their use for instruction is</p><p>the nature of the writing undertaken in this assignment and our focus on</p><p>pedagogical application. The students wrote comparative discussions, ap-</p><p>proximately 1,400 words in length, across three drafts with instructor and</p><p>peer feedback. The student-writer selects two scientific theories, concepts,</p><p>137</p><p>or approaches in the history of science to compare in relation to a speci-</p><p>fied criterion. This critically-engaged discussion typically concludes with</p><p>claims about the different motivations for these concepts in the history of</p><p>science9.</p><p>Correspondingly, instruction focused on expository genres, specif-</p><p>ically comparative discussions in the history of science. Comparison is a</p><p>semantic domain relevant in the discussion assignment as well as the first-</p><p>year Physics, Chemistry, and Mathematics textbooks used by the students,</p><p>as it is in science discourse more widely. The instructor in this case, Alfredo,</p><p>observed that students were frequently challenged when using compara-</p><p>tive language in their reports, such as from Chemistry labs, as well as in</p><p>longer writing assignments. This particular discourse analytic research out</p><p>of VanCor arose from an interest in developing materials that would help</p><p>address this observed need for instructional materials in the history of sci-</p><p>ence module of the VC science Content and Language Integrated Learning</p><p>(CLIL) course. Students in the Vantage science stream received no other</p><p>explicit instruction in the language and functions of comparison.</p><p>Qualitative analysis of these texts following SFL theory (Halliday &</p><p>Matthiessen, 2014) led to a number of instruction and research-worthy in-</p><p>sights into the functions of comparison in historical expositions and aca-</p><p>demic writing more generally. Table 1 outlines the functional range of com-</p><p>parative language identified in student writing across metafunctions (i.e.,</p><p>organization, interpersonal positioning, and representation) and some sub-</p><p>functions. For background on the one more technical subfunction listed in</p><p>the table, theme, understood in SFL as the informational point of departure</p><p>for the clause, see Kang (2016). Within the function of representation, the</p><p>genre-specific distinction between focal and non-focal compared things is</p><p>explained below.</p><p>9 For a relevant outline of the development of disciplinary literacy practices in his-</p><p>tory, see Coffin (1997).</p><p>138</p><p>Meta-</p><p>func-</p><p>tions</p><p>Sub-</p><p>function</p><p>Example of Comparative Language in Students’ History</p><p>of Science Writing</p><p>Organi-</p><p>zation</p><p>Title</p><p>A Comparative Exposition of Celestial Mechanics and</p><p>Quantum Electrodynamics in relation to the Description</p><p>of the State of Motion. (Text 09 - Science)</p><p>Thesis state-</p><p>ment; topic</p><p>sentence</p><p>In the exposition that follows, phlogiston theory and oxy-</p><p>gen theory are compared from macro and micro perspec-</p><p>tives in studying science. (Text 10 - Science)</p><p>Theme</p><p>Neo-Darwinism places greater emphasis on natural se-</p><p>lection, whereas eugenics affirms that artificial selection</p><p>is required to conserve the useful features of individuals</p><p>(Paul, 2013). […] These contrasts will be further discussed</p><p>within the section below. (Text 07 - Science)</p><p>Inter-</p><p>personal</p><p>Position-</p><p>ing</p><p>Hedge:</p><p>claim</p><p>A more detailed exploration of the kinematic relation</p><p>between two or more objects in macro and micro perspec-</p><p>tives is provided to consider the difference between the</p><p>types of acting force. (Text 09 - Science)</p><p>Hedge:</p><p>disciplinary</p><p>category</p><p>“better adapted individuals” can be described as a group</p><p>of organisms with higher reproductivity which enables</p><p>their “more useful” genetic characteristics to pass onto</p><p>their offspring and onto the future generations, whereas</p><p>“less adapted individuals” are less likely to survive (Abbey</p><p>& Abalaka, 2011). (Text 07 - Science)</p><p>Ide-</p><p>ational</p><p>Compari-</p><p>son of focal</p><p>things</p><p>Comparing the symmetrical aspect of nature has the pos-</p><p>sibilities to predict the existence of unknown materials or</p><p>phenomena in the universe (Capra, 1975). (Text 08 - Sci-</p><p>ence)</p><p>Comparison</p><p>of non-focal</p><p>things</p><p>A more detailed exploration of the kinematic</p><p>relation</p><p>between two or more objects in macro and micro perspec-</p><p>tives is provided to consider the difference between the</p><p>types of acting force. (Text 09 - Science)</p><p>Both observation and experiments are indispensable in</p><p>studying science, making science more rigorous and accu-</p><p>rate (Ainsworth et al., 1991). (Text 10 - Science)</p><p>Table 1. The functional scope of comparative language in high-performing ex-</p><p>positions in History of Science by first-year science students.</p><p>These data highlight several important features of comparative lan-</p><p>guage that can help develop hypotheses about this area of discourse for use</p><p>in teaching and research. The main finding is that comparative language</p><p>139</p><p>realizes all three main metafunctions and various subfunctions. An inter-</p><p>esting example of this is within the function of representation (technically</p><p>in SFL, the experiential function), which indicates two levels of focus when</p><p>analyzing genres that explicitly set out to compare things: the comparison</p><p>of two or more things in focus in the comparative text, and the comparison</p><p>of everything else for purposes such as organizing ideas for the readers, that</p><p>is the comparison of non-focal things. The latter function arises, for ex-</p><p>ample, in comparing relative degrees of information detail across the text,</p><p>where the writer signposts “a more detailed exploration of… is provided”.</p><p>This finding indicates that comparison is both a defining feature of some</p><p>genres and a more broadly functional resource in academic discourse.</p><p>These corpus data also help qualify the comparative exposition as a</p><p>useful genre for understanding the development of student writing. This</p><p>claim is based on the wide functional scope of comparative language, its</p><p>Field (realized in the lexicogrammatical choices of for representing ideas,</p><p>Tenor (interpersonal positioning), and Mode (textual organization) (on</p><p>these register variables, see Halliday & Matthiessen, 2014). Such a map</p><p>helps us to chart trajectories of development of language and academic</p><p>writing within and across functions by focusing on comparative language.</p><p>In this way, the data also lend validity to the assignment in relation to the</p><p>course learning objectives which aspire for development across the three</p><p>metafunctions.</p><p>In relation to language and writing development, it is worth noting</p><p>the potential of extending this map. This opportunity has arisen in the</p><p>transcript of a visiting lecture (which led the history of science module in</p><p>the course) by an established historian of science10. The lecture includes</p><p>instances of comparative language used for engagement (superlative/hy-</p><p>perbole used to bait readers into the counter-argument before arguing</p><p>against it) and politeness through reverse polarity (reference to an unreli-</p><p>able academic source as “not the most impartial judge”). The extension of</p><p>the semantic potential of comparative language suggested by the practices</p><p>10 The transcript referred to here comes from a lecture on Ancient Greek protosci-</p><p>ence delivered by Dr. Sylvia Berryman, Philosophy Professor at UBC.</p><p>140</p><p>of a more mature scholar shows how comparative language can realize in-</p><p>creasingly fine-grained functions in accordance with disciplinary and lin-</p><p>guistic development, illustrating Halliday’s (1993) conception of language</p><p>development as increasing one’s registerial repertoire or capacity to mean</p><p>across situated contexts; for discussion in advanced language development,</p><p>see Matthiessen (2006). These insights indicate potential directions for re-</p><p>searching language and writing development in this context.</p><p>Moving to a lexicogrammatical view of comparison in History of</p><p>Science arguments, an analysis of the comparative lexis from the history</p><p>texts in the pilot corpus yielded the results shown in Table 2 below. The</p><p>word lists on the right-side columns are classified by grammatical and se-</p><p>mantic/functional units, subunits, and whether the words instantiate the</p><p>semantic domains of similarity or difference. The ordering of grammati-</p><p>cal units from nominal group (noun phrase in traditional grammar) at the</p><p>top of the table down to verb, adverbial, and conjunction at the bottom is</p><p>motivated by the degrees of information density afforded by these units</p><p>(i.e., from most abstract and/or general to most concrete) per the concept</p><p>of ideational grammatical metaphor (Ferreira, 2020; Halliday, 1998). The</p><p>wordlist in each of the subunit categories are ordered from most to least</p><p>frequently occurring with the number of tokens listed on the right-hand</p><p>column.</p><p>141</p><p>Grammatical/</p><p>Functional</p><p>Unit</p><p>Subunit</p><p>Similarity</p><p>Instances in Pilot Corpus</p><p>#</p><p>Difference</p><p>1</p><p>Nominal</p><p>Group /</p><p>Participant</p><p>Head Noun /</p><p>Thing</p><p>comparison/s 8</p><p>similarities 8</p><p>difference/s 7</p><p>contrast/s 2</p><p>alignment 1</p><p>opposite 1</p><p>superiority 1</p><p>Adjective &</p><p>premodifier/</p><p>post-pointer,</p><p>describer</p><p>different 8</p><p>opposite 7</p><p>better 3</p><p>greater 3</p><p>both better adapted 2</p><p>comparative broader 2</p><p>higher 2</p><p>more likely 2</p><p>similar deeper 1</p><p>corresponding less adapted 1</p><p>less likely 1</p><p>more appealing 1</p><p>same more like 1</p><p>more predictable 1</p><p>more regular 1</p><p>more useful 1</p><p>opposing 1</p><p>proportional 1</p><p>superior 1</p><p>2 Verb /</p><p>Process</p><p>Relational</p><p>process</p><p>overlap 1</p><p>share 1</p><p>Material & other</p><p>process</p><p>compare/d 9</p><p>comparing contrasts 3</p><p>distinguished from 2</p><p>correlated with 2</p><p>3</p><p>Adverbial /</p><p>Circum-</p><p>stance</p><p>also by contrast 1</p><p>like in contrast with 1</p><p>similarly more frequently 1</p><p>4</p><p>Conjunc-</p><p>tion/</p><p>Relator</p><p>as while 4</p><p>as whereas 2</p><p>however 1</p><p>rather than 1</p><p>Table 2. Comparative lexis by grammar and function in four high-performing</p><p>History of Science expositions in 1st-year EAP</p><p>142</p><p>As can be seen, the tokens of comparative language cluster signifi-</p><p>cantly in the nominal group (e.g. “These contrasts”; “A more detailed ex-</p><p>ploration of the kinematic relation between two or more objects in mac-</p><p>ro and micro perspectives”). This result can be understood to reflect the</p><p>relatively high functional load of the nominal group in academic writing</p><p>especially with regards to the specification of concepts and foci that is as-</p><p>sociated with disciplinary writing development in university (Duff et al.,</p><p>2015). Unsurprisingly, abstract concepts involving comparison are central</p><p>to texts and genres that set out to compare historical theories in science.</p><p>The finding of high frequency of comparative language in nominal</p><p>groups relative to its use in more dynamic processes (verbs), circumstances</p><p>(adverbs) and logical reasoning (conjunctions) points to a need for addi-</p><p>tional attention to this role of comparative language in construing abstract</p><p>concepts in writing instruction. A cursory examination of two popular</p><p>EAP writing textbooks, both fourth editions (Oshima & Hogue, 2005;</p><p>Blanchard & Root, 2017), highlights a potential emphasis on the latter dy-</p><p>namic, syntactically more complex and material meanings, while the more</p><p>frequent realizations of abstract concepts involving nominal groups receive</p><p>little explicit attention. The “comparison signal words” recommended as</p><p>useful for comparative writing in one of these textbooks, shown in Table 3,</p><p>illustrates this tendency:</p><p>Comparison Signal Words</p><p>Transition Words and Phrases: similarly; likewise; also; too</p><p>Subordinators: as; just as</p><p>Coordinators: and; both… and; not only… but also; neither… nor</p><p>Others: like (+noun); just like (+noun); similar to (+noun);</p><p>(be) like; (be) similar (to); (be) the same as; (be) the same</p><p>(be) alike; (be) similar; to compare (to/with)</p><p>Table 3. Words and phrases used in comparisons recommended by popular</p><p>EAP writing textbook (Oshima & Hogue, 2005: 116-117)</p><p>According to this edition of the textbook, students should focus their</p><p>attention on realizations of comparison for these functions of logical order-</p><p>ing and transition with minimal attention given to elements of the nominal</p><p>143</p><p>group (noting that the “+noun” elements under “Others” do not themselves</p><p>realize a comparative meaning). Such an emphasis does not align with</p><p>the functional distribution of</p><p>comparative language in the sub-corpus of</p><p>high-performing texts in the History of Science.</p><p>These results suggest potentially useful insights for research and in-</p><p>struction. We have found that the semantic scope of comparison encom-</p><p>passes a wide functional range of language: ideational, interpersonal, and</p><p>organizational meanings, and various sub-functions of these such as evalu-</p><p>ation, affect and multiple scales of text organization including signposting</p><p>through topic sentences and various cohesive devices.</p><p>Given the wide functional scope and grammatical realizations of</p><p>comparative language in the comparative exposition genre, a relatively ho-</p><p>listic perspective on language and writing development in EAP contexts</p><p>can be operationalized by focusing on comparative language in this genre.</p><p>The same results suggest various corpus-based approaches and tasks for in-</p><p>structional curricula involving comparative and related genres of academic</p><p>writing. In these and other ways, the focus on a few student texts within</p><p>a relatively specific written genre has yielded useful insights to apply to</p><p>teaching, research, and the next phases of the VanCor project.</p><p>Current status and next steps</p><p>Through our collaboration on the VanCor project, the team has tak-</p><p>en the first steps in designing, compiling, storing, and applying learner cor-</p><p>pora: reviewing the literature, consulting with experts, piloting the various</p><p>sub-tasks involved in data collection, and analyzing the results. These ex-</p><p>periences in the pilot phase of the project will inform the next phase of the</p><p>project.</p><p>Our efforts to disseminate our ideas and experiences range from the</p><p>local to the global. We introduced our LC project and preliminary find-</p><p>ings to our VC program colleagues with the aim of generating interest in</p><p>collaborating on the larger scale of the project through realizing its poten-</p><p>tial for curriculum development, instruction, and research. Additionally,</p><p>we have engaged in dissemination efforts, which include presentations at</p><p>144</p><p>professional organization annual conferences11, with the intent of sharing</p><p>the insights gained from our pilot and sharing our preliminary findings.</p><p>As for the next steps in VanCor itself, we plan to implement the proj-</p><p>ect by inviting all VC instructors as collaborators and thus expand the na-</p><p>ture of the student texts included in the LC. A higher number of instructor</p><p>collaborators across all VC programs will allow us to collect texts from,</p><p>ideally, all courses included in first year programs. This scope of text types</p><p>will result in a diversity of genres across several disciplinary fields, expand-</p><p>ing the potential contributions of the corpus to research.</p><p>Acknowledgments</p><p>The UBC graduate research assistant positions for this project,</p><p>fulfilled by Greta Perris and Sara van Dan Acker, were supported by an</p><p>institutionally administered Social Sciences and Humanities Research</p><p>Council (SSHRC) grant (UBC Explore SSHRC grant), for which we are</p><p>very grateful. Simone Sarmento would like to thank the support of CNPq</p><p>Productivity Grant and CAPES Print. Our gratitude also goes to the stu-</p><p>dents who participated in this pilot phase of our project, generously shar-</p><p>ing samples of their writing. We would also like to thank Brian Wilson,</p><p>Curriculum Manager at VC, for his feedback and advice on survey design;</p><p>Dr. Shelley Staples, CROW and MACAWS project leader, and two of her</p><p>team members, Dr. Bruna Sommer-Farias and Dr. Nina Conrad; for their</p><p>time to meet with us and give us access to their project materials and shar-</p><p>ing their lessons learned with us; and Dr. Susan Conrad, leader of the Civil</p><p>Engineering corpus project, for making available for us a number of helpful</p><p>teaching materials derived from that corpus.</p><p>11 Zappa-Hollman, S., Ferreira, A. A., Perris, G. & Matte, M. L. (March 2022). De-</p><p>signing a local learner corpus for pedagogical applications and research. Paper presen-</p><p>tation at the Virtual TESOL Annual Convention.</p><p>145</p><p>References</p><p>Blanchard, K. & Root, C. (2017). Ready to write 3: From paragraph to essay (4th</p><p>Edition). Pearson.</p><p>Coffin, C. (1997). Constructing and giving value to the past: An investigation into</p><p>secondary school history. In F. Christie & J. R. Martin (Eds.), Genre and institutions:</p><p>Social processes in the workplace and school (pp. 196-230). Cassell</p><p>Conrad, S. (2017). A comparison of practitioner and student writing in civil engineer-</p><p>ing. Journal of Engineering Education, 106, 191-217. doi:10.1002/jee.20161.</p><p>Duff, P. (2008). Case study research in applied linguistics. Lawrence Erlbaum/Taylor &</p><p>Francis.</p><p>Duff, P. A., Ferreira, A. A. & Zappa-Hollman, S. (2015). Putting (functional) gram-</p><p>mar to work in content-based English for academic purposes instruction. In M. A.</p><p>Christison, D. Christian, P. A. Duff, & N. Spada (Eds.), Teaching and learning English</p><p>grammar: Research findings and future directions: A festschrift for Betty Azar (pp.139–</p><p>158). Routledge.</p><p>Ferreira, A. A. (2020). Sociocultural development in the spectrum of concrete and</p><p>abstract ideation, Mind, Culture, and Activity, 27(1), 50-69, doi:10.1080/10749039.20</p><p>19.1686027</p><p>Ferreira, A. & Zappa-Hollman, S. (2019). Disciplinary registers in a first-year pro-</p><p>gram. A view from the context of curriculum. Language, Context and Text, 1(1), 148-</p><p>193. https://doi.org/10.1075/langct.00007.fer</p><p>Gardner, S. & Nesi, H. (2013). A classification of genre families in university student</p><p>writing. Applied Linguistics, 34(1), 25-52. https://doi.org/10.1093/applin/ams024</p><p>Granger, S. (2002). Computer learner corpora, second language acquisition and foreign</p><p>language teaching. John Benjamins Publishing Company.</p><p>Granger, S. (2009). The contribution of learner corpora to second language acquisition</p><p>and foreign language teaching: A critical evaluation. In K. Aijmer (Ed.), Corpora and</p><p>language teaching. John Benjamins, 13–32. https://doi.org/10.1075/scl.33.04gra</p><p>Granger, S. (2015). The contribution of learner corpora to reference and instructional</p><p>materials design. In Granger, S., Gilquin, G. & Meunier, F. (eds.) The Cambridge hand-</p><p>book of learner corpus research. Cambridge University Press, pp. 486-510.</p><p>https://www.jbe-platform.com/content/journals/25897241</p><p>https://www.jbe-platform.com/content/journals/25897241/1/1</p><p>https://www.jbe-platform.com/content/journals/25897241/1/1</p><p>https://doi.org/10.1075/langct.00007.fer</p><p>https://doi.org/10.1093/applin/ams024</p><p>https://doi.org/10.1075/scl.33.04gra</p><p>146</p><p>Granger, S., Dupont, M., Meunier, F., Naets, H. & Paquot, M. (2020). The International</p><p>Corpus of Learner English. Version 3. Presses universitaires de Louvain. Available at:</p><p>https://uclouvain.be/en/research-institutes/ilc/cecl/icle.html</p><p>Graves, R. (2017). Writing assignments across five academic programs. In R. Graves &</p><p>T. Hyland (Eds.). Writing assignments across university disciplines. Trafford Publishing.</p><p>Graves, R. & Hyland, T. (Eds.) (2017). Writing assignments across university disciplines.</p><p>Trafford Publishing.</p><p>Halliday, M. A. K. (1993). Towards a language-based theory of learning. Linguistics</p><p>and Education, 5, 93–116. doi:10.1016/0898-5898(93)90026-7</p><p>Halliday, M. A. K. (1998). Things and relations: Regrammaticising experience as tech-</p><p>nical knowledge. In J. R. Martin & R. Veel (Eds.), Reading science: Critical and func-</p><p>tional perspectives. Routledge. pp. 185–235.</p><p>Halliday, M. A. K. & Matthiessen, C. M. I. M. (2014). Halliday’s introduction to func-</p><p>tional grammar (4th ed.). Routledge.</p><p>Kang, J. (2016). A functional approach to the status of theme and textual development.</p><p>Theory and practice in language studies, 6(5), 1053-1059. http://dx.doi.org/10.17507/</p><p>tpls.0605.20</p><p>Matthiessen, C. M. I. M. (2006). Educating for advanced foreign language capacities:</p><p>Exploring the meaning-making resources of languages systemic-functionally. In H.</p><p>Byrnes (Ed.), Advanced language learning: The contribution of Halliday and Vygotsky</p><p>(pp. 31–57). London, UK: Continuum.</p><p>Oshima, A. & Hogue, A. (2005) Writing</p><p>academic English (4th ed.). Pearson-Longman.</p><p>Römer, U. & O’Donnell, M. B. (2011). From student hard drive to web corpus (part</p><p>1): The design, compilation and genre classification of the Michigan Corpus of Upper-</p><p>level Student Papers (MICUSP). Corpora, 6(2), 159-177. doi:10.3366/cor.2011.0011</p><p>Staples, S., Novikov, A., Picoral, A. & Sommer-Farias, B. (2019-). Multilingual</p><p>Academic Corpus of Assignments – Writing & Speech (MACAWS). Available at</p><p>https://macaws.corporaproject.or</p><p>Staples, S. & Dilger, B. (2018). Corpus and repository of writing [learner corpus articu-</p><p>lated with repository]. Available at https://crow.corporaproject.org</p><p>Zappa-Hollman, S. (2018). Collaborations between EAP and disciplinary instruc-</p><p>tors: Factors and indicators of positive partnerships. International Journal of Bilingual</p><p>Education and Bilingualism, 21(5), 591-606. doi:10.1080/13670050.2018.1491946</p><p>https://uclouvain.be/en/research-institutes/ilc/cecl/icle.html</p><p>http://dx.doi.org/10.17507/tpls.0605.20</p><p>http://dx.doi.org/10.17507/tpls.0605.20</p><p>http://dx.doi.org/10.3366/cor.2011.0011</p><p>https://macaws.corporaproject.org</p><p>https://crow.corporaproject.org</p><p>147</p><p>Zappa-Hollman, S. & Fox, J. (2021). Engaging in linguistically responsive instruc-</p><p>tion: Insights from a first-year university program for emergent multilingual learners.</p><p>TESOL Quarterly, 55(4), 1081-1091. https://doi.org/10.1002/tesq.3075</p><p>Zappa-Hollman, S., Ferreira, A. A., Perris, G. & Matte, M. L. (March 2022). Designing</p><p>a local learner corpus for pedagogical applications and research. Paper presentation at</p><p>the Virtual TESOL Annual Convention.</p><p>Appendix A</p><p>Vantage Corpus of Student Texts Across Disciplines Project Survey</p><p>[Embedded institutional student consent form included here in original</p><p>survey. The survey can be completed only after students provide informed</p><p>consent]</p><p>Part 1 - Demographic information</p><p>Q1 What is your name? (as it appears in your UBC ID)</p><p>Q2 Please write down your preferred e-mail so that we can contact you:</p><p>Q3 Please confirm your email:</p><p>Q4 What Vantage Program are you in?</p><p>• Arts</p><p>• Science</p><p>• Engineering</p><p>Q5 How old are you?</p><p>• 17 to 19 years old</p><p>• 20 to 22 years old</p><p>• older than 22</p><p>https://doi.org/10.1002/tesq.3075</p><p>148</p><p>Q6 What is your preferred gender?</p><p>• Male</p><p>• Female</p><p>• Other ______________</p><p>Q7 What is (are) your native language(s)? You can select one or more, as it</p><p>applies to you, to a maximum of three.</p><p>▢ Arabic</p><p>▢ Cantonese</p><p>▢ Farsi</p><p>▢ French</p><p>▢ German</p><p>▢ Hindi</p><p>▢ Indonesian</p><p>▢ Japanese</p><p>▢ Korean</p><p>▢ Malay</p><p>▢ Mandarin</p><p>▢ Mongolian</p><p>▢ Portuguese</p><p>▢ Russian</p><p>▢ Spanish</p><p>▢ Other ____________________</p><p>Q8 How many years of high-school education did you complete in English?</p><p>None</p><p>• 1</p><p>• 2</p><p>• 3</p><p>• 4</p><p>• More than 4</p><p>Q9 In what country did you receive your high-school diploma? (If none of</p><p>the countries apply to you, please select Other at the end of the list.)</p><p>149</p><p>• Brazil</p><p>• Cambodia</p><p>• Canada</p><p>• Chile</p><p>• China</p><p>• Taiwan</p><p>• Ecuador</p><p>• Egypt</p><p>• France</p><p>• Germany</p><p>• Hong Kong</p><p>• India</p><p>• Indonesia</p><p>• Iran</p><p>• Japan</p><p>• Korea</p><p>• Macao</p><p>• Malaysia</p><p>• Mexico</p><p>• Mongolia</p><p>• Panama</p><p>• Russia</p><p>• Other</p><p>Q9a Other: In what country did you receive your high-school</p><p>diploma?</p><p>End of Part 1 - Demographic Information (participants complete this</p><p>once)</p><p>Part 2 - Assignment Information and upload</p><p>Q10 Would you like to upload another assignment?</p><p>• Yes</p><p>150</p><p>• No</p><p>Q11 Vantage College Corpus of Texts Across Disciplines Assignment in-</p><p>formation and uploading. Please, answer the following questions and then</p><p>upload your assignment.</p><p>You will be prompted to answer the same questions for every assignment</p><p>you upload.</p><p>Q12 Assignment upload:</p><p>Is this a single document?</p><p>• Yes</p><p>• No</p><p>Q12a You can upload only one document at a time. Please select an-</p><p>other document and continue answering the questions.</p><p>Q13 Are you the only author of this assignment?</p><p>• Yes</p><p>• No</p><p>Q13a You can only submit an assignment completed by you only.</p><p>Please select another assignment that you completed by yourself.</p><p>Assignments completed together with your peers or classmates as</p><p>part of pair/group work cannot be accepted.</p><p>Q14 Have you received a grade for this assignment?</p><p>• Yes</p><p>• No</p><p>Q14a You can only submit assignments that have been graded. Please</p><p>select an assignment you have completed by yourself and for which</p><p>you received a grade.</p><p>151</p><p>Q15 Course you completed this assignment for:</p><p>(NOTE: you can only upload assignments submitted only for the courses</p><p>listed below)</p><p>Course</p><p>• VANT 140</p><p>• WRDS 150</p><p>• ASTU 204</p><p>Q16 What grade did you receive for this assignment?</p><p>• 90 - 100</p><p>• 70 - 89</p><p>• 60 - 69</p><p>• 50 - 59</p><p>• below 50</p><p>• Prefer not to answer</p><p>Q17 Upload your assignment:</p><p>Q18 If available, please upload the instructions you received to complete</p><p>this assignment.</p><p>End of Part 2 - Assignment Information and upload</p><p>Part 3 - Interview Invitation</p><p>Q19 Thank you for uploading your assignment(s).</p><p>How easy or difficult was it to answer the questions and upload the</p><p>assignment?</p><p>• Extremely easy</p><p>• Somewhat easy</p><p>• Neither easy nor difficult</p><p>• Somewhat difficult</p><p>• Extremely difficult</p><p>152</p><p>Q20 Would you be available to participate in a 30 minute interview to share</p><p>your experience in this pilot project and to share with us information about</p><p>the process of writing your assignment(s)?</p><p>For your participation in the interview you will receive a $20 UBC Bookstore</p><p>web gift card.</p><p>Do you want to participate?</p><p>• Yes, please send me more information about the interview.</p><p>• No</p><p>Q21 Is this the email you would like to be contacted at: [email entered by</p><p>participant]?</p><p>• Yes</p><p>• No</p><p>Q22 Please, provide your preferred e-mail so that we can send you more</p><p>information about the interview.</p><p>End of Part 3 - Interview Questions</p><p>153</p><p>Appendix B</p><p>VanCor Metadata Annotated Coding Sheet</p><p>Date coded: [yyyy/mm/dd]</p><p>Coder: [Name of person who coded]</p><p>Project: [name of LC project]</p><p>Assignment UID: [unique ID assigned to text being coded]</p><p>• Date submitted to instructor: [yyyy/mm/dd]</p><p>• Date submitted to VanCor: [yyyy/mm/dd; this is the dat the assign-</p><p>ment was uploaded by the participant to the Qualtrics survey]</p><p>Vantage Program: [select what applies]</p><p>Science</p><p>Engineering</p><p>Arts</p><p>Type of course: [Include here dropdown menu with list of courses from the</p><p>corresponding program]</p><p>• EAP Writing course</p><p>• LLED 200</p><p>• LLED 201</p><p>• EAP disciplinary-linked course</p><p>• VANT 140</p><p>• Other writing and communication course</p><p>• ASTU 204</p><p>• WRDS 150</p><p>• Disciplinary courses</p><p>Semester:</p><p>W1 [September-December]</p><p>W1-2 [September-April]</p><p>W2 [January-April]</p><p>S [May-July]</p><p>Course length in weeks: [include number of weeks]</p><p>Demographic Info:</p><p>• Age:</p><p>• Gender:</p><p>• Native language(s):</p><p>154</p><p>• Years of high-school education in English:</p><p>• Country received HS diploma:</p><p>Assignment</p><p>• Grade received:</p><p>• Percentage of final grade:</p><p>• Researcher’s rating:</p><p>• Assignment instructions provided?</p><p>• Yes</p><p>• No</p><p>• Genre:</p><p>• Instructor’s label if provided: [this refers to the way the instructor</p><p>called the genre of the assignment; e.g., annotated bibliography]</p><p>• Student’s label if provided: [this refers to how the student may</p><p>have labeled the genre of the assignment; e.g., “in this discus-</p><p>sion” - this is determined by looking at “clues” related to the over-</p><p>all structure of the text; e.g., “On the one hand…on the other</p><p>hand…”]</p><p>• Researcher’s label: [use SFL-based classification]</p><p>• Is this assignment a component of a larger assignment? Yes/No</p><p>No</p><p>Yes: (link to genre of final assignment) (e.g. Results part of IMRD)</p><p>Length/# words:</p><p>Title:</p><p>Visuals included in the text? (e.g., figures, images, symbols, tables, graphs):</p><p>No</p><p>Yes</p><p>Completed In-class?</p><p>Yes</p><p>No</p><p>Completed out of class?</p><p>Timed</p><p>Not timed</p><p>155</p><p>The role of genre in academic language use: the case of</p><p>Critiques and Case Studies in BAWE</p><p>Marine Laísa Matte (UFRGS)</p><p>Deise Amaral (UFRGS)</p><p>Larissa Goulart (Montclair State University)</p><p>Introduction</p><p>As users of any language, we know that different linguistic features</p><p>are employed when we write or speak for different</p><p>purposes. When we write</p><p>a Facebook message, for instance, we use colloquial linguistic resources,</p><p>like contractions and subject omission, that we do not include in a course</p><p>paper (Biber, 2006). In other words, texts with different communicative</p><p>purposes aimed at different interlocutors adopt distinct linguistic features</p><p>in order to convey these purposes. However, it is only recently that academ-</p><p>ic discourse started to be treated not as a single unit; instead it has been</p><p>shown that there is variation between different genres in academic writing</p><p>(Biber, 2006; Biber & Gray, 2016; Hardy & Friginal, 2016; Staples et al.,</p><p>2016, 2018; Staples & Reppen, 2016). Undergraduate argumentative essays</p><p>and research articles, for instance, are both part of what we call academic</p><p>discourse; nevertheless, these two genres have distinct characteristics (i.e.,</p><p>length, methodology description, the use of visual elements, etc.), which</p><p>are reflected in the language used in their texts.</p><p>Although the texts required by teachers in university settings are re-</p><p>ferred to as assignments or course papers, in university writing variation</p><p>becomes even more salient as the required texts can vary from laboratory</p><p>reports to case studies or explanations. Gardner and Nesi (2013) suggest</p><p>that some university assignments are written in preparation for profession-</p><p>al practice (Case Studies, Designs, Proposals, among others) while others</p><p>156</p><p>are written as a form of showing independent reasoning and of developing</p><p>critical thinking (Essays, Critiques, etc). The goal of this paper is to inves-</p><p>tigate how linguistic features vary in two academic genres of unpublished</p><p>university writing: Case Studies and Critiques. The research questions to</p><p>be answered are:</p><p>a) To what extent is there linguistic variation between Case Studies and</p><p>Critiques?</p><p>b) How is this variation reflected in the way different academic language</p><p>features are used to express the communicative purposes of Case</p><p>Studies and Critiques?</p><p>Academic writing and genre/register1 studies</p><p>Academic writing is usually considered more complex than writing</p><p>in non-academic contexts. But what does complex mean? In this paper, we</p><p>align with Biber and colleagues’ definition of complexity (e.g., Biber et al.,</p><p>2021), where grammatical complexity is defined as the addition of optional</p><p>structural elements to simple phrases and clauses. Biber et al. (2011) used</p><p>corpus-based analyses to contrast the grammatical complexity of academic</p><p>research articles and conversation through 28 lexico-grammatical features</p><p>associated with structural complexity in previous studies (Biber, 1988,</p><p>1 The terms “genre” and “register” have been used alternately depending on the time</p><p>the research study has been produced and/or on the distinct conceptualizations they</p><p>represent. Most researchers choose not to make any distinction between the terms</p><p>and use one or the other without specifying the construct being followed. However,</p><p>when they are theoretically distinct, genre studies tend to focus their analyses on “the</p><p>conventional structures used to construct a complete text within the variety” (Biber &</p><p>Conrad, 2009: 2) while register studies analyses search for “characteristic lexico-gram-</p><p>matical linguistic features” (Biber, 2006: 11). Both perspectives look for “linguistic va-</p><p>rieties associated with particular situations of use and particular communicative pur-</p><p>poses” (Biber, 2006: 11). According to Berber Sardinha, “register has been proposed as</p><p>a central construct in corpus linguistic research (Biber, 2012) and as the driving force</p><p>behind the analysis, rather than as an afterthought: ‘the practice advocated […] is to</p><p>begin a research study with the hypothesis that […] register differences exist, and to</p><p>include analysis of those differences unless they are empirically shown to be unimport-</p><p>ant’ (Biber, 2012: 34).” (Berber Sardinha, 2014: 241).</p><p>157</p><p>1992, 2006; Biber et al., 1999). Their findings show that conversation is</p><p>characterized by clausal elaboration (complement clauses, adverbials, etc.),</p><p>while academic writing contains more phrasal compression (e.g., nominal-</p><p>ization, non-finite clauses, etc.). This means that the phrasal constructions</p><p>seen in “the use of different transformations would have significant effects</p><p>on our perceptions of spatial patterns in kelp holdfast assemblages” (Biber</p><p>et al., 2011: 27) are characteristic of academic language writing, such as</p><p>prepositional phrases modifying the noun (of different transformations),</p><p>attributive adjectives (different, significant, spatial), and nominal premodi-</p><p>fiers (kept holdfast).</p><p>Several researchers have focused their attention on the analysis of</p><p>grammatical complexity to account for language development in L1 (Biber</p><p>et al., 2011; Ansarifar et al., 2018) or L2 writing (Bulté & Housen, 2018;</p><p>Goulart, 2020; Kuiken & Vedder, 2019). Ansarifar et al. (2018) compared</p><p>99 MA-level abstracts and 64 PhD dissertation abstracts written by L1</p><p>Persian writers with 149 research article abstracts by expert writers in</p><p>Applied Linguistics. The authors found that phrasal features would develop</p><p>along university years of study, with the MA group differing significantly</p><p>from the expert writers, while the PhD abstracts did not show such a differ-</p><p>ence in relation to the published articles, corroborating Biber et al.’s (2011)</p><p>hypothesized stages of complexity development.</p><p>The use of phrasal constructions is also discussed in Staples et al.</p><p>(2016). The authors conducted a study on academic writing development</p><p>by analyzing texts retrieved from BAWE2 organized in four levels of study</p><p>(three years of undergraduate and first year of MA), four different genres</p><p>(Essays, Critiques, Case Studies and Explanations), and in the four disci-</p><p>plinary groups present in BAWE (Arts and Humanities, Social Sciences,</p><p>Life Sciences, and Physical Sciences), meaning that three separate analyses</p><p>were done. By including 23 linguistic features that previous research has</p><p>shown to account for grammar complexity (Biber et al., 2011; Biber & Gray,</p><p>2013; Biber et al., 2016), their study corroborates the assumption that there</p><p>2 The British Academic Written English corpus (BAWE - Nesi et al., 2008-2010) is</p><p>a collection of proficient texts written by university students from 2004 to 2007 with</p><p>representation of discipline areas and genre families.</p><p>158</p><p>is a high incidence of phrasal features in advanced academic writing - as</p><p>students gain experience, their writing tends to become more compressed,</p><p>that is, less explicit, with a preference for using more phrasal features which</p><p>“are more economical and allow writers to package information more</p><p>densely” (Staples et al., 2016: 179).</p><p>In their analyses of disciplines and genres, Staples et al. (2016) con-</p><p>clude that the writers in Arts and Humanities used more clausal features,</p><p>such as finite clauses (e.g., although the number of participants in the coup it-</p><p>self was indeed small) than the ones in Life and Physical Sciences, who tend</p><p>to use more phrasal features, such as attributive adjectives (e.g., unique, ef-</p><p>ficient). That implies that, in BAWE, Case Studies present more phrasal fea-</p><p>tures than Critiques, as most of the Case Studies are from these two areas</p><p>of the hard sciences. This coincides with the results of a study of academic</p><p>sub-genres in Biber and Gray (2016) which finds that research articles in</p><p>Humanities use more clausal modification while the ones in the Natural</p><p>Sciences rely on phrasal structures modifying nouns (e.g., patient report).</p><p>Other studies have considered genre differences in analyses of lexi-</p><p>co-grammatical variation, finding that certain language features are more</p><p>recurrent in specific situationally-defined varieties than in others (Biber,</p><p>2006, 2012; Biber & Conrad, 2009). From the studies that compare more</p><p>general registers, such as oral against written (Biber et al., 2011, 2016), to</p><p>investigations of differences among academic genres (Biber,</p><p>2006; Biber &</p><p>Gray, 2016; Hardy & Friginal, 2016; Staples et al., 2016, 2018; Staples &</p><p>Reppen, 2016), there seems to be a growing understanding that genre me-</p><p>diates variation in language. In the present study, although looking only at</p><p>genre variation in academic writing, we believe that as academic experi-</p><p>ence is gained, particular lexico-grammatical features are developed, like</p><p>phrasal constructions (Biber et al., 2011; Ansarifar et al., 2018; Staples et</p><p>al., 2016). Differently from many of the studies mentioned above that com-</p><p>pare levels of academic writing, from undergraduate to PhD and expert</p><p>published articles, we chose to work exclusively with texts written by MA</p><p>students, the highest level in BAWE, which might indicate that they have</p><p>already had a greater exposure to academic language when compared to</p><p>159</p><p>less experienced university students. Next, we present the corpus of study</p><p>as well as the methodological procedures.</p><p>Methodology</p><p>The corpus</p><p>In order to answer our research questions, we have selected two uni-</p><p>versity genres from the BAWE corpus, Critiques (CR) and Case Studies</p><p>(CS). According to Gardner and Nesi (2013), CS are written in order to</p><p>prepare students for professional practice and usually take large amounts</p><p>of data into account. The authors describe the purpose of CS as “to demon-</p><p>strate understanding of professional practice through the analysis of a sin-</p><p>gle example” (Gardner & Nesi, 2013: 37). CR, on the other hand, require</p><p>that students show informed and independent reasoning while also devel-</p><p>oping “understanding of the object of study and the ability to evaluate and/</p><p>or assess its significance” (Nesi & Gardner, 2012: 94). As discussed above,</p><p>we have selected CR and CS - two genres with different communicative</p><p>purposes - written by first year MA students. As for native language, we</p><p>included texts written in English as L1 and L2, without making the dis-</p><p>tinction, since our object of investigation is the development of academic</p><p>writing irrespective of L1.</p><p>Our final corpus contains 95 CS and 83 CR from the 4 disciplinary</p><p>groups in BAWE, Arts and Humanities (AH), Social Sciences (SS), Life</p><p>Sciences (LS), and Physical Sciences (PS). Table 1 describes the number of</p><p>texts, the number of words, and the average text length for each genre. As</p><p>this table shows, CS are somewhat longer texts than CR. In addition, we can</p><p>see a difference in length across disciplines, with SS CR being the shortest,</p><p>and PS CS the longest.</p><p>160</p><p>Genre Discipline</p><p>group Nr of texts Nr of words Mean text</p><p>length</p><p>Case study</p><p>AH - - -</p><p>LS 66 161,080 2,440.6</p><p>PS 10 38,139 3,813.9</p><p>SS 19 64,692 3,404.8</p><p>Total 95 263,911 2,778</p><p>Critique</p><p>AH 15 37,447 2,496.5</p><p>LS 30 73,306 2,443.5</p><p>PS 13 33,225 2,555.8</p><p>SS 25 48,572 1,942.9</p><p>Total 83 192,550 2,319.9</p><p>Table 1. Description of the corpus</p><p>These texts were tagged for a series of grammatical features associ-</p><p>ated with academic language using the Biber Tagger3 (Biber, 1988). The</p><p>features included in our analysis are described in the next section.</p><p>Lexico-grammatical features</p><p>The linguistic features used to contrast academic language in CR</p><p>and CS were chosen based on a review of studies that examined academic</p><p>writing in English as well as complexity features associated to academic</p><p>language (Biber, 2006; Goulart, 2020; Gray, 2015; Parkinson & Musgrave,</p><p>2014; Staples et al., 2016; Staples & Reppen, 2016). We have included phras-</p><p>al features (nouns - group, stance, abstract and cognitive nouns - plus stance</p><p>nouns followed by prepositional phrases, attributive adjectives, premodify-</p><p>ing nouns, and nominalizations), clausal features (verbs, subordinate claus-</p><p>es - causative, conditional and others -, that-complement clauses controlled</p><p>by verbs, and clausal coordinating conjunctions) and also intermediate</p><p>features encountered in these previous studies. Intermediate features are</p><p>3 Although originally designed to be used in Multidimensional Analysis studies, the</p><p>Biber Tagger has been recently used by researchers that analyze grammatical complex-</p><p>ity in L2 writing. (Biber et al., 2011; Biber et al., 2016).</p><p>161</p><p>clauses that add information to the noun phrase, such as relative claus-</p><p>es, or complement specific types of nouns. We have selected passive voice,</p><p>relative clauses, to-complement clauses controlled by verbs of desire and</p><p>stance nouns, and that complement clauses controlled by nouns - attitudi-</p><p>nal, stance and nouns of likelihood. The 234 linguistic features selected can</p><p>be seen in Table 2.</p><p>Linguistic feature ExamplePhrasal features</p><p>Nouns keyboard, engine, patient</p><p>Attributive adjectives short term, previous research</p><p>Nominalizations satisfaction, interference, invasion</p><p>Premodifying nouns customer satisfaction, patient report</p><p>Group nouns the committee took account of the severity</p><p>Stance nouns reason, claim, assumption</p><p>Abstract nouns a description of the progress</p><p>Cognitive nouns analysis, decision, concern, idea</p><p>Stance nouns followed by preposi-</p><p>tional phrases</p><p>advantages of having a large database</p><p>Clausal features</p><p>Verbs believe, make, propose</p><p>Subordinate clauses (causative) this cannot happen because there would be</p><p>arbitrage opportunities</p><p>Subordinate clauses (conditional) they are only acceptable if one first accepts</p><p>the existence of memes</p><p>Subordinate clauses (others) increments of twice the error until the func-</p><p>tion value goes positive</p><p>That verb complement clauses the findings show that cash flows map into</p><p>returns with significantly higher coefficients</p><p>Clausal coordinating conjunctions This can be applied for all multiple cylinders</p><p>engine but is more commonly found in four-</p><p>and six-cylinders engines</p><p>Intermediate features</p><p>Passive voice the high-level solution was selected</p><p>Non-finite to- verb complement</p><p>clauses controlled by verbs of desire</p><p>Those speculators intend to save money to</p><p>obtain benefits</p><p>4 The grammatical variable stance nouns is referred to as stance nouns in other con-</p><p>texts and new stance nouns in Table 3. Thus, Table 2 and 3 have 22 and 23 features,</p><p>respectively.</p><p>162</p><p>Wh-relative clauses potential flights to Cape town, which will be</p><p>stored for future access</p><p>That-relative clauses an antidepressant that has a low toxicity</p><p>That-noun complement clause con-</p><p>trolled by attitudinal noun</p><p>it is therefore no surprise that the main ob-</p><p>jective of the article is to aid conservation</p><p>agencies in their management of present</p><p>woodland</p><p>That-noun complement clause con-</p><p>trolled by noun of likelihood</p><p>because of the assumption that more is better</p><p>or that qualitative research is incomplete</p><p>Non-finite to- complement clauses</p><p>controlled by stance nouns</p><p>it might take time to change laws but that is</p><p>not a reason to inhibit new inventions</p><p>Table 2. Grammatical variables included in this study</p><p>Statistical analysis</p><p>The frequency of occurrence of each feature was normed to 10,000</p><p>to account for different text lengths (see Table 1). Since the data did not</p><p>meet the assumptions of normality and linearity, as shown by Shapiro-Wilk</p><p>normality tests, we ran Mann-Whitney U tests with the linguistic features</p><p>as dependent variables and the two genres as independent variables using</p><p>R version 4.0.5 (R Core Team, 2020), and calculated effect sizes5 using the</p><p>R Companion package (Mangiafico, 2015). The Mann-Whitney U tests in-</p><p>dicated whether there are statistical differences between genres for each</p><p>feature.</p><p>Results</p><p>The results of the analysis can be observed in Table 3, which contains</p><p>the descriptive statistics for each feature in both genres (Critiques and Case</p><p>Studies): means and medians of the features for each genre as well as the</p><p>Mann-Whitney U test results, the statistical significance (p-value), and the</p><p>effect size (rrb - rank biserial correlation).</p><p>5 “Effect size is a standardized measure, that is a measure comparable across differ-</p><p>ent studies that expresses the practical importance of the effect observed in the corpus</p><p>or corpora” (Brezina, 2018: 14)</p><p>163</p><p>Feature</p><p>Mean</p><p>(SD)</p><p>CR</p><p>Mean</p><p>(SD)</p><p>CS</p><p>Medi-</p><p>an CR</p><p>Medi-</p><p>an CS</p><p>Mann-</p><p>Whitney</p><p>U</p><p>Alpha</p><p>(p) rrb</p><p>nouns 328.79</p><p>(30.3)</p><p>343.49</p><p>(32.9) 327 342 2864 0.003* -0.222</p><p>attributive</p><p>adjectives</p><p>70.34</p><p>(17.2)</p><p>74.94</p><p>(12.7) 68.6 73.1 3028.5 0.014 -0.185</p><p>nominalizations 86.59</p><p>(23.1)</p><p>75.51</p><p>(18.3) 87.7 70.7 5085</p><p>conjunctions as clausal connectors are be-</p><p>tween brackets (displayed by an individual, [and] to the mental structures).</p><p>Below, the excerpts are presented and followed by a discussion.</p><p>On page 485 the authors state that they aim to argue against dry-</p><p>land farming. The first argument to support this is the pollen</p><p>record that shows the absence of woodland clearance. Then, al-</p><p>though the reader expects a second argument, there is a ^long^</p><p>^technical^ narration about alluvium phases at the end of which</p><p>the point is made that Neolithic sites were located near a then</p><p>^active^ floodplain… (6006bCR)</p><p>In the excerpt above, the writer begins with a direct reference to a</p><p>previous paper, a typical feature of CR, and reports the authors’ ideas using</p><p>the verbs “state”, “aim” and “argue”, followed by a that-complement clause, a</p><p>to-complement clause and a noun phrase, respectively. The second sentence</p><p>initiates the analysis of the argumentative structure of the text being eval-</p><p>uated, making use of a relative clause to explain the argument related to</p><p>the “pollen record”. The writer’s stance can be recognized through the con-</p><p>struction “although the reader expects” and the use of the attributive adjec-</p><p>tive in “long narration”, which together show criticism of the text analyzed.</p><p>Mayo and Jarvis referred to perception as “the process by which</p><p>an individual selects, organizes, [and] interprets information</p><p>to create a ^meaningful^ picture of the world.” Learning is</p><p>“changes in an individual’s behavior based on his experiences.</p><p>Personality refers to ‘the patterns of behavior displayed by an in-</p><p>dividual, [and] to the ^mental^ structures that relate experience</p><p>and behavior in an orderly way’. Motives are described as ‘the</p><p>^internal^ ^energizing^ forces that direct a person’s behavior</p><p>toward the achievement of ^personal^ goals. (3050bCR)</p><p>Even though in this second excerpt the same phenomenon of relying</p><p>on a previous paper can be observed, the purpose here is to demonstrate</p><p>comprehension of the concepts developed in previous work, another fun-</p><p>damental characteristic of Critiques (Gardner & Nesi, 2013). The concepts</p><p>168</p><p>definitions (“Personality refers to”, “Motives are”) are built with the use of</p><p>nominalizations and abstract nouns (“perception”, “personality”, “experi-</p><p>ence”, “achievement”), with that-relative clauses (defining “mental struc-</p><p>ture” and “energizing forces”) and with coordination using “and” as con-</p><p>junctions. These definitions corroborate that CR are a type of genre where</p><p>students make sense of phenomena and claims in their disciplines (Nesi</p><p>& Gardner, 2012: 37). Furthermore, this excerpt taken from a CR exem-</p><p>plifies what Ansarifar et al. (2018) and Staples et al. (2016) mention about</p><p>academic writing being characterized by phrasal features rather than claus-</p><p>al ones. Considering that CR are expected to engage students in critical</p><p>thinking more than CS, this argument is supported by the example above.</p><p>But according to Scott […] It is important not to ignore experi-</p><p>ence [but] recognise its constructed nature and the role played by</p><p>language and discourse (1992:25). In essence we need to be aware</p><p>that experience is factual and socially constructed. Experience</p><p>establishes the existence of individuals and operates within the</p><p>^ideological^ construction that makes the individual the ^start-</p><p>ing^ point of knowledge (Scott, 1992:27). The argument about</p><p>the truth in women’s account only validates the *claim* by sec-</p><p>ond wave feminists that women are ^different^. Feminists claim</p><p>that using women’s experience as a ^starting^ point is the only</p><p>option left for feminist researchers. (0402cCR)</p><p>In this excerpt, the author shows his/her understanding of the theo-</p><p>retical context in which the research area - feminist research - is developed,</p><p>which figures as one of the purposes of Critiques. We can easily notice that</p><p>this discussion of theory demands the constant use of abstract nouns and</p><p>nominalizations. The paraphrasing of one writer’s ideas in the first part</p><p>to validate the feminists’ claim constructs the argument in support of the</p><p>feminist point of view, emphasized by the sequence “only validates”. The</p><p>use of both the likelihood noun (“the claim […] that women are different”)</p><p>and the verb “claim” establishes a distance from the writer and a degree of</p><p>uncertainty in relation to the propositions in the relative and in the verb</p><p>complement clause (“claim that using women’s experience […] is the only</p><p>169</p><p>option left”) that follow. The first relative clause giving more information</p><p>about “construction” is typical of this type of academic discourse.</p><p>It is an ^enormous^ achievement that a project of such com-</p><p>plexity was completed both on schedule and within budget…</p><p>Another ^key^ success of the project was the company’s ^fi-</p><p>nal^ expenditure of only $470 million - approximately half of</p><p>the originally allocated budget. This is indicative of ^effective^</p><p>cost planning on Eiffage’s behalf. The ^low^ ^final^ expendi-</p><p>ture also suggests the company may have had a rather ^large^</p><p>Risk Budget in place to deal with uncertainty. The *fact* that</p><p>Eiffage were able to outbid the other three ^competing^ parties</p><p>to build the bridge while maintaining such a ^sizeable^ budget</p><p>demonstrates both ^effective^ cost planning analysis and risk</p><p>identification and assessment.” (0177aCR)</p><p>The excerpt above evaluates the performance of a company in a con-</p><p>struction project, one of the possible uses of the genre family Critique in</p><p>BAWE. The purpose of the evaluation is clear in the use of the attributive</p><p>adjectives, like in “enormous achievement” and “effective cost planning”, as</p><p>well as in the sequences with pre-qualifiers and adverbs, “rather large risk</p><p>budget”, “such a sizeable budget”, “such complexity” and “only $470 mil-</p><p>lion”. The relative clause defining the “achievement” of the company, with</p><p>the idioms “on schedule” and “within budget”, suggest a more informal</p><p>style, which is reinforced by the use of the emotive language also shown by</p><p>the choice of adjectives used. On the other hand, the complement clause</p><p>with the modal following the hedge verb “suggest” and the noun comple-</p><p>ment one initiated by “the fact that” both demonstrate a more distant style</p><p>helping the writer to express his opinion in a less direct way.</p><p>For the CS excerpts, the features are marked as follows: group nouns</p><p>are between asterisks (*laboratory*), verb complement clauses controlled by</p><p>verbs of desire are underlined and the verb is bold (prefer to opt), wh-rela-</p><p>tive clauses are underlined (which indicate), nominalizations are in italics</p><p>(surgery), premodifying nouns are between brackets ([work] hours), nouns</p><p>170</p><p>in bold (patients) and attributive adjectives between circumflexes (^nutri-</p><p>tional^ treatment).</p><p>Some patients prefer to opt for surgery at presentation rather</p><p>than ^pharmacological^ or ^nutritional^ treatment of ^un-</p><p>known^ duration. Unfortunately, there is no ^controlled^ data</p><p>to confirm the ^best^ approach for patients. There is a ^well-</p><p>known^ association between ^ulcerative^ colitis and an in-</p><p>creased risk of ^colorectal^ cancer, and patients with Crohn’s</p><p>disease are believed to be at ^increased^ risk of cancer of the</p><p>^small^ intestine. Studies have shown that the ^relative^ risk of</p><p>^colorectal^ cancer in patients with Crohn’s colitis is approxi-</p><p>mately 5.6 and should raise the same concerns as in patients with</p><p>^ulcerative^ colitis. (0203iCS)</p><p>The excerpt above is from a Case Study and some of the highlighted</p><p>features are different from the ones in the previous excerpts, which were</p><p>taken from Critiques. As Nesi and Gardner (2012: 40) state, CS are used “to</p><p>demonstrate/develop an understanding of professional practice through</p><p>the analysis of a single exemplar” and are common in the Health area. In</p><p>this part, the author discusses the possible treatment for a patient with</p><p>Crohn’s disease and the recommendation of surgery, a common stage of</p><p>CS. The regular presence of</p><p>and material de-</p><p>signers in the 20th century (e.g., University Word List by Xue & Nation, 1984), but the</p><p>AWL (Coxhead, 2000) was the first one based on a digitally compiled corpus. Xue and</p><p>Nation (1984) used previously composed lists, mainly put together manually (Cam-</p><p>pion & Elley, 1971; Ghadessy, 1979; Lynn, 1973; Praninskas, 1972, as cited in Gardner</p><p>& Davies, 2014).</p><p>17</p><p>workbooks, laboratory manuals, and course notes.4 This corpus included</p><p>3.5 million words, yielding a list with 570 word families. The AWL’s con-</p><p>tribution to EAP is undeniable, and it has been influential “in setting vo-</p><p>cabulary goals for language courses, guiding learners in their independent</p><p>study, and informing course and material designers in selecting texts and</p><p>developing learning activities” (Coxhead, 2000: 214). Criticisms, however,</p><p>have been leveled against the AWL, especially due to its use of word fami-</p><p>lies and its relationship to the GSL (Gardner & Davies, 2014). In addition,</p><p>it has been challenged due to its listing of individual words and its basis not</p><p>being an updated and larger corpus.</p><p>Other corpus-based studies have provided academic vocabulary lists</p><p>(Ackermann & Chen, 2013; Biber et al., 1999; Biber et al., 2004; Gardner &</p><p>Davies, 2014; Simpson-Vlach & Ellis, 2010)5 based on larger corpora than</p><p>the GSL and AWL and included information on word co-occurrence and</p><p>phraseology. The recognition of phraseology as a central element of lan-</p><p>guage is not novel in linguistics. Nearly 70 years ago, Firth (1957) claimed</p><p>that to understand a word, it is necessary to consider the other words it</p><p>co-occurs with. Sinclair’s (1991) groundbreaking work in corpus linguistics</p><p>using large collections of texts made it possible to find evidence of recurrent</p><p>patterns of words and constructions, which led him to propose the idiom</p><p>principle that “a language user has available to him or her a large number</p><p>of semi-preconstructed phrases that constitute single choices, even though</p><p>they might appear to be analyzable into segments” (p. 110). He further ex-</p><p>plored how this pervasive principle is productive in language in phrases</p><p>such as “set eyes on,” “it’s not in his nature to” (Sinclair, 1991: 111), “hard</p><p>work,” and “hard evidence” (p. 112), defining the term “collocation” as “the</p><p>occurrence of two or more words within a short space of each other in a</p><p>text” (p. 170). As Ellis (2008: 9) metaphorically puts it, phraseology is ev-</p><p>erywhere in language: “Like blood in systemic circulation it flows through</p><p>4 https://www.wgtn.ac.nz/lals/resources/academicwordlist/information/corpus</p><p>5 Even if some of these publications, such as Biber et al. (1999), did not have a major</p><p>goal of providing a list to EAP, as they carried out careful corpus-based research, they</p><p>presented results that can be sources for data-based language materials and classes.</p><p>18</p><p>heart and periphery, nourishing all.” Therefore, phraseology should be vital</p><p>to language teaching in general and to EAP in particular.</p><p>Biber et al. (1999) introduced a particular kind of phraseological</p><p>unit, which they termed lexical bundles. Lexical bundles are defined as</p><p>“the sequences of words that most commonly co-occur in a register” (Biber</p><p>et al., 1999: 989) and “serve the most important communicative needs</p><p>of a register” (Biber, 2009: 285). Biber et al. (1999) analyzed their use in</p><p>both conversation and academic prose, while Biber et al. (2004) showed</p><p>how these units are used in university classroom teaching and textbooks.</p><p>After generating a list of four-, five-, and six-word lexical bundles, Biber</p><p>et al. (1999) analyzed them from a structural perspective (e.g., dependent</p><p>clause fragment, such as know what I mean, and noun phrase of preposi-</p><p>tion phrase fragments, such as the end of the). As investigating the use of</p><p>lexical bundles can contribute to our understanding of language use, Biber</p><p>et al. (2004) presented not only structural, but also functional categories</p><p>of lexical bundles. This frequency-driven study followed specific criteria</p><p>for bundle inclusion for analysis—namely, a frequency cut-off point of</p><p>40 times per million words, a bundle word length of four, and the occur-</p><p>rence of the bundle in at least five different texts. Their corpus of class-</p><p>room teaching and textbooks includes 2,009,400 words, which is not bigger</p><p>than Coxhead’s (2000) corpus. Nevertheless, Biber et al. (2004) compared</p><p>their results to the Longman Spoken and Written English Corpus’s (Biber</p><p>et al., 1999) conversation section (7 million words of British and American</p><p>English) and academic prose section. One of their major contributions was</p><p>the detailed comparison across four registers (classroom teaching, text-</p><p>books, conversation, and academic prose), especially the presentation of</p><p>a functional categorization of the bundles, which was also used in Biber</p><p>(2006) to analyze other university registers (e.g., office hours, study groups,</p><p>service encounters). Bundles were classified into four functions: stance ex-</p><p>pressions (e.g., I don’t know if, it is important to), discourse organizers (e.g.,</p><p>if you look at, on the other hand), referential expressions (e.g., that’s one of</p><p>the, as a result of), and special conversation functions (e.g., I said to him/</p><p>her). Biber et al. (2004) did not claim that their study could generate an</p><p>academic list, but their results can inform EAP professionals of the most</p><p>19</p><p>important lexical bundles that students need to understand in both written</p><p>and spoken higher education English, which adds a register perspective to</p><p>our understanding of lexical bundle use.</p><p>The Academic Formula List (AFL; Simpson-Vlach & Ellis, 2010) ex-</p><p>panded on the functional taxonomy provided by Biber et al. (2004), com-</p><p>bining quantitative and qualitative criteria to include three to four n-grams</p><p>in their list, which is also devoted to English used in the university context.</p><p>Their methodology involved corpus statistics, linguistic analyses, psycho-</p><p>linguistic processing metrics, and EAP instructors’ and language testers’</p><p>insights, yielding a 435-lexical-bundle list. They used the Michigan Corpus</p><p>of Academic Spoken English (MICASE) and the oral academic part of the</p><p>British National Corpus (BNC), in addition to Hyland’s 2008 corpus and</p><p>written BNC files of various academic subjects. As the main purpose of</p><p>creating a list such as the AFL was pedagogical, it is a valuable resource</p><p>for EAP practitioners. The fact that they took into consideration profes-</p><p>sionals’ perceptions when selecting the bundles as a refinement of what</p><p>the quantitative analyses provided added pedagogical reliability to the list.</p><p>EAP practitioners can use this lexical bundle list to inform class activities</p><p>that go beyond the three major categories (referential expressions, stance</p><p>expressions, and discourse markers) identified in Biber et al. (2004) and</p><p>help learners develop an awareness of specific bundle functions as the AFL</p><p>includes 18 subcategories, such as referential expressions of tangible fram-</p><p>ing attributes (e.g., (as) part of [a/the], the change in), stance expressions of</p><p>hedging (e.g., (more) likely to (be), [it/there] may be), and discourse-orga-</p><p>nizing function expressions of metadiscourse and textual reference (e.g.,</p><p>come back to, I’m talking about). The list distinguishes bundles that are core</p><p>AFL, meaning both frequent in oral and written academic language (e.g.,</p><p>[a/the] result of), and bundles that are more frequent in either spoken (e.g.,</p><p>in order to get) or written texts (e.g., as a consequence).</p><p>Another important contribution to EAP has been Ackermann and</p><p>Chen’s (2013) Academic Collocation List (ACL) because it was based on</p><p>a large corpus, relied on both human judgment and quantitative analyses,</p><p>and focused on lexical collocations. They used a written curricular com-</p><p>ponent of the Pearson International Corpus of Academic English (PICAE)</p><p>20</p><p>comprising over 25 million words. Although Simpson-Vlach and Ellis</p><p>(2010) also incorporated EAP practitioners’</p><p>abstract nouns and nominalizations is compar-</p><p>atively lower than what can be observed in the examples of CR, but nouns</p><p>(in bold) are quite frequent. Moreover, in this example, there is no explicit</p><p>judgment on the previous studies of the area; instead, the writer refrains</p><p>from being conclusive about recommended treatment with the use of the</p><p>stance verbs in “studies have shown”, “patients with Crohn’s disease are be-</p><p>lieved to be” and “the relative risk […] should raise the same concerns”.</p><p>Here it is worth pointing out that, although not quantitatively analyzed,</p><p>stance verbs appear in both CR and CS, thus being a feature in common.</p><p>The ^above^ guidelines therefore suggest that Mr’s ischaemia</p><p>was most likely due to an ̂ embolic^ event. The fact that capillary</p><p>refill occurred, albeit delayed, in the ^right^ foot suggests that</p><p>either the obstruction was incomplete or that some collaterals</p><p>171</p><p>were available in order to maintain perfusion. Although it is im-</p><p>possible to determine the ^precise^ cause of the emboli without</p><p>further investigation, there was no evidence to suggest the pres-</p><p>ence of ^atrial^ fibrillation or an ^abdominal^ ^aortic^ aneu-</p><p>rysm.” (0047dCS)</p><p>In this excerpt, there is a clear reference to a particular case and the</p><p>writer relies heavily on the hedge verb “suggest” followed by that-verb com-</p><p>plement clauses or noun phrases. The clause “that either the obstruction was</p><p>incomplete or that some collaterals were available” is used to present possi-</p><p>ble reasons for the occurrence of capillary refill. The technical phrases with</p><p>attributive adjectives and nouns, such as in “embolic event”, “right foot” and</p><p>“atrial fibrillation” confirm Staples et al.’s (2016) claim that attributive adjec-</p><p>tives are frequently used in CS to help the description of technical terms, as</p><p>it is very common in Life and Physical Sciences” (p. 169), which is also the</p><p>case in assignments written by graduate students in BAWE.</p><p>In ^traditional^ ^hierarchical^ organization, promotion is one</p><p>of the most ^useful^ HR policies. In Oticon, people may be</p><p>[project] leaders for several times, but they are seldom promot-</p><p>ed. So the *organization* need to use other methods to compen-</p><p>sate, like arranging tailored [training] program for ^excellent^</p><p>staffs, or giving them the freedom to choose ^suitable^ tasks,</p><p>[work] hours, place of work and so on… (0166aCS)</p><p>This analysis of one company’s HR policies shows the writer’s knowl-</p><p>edge about the professional practices in his area and his capacity to analyze</p><p>and evaluate them. We can see here the use of the two features that were</p><p>significantly more frequent in CS than in CR, group or institution nouns -</p><p>“organization” - and to-clauses following verbs of desire – “need to use”. In</p><p>the first case, it is expected that institutions like companies or hospitals are</p><p>mentioned in these texts as they describe events and practices typical of</p><p>these contexts. As for the verbs of desire, they appear to be very useful for</p><p>the recommendations part of CS.</p><p>172</p><p>There are several ^salient^ features of the history which indicate</p><p>^Infective^ Endocarditis (IE) as the most ̂ likely^ cause for Mr’s</p><p>symptoms. […] In addition, the onset of [loin] pain may be due</p><p>to splenomegaly or ^immune-complex^ deposition, which is</p><p>commonly seen in IE. (0047fCS)</p><p>In this excerpt, there is a discussion on the medical history of a pa-</p><p>tient that suggests he might be suffering from a disease, namely Infective</p><p>Endocarditis (IE); this understanding is conveyed through the use of the</p><p>wh-relative clause starting with “which indicate”. Besides, the following sen-</p><p>tence introduced by “in addition” gives a further explanation of the symp-</p><p>toms and problems that are a result of IE. As can be observed, there are</p><p>occurrences of nouns (“onset”, “pain”, “splenomegaly”), premodifying nouns</p><p>(“loin”) and attributive adjectives (“immune-complex”) to explain and de-</p><p>tail the disease under discussion.</p><p>Based on these excerpts, it is worth highlighting that the uses of the</p><p>linguistic features match the purposes of both genres analyzed; CR evalu-</p><p>ates and reviews an object of study while CS demonstrates understanding</p><p>of professional practices (Gardner & Nesi, 2013). The first CR excerpt illus-</p><p>trates references and evaluations of an object of study; in the second one</p><p>the author makes sense of a specific object of study and the evaluation is</p><p>not direct; the same phenomenon happens in the third example in which</p><p>instances of stance utterances are observed; the fourth CR excerpt presents</p><p>an evaluation of a business issue.</p><p>When it comes to the CS excerpts, the purposes of this genre can</p><p>be observed as well in the use of the analyzed features. In the first excerpt,</p><p>there is an understanding of professional practice in the health field as well</p><p>as the use of hedge in recommending a treatment; the second one brings</p><p>even more distancing from explicit judgements at the same time as it dis-</p><p>cusses a specific case; the third excerpt also shows recommendations but in</p><p>the business area.</p><p>Overall, it is possible to affirm that the features under investigation</p><p>are used in order to convey particular meanings in each of the two genres</p><p>examined. This means that the way the linguistic features selected for this</p><p>173</p><p>study are used showed, above all, that they are at the service of the commu-</p><p>nicative purposes of Critiques and Case Studies, as described by Nesi and</p><p>Gardner (2012).</p><p>Conclusion</p><p>The objective of this paper was to analyze academic language features</p><p>in two university student genres, under the assumption that academic writ-</p><p>ing cannot be regarded as a homogeneous group. Thus, the first research</p><p>question asked to what extent there is linguistic variation in the compari-</p><p>son between Case Studies and Critiques. The results of Mann-Whitney U</p><p>tests revealed that most of the linguistic features analyzed appeared with a</p><p>higher incidence in CR than in CS, possibly due to the distinct purposes</p><p>of each genre. In CS, students show knowledge of a professional practice,</p><p>whereas in CR they must demonstrate informed and independent reason-</p><p>ing as well as understanding of the object of study while evaluating and/or</p><p>assessing its importance.</p><p>Regarding the second research question, which involved the under-</p><p>standing of the grammar features observed in this variation, some patterns</p><p>emerged, leading to the diverse usages of the same features according to the</p><p>communicative objectives of each genre. For instance, attributive adjectives</p><p>are used in both genres, but in CS it is possible to observe that this feature</p><p>helps in the description of issues, diseases or products, whereas in CR they</p><p>are commonly used to evaluate previous studies or to present and discuss</p><p>theoretical concepts.</p><p>As reported in the quantitative results, most of the differences be-</p><p>tween the two genres were not statistically significant, with the same fea-</p><p>tures being used in both; sometimes for the same purpose, as in the use of</p><p>stance features; others to convey different purposes in each genre, as is the</p><p>case of attributive adjectives previously mentioned. Below, we summarize</p><p>what some usages highlighted in the excerpts extracted from the corpus</p><p>might suggest:</p><p>174</p><p>• Attributive adjectives appear to be helping the descriptions in CS, such</p><p>as patients’ issues and diseases, or products in the business area, by</p><p>specifying technical terms; in CR they support the evaluation of previ-</p><p>ous studies.</p><p>• Abstract nouns and nominalizations are very frequent in both genres</p><p>and seem to be very useful to present and discuss theoretical concepts</p><p>in CR.</p><p>• Group nouns referring to institutions, companies or universities are</p><p>considerably more frequent in CS as this genre describes situations or</p><p>products that happen in or are produced by these institutions.</p><p>• Stance features and hedging are present in both genres but are very re-</p><p>current in CR to assess previous studies or one specific object of study.</p><p>• That-relative clauses and</p><p>noun complement clauses controlled by</p><p>nouns of likelihood are more common in CR and are used to explain or</p><p>define the object of study.</p><p>• To-clauses controlled by verbs of desire appear to help the recommen-</p><p>dations of CS.</p><p>This study has the limitation of analyzing a restricted set of linguistic</p><p>features and not a comprehensive amount of what has been proposed in</p><p>the grammatical complexity literature. As suggestions for future studies,</p><p>besides including more grammatical features in the investigation, it would</p><p>be of great value to expand the variety of genres under analysis in order to</p><p>explore other uses of the same features and others that were not includ-</p><p>ed in this study and that might contribute to the description of how these</p><p>features are employed in different genres. Also, it would be interesting to</p><p>include different levels of university study, apart from undergraduate and</p><p>graduate first years, in order to account for language development in aca-</p><p>demic writing.</p><p>Although with a small scope, the results of this investigation might</p><p>contribute to the understanding of how the expression of diverse commu-</p><p>nicative objectives is built in academic writing through its various genres.</p><p>175</p><p>References</p><p>Ansarifar, A., Shahriari, H. & Pishghadam, R. (2018). Phrasal complexity in academic</p><p>writing: A comparison of abstracts written by graduate students and expert writers in</p><p>applied linguistics. Journal of English for Academic Purposes, 31, 58–71. https://doi.</p><p>org/10.1016/j.jeap.2017.12.008</p><p>Berber Sardinha, T. (2014). 25 years later. In Berber Sardinha, T.; Veirano Pinto, M.</p><p>(eds.) Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber. (pp. 81-108).</p><p>John Benjamins.</p><p>Biber, D. (1988). Variation across Speech and Writing. Cambridge: Cambridge</p><p>University Press. doi:10.1017/CBO9780511621024</p><p>Biber, D. (1992). On the complexity of discourse complexity: A multidimensional</p><p>analysis. Discourse Processes, 15, 133–163. doi:10.1080/01638539209544806.</p><p>Biber, D. (2006). University Language: A Corpus-based Study of Spoken and Written</p><p>Registers. John Benjamins.</p><p>Biber, D. (2012). Register as a predictor of linguistic variation, Corpus Linguistics and</p><p>Linguistic Theory, 8(1), 9-37. doi: https://doi.org/10.1515/cllt-2012-0002</p><p>Biber, D. & Conrad, S. (2009). Register, Genre, and Style. Cambridge University Press.</p><p>Biber, D. & Gray, B. (2013). Discourse characteristics of writing and speaking task</p><p>types on the TOEFL iBT Test: A lexico-grammatical analysis (TOEFL iBT Research</p><p>Report iBT-19). Princeton, NJ: Educational Testing Service.</p><p>Biber, D. & Gray, B. (2016). Grammatical complexity in academic English: Linguistic</p><p>change in writing. Cambridge University Press.</p><p>Biber, D., Gray, B. & Poonpon, K. (2011). Should we use characteristics of conversation</p><p>to measure grammatical complexity in L2 writing development? TESOL Quarterly, 45,</p><p>5–35. https://doi.org/10.5054/tq.2011.244483</p><p>Biber, D., Gray, B. & Staples, S. (2016). Predicting Patterns of Grammatical Complexity</p><p>Across Language Exam Task Types and Proficiency Levels, Applied Linguistics, 37 (5),</p><p>October, 639–668, https://doi.org/10.1093/applin/amu059</p><p>Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. (1999). The Longman</p><p>grammar of spoken and written English. London, England: Longman.</p><p>https://doi.org/10.1016/j.jeap.2017.12.008</p><p>https://doi.org/10.1016/j.jeap.2017.12.008</p><p>https://doi.org/10.1093/applin/amu059</p><p>176</p><p>Biber, D., Gray, B., Staples, S. & Egbert, J. (2021). Theoretical and Descriptive Linguistic</p><p>Foundation of the Register-Functional Approach to Grammatical Complexity.</p><p>In: Biber, D., Gray, B., Staples, S. & Egbert, J. The Register-Functional Approach to</p><p>Grammatical Complexity (pp. 6-22). Routledge.</p><p>Brezina, V. (2018). Statistics in corpus linguistics: A practical guide. Cambridge</p><p>University Press.</p><p>Bulté, B. & Housen, A. (2018). Conceptualizing and measuring syntactic diversi-</p><p>ty. International Journal of Applied Linguistics, 28, 147–164. https://doi.org/10.1111/</p><p>ijal.12196</p><p>Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale,</p><p>NJ: Lawrence Erlbaum Associates, Publishers. https://doi.org/10.4324/9780203771587</p><p>Gardner, S. & Nesi, H. (2013). A classification of genre families in university student</p><p>writing. Applied Linguistics, 34(1), 25-52.</p><p>Gardner, S., Nesi, H. & Biber, D. (2019). Discipline, level, genre: Integrating situational</p><p>perspectives in a new MD analysis of university student writing. Applied Linguistics,</p><p>40(4), 646-674.</p><p>Goulart, L. (2020). Analyzing the patterns of lexico-grammatical complexity across</p><p>Graded Reader levels. Reading in a Foreign Language, 32(2), 83-103. http://hdl.handle.</p><p>net/10125/67375</p><p>Gray, B. (2015). Linguistic variation in research articles: When discipline tells only part</p><p>of the story. Amsterdam, Netherlands: John Benjamins.</p><p>Hardy, J. A. & Friginal, E. (2016). Genre variation in student writing: A multi-dimen-</p><p>sional analysis. Journal of English for Academic Purposes, 22, 119-131.</p><p>Kuiken, F. & Vedder, I. (2019). Syntactic complexity across proficiency and languag-</p><p>es: L2 and L1 writing in Dutch, Italian and Spanish. International Journal of Applied</p><p>Linguistics. Special Issue. https://doi.org/10.1111/ijal.12256</p><p>Mangiafico, S.S. (2015). An R Companion for the Handbook of Biological Statistics, ver-</p><p>sion 1.3.2. rcompanion.org/rcompanion/.</p><p>Nesi, H. & Gardner, S. (2012). Genres across the disciplines: Student writing in higher</p><p>education. Cambridge, UK: Cambridge University Press. Available from http://bit.ly/</p><p>slWnd5</p><p>https://doi.org/10.4324/9780203771587</p><p>http://hdl.handle.net/10125/67375</p><p>http://hdl.handle.net/10125/67375</p><p>https://doi.org/10.1111/ijal.12256</p><p>http://bit.ly/slWnd5</p><p>http://bit.ly/slWnd5</p><p>177</p><p>Nesi, H., Gardner, S., Thompson, P. & Wickens, P. (2008-2010). The British</p><p>Academic Written English (BAWE) corpus. Available from: https://www.coven-</p><p>try.ac.uk/research/research-directories/current-projects/2015/british-academic-</p><p>written-english-corpus-bawe/</p><p>Parkinson, J. & Musgrave, J. (2014). Development of noun phrase complexity in the</p><p>writing of English for Academic Purposes students. Journal of English for Academic</p><p>Purposes, 14, 48-59. https://doi.org/10.1016/j.jeap.2013.12.001</p><p>R Core Team (2020). R: A language and environment for statistical computing. R</p><p>Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.</p><p>org/.</p><p>Staples, S., Biber, D. & Reppen, R. (2018). Using corpus-based register analysis to ex-</p><p>plore the authenticity of high-stakes language exams: A register comparison of TOEFL</p><p>iBT and disciplinary writing tasks. The Modern Language Journal vol 102/2, pp. 310-</p><p>332. https://doi.org/10.1111/modl.12465</p><p>Staples, S. & Reppen, R. (2016). Understanding first-year L2 writing: A lexico-gram-</p><p>matical analysis across L1s, genres, and language ratings. Journal of Second Language</p><p>Writing, 32, 17e35. https://doi.org/10.1016/j.jslw.2016.02.002.</p><p>Staples, S., Egbert, J., Biber, D. & Gray, B. (2016). Academic writing development at the</p><p>university level: Phrasal and clausal complexity across level of study, discipline, and genre.</p><p>Written Communication, 33, 149–183. https://doi.org/10.1177/0741088316631527</p><p>https://www.coventry.ac.uk/research/research-directories/current-projects/2015/british-academic-written-english-corpus-bawe/</p><p>https://www.coventry.ac.uk/research/research-directories/current-projects/2015/british-academic-written-english-corpus-bawe/</p><p>https://www.coventry.ac.uk/research/research-directories/current-projects/2015/british-academic-written-english-corpus-bawe/</p><p>https://doi.org/10.1111/modl.12465</p><p>https://doi.org/10.1016/j.jslw.2016.02.002</p><p>178</p><p>Investigating Brazilian English Learners’ Use of Academic</p><p>Collocations: A Corpus-Based Study</p><p>Marine Laísa Matte (UFRGS/IFSul)</p><p>Simone Sarmento (UFRGS)</p><p>Introduction</p><p>Writing has a special role in academic contexts as it is one of the main</p><p>skills students have to master in order to achieve academic success (Biber &</p><p>Gray, 2016).</p><p>In addition, at the higher education (HE) level, academic lit-</p><p>eracies are being learned and tested all the time. New ways of constructing</p><p>knowledge are constantly being discovered (Lea & Street, 1998), and these</p><p>practices are necessarily dependent upon academic writing. In spite of the</p><p>importance writing plays in academic contexts, it is usually assumed that</p><p>students should already know the rules and conventions of this practice.</p><p>However, these rules are not transparent, forming what Lillis (2001) called</p><p>“practices of mystery”. For Lillis, students who are not familiar with aca-</p><p>demic writing conventions may have their participation in HE impaired.</p><p>Thus, these conventions should be explicitly taught since we cannot depend</p><p>on incidental learning or on a hidden curriculum, as students “must now</p><p>gain fluency in the conventions of English language academic discourses</p><p>to understand their disciplines and to successfully navigate their learning.”</p><p>(Hamp-Lyons, 2002: 1)</p><p>Academic language is a specific subset of general language, and dif-</p><p>fers considerably from the type of language used in daily life situations, not</p><p>only in terms of formality but also in terms of language features (Simpson-</p><p>Vlach & Ellis, 2010). The language features specific to academic contexts</p><p>may range from, for instance, choice of verb to combination of words,</p><p>i.e. collocations. Collocations are also important in general language, but</p><p>179</p><p>gain even more importance in academic registers. According to Sinclair</p><p>(1991: 110), any language user will have a repertoire of “a large number</p><p>of semi-preconstructed phrases that constitute single choices, even though</p><p>they might appear to be analysable into segments.” In other words, profi-</p><p>cient language users resort to collocations to convey meaning. Therefore,</p><p>mastering collocations is imperative for guaranteeing fluency in a text, as</p><p>writing proper academic English goes beyond knowing isolated words.</p><p>When it comes to judging text quality, one of the criteria a reader has in</p><p>mind, however unconsciously, is how conventionalized language is. This</p><p>conventionality is partly guaranteed by the appropriate use of collocations.</p><p>Bearing the importance of collocations for academic texts in mind,</p><p>the main goal of this study is to analyze how Brazilian students produce</p><p>collocations in academic texts written in English by comparing two corpo-</p><p>ra of unpublished texts: one with texts produced by Brazilians studying in</p><p>British universities (BrAWE) whose grades are unknown, and the reference</p><p>corpus with texts written by students from multiple nationalities studying</p><p>in British universities but which were graded with merit or distinction</p><p>(BAWE). The latter will be used as baseline data. The following research</p><p>questions will be addressed:</p><p>a) Is there a statistically significant difference in the frequency of the noun</p><p>nodes and their respective collocates in BrAWE and BAWE?</p><p>b) Are there differences in syntactic structures of collocations between the</p><p>two corpora?</p><p>Collocations</p><p>“You shall know a word by the company it keeps” is probably a sen-</p><p>tence that immediately comes to mind of anyone acquainted with colloca-</p><p>tional studies. This sentence formulated by J. Firth (1957: 11) has inspired</p><p>a great deal of research in the field, as it summarizes the core meaning of</p><p>collocations, i.e., the likelihood of two or more words occurring together</p><p>((Sinclair, 1991; Hill, 1999; Durrant, 2009). Sinclair (1991) proposed the</p><p>idea that language operates according to the open-choice principle and the</p><p>idiom principle. The former considers language as the result of complex</p><p>180</p><p>choices to complete each unit (word, phrase and clause) that composes a</p><p>text, i.e., all slots of a text can be filled with any word as long as grammati-</p><p>cality is preserved. The latter assumes that “a language user has available to</p><p>him or her a large number of semi-preconstructed phrases that constitute</p><p>single choices, even though they might appear to be analyzable into seg-</p><p>ments.” (Sinclair, 1991: 110).</p><p>Regarding language learners, evidence shows that they do make use</p><p>of collocations but tend to have a more limited repertoire of conventional</p><p>combinations (Granger, 1998; Lorenz, 1999; Nesselhauf, 2005). The com-</p><p>parison between native (NS) and non-native (NNS)1 collocational per-</p><p>formance is presented in Howarth (1998), who analyzes adult learners of</p><p>English writing academically in Social Sciences postgraduate courses and</p><p>focuses on the use of collocations composed of verb + noun. The study</p><p>reveals that the NNS “produced, on average, a much lower density of con-</p><p>ventional combinations (25%), suggesting either a generally lower level of</p><p>knowledge of collocations, or a lack of awareness of how to deploy them</p><p>appropriately, or both.” (Howarth, 1998: 36).</p><p>Granger (1998) analyzes intensifying adverbs ending in –ly that</p><p>function as amplifiers and modifiers as the nodes of the collocations. By</p><p>comparing a corpus of native English writers to a similar corpus of ad-</p><p>vanced French-speaking learners of English, the data revealed a statistically</p><p>significant overall underuse of amplifiers in the learner corpus. However,</p><p>when looking at some amplifiers individually, completely and totally were</p><p>overused by the learners, while highly was underused. Granger suggests</p><p>that this overuse can possibly be explained by the fact that these adverbs</p><p>have direct equivalents in French and, consequently, students choose to</p><p>translate them from French into English. Additionally, some combinations</p><p>with amplifiers such as acutely aware, bitterly disillusioned, gravely disor-</p><p>ganised, and steeply dipping are used exclusively by native speakers.</p><p>1 It is important to point out that most studies related to proper use of collocations</p><p>rely on a contrastive analysis between native speakers (NS) and non-native speakers</p><p>(NNS). However, in this study the comparison was not based on a NS vs. NNS dichot-</p><p>omy.</p><p>181</p><p>Collocations composed of adjective + noun or noun + noun are ana-</p><p>lyzed by Durrant and Schmitt (2009). The authors analyze a total of 96 texts</p><p>organized in two sets: one containing NNS texts and the other NS texts. By</p><p>classifying collocations into low-frequency and high-frequency and estab-</p><p>lishing collocational strength with t-score and Mutual Information mea-</p><p>sures2, they came to three main findings: Firstly, native writers use more</p><p>low-frequency combinations than non-natives. […] Secondly, non-native</p><p>writers make at least as much use of collocations with very high t-scores as</p><p>do natives. […] Thirdly, non-native writers significantly underuse colloca-</p><p>tions with high mutual information (MI)3 scores in comparison with native</p><p>norms (Durrant & Schmitt, 2009). These findings suggest that learners have</p><p>a tendency to repeat favored items, as they quickly pick up frequent collo-</p><p>cations because the less frequent and strongly associated items take longer</p><p>to acquire (Durrant & Schmitt, 2009). Ellis, Simpson-Vlach and Maynard</p><p>(2008) reinforce this idea that NS use a wider range of collocations, where-</p><p>as NNS tend to use collocations they encounter more frequently. The issue</p><p>of overusing collocations is discussed by Ackerman and Chen (2013: 3),</p><p>who argue that “by using a less appropriate collocate, a non-native speaker</p><p>will sound unnatural or may even become unintelligible among speakers</p><p>of the target language.”</p><p>Laufer and Waldman (2011) investigate verb + noun collocations</p><p>produced by L1-Hebrew learners of English. Besides comparing the learner</p><p>corpus to a NS one, the authors also compared the data within L1 Hebrew</p><p>learners of English represented in the corpus. Results indicated that the NS</p><p>produced almost twice as many collocations as the learners. Learners un-</p><p>derused verb + noun collocations when compared to NS and produced sig-</p><p>nificantly more deviant collocations. Advanced and intermediate learners</p><p>2 The t-score is an association measure that “highlights frequent combinations of</p><p>words. [H]owever</p><p>while all collocations identified by the t-score are frequent, not all</p><p>frequent word combinations have a high t-score. [On the other hand], MI-score is</p><p>negatively linked to frequency, meaning that the value is larger the more exclusively</p><p>the two words are associated and the rarer the combination is.” (Gablasova et al., 2017:</p><p>8-9).</p><p>3 MI is a measure of association between words. The higher the MI score, the stron-</p><p>ger the relation between the items (Church & Hanks, 1990)</p><p>182</p><p>were the ones who produced more deviant collocations, probably because</p><p>they feel more confident in relation to the English language when com-</p><p>pared to basic students.</p><p>Chinese learners of English and their use of collocations in academic</p><p>written texts were investigated by Wu (2016). The author analyzed verb +</p><p>adverb and adverb + verb collocations comparing three academic English</p><p>corpora, two of NS and one of NNs. Wu (2016) also shows that there are</p><p>significant differences in terms of collocations chosen by Chinese learners</p><p>of English who use, for instance, develop quickly, widely use and abolish</p><p>completely more frequently than NS do. This difference regarding lexical</p><p>competence and knowledge of collocations might be related to the fact that</p><p>the teaching of collocations is not common in China, and that Mandarin</p><p>and English have only few similarities.</p><p>Ohlrogge (2009) analyzed 170 written compositions written for an</p><p>EFL proficiency test and found correlations between level of proficiency</p><p>and collocations. Hence, the students who received higher grades pre-</p><p>sented a higher incidence of collocations. This follows what Crossley et al</p><p>(2015) state regarding the relation between proficiency and collocations.</p><p>After having investigated lexical proficiency in both oral and written texts</p><p>produced by learners of three different levels (beginning, intermediate and</p><p>advanced), raters judged the lexical proficiency according to analytical and</p><p>holistic features, one of them being collocations. Results indicate that high-</p><p>er proficiency writers tend to use a wider range of collocations than lower</p><p>proficiency writers, corroborating what was found in our study.</p><p>When it comes to the analysis of collocations used by Brazilian learn-</p><p>ers of English in academic genres, more specifically in argumentative es-</p><p>says, Guedes (2017) explored verb + adverbs ending in -ly collocations. The</p><p>author found that the most common verbs used by the learners are action</p><p>verbs (apply and provide). Also, there is a high frequency of verbs such as</p><p>improve, develop, and adopt among learners of English. On the other hand,</p><p>verbs such as increase, include, occur, reduce, and require are more frequent</p><p>in BAWE. Due to the low frequency of verb + adverbs ending in -ly their</p><p>collocational strength could not be statistically measured.</p><p>183</p><p>Matte and Rebechi (2019) analyzed the differences in the use of col-</p><p>locations of the Academic Collocation list (ACL)4 (Ackermann & Chen,</p><p>2013) in the same corpora used in the present study. Their results show that</p><p>only a few collocations of ACL are used differently in the comparative anal-</p><p>ysis of BAWE and BrAWE. Furthermore, the most frequent collocations in</p><p>both corpora are not exactly the same presented in the list, which suggest</p><p>a possible mismatch between what is presented in ACL and authentic lan-</p><p>guage produced by students both in BrAWE and BAWE.</p><p>There are ready-made lists containing relevant collocations and for-</p><p>mulas to be mastered, as those presented in the ACL (Ackermann & Chen,</p><p>2013) and the Academic Formulas List (AFL) (Simpson-Vlach & Ellis,</p><p>2010). However, despite the “progression in research from studies that pro-</p><p>vide evidence of the importance of collocations for L2 learners” (Boers &</p><p>Webb, 2018), it is necessary to create pedagogical materials that fit students’</p><p>needs. Thus, more than memorizing vocabulary and collocation lists, it is</p><p>imperative to master collocations in terms of knowing their appropriate</p><p>use, that is, collocational competence must be acquired in context. This</p><p>argument is sustained by Frankenberg-Garcia (2018: 101), who points out</p><p>that “the lexical knowledge is not just about understanding words, but also</p><p>about employing words in context.”.</p><p>The corpora</p><p>The BAWE corpus (Alsop & Nesi, 2009) was compiled with the ob-</p><p>jective of gathering unpublished written assignments from students of mul-</p><p>tiple nationalities studying5 in four different British universities: Warwick</p><p>University, Reading University, Oxford Brookes University, and Coventry</p><p>University. Unlike other academic corpora that are mostly composed of</p><p>texts written by experts and edited by professionals, the BAWE is com-</p><p>posed of discipline-specific learner texts. Despite containing students’ writ-</p><p>ing, this corpus is different from those compiled with essays written under</p><p>4 https://www.eapfoundation.com/vocab/academic/acl/</p><p>5 BAWE contains texts of undergraduate and master’s students.</p><p>184</p><p>examination conditions for analyzing non-native-speaker error and lan-</p><p>guage acquisition, as it contains assignments written during undergraduate</p><p>and master courses which were graded merit or distinction. The BAWE cor-</p><p>pus was, thus, designed to enable the investigation of academic literacy and</p><p>disciplinary knowledge development. BAWE has 6,968,089 words and it is</p><p>balanced into four areas6: Life Sciences (LS), Social Sciences (SS), Physical</p><p>Sciences (PS), and Arts and Humanities (AH). Each area encompasses a</p><p>variety of disciplines. Moreover, the corpus is organized according to 13</p><p>different academic genre families proposed by Gardner and Nesi (2013).</p><p>A total of 2,858 texts were compiled, being 1,953 written by L1 speakers</p><p>of English and the remainder by highly proficient English as an additional</p><p>language (EAL) students.</p><p>The Brazilian version of BAWE is BrAWE (the Brazilian Academic</p><p>Written English corpus) compiled by Goulart (2017). The organization</p><p>of the corpus is similar to the British one, as it covers the same areas of</p><p>expertise and gathers assignments produced by undergraduate students.</p><p>Therefore, BrAWE also follows Gardner and Nesi’s (2013) classification of</p><p>academic genre families, but only 12 categories were found. The final ver-</p><p>sion of the corpus contains 380 assignments of students from 59 universi-</p><p>ties. The high number of universities involved is due to the fact that most</p><p>of the students were participants of the Sciences without Borders (SwB)</p><p>program, which partnered with over 80 universities in the United Kingdom</p><p>alone. The SwB was a Brazilian scientific mobility program created in 2011</p><p>with the objective of strengthening and expanding the internationalization</p><p>of Brazilian higher education by providing scholarships for both students</p><p>and researchers.</p><p>Overall, engineering, natural sciences, and health sciences were the</p><p>areas covered by the SwB. Areas such as arts and humanities were not con-</p><p>templated by the program, but some texts from this area were included</p><p>in the corpus as some students from other mobility programs were also</p><p>contacted. Despite being comparable to BAWE, the corpus is unbalanced</p><p>in terms of size of subcorpora. Considering that Life Sciences (LS), Social</p><p>6 Alsop and Nesi (2009) refer to these areas as disciplinary groups.</p><p>185</p><p>Sciences (SS) and Physical Sciences (PS) are the most representative areas</p><p>in BrAWE, a subcorpus of BAWE was created in order to make it compa-</p><p>rable to the BrAWE corpus. Thus, whenever BAWE is mentioned, we are</p><p>referring to BAWE’s subcorpora that contain only assignments in the fields</p><p>of LS, SS, and PS.</p><p>BAWE BrAWE</p><p>Words 3,312,196 768,3237</p><p>Number of assignments 2,761 380</p><p>Quality of assignments Merit and distinction Passing (and higher)</p><p>Table 1. BAWE and BrAWE corpora</p><p>As stated above, the attested quality of assignments distinguishes</p><p>BAWE and BrAWE. In BAWE, students were attributed merit and distinc-</p><p>tion, whereas in BrAWE students may have obtained a passing grade by</p><p>the minimum requirement, which does not necessarily</p><p>mean that no one</p><p>wrote outstanding texts. Although grades were not given because of the</p><p>quality of language, one can speculate that language may indeed play an</p><p>important part in the quality of an assignment. According to Kumar and</p><p>Rao (2018: p. 9), “poor academic writing skills and lack of command over</p><p>the knowledge of English language” feature among the reasons why man-</p><p>uscripts are rejected. Therefore, due to the quality of texts, and to the high</p><p>level of English language proficiency of participants, BAWE may be con-</p><p>sidered an adequate reference corpus to fulfill the purposes of a contrastive</p><p>corpus analysis.</p><p>Methodological procedures</p><p>Collocations can be analyzed according to the frequency of the words</p><p>or to the strength of association between the composing words using statis-</p><p>tical measures, such as MI, t-score, Log Dice (Brezina, 2018). In this study,</p><p>7 The size of BrAWE in Sketch Engine is 768,323 rather than 670,314, as shown in</p><p>Table 3, because this software counts punctuation marks as words.</p><p>186</p><p>we used Log Dice to calculate the strength of association between words</p><p>since this is the default statistical measure of Sketch Engine, the software</p><p>used to extract the collocations.</p><p>Three different types of collocations8 were investigated: modifier +</p><p>noun, noun (subject) + verb, and verb + noun (object). For example,</p><p>• Modifier: adjectives that come before the node</p><p>Ex.: difficult + task, advanced + technique</p><p>• Verb (object of): used when the node is the object of the verb</p><p>Ex.: execute + task, apply + technique</p><p>• Verb (subject of): used when the node is the subject of the verb</p><p>Ex.: task + require, technique + use</p><p>These categories of collocates follow Frankenberg-Garcia et al.’s list</p><p>(2018) composed of 187 collocational nodes which is a merge of three</p><p>lists: the Academic Vocabulary List9 (AVL, Gardner & Davies, 2014),</p><p>the Academic Keyword List10 (AKL, Paquot, 2010), and the Academic</p><p>Collocations List (ACL, Ackermann & Chen, 2013). Of these 187 nodes</p><p>125 are nouns, 38 are verbs, and the remaining 24 are adjectives.</p><p>We focused on the identification of overused and underused academ-</p><p>ic noun-node collocations, through the comparison of two different corpo-</p><p>ra, the British Academic Written English corpus (BAWE) and the Brazilian</p><p>Academic Written English corpus (BrAWE). The cut-off point to include</p><p>a collocation in the study was a minimum frequency of four occurrences</p><p>in BAWE in at least two out of the three remaining areas, i.e. Life Sciences,</p><p>Health Sciences, and Social Sciences. Thus, collocations of one-single area</p><p>were not included, as it is the case of health need, a collocation that only</p><p>appears in LS assignments. The five methodological steps were:</p><p>8 The main word of a collocation is called node, and the ones associated to the node</p><p>are the collocates. Thus, the basic structure of a collocation is node + collocate.</p><p>9 Derived from BAWE.</p><p>10 https://uclouvain.be/en/research-institutes/ilc/cecl/academic-keyword-list.html</p><p>https://uclouvain.be/en/research-institutes/ilc/cecl/academic-keyword-list.html</p><p>187</p><p>1st: Listing in descending order the 125 nouns from the Frankenberg-</p><p>Garcia et al.’s list (2018) from the most to the least frequent in BAWE by</p><p>using the “search” tool in Sketch Engine11. The node was typed in the “lem-</p><p>ma” box and the PoS noun was selected. All the words that derive from the</p><p>base form of the node came up as a result, for example for approach, the</p><p>plural form – approaches – was also selected. This procedure was repeated</p><p>for every noun, i.e., for the 125 nodes.</p><p>2nd: Extracting the collocates of the 125 nodes in both corpora using</p><p>the “Word Sketch” tool. The following syntactic structures mattered to this</p><p>study: (different + approach), object of (verb) (use + approach), and subject</p><p>of (verb) (approach + involve). Again, the node was typed in the “lemma”</p><p>box in “word sketch”, and the PoS – noun was selected.</p><p>3rd: Calculating the Log Likelihood (LL) value (Rayson, 2002) for the</p><p>different frequencies of each one of the 125 nodes in both corpora. If the</p><p>outcome of the statistical test is 6.63 or higher, there is a 99% chance that</p><p>the results are not random (p</p><p>When comparing collocations composed of the nodes with statisti-</p><p>cally significant differences (the overused nodes and the ones composed of</p><p>the underused nodes), it is possible to observe a balance in terms of syntac-</p><p>tic structures in BAWE and in BrAWE, as shown below:</p><p>BAWE BrAWE</p><p>Modifier Verb</p><p>(object)</p><p>Verb</p><p>(subject)</p><p>Modifier Verb</p><p>(object)</p><p>Verb</p><p>(subject)</p><p>Overused</p><p>nodes</p><p>562</p><p>(50.4%)</p><p>423</p><p>(37.9%)</p><p>130</p><p>(11.6%)</p><p>219</p><p>(48.2%)</p><p>196</p><p>(43.1%)</p><p>39</p><p>(8.6%)</p><p>TOTAL 1115 454</p><p>Underused</p><p>nodes</p><p>468</p><p>(51.03%)</p><p>355</p><p>(38.7%)</p><p>94</p><p>(10.2%)</p><p>147</p><p>(49.3%)</p><p>134</p><p>(44.9%)</p><p>17</p><p>(5.7%)</p><p>TOTAL 917 298</p><p>Table 4. Syntactic structures of collocations in both corpora</p><p>Modifiers that precede the nodes are the most productive ones, with</p><p>50.4% and 48.2% of occurrences in BAWE and BrAWE with the overused</p><p>nodes, and 51.03% and 49.3% in BAWE and BrAWE with the underused</p><p>nodes. Subsequently, verb + node (object) collocations have the second</p><p>highest percentage of occurrences, with 37.9% in BAWE with overused</p><p>nodes and 43.1% in BrAWE with the same nodes. When it comes to the</p><p>underused nodes, the percentages are 38.7% and 44.9% in BAWE and in</p><p>BrAWE respectively. Node (subject) + verb collocations account for the</p><p>lowest percentages with both overused and underused nodes: 11.6% and</p><p>10.2% in BAWE, and 8.6% and 5.7% in BrAWE.</p><p>When analyzing the LL values of the nodes, there is a bigger difference</p><p>in the range of LL values of the underused nodes than with the overused</p><p>191</p><p>ones. Table 5 illustrates the LL values of the nodes with the most signifi-</p><p>cant differences in the comparison between both corpora. Considering that</p><p>BAWE is the reference corpus, the terms overused and underused refer to</p><p>the uses in BrAWE:</p><p>Overused Underused</p><p>Lowest LL Factor (7.06) Difficulty (-6.84)</p><p>Highest LL Example (370.55) Data (-615.78)</p><p>Table 5. Lowest and highest LL</p><p>Higher LL values indicate that the differences between the frequency</p><p>scores are more significant (Rayson, 2002). Table 6 shows collocations with</p><p>the node data (the underused node with the highest LL) in both corpora.</p><p>Differences can be observed not only in the total number of collocations</p><p>(55 in BAWE vs. 10 in BrAWE), but also in the syntactic patterns, since</p><p>90% of the words that collocate with data in BrAWE are verbs, as compared</p><p>to 63,6% in BAWE.</p><p>BAWE BrAWE</p><p>Modifier Verb</p><p>(object)</p><p>Verb</p><p>(subject)</p><p>Modifier Verb</p><p>(object)</p><p>Verb</p><p>(subject)</p><p>data 20</p><p>(36.3%)</p><p>23</p><p>(41.8%)</p><p>12</p><p>(21.8%)</p><p>1 (10%) 7 (70%) 2 (20%)</p><p>TOTAL 55 10</p><p>Table 6. Collocations with the node data</p><p>While 20 different modifiers12 collocate with data in BAWE, in</p><p>BrAWE the only modifier is “raw”. A possible explanation is that the assign-</p><p>ments which compose the BAWE corpus are mostly evidence-based stud-</p><p>ies, justifying the higher use of data. We can also speculate that Brazilian</p><p>students prefer not to characterize the type of data under analysis by using</p><p>the word individually rather than as part of a collocation. When it comes</p><p>12 experimental, empirical, quantitative, historical, available, raw, recent, sample,</p><p>past, primary, following, financial, other, survey, character, relevant, personal, import-</p><p>ant, actual, old.</p><p>192</p><p>to the verbs that combine with data, regardless of whether the node is the</p><p>object or the subject, the differences continue to be significant. Table 7</p><p>demonstrates the different behaviors:</p><p>BAWE BrAWE</p><p>Verb</p><p>(object)</p><p>use, collect, obtain, show, analyse, contain, pro-</p><p>vide, give, record, gather, transmit, compare, pres-</p><p>ent, produce, take, require, store, plot, interpret,</p><p>send, receive, need, fit</p><p>collect, obtain,</p><p>show, transmit,</p><p>store, plot, need</p><p>Verb</p><p>(subject)</p><p>show, suggest, use, collect, follow, gather, link,</p><p>seem, demonstrate, support, indicate, exist</p><p>show, seem</p><p>Table 7. Verbs that collocate with data</p><p>Among the nodes with statistically significant differences, difficulty</p><p>is the underused node with the lowest LL (-6.84). This means that overall</p><p>difficulty is underused in BrAWE in comparison to BAWE. Table 8 portrays</p><p>the syntactic structures of the collocations with this node.</p><p>BAWE BrAWE</p><p>Modifier Verb</p><p>(object)</p><p>Verb</p><p>(subject)</p><p>Modifier Verb</p><p>(object)</p><p>Verb</p><p>(subject)</p><p>difficulty 7 (46.6%) 8 (53.3%) 0 3 (37.5%) 5 (62.5%) 0</p><p>TOTAL 15 8</p><p>Table 8. Collocations with the node difficulty</p><p>In total, there are 15 different collocations in BAWE and eight in</p><p>BrAWE, with collocates in the modifier and verb (object) categories. While</p><p>seven different modifiers collocate before the node in BAWE, only three are</p><p>produced by Brazilians. As for the verbs that accompany the node when</p><p>it is the object, eight go together with difficulty in BAWE whereas five are</p><p>used in BrAWE, as shown in table 9:</p><p>193</p><p>BAWE BrAWE</p><p>Modifier Great, technical, financial, main,</p><p>economic, other</p><p>Great, main, other</p><p>Verb</p><p>(object)</p><p>Face, cause, encounter, experience,</p><p>pose, highlight, create</p><p>Face, cause, highlight, create</p><p>Table 9. Types of collocates with difficulty</p><p>Conclusion</p><p>This corpus-based study aimed to unveil the use of collocations</p><p>by Brazilians studying in British universities. To that end, a comparative</p><p>analysis of collocations of the Brazilian Academic Written English Corpus</p><p>(BrAWE; Goulart, 2017) and the British Academic Written English (BAWE;</p><p>Alsop & Nesi, 2009) was conducted.</p><p>Regarding the first research question Is there a statistically significant</p><p>difference in the frequency of the noun nodes and their respective collocates</p><p>in BAWE and BrAWE?, it is possible to state that from the 125 nodes an-</p><p>alyzed, 36 have a similar frequency in both corpora, 48 were underused</p><p>and 41 were overused in BrAWE. When it comes to the collocates, the 125</p><p>nodes produced 2,679 collocates in BAWE that met our inclusion criteria.</p><p>In BrAWE, only 1,015 collocates occur with the same 125 nodes. Out of</p><p>these collocates, 287 came up as having a statistically significant difference</p><p>in use while analyzing the behavior of the 125 nouns, being 190 underused</p><p>by Brazilians and 97 overused.</p><p>As for the second research question, Are there differences in syntactic</p><p>structures of collocations between the two corpora?, the data revealed that</p><p>from the 287 collocates which presented significant differences, 202 are</p><p>modifiers, 76 are verbs in the object position, and nine are verbs in the sub-</p><p>ject position. In both corpora modifiers account for half of the occurrences</p><p>(50.7% and 49.8% in BAWE and BrAWE respectively). Nodes as objects are</p><p>more frequent in BrAWE (46,7%) as compared to BAWE (39.1%), whereas</p><p>nodes as subjects are more preferred in BAWE (10.1%) than in BrAWE</p><p>(6.4%). This discrepancy might be related to the type of study conducted by</p><p>Brazilian students and to how proficient they are to employ different types</p><p>194</p><p>of verbs when nodes are used as subjects. For instance, studies conducted</p><p>by students who wrote texts that compose BrAWE may be of different na-</p><p>ture, thus the need to use a verb that best combines with the studies itself</p><p>(make process, conduct research). On the other hand, when choosing verbs</p><p>that are used after the node (subject of the sentence), their repertoire is</p><p>narrower.</p><p>Based on the comparison of the two corpora used in this study –</p><p>BAWE and BrAWE – we noted that academic collocations do not seem</p><p>to be fully mastered by Brazilian students who write academic texts. For</p><p>Sinclair (1991), learners operate more on the open choice principle than on</p><p>the idiom principle, producing fewer collocations or collocations that do</p><p>not sound natural. This lack of collocational competence was observed in</p><p>the reduced number of collocations in BrAWE (1,015) when compared to</p><p>BAWE (2,679) and in the number of outcomes that came up with statisti-</p><p>cally significant differences in the comparison between the data in the stud-</p><p>ied corpora. A node that illustrates this phenomenon is data, as displayed</p><p>in Tables 6 and 7, in which it is possible to observe that the number of</p><p>collocates used with data is significantly smaller in BrAWE than in BAWE.</p><p>The findings of this</p><p>study suggest that Brazilian students have a lim-</p><p>ited variety of vocabulary as long as collocations are concerned. It is our</p><p>belief that proper use of collocations is a major element in academic writ-</p><p>ing and should, thus, be treated as such in English teaching environments</p><p>(AlHassan & Wood, 2015; Li & Schmitt, 2009; Martinez & Schmitt, 2012).</p><p>For instance, the ones which are underused in BrAWE, such as design + sys-</p><p>tem, measured + value, good + value, decision-making + process, detailed +</p><p>analysis, further + analysis, empirical + data, and quantitative + data should</p><p>be addressed with Brazilian students.</p><p>As pointed out by Hyland and Hamp-Lyons (2002: 10), “EAP offers</p><p>the possibility of making even greater contributions to our understanding</p><p>of the varied ways language is used in academic communities to provide</p><p>even more strongly informed foundations for pedagogic materials.” Some</p><p>suggestions are given by Nesselhauf (2005: 253), for whom teaching col-</p><p>locations should begin with making students aware of this phenomenon.</p><p>AlHassan and Wood (2005) also support the idea that a focus on formulaic</p><p>195</p><p>sequences in teaching reveals a development in L2 writing proficiency.</p><p>Thus, a large repertoire of academic collocations improves students’ writ-</p><p>ing, making it more formulaic and fluent, as formulaic sequences (such as</p><p>collocations) provide fluency and conventionality to the language.</p><p>Considering that more information on the use of collocation by ac-</p><p>ademic English learners would help us to establish a greater degree of ac-</p><p>curacy on this matter, a natural progression of this work would be to thor-</p><p>oughly analyze and describe the collocates of all 125 nodes.</p><p>Acknowledgements</p><p>Marine Laísa Matte would like to thank CAPES for the financial sup-</p><p>port during her Masters. Simone Sarmento holds a CNPq research produc-</p><p>tivity scholarship level 1D.</p><p>References</p><p>Ackermann, K. & Chen, Y. H. (2013). Developing the Academic Collocation List</p><p>(ACL)–A corpus-driven and expert-judged approach. Journal of English for Academic</p><p>Purposes, 12(4), 235-247.</p><p>AlHassan, L. & Wood, D. (2015). The effectiveness of focused instruction of formulaic</p><p>sequences in augmenting L2 learners’ academic writing skills: A quantitative research</p><p>study. Journal of English for Academic Purposes, 17, 51-62.</p><p>Alsop, S. & Nesi, H. (2009). Issues in the development of the British Academic Written</p><p>English (BAWE) corpus. Corpora, 4(1), 71-83.</p><p>Biber, D. & Gray, B. (2016). Grammatical complexity in academic English: Linguistic</p><p>change in writing. Cambridge University Press.</p><p>Boers, F. & Webb, S. (2018). Teaching and learning collocation in adult second and</p><p>foreign language learning. Language Teaching, 51(1), 77-89.</p><p>Brezina, V. (2018). Statistics in corpus linguistics: A practical guide. Cambridge</p><p>University Press.</p><p>196</p><p>Church, K. W. & Hanks, P. (1990). Word association norms, mutual information, and</p><p>lexicography. Computational linguistics, 16(1), 22-29.</p><p>Crossley, S. A., Salsbury, T. & Mcnamara, D. S. (2015). Assessing lexical proficien-</p><p>cy using analytic ratings: A case for collocation accuracy. Applied Linguistics, 36(5),</p><p>570-590.</p><p>Durrant, P. (2009). Investigating the viability of a collocation list for students of English</p><p>for academic purposes. English for Specific Purposes, 28(3), 157-169.</p><p>Durrant, P. & Schmitt, N. (2009). To what extent do native and non-native writers</p><p>make use of collocations? IRAL-International Review of Applied Linguistics in Language</p><p>Teaching, 47(2), 157-177.</p><p>Ellis, N. C., Simpson-Vlach, R. I. T. A. & Maynard, C. (2008). Formulaic language</p><p>in native and second language speakers: Psycholinguistics, corpus linguistics, and</p><p>TESOL. Tesol Quarterly, 42(3), 375-396.</p><p>Firth, J. (1957). A Synopsis of Linguistic Theory, 1930-55. In Studies in Linguistic</p><p>Analysis (pp. 1-31). Special Volume of the Philological Society. Oxford: Blackwell.</p><p>[Reprinted as Firth (1968)]Frankenberg-Garcia, A. (2018). Investigating the colloca-</p><p>tions available to EAP writers. Journal of English for Academic Purposes, 35, 93-104.</p><p>Frankenberg-Garcia, A., Lew, R., Roberts, J. C., Rees, G. P. & Sharma, N. (2018).</p><p>Developing a writing assistant to help EAP writers with collocations in real time.</p><p>ReCALL, 31(1), 23-39.</p><p>Gablasova, D., Brezina, V. & McEnery, T. (2017). Collocations in corpus-based lan-</p><p>guage learning research: Identifying, comparing, and interpreting the evidence.</p><p>Language learning, 67(S1), 155-179.</p><p>Gardner, S. & Nesi, H. (2013) A classification of genre families in university student</p><p>writing. Applied Linguistics, v. 34, n. 1, p. 25-52.</p><p>Gardner, D. & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics,</p><p>35(3), 305-327</p><p>Goulart, L. (2017). Compilation of a Brazilian academic written English corpus.</p><p>Revista e-scrita: Revista do Curso de Letras da UNIABEU, 8(2), 32-47.</p><p>Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and</p><p>formulae. Phraseology: Theory, analysis, and applications, 145 - 160.</p><p>197</p><p>Granger, S., Dagneaux, E., Meunier, F. & Paquot, M. (Eds.). (2009). International cor-</p><p>pus of learner English (Vol. 2). Louvain-la-Neuve: Presses universitaires de Louvain.</p><p>Guedes, A. D. S. (2017). Verbos do inglês acadêmico escrito e suas colocações: um estudo</p><p>baseado em um corpus de aprendizes brasileiros de inglês. PhD Thesis. Universidade</p><p>Federal de Minas Gerais</p><p>Hill, J. (1999). Collocational competence. Readings in Methodology, 162.</p><p>Howarth, P. (1998). Phraseology and second language proficiency. Applied linguistics,</p><p>19(1), 24-44</p><p>Hyland, K. & Hamp-Lyons, L. (2002). EAP: Issues and directions. Journal of English for</p><p>academic purposes, 1(1), 1-12.</p><p>Kilgarriff, A., Rychly, P., Smrz, P. & Tugwell, D. (2004). The sketch engine. Information</p><p>Technology, 105, 116</p><p>Kumar, V. P. & Rao, C. S. (2018). A review of reasons for rejection of manuscripts.</p><p>Journal for research scholars and professionals of english language teaching, 8(2), 1-11.</p><p>Laufer, B. & Waldman, T. (2011). Verb‐noun collocations in second language writing:</p><p>A corpus analysis of learners’ English. Language Learning, 61(2), 647-672.</p><p>Lea, M. R. & Street, B. V. (1998). Student writing in higher education: An academic</p><p>literacies approach. Studies in higher education, 23(2), 157-172.</p><p>Li, J. & Schmitt, N. (2009). The acquisition of lexical phrases in academic writing: A</p><p>longitudinal case study. Journal of Second Language Writing, 18(2), 85-102.</p><p>Lillis, T. M. (2001). Student Writing: Regulation, Access, Desire. London: Routledge.</p><p>Lorenz, Gunter (1999). Adjective Intensification – Learners Versus Native Speakers: A</p><p>Corpus Study of Argumentative Writing. Amsterdam: Rodopi.</p><p>Martinez, R. & Schmitt, N. (2012). A phrasal expressions list. Applied linguistics, 33(3),</p><p>299-320</p><p>Matte, M. L. & Rebechi, R. R. (2018). A quantitative analysis of collocations in Brazilian</p><p>and British students’ academic writing. Entrepalavras, 9(2), 195-213</p><p>Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: John Benjamins.</p><p>198</p><p>Ohlrogge, A. (2009). Formulaic expressions in intermediate EFL writing assessment.</p><p>Formulaic language, 2, 375-86.</p><p>Paquot, M. (2010). Academic vocabulary in learner writing: From extraction to analysis.</p><p>London: Bloomsbury Publishing.</p><p>Rayson, P. (2002). Matrix: A statistical method and software tool for linguistic analysis</p><p>through corpus comparison. PhD Theses, Lancaster University.</p><p>Simpson-Vlach, R. & Ellis, N. C. (2010). An academic formulas list: New methods in</p><p>phraseology research. Applied linguistics, 31(4), 487-512.</p><p>Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.</p><p>Wu, J. (2016). A Corpus-Based Contrastive Study of Adverb + Verb Collocations in</p><p>Chinese Learner English and Native Speaker English. Master degree project. Stockholm</p><p>University.</p><p>Appendix 01</p><p>Node BAWE BrAWE</p><p>0 occur-</p><p>rences in</p><p>BrAWE</p><p>Node BAWE BrAWE</p><p>0 occur-</p><p>rences in</p><p>BrAWE</p><p>system 48 23 25 example 24 8 16</p><p>result 53 31 22 conclusion 8 6 2</p><p>value 50 23 27 conflict</p><p>7 2 5</p><p>figure 15 3 12 standard 25 8 17</p><p>process 52 20 32 reference 1 1 0</p><p>group 50 16 34 aspect 22 11 11</p><p>level 49 14 35 error 15 7 8</p><p>model 59 17 42 movement 3 1 2</p><p>develop-</p><p>ment 45 12 33 task 20 13 7</p><p>data 55 10 45 measure 25 0 25</p><p>information 51 21 30 importance 25 12 13</p><p>research 41 15 26 support 18 5 13</p><p>analysis 34 15 19 feature 23 5 18</p><p>rate 55 18 37 discussion 4 1 3</p><p>effect 53 22 31 perspective 6 1 5</p><p>method 51 19 32 influence 13 6 7</p><p>change 55 20 35 requirement 21 8 13</p><p>strategy 43 13 30 extent 8 5 3</p><p>199</p><p>factor 68 25 43 characteristic 23 3 20</p><p>control 31 7 24 interaction 6 2 4</p><p>use 45 21 24 author 2 1 1</p><p>policy 30 8 22 degree 10 5 5</p><p>theory 20 3 17 capacity 12 5 7</p><p>approach 32 13 19 understand-</p><p>ing 13 7 6</p><p>structure 26 11 15 concern 15 8 7</p><p>role 32 12 20 pattern 17 8 9</p><p>quality 29 16 13 reduction 10 5 5</p><p>difference 41 18 23 basis 9 4 5</p><p>function 28 12 16 definition 11 5 6</p><p>activity 37 11 26 procedure 9 5 4</p><p>organisation 16 5 11 trend 25 5 20</p><p>environ-</p><p>ment 31 6 25 consideration 12 2 10</p><p>resource 26 11 15 observation 5 3 2</p><p>type 34 11 23 potential 11 3 8</p><p>society 5 2 3 improvement 11 6 5</p><p>condition 46 16 30 purpose 7 2 5</p><p>production 34 7 27 finding 13 8 5</p><p>form 20 4 16 assumption 9 3 6</p><p>section 16 5 11 outcome 10 5 5</p><p>interest 23 7 16 aim 5 2 3</p><p>relationship 35 12 23 presence 6 3 3</p><p>source 25 13 12 consequence 9 3 6</p><p>impact 30 16 14 explanation 6 4 2</p><p>practice 18 5 13 implication 7 0 7</p><p>need 46 20 26 variation 9 4 5</p><p>growth 23 8 15 category 10 2 8</p><p>material 26 11 15 difficulty 14 8 6</p><p>period 14 5 9 description 6 3 3</p><p>increase 28 11 17 link 8 3 5</p><p>review 6 3 3 attempt 1 1 0</p><p>term 16 6 10 shift 5 2 3</p><p>solution 24 17 7 significance 1 0 1</p><p>individual 6 0 6 limitation 2 1 1</p><p>concept 18 10 8 proportion 7 5 2</p><p>demand 25 9 16 phenomenon 7 5 2</p><p>population 26 10 16 recognition 2 1 1</p><p>element 24 12 12 contrast 0 0 0</p><p>200</p><p>knowledge 23 8 15 contribution 5 3 2</p><p>introduc-</p><p>tion 3 0 3 alternative 4 4 0</p><p>benefit 35 15 20 insight 7 5 2</p><p>experience 17 6 11 tendency 1 1 0</p><p>technique 30 10 20 exception 1 1 0</p><p>range 21 9 12</p><p>TOTAL</p><p>BAWE BrAWE 0 occurrences in BrAWE</p><p>2679 1015 1664</p><p>Appendix 02: Types of collocates for each node</p><p>NODE</p><p>Modifier Object Subject Modifier Object Subject</p><p>BAWE BrAWE</p><p>system 9 22 17 4 11 8</p><p>result 20 23 10 11 14 6</p><p>value 19 23 8 7 15 1</p><p>figure 9 5 1 0 2 1</p><p>process 14 24 14 5 9 6</p><p>analysis 12 14 8 4 10 1</p><p>group 18 20 12 7 5 4</p><p>level 21 25 3 6 8 0</p><p>model 14 24 21 4 11 2</p><p>development 25 16 4 10 3 0</p><p>data 20 23 12 1 7 2</p><p>information 24 25 2 10 11 0</p><p>research 22 9 10 9 3 3</p><p>rate 24 24 7 7 10 1</p><p>effect 25 25 3 11 10 1</p><p>method 17 23 11 11 6 2</p><p>change 24 24 7 10 8 2</p><p>strategy 18 19 6 6 6 1</p><p>factor 25 25 18 14 8 3</p><p>control 14 16 1 3 5 0</p><p>use 25 18 2 8 12 1</p><p>policy 9 16 5 1 5 2</p><p>theory 3 9 8 0 3 0</p><p>approach 10 15 7 3 9</p><p>structure 10 15 1 3 8 0</p><p>role 18 14 0 8 4 0</p><p>201</p><p>quality 12 16 1 6 10 0</p><p>difference 24 15 2 9 9 0</p><p>function 11 16 1 5 7 0</p><p>activity 16 19 3 4 7 0</p><p>organisation 5 3 8 1 2 2</p><p>environment 23 7 1 4 2 0</p><p>resource 16 9 1 9 2 0</p><p>type 22 12 0 8 3 0</p><p>society 5 0 0 2 0 0</p><p>condition 25 17 4 11 4 1</p><p>production 22 9 3 4 3 0</p><p>form 16 4 0 3 1 0</p><p>section 10 1 5 3 1 1</p><p>interest 13 10 0 4 3 0</p><p>relationship 17 17 1 5 7 0</p><p>source 20 4 1 10 2 1</p><p>impact 21 8 1 12 4 0</p><p>practice 14 4 0 4 1 0</p><p>need 25 21 0 10 10 0</p><p>growth 13 10 0 5 3 0</p><p>material 14 9 3 7 4 0</p><p>period 11 2 1 5 0 0</p><p>increase 21 6 1 7 4 0</p><p>review 4 2 0 2 1 0</p><p>term 10 5 1 4 2 0</p><p>solution 9 13 3 5 9 3</p><p>individual 1 3 2 0 0 0</p><p>concept 9 9 0 2 8 0</p><p>demand 14 11 0 5 4 0</p><p>population 19 5 2 8 2 0</p><p>element 18 7 0 8 4 0</p><p>knowledge 13 10 0 3 5 0</p><p>introduction 3 0 0 0 0 0</p><p>benefit 2 19 2 8 6 1</p><p>experience 13 4 0 4 2 0</p><p>technique 18 10 3 4 5 1</p><p>range 14 7 0 6 3 0</p><p>example 14 8 2 5 3 0</p><p>conclusion 4 4 0 2 4 0</p><p>conflict 2 3 2 0 2 0</p><p>standard 12 12 1 5 2 1</p><p>202</p><p>reference 0 1 0 0 1 0</p><p>aspect 15 7 0 10 1 0</p><p>error 5 10 0 3 4 0</p><p>movement 2 1 0 1 0 0</p><p>task 11 8 1 8 4 1</p><p>measure 16 7 2 0 0 0</p><p>importance 11 14 0 6 6 0</p><p>support 12 6 0 1 4 0</p><p>feature 16 5 2 3 1 1</p><p>discussion 4 0 0 1 0 0</p><p>perspective 6 0 0 1 0 0</p><p>influence 11 2 0 6 0 0</p><p>requirement 15 6 0 5 3 0</p><p>extent 6 2 0 5 0 0</p><p>characteristic 17 5 1 2 1 0</p><p>interaction 4 1 1 2 0 0</p><p>author 2 0 0 1 0 0</p><p>degree 7 3 0 3 2 0</p><p>capacity 8 4 0 5 0 0</p><p>understanding 8 5 0 4 3 0</p><p>concern 12 3 0 6 2 0</p><p>pattern 12 5 0 5 3 0</p><p>reduction 4 6 0 2 3 0</p><p>basis 6 3 0 3 1 0</p><p>definition 8 3 0 2 3 0</p><p>procedure 5 3 1 1 3 1</p><p>trend 15 7 3 1 4 0</p><p>consideration 8 5 0 1 1 0</p><p>observation 2 3 0 1 2 0</p><p>potential 5 6 0 2 1 0</p><p>improvement 5 6 0 2 4 0</p><p>purpose 7 1 0 2 0 0</p><p>finding 5 4 4 2 3 3</p><p>assumption 4 4 1 1 2 0</p><p>outcome 5 6 0 4 1 0</p><p>aim 4 1 0 1 1 0</p><p>presence 2 4 1 0 2 1</p><p>consequence 9 0 0 3 0 0</p><p>explanation 3 3 0 2 2 0</p><p>implication 5 2 0 0 0 0</p><p>variation 6 3 0 2 2 0</p><p>203</p><p>category 8 1 1 2 0 0</p><p>difficulty 7 8 0 3 5 0</p><p>description 4 2 0 1 1 0</p><p>link 4 4 0 2 1 0</p><p>attempt 0 1 0 0 1 0</p><p>shift 1 4 0 0 2 0</p><p>significance 1 0 0 0 0 0</p><p>limitation 2 0 0 1 0 0</p><p>proportion 6 1 0 5 0 0</p><p>phenomenon 3 4 0 2 3 0</p><p>recognition 1 1 0 1 0 0</p><p>contrast* 0 0 0 0 0 0</p><p>contribution 4 4 0 2 1 0</p><p>alternative 2 2 0 2 2 0</p><p>insight 2 5 0 1 4 0</p><p>tendency 1 0 0 1 0 0</p><p>exception 1 0 0 1 0 0</p><p>TOTAL</p><p>1359</p><p>(50.7%)</p><p>1049</p><p>(39.1%)</p><p>271</p><p>(10.1%)</p><p>506</p><p>(49.8%)</p><p>444</p><p>(46.7%)</p><p>65</p><p>(6.4%)</p><p>2679 1015</p><p>*contrast is an academic noun classified in Frankenberg-Garcia et al.’s (2018)</p><p>study that does not have productivity in BAWE nor in BrAWE.</p><p>204</p><p>From corpus to classroom: evaluating</p><p>Web-based tools to teach collocations</p><p>Larissa Goulart (Montclair State University)</p><p>Maria Kostromitina (Northern Arizona University)</p><p>Jennifer Klein (Coconino Community College)</p><p>Introduction</p><p>Throughout the years, researchers have defined collocations in dif-</p><p>ferent ways. Men (2018) for instance, defines collocations as sequences of</p><p>words that are transparent in meaning (e.g. make a decision). Durrant and</p><p>Mathews-Aydınlı (2011: 60) on the other hand, focus their definition on</p><p>the linguistic aspect, stating that collocations are “successions of linguis-</p><p>tic entities that are best learned as integral wholes or independent entities</p><p>(…) (collocations) occur with sufficient frequency that their independent</p><p>learning will facilitate fluency”. These, sometimes conflicting, definitions</p><p>of collocations emerge from the different approaches used in the study of</p><p>collocations. Here we will focus on two of those: the phraseological ap-</p><p>proach and the frequency approach (Wolter & Gyllstad, 2013; Gablasova</p><p>et al., 2017).</p><p>Researchers such as Paquot and Granger (2012) have focused on the</p><p>phraseological approach to analyze collocations. Some examples of col-</p><p>locations in the phraseological approach are: face a problem, take a step,</p><p>and reach a conclusion (Paquot & Granger, 2012). As we can see, the phra-</p><p>seological approach focuses on the semantic relationship between words</p><p>in a collocation and their idiomatic nature. The frequency approach, in</p><p>contrast, focuses on which words frequently occur together, such as pre-</p><p>pare meals, fixed an error, and conquered the city. Research using the phra-</p><p>seological approach can also adopt measures of association, such as MI</p><p>205</p><p>(mutual information) scores or t-scores (see Brezina, 2018: 74 for a com-</p><p>plete description of measures of association). It is worth noting, however,</p><p>that some measures of association can be misleading. MI scores, for exam-</p><p>ple, identify words that occur together frequently, but do not necessarily</p><p>identify collocations that are frequent in the overall language. This distinc-</p><p>tion between the way collocations can be identified is reflected in colloca-</p><p>tion teaching materials. That is, when teaching collocations, some materials</p><p>focus on the most frequent constructions, while others focus on idioms.</p><p>This is one reason why it is important for teachers to be able to evaluate</p><p>ready-made tools for learning collocations, as teachers seek to teach the</p><p>most frequent collocations to their learners.</p><p>In addition to this divide between the phraseological and the frequen-</p><p>cy approach, there are other aspects that cause confusion when defining a</p><p>collocation. Within the frequency approach, many studies will focus exclu-</p><p>sively on collocations with lexical words, such as verb-noun combinations</p><p>(Boers et al., 2014;</p><p>Tsai, 2020) or adjective-noun combinations (Wolter &</p><p>Gyllstad, 2013). Another point of disagreement is how to account for inter-</p><p>vening words; some research allows for intervening words in a collocation</p><p>(bring to light) (Tsai, 2020), while others do not (give thanks) (Yamashita &</p><p>Jiang, 2010). Additionally, some researchers investigate n-grams as colloca-</p><p>tions, that is, they examine longer sequences of words in terms of colloca-</p><p>tional use (Gablasova et al., 2017).</p><p>Studies have also taken dispersion into account when defining col-</p><p>locations. Dispersion refers to the degree to which collocations are used</p><p>frequently in different texts in a corpus (Gablasova et al., 2017). This is</p><p>particularly important for language teachers, as teaching only the most fre-</p><p>quent collocations, without accounting for dispersion, could mislead the</p><p>learner to acquire a collocation that, in reality, only occurs in one particular</p><p>text, or in one particular discipline.</p><p>Collocation research has also been connected to L2 learning. One</p><p>relevant concept found in the domain of L2 writing research is the dis-</p><p>tinction between congruent and non-congruent collocations (Wolter &</p><p>Gyllstad, 2013; Yamashita & Jiang, 2010). Congruent collocations have</p><p>similar lexical elements as collocations in the learner’s first language, while</p><p>206</p><p>non-congruent collocations do not exist in the learner’s native language.</p><p>This distinction is especially relevant in research examining language trans-</p><p>fer from a learner’s L1 to an L2 and should be kept in mind when teaching</p><p>collocations in the L2.</p><p>Even though the appropriate use of collocations is usually associated</p><p>with native-like English proficiency (Bahns & Eldaw, 1993; Chen, 2011),</p><p>research in second language acquisition has shown that collocations can</p><p>be a challenge to language learners. Granger (1998), for example, shows</p><p>that L2 learners of English tend to use more collocations that are congruent</p><p>with collocations that exist in their native language. Nesselhauf (2011) also</p><p>finds the same results when examining the production of German learn-</p><p>ers of English suggesting that non-congruent collocations are considered</p><p>the most difficult ones for second language learners. Ellis (1996) argues</p><p>that L2 learners’ acquisition of formulaic sequences differs from that of</p><p>native speakers in the sense that native speakers process formulas relying</p><p>on semantic associations, while L2 learners rely on orthography and pho-</p><p>nology, driving them to, possibly, making incorrect associations based on</p><p>orthographic or phonological confusion. In a comparatively recent study,</p><p>Ellis et al. (2008) confirm that native speakers process formulas based on</p><p>different criteria than L2 learners; while the latter used formulas that are</p><p>more frequent, the former used formulas that had a stronger association</p><p>between words.</p><p>In sum, collocations present a challenge for learners due to four main</p><p>aspects: First, language learners have difficulty identifying exactly which</p><p>words collocate (Jiang, 2009), for example, there is no grammatical reason</p><p>why conduct research is more common than perform research. Second, a</p><p>node-word can have more than one collocate and each of these combina-</p><p>tions can have different meanings (Nesselhauf, 2003). One such case is the</p><p>word face, when it collocates with to face (as in face to face) it means stand</p><p>in front of, and when it collocates with away it means to look to the other</p><p>side. Third, collocations do not transfer from students’ first language. Chan</p><p>and Liou (2005), for example, point out that the difference between take</p><p>medicine and eat medicine is not clear for learners with a Chinese back-</p><p>ground because this difference does not exist in mandarin. Finally, Cobb</p><p>207</p><p>(2018) suggests that even though collocations are pervasive in language, it</p><p>is unlikely students will encounter them a meaningful number of times in</p><p>their classroom readings and textbooks in order for these structures to be</p><p>acquired in a classroom environment.</p><p>Considering the challenge that collocations can present to language</p><p>learners, researchers have proposed a number of ways to help learners ac-</p><p>quire these constructions. Cobb (2018) argues for the use of concordance</p><p>lines in the teaching of collocations because, in contrast to textbooks, con-</p><p>cordance lines give students standard associations that are possible in lan-</p><p>guage, while textbooks expose students to non-standard collocations in</p><p>exercises such as fill in the gap. Chan and Liou (2005) also highlight the</p><p>fact that without the use of concordancing tools it is unlikely that students</p><p>will encounter a collocation enough times to learn it inductively. Another</p><p>argument for the use of corpus tools in the learning and teaching of collo-</p><p>cations is that it allows learners to work independently, which is crucial for</p><p>the acquisition of collocations (Woolard, 2000; Conzett, 2000). In spite of</p><p>these arguments for the use of corpus tools in teaching collocations, Cobb</p><p>(2018) notes that most Computer Assisted Language Learning (CALL)</p><p>tools developed for English language learners focus on a single unit (i.e.,</p><p>only words), with concordancing tools that integrate collocations (and oth-</p><p>er formulaic language) remaining somewhat limited. Therefore, the goal of</p><p>this chapter is to evaluate what collocation tools have to offer to language</p><p>teachers and suggest tasks that integrate the use of these tools in the EFL</p><p>classroom.</p><p>Web-Based Learning tools for collocations</p><p>Web-based learning tools (WBLT), or learning objects, are “inter-</p><p>active, online learning tools that support the learning of specific concepts</p><p>by enhancing, amplifying, or guiding the cognitive processes of learners”</p><p>(Kay, 2011: 1849). WBLTs allow students to manipulate different aspects</p><p>of language in order to understand how language works. In the case of</p><p>collocations, this manipulation can be in the form of the node-word, the</p><p>208</p><p>position of the collocates, the types of texts in which they occur, among</p><p>other variables.</p><p>To date, most studies evaluating WBLTs have been conducted by</p><p>the tools’ developer, usually upon the launch of the tool (see Chen, 2011;</p><p>L’Huillier, 1990; Nesbitt, 2012, for examples). There are two issues with this</p><p>type of evaluation: first, this approach focuses on the evaluation of a single</p><p>tool at a time, therefore, it does not present comparisons between existing</p><p>tools. These comparisons are relevant to understand the tool that fits bet-</p><p>ter in a specific context. Second, since each researcher is conducting an</p><p>independent evaluation, the criteria set for evaluation varies widely. Chen</p><p>(2011), for instance, includes part of speech tagging, frequency summary,</p><p>retrieval speed, link to examples, search options and corpus size as their</p><p>criteria, while Nesbitt (2012) focuses only on the design of the WBLT.</p><p>These differences in evaluation criteria make it impossible to compare re-</p><p>sults across studies.</p><p>Evaluating several WBLTs using the same criteria allows teachers</p><p>and researchers to determine which tool is more appropriate for specific</p><p>learning contexts (i.e., teaching English to high school students, teaching</p><p>Academic English to L2 graduate students, etc). Kay and Knaack (2009)</p><p>also argue for the need of a structured and organized evaluation criteria</p><p>that can later be used for teachers to evaluate new tools as they appear on</p><p>the market. Therefore, we seek to propose an evaluation scheme that can be</p><p>used by teachers and tool developers to assess the applicability of their tools</p><p>for specific classroom contexts.</p><p>Previous studies such as Nurmukhamedov (2015) have investigated</p><p>collocation tools from a learners’ perspective; nevertheless, this author has</p><p>explored online collocation dictionaries and a printed version of word and</p><p>phrase. The current study seeks to evaluate WBLTs developed to teach col-</p><p>locations that are completely online and free to use. We believe that by eval-</p><p>uating these tools we can a) help inform the development</p><p>of better tools in</p><p>the future; b) inform teachers’ decisions of the tools to use in each context;</p><p>c) suggest tasks for integrating these tools in the classroom; and d) push</p><p>developers to make more information available as to how they developed</p><p>these tools.</p><p>209</p><p>Methods</p><p>Selecting the tools</p><p>The first step in addressing the research questions in the present</p><p>study consisted of selecting the web-based collocation tools for evaluation</p><p>and comparison. For this purpose, we developed specific inclusion criteria</p><p>for the tools to be selected for the evaluation. Thus, to be included in the</p><p>study, the web-based collocation tools had to meet the following criteria:</p><p>(1) be hosted on a specific website (i.e., online);</p><p>(2) be free to access;</p><p>(3) be corpus-based (i.e., grounded in corpus-based methodology)1;</p><p>(4) allow word searches.</p><p>These criteria allowed us to exclude such collocation software as</p><p>Antconc as it is not hosted online and generally requires installation on a</p><p>computer. Additionally, SketchEngine, while widely used in research, was</p><p>not included because it is a paid tool allowing only for a free 30-day trial</p><p>with limited access to its tools. We also excluded pre-made collocation lists,</p><p>such as a dictionary of collocations, and websites like CollocAid that do not</p><p>have an option to search for collocations for a word of interest. In the end,</p><p>five web-based collocation tools were identified for the evaluation: FLAX,</p><p>SKELL, Just the Word, Linggle, and Netspeak.</p><p>Evaluation Rubric</p><p>After the web-based collocation tools were selected, they were as-</p><p>sessed using an evaluation framework that was developed on the basis of: a)</p><p>existing rubrics for the evaluation of education tools (e.g., Rosell-Aguilar,</p><p>2017), b) findings of previous research that focused on language learning</p><p>apps overall. Broadly, research in learners’ evaluation of English-learning</p><p>computer or mobile apps has indicated that on top of the content quality,</p><p>1 The extent to which a tool was corpus-based or corpus-informed was determined</p><p>by reading the information available on the tool’s website. More specifically, we exam-</p><p>ined the source of the collocations presented to the user.</p><p>210</p><p>learners value the usability (also defined as operation and design) and</p><p>customization of a tool, as well as its ability to give feedback (Chen, 2016;</p><p>Smith & Ragan, 2004). In relation to usability, Rosell-Aguilar (2017) named</p><p>navigation, accessibility, clear instructions, and the quality of sound and</p><p>image among the factors that contribute to the success of a tool. Haughley</p><p>and Muirhead (2005) also propose that learning tools need to be linked</p><p>to the communicative experiences of the learners and thus encourage en-</p><p>gagement. Other categories that are often used to evaluate computer-based</p><p>language learning tools include the feasibility of a platform, such as its flex-</p><p>ibility and reaction speed (Nesbitt, 2012).</p><p>While the criteria above needed to be taken into consideration in</p><p>developing the evaluation rubric for the present study, we also accounted</p><p>for additional characteristics specific to collocations. McEnery et al. (2006)</p><p>provide a list of criteria that are required in collocation learning tools. They</p><p>emphasized that a tool has to provide substantial information about the</p><p>collocation and its use. For instance, a tool should allow its users to check</p><p>the frequency of a collocation and its distribution across source texts in</p><p>case the collocation is register-specific. A collocation tool should also re-</p><p>port on the statistical measure(s) (t scores or MI scores) and positional</p><p>information regarding the collocate to the node. Finally, learners should</p><p>be able to adjust the distance between collocating items (or the collocation</p><p>window) and between colligations (a type of collocation where lexical items</p><p>are tied to grammatical ones, e.g., verbs of perception colligate with object</p><p>and a non-finite verb complement clause) and collocations (McEnery et</p><p>al., 2006). In addition, Chan and Liou (2005) and Yoon and Hiverla (2004)</p><p>comment on the presentation of collocations, reporting that learners expe-</p><p>rienced difficulty with collocation tools as they presented cut-off sentences</p><p>in the concordancer and learners were unable to locate the appropriate col-</p><p>locates. Along with these essential features, Chen (2011) highlights the im-</p><p>portance of the corpora underlying a collocation tool, saying that the cor-</p><p>pus used in a tool needs to be large in size (more than 100 million words)</p><p>and pre-tagged for parts of speech as well as text registers.</p><p>Synthesizing and adapting the characteristics of language learning</p><p>tools and collocation apps that are pervasive across the existing frameworks,</p><p>211</p><p>we developed an evaluation rubric that consisted of four major categories:</p><p>content quality, interface, presentation of search outcomes, and feedback.</p><p>The first category addressed the concerns about the quality of the presented</p><p>collocations in terms of the underlying corpus research as well as the abil-</p><p>ity of the tool to be adjusted based on a learner’s needs. Thus, the content</p><p>quality was operationalized as the amount and quality of linguistic research</p><p>conducted and corpora used to create the tool. Additionally, the criterion</p><p>included the tool’s ability to account for register variation, to filter the pre-</p><p>sented collocations based on specific criteria, and to account for adjacent</p><p>words between the nodes in the search. The interface criterion encompassed</p><p>the usability of a tool, clearly defined menu options, navigation in the tool,</p><p>and user-friendliness (i.e., how many clicks does it take a learner to get to</p><p>the information they are looking for?). The third criterion, presentation of</p><p>search outcomes, involves the tool options related to the identification of</p><p>the collocations. The criterion accounts for the ability of a tool to provide</p><p>learners with the information about the part of speech of the collocates and</p><p>the frequency of the collocation. It also evaluates the flexibility in the way</p><p>learners can search for collocations including misspellings, the side of the</p><p>collocates in relation to the node, etc. Finally, the last criterion in the ru-</p><p>bric addressed the issue of feedback and assessed whether a tool provided</p><p>learners with an evaluation of the collocation they produced. The complete</p><p>evaluation rubric can be found in Appendix A.</p><p>Scoring</p><p>We evaluated each collocation tool in the study on a five-point Likert</p><p>scale. That is, each criterion in the rubric was assigned five points with one</p><p>being the lowest and five being the highest score a tool could receive. The</p><p>total score each tool could receive was 15 points. While the length of Likert</p><p>scales is often defined arbitrarily, five-point scales have been commonly</p><p>used in various domains of Second Language Acquisition (SLA) research,</p><p>such as speech comprehensibility and accentedness (Trofimovich & Isaacs,</p><p>2013), learners’ individual differences (MacIntyre & Vincze, 2017), and L2</p><p>writing (Becker, 2018) among others. Each tool was first evaluated by the</p><p>212</p><p>first and second author separately, after which the raters then met to dis-</p><p>cuss the discrepancies and the agreement reached 100%.</p><p>Results</p><p>Description and evaluation of tools</p><p>This section focuses on describing the five web-based tools and their</p><p>evaluation. The tools examined in this study were SKELL, FLAX, Linggle,</p><p>Just the Word, and Netspeak. First, each tool is described according to its</p><p>three aspects: a general description of the tool, searches using that tool, and</p><p>results of searches using the tool. Next, the reference corpus/corpora for</p><p>each tool are described, followed by examples of searches and results. Then,</p><p>a summary of the three main evaluation criteria (content quality, interface,</p><p>and presentation of search outcomes) is given for each tool.</p><p>SKELL</p><p>https://skell.sketchengine.eu/</p><p>SKELL, or Sketch Engine for Language Learning (Baisa & Suchomel,</p><p>2014), is a search engine that allows</p><p>judgment in selecting the</p><p>bundles, Ackermann and Chen’s (2013: 236) list considered human judg-</p><p>ment for both “the selection of lexical items for pedagogical purposes”</p><p>and “for the refinement for the final listing.” Their choice of creating a list</p><p>with collocations is based on several studies (i.e., Nation, 2001; Nesselhauf,</p><p>2003, 2005) that pointed out the relevance of teaching collocations as they</p><p>“are difficult to learn and retain even with the assistance of dictionaries”</p><p>(Ackermann & Chen, 2013: 246). Above all, Nation (2001) argued that the</p><p>frequency of academic collocations may not be enough for learning them</p><p>implicitly. The ACL comes in handy for EAP practitioners as it includes</p><p>2,468 entries categorized in four types: noun combinations (adjective +</p><p>noun or noun + noun; e.g., anecdotal evidence, assessment process); verb</p><p>+ noun / adjective combinations (e.g., gather information, seem plausible);</p><p>verb + adverb combinations (e.g., explicitly state, grow rapidly); and adverb</p><p>+ adjective combinations (e.g., highly controversial, (be) markedly different).</p><p>A crucial information of the ACL is the high percentage of occurrence of</p><p>noun combinations: 74.3% (adjective + noun = 71.8% and noun + noun</p><p>2.5%; Ackermann & Chen, 2013: 241), leading the authors to suggest that</p><p>both implicit and explicit collocation teaching is required to impact learn-</p><p>ers’ understanding and production of academic English with high infor-</p><p>mation load. These results support studies (Biber & Gray, 2010, 2016) that</p><p>show how compressed academic language is, which is an issue that will be</p><p>discussed in the next section.</p><p>Based on the 120-million-word academic subcorpus of the Corpus</p><p>of Contemporary American English interface (COCA; Davies, 2008), the</p><p>new Academic Vocabulary List (AVL) (Gardner & Davies, 2014) is an in-</p><p>valuable resource for EAP practitioners as it covers nine major disciplines</p><p>(i.e., education, history, business and finance, medicine and health, law and</p><p>political science, humanities, philosophy, religion and psychology, science</p><p>and technology, and social science). In addition, it has been integrated into</p><p>the COCA interface, allowing users to download it freely, input their texts,</p><p>and get information about the word(s) of focus in many different ways. The</p><p>search tool provides “(i) synonyms, (ii) definitions, (iii) relative frequency</p><p>21</p><p>across nine academic disciplines, (iv) the top collocates of the word, which</p><p>provide useful insights into meaning, usage, and phrasal possibilities, and</p><p>(v) up to 200 sample concordance lines” (Gardner & Davies, 2014: 325).</p><p>Above all, this powerful resource, integrated into a user-friendly interface,</p><p>grants students several possibilities to explore language, which could con-</p><p>tribute to a more confident use of academic English. In Almeida et al. (2023),</p><p>this interface is a tool to guide EAP students to reflect on the importance of</p><p>collocates and how register affects the choices language users make. They</p><p>can contrast examples from blogs, web, TV/movie, fiction, news, magazine,</p><p>spoken, and academic registers. The series of activities proposed in their</p><p>chapter uses information students can extract from accessing the “word”</p><p>tool (Figure 1) in COCA to understand in which register certain verbs are</p><p>more frequently used (e.g., achieve) and the noun collocates they often at-</p><p>tract. The tasks culminate in focusing on the verb–noun collocations in</p><p>the academic register. They lead students to fill in the blank of authentic</p><p>sentences extracted from COCA and, finally, create their own texts using</p><p>the verbs that more often occur in the academic register together with their</p><p>appropriate noun collocates.</p><p>Figure 1. Collocates of achieve in COCA.</p><p>Other corpus-based studies have also dealt with corpora that al-</p><p>low for the investigation of variation across disciplines, specifically lexical</p><p>22</p><p>bundle variation (Cortes, 2013; Hyland, 2008; Lake & Cortes, 2020; Reppen</p><p>& Olson, 2020). Differentiating between general and specific discipline lex-</p><p>ical bundles meets one of EAP’s demands to have materials for use in gen-</p><p>eral and specific EAP courses. Hyland (2008), who compiled a corpus of</p><p>articles, master’s degree theses, and doctoral-level dissertations written in</p><p>four areas (i.e., electrical engineering, biology, business studies, and applied</p><p>linguistics), discovered that more than 50% of the lexical bundles were not</p><p>common among the four areas. “The best candidate bundles for a general</p><p>EAP course are on the other hand, in the case of, as well as the, and the end of</p><p>the” (Hyland, 2008: 13). Taking a similar path as Hyland (2008) to uncover</p><p>discipline variation, Reppen and Olson (2020) compiled a corpus of more</p><p>than 25 million words from nine disciplines and 898 texts of textbooks, web</p><p>pages, and academic articles. They examined more than 700 four-word lex-</p><p>ical bundles, identifying cross-disciplinary and discipline-specific bundles:</p><p>The bundles that occurred in four or more disciplines function as</p><p>discourse frames providing signposts for readers (e.g., on the oth-</p><p>er hand, the rest of the; in the case of), while the discipline-specific</p><p>bundles are often content or discipline specific (e.g., of the interior</p><p>design, role of hotel owners). (Reppen & Olson, 2020: 172)</p><p>Having access to the cross-disciplinary and discipline-specific bundle</p><p>lists, as presented in Reppen and Olson (2020), can make it easier for EAP</p><p>instructors to prepare classes that cater to their students’ needs. Activities</p><p>with cross-disciplinary bundles are quite useful in EAP classes with stu-</p><p>dents from various disciplines, and the discipline-specific bundles can cer-</p><p>tainly be a unique contribution to any EAP classes, especially those that</p><p>want to boost students’ awareness of bundles as they contrast how some of</p><p>the most frequent bundles vary across disciplines.</p><p>As lexical bundles “are an important part of the communicative</p><p>repertoire of speakers and writers” (Biber et al., 2004: 377), novice writers</p><p>can be trained to recognize and use bundles, making their oral or writ-</p><p>ten texts easier to understand. Activities in which learners deal with aca-</p><p>demic cross-disciplinary lexical bundles that work as signposts in writing</p><p>(Reppen, 2018: 195–196) are of great help to students. Such bundles are</p><p>23</p><p>crucial for giving the text appropriate discourse frames, such as presenting</p><p>“how the text or information is organized (e.g., at the beginning of, at the end</p><p>of), expressing relationships about the information being presented (e.g., as</p><p>a result of, in addition to, on the basis of), showing contrast (e.g., on the</p><p>other hand), and highlighting information or processes (e.g., it is important</p><p>to).” Reppen (2018) presented several activities, including a jigsaw task that</p><p>can be a great discovery moment for students as, individually or in pairs,</p><p>they put together words or groups of words (e.g., at the, of the) to form the</p><p>bundles. Once their list is done, they can compare the lexical bundles they</p><p>formed to a list of academic lexical bundles taken from Biber et al. (1999).</p><p>Another activity Reppen (2018) suggested is to have students individually</p><p>look for bundles in academic texts (textbooks or any class readings) and</p><p>then, in pairs or groups, compare the results to determine if they were all</p><p>able to identify the same bundles. She also advises to let students work with</p><p>different texts and have them compare what they found out. Students could</p><p>also compare the texts they have written for class assignments with the an-</p><p>alyzed texts to determine if they used the same bundles that are present in</p><p>published materials. This last step of the activity would go beyond raising</p><p>awareness, making it possible for learners to edit their texts, thereby im-</p><p>proving the use of lexical bundles in their own written texts.</p><p>Along the same phraseological trend as Reppen and Olson (2020)</p><p>and Reppen (2018), Bocorny and Welp (2021) developed a description of</p><p>key lexical bundles in</p><p>users to search for words or phrases</p><p>and see concordance lines and possible collocations of the word by part of</p><p>speech. Information regarding the reference corpus/corpora for this tool</p><p>was not available.</p><p>Searches in SKELL. First, users will click on the “Try SKELL” button</p><p>on the homepage of the website. Then, users will type in the word or phrase</p><p>that they wish to find information about. There are three different tabs</p><p>that learners are presented with when they enter their word in SKELL: 1)</p><p>Examples (which are concordance lines), 2) Word Sketch (which provides</p><p>collocational information), and 3) Similar Words (providing synonyms).</p><p>So, learners will need to be instructed by their teacher to click on the “Word</p><p>Sketch” tab to access collocations.</p><p>Results of searches in SKELL and examples. “Word Sketch” pro-</p><p>vides learners with information related to the word that they searched.</p><p>First, Word Sketch allows users to select the part of speech of the word</p><p>213</p><p>they searched. Possible collocations of a word are then organized by part of</p><p>speech and function. Finally, users can see examples of how these colloca-</p><p>tions are used (portions of concordance lines). For example, if one searches</p><p>for the word purpose (we must use the base form of a word when perform-</p><p>ing the search), they can choose if they want to search the word as a noun</p><p>or verb (the most frequent part of speech will be automatically selected, a</p><p>dropdown menu provides options for other parts of speech). For this ex-</p><p>ample, the word purpose was searched as a ‘noun’. The following categories</p><p>are presented when this search is performed: 1) verbs with purpose as sub-</p><p>ject (a purpose built), 2) verbs with purpose as object (serve the purpose), 3)</p><p>adjectives with purpose (purpose is twofold), 4) modifiers of purpose (for the</p><p>sole purpose of), 5) nouns modified by purpose (all-purpose flour), 6) words</p><p>and (this shows phrases like purpose and meaning, purpose and direction),</p><p>and 7) or purpose (showing phrases like motive or purpose).</p><p>Content Quality of SKELL. Of all the tools examined, SKELL is the</p><p>most complete in terms of content quality and usability. The collocations</p><p>presented to learners come from large corpora that are available on Sketch</p><p>Engine, as a result, learners can find more information about the colloca-</p><p>tions of interest if they log on to Sketch Engine. The only issue in terms of</p><p>content quality is that learners do not have the option to select collocations</p><p>that occur in specific genres or disciplines.</p><p>SKELL’s interface. The interface clearly indicates that SKELL was de-</p><p>veloped for English learners. In terms of the language used and the menu</p><p>design, the website is clearly targeted for learners, using only a single word</p><p>to describe the menus and providing a limited number of options. The</p><p>fact that SKELL provides only three types of search, described above, also</p><p>makes it easy for learners to locate the right option for their needs.</p><p>Presentation of search outcomes. In comparison to the other</p><p>WBLTs evaluated, SKELL seems to be the most appropriate to use with</p><p>students without extensive classroom training. It only requires that learners</p><p>know how to type the word that they are searching. The results are then</p><p>presented divided by part of speech, as detailed above. One of the great</p><p>advantages of SKELL is that learners can click on a collocation and find</p><p>example sentences of this collocation being used in context.</p><p>214</p><p>FLAX</p><p>http://flax.nzdl.org/</p><p>FLAX, or Flexible Language Acquisition (Fitzgerald et al., 2015),</p><p>provides a tool for searching for collocations based on the British National</p><p>Corpus (BNC), British Academic Written English (BAWE) corpus, and the</p><p>Wikipedia corpus.</p><p>Searches in FLAX. The collocation tool in FLAX is under the menu</p><p>Learning Collocations. This tool searches the reference corpora (BNC,</p><p>BAWE, or Wikipedia corpus) for collocations, and presents them to the</p><p>user. FLAX allows for collocation searches in six different registers: 1)</p><p>Contemporary English (Wikipedia corpus), 2) Standard English (BNC),</p><p>3) Academic English in Physical Sciences, 4) Academic English in Social</p><p>Sciences, 5) Academic English in Life Sciences, and 6) Academic English</p><p>in Arts and Humanities. Users need to choose which register to search in</p><p>from a dropdown menu. Users simply enter a word, choose the register,</p><p>and click “go”.</p><p>Results of Searches in FLAX and examples. The results of a search</p><p>display collocates by part of speech, and include the frequency of the col-</p><p>location in the reference corpus. The ten most frequent collocations are</p><p>automatically displayed for each part of speech and users can click “more”</p><p>to see less frequent collocations.</p><p>Continuing with the previous example of purpose from above, the</p><p>most frequent collocations for each of the six registers are displayed in</p><p>Table 1.</p><p>215</p><p>Register Collocations</p><p>Contemporary English main purpose, primary purpose, sole pur-</p><p>pose</p><p>Standard English main purpose, sense of purpose, primary</p><p>purpose</p><p>Academic English in Physical Sciences used for this purpose, suited for this pur-</p><p>pose, developed for this purpose</p><p>Academic English in Social Sciences non-commercial purpose, main purpose,</p><p>commercial purpose</p><p>Academic English in Life Sciences used for this purpose, purpose in life, pur-</p><p>pose of this study</p><p>Academic English in Arts and Human-</p><p>ities</p><p>purpose of the study, main purpose, differ-</p><p>ent purpose</p><p>Table 1. Most frequent collocations of purpose by register</p><p>Content Quality of FLAX. As described above, FLAX relies on a</p><p>combination of different corpora to extract collocations; nevertheless, dif-</p><p>ferently from SKELL, learners using FLAX can choose the specific text</p><p>types that they are interested in. In terms of content quality, FLAX also has</p><p>good documentation of the criteria used to extract collocations, making it</p><p>one of the best tools in this criterion.</p><p>FLAX’s Interface. FLAX seems geared towards advanced learners</p><p>and teachers. Unlike SKELL, FLAX’s interface can be confusing due to the</p><p>extensive number of menu options. While these options can be beneficial</p><p>for learners who are interested in learning collocations in a specific disci-</p><p>pline (e.g. law, life sciences, etc.) or register (university writing, abstracts,</p><p>etc.), the menus are not clearly labelled, which can be distracting.</p><p>Presentation of search outcomes. This tool provides plenty of mate-</p><p>rials for learners and teachers interested in academic English, from lesson</p><p>plans to lists of collocations. The advantage of this tool is that results are</p><p>organized in terms of part of speech, which can help learners visualize lan-</p><p>guage patterns.</p><p>216</p><p>Linggle</p><p>https://linggle.com/</p><p>Linggle (Boisson et al., 2013) is a search engine that allows users to</p><p>search for collocations, specifying the number of words and parts of speech</p><p>of the collocations. The search engine draws from several reference corpo-</p><p>ra including Google Web 1T 5-gram, the BNC, and the New York Times</p><p>Annotated Corpus.</p><p>Searches in Linggle. Users need to know wildcards2 to use this search</p><p>engine. Linggle also allows users to search with part of speech tags. This</p><p>makes searching for collocations more challenging for users than some of</p><p>the other tools described which allow users to search for words or phrases</p><p>with the use of buttons rather than wild cards and parts of speech.</p><p>Results of searches in Linggle and examples. Results of a search</p><p>in Linggle are collocations displayed by frequency. The results provide a</p><p>frequency of the collocation and percentage. It is not stated whether these</p><p>are raw frequencies or normed frequencies. In addition, there is no expla-</p><p>nation for the percentages. So, the percentages might represent how of-</p><p>ten the collocate appears in collocation with that node or the frequency of</p><p>that word in percentage to the number of words in the reference corpus. If</p><p>a user clicks on a collocation, concordance lines are displayed under the</p><p>collocation.</p><p>We will again use the word purpose for the example</p><p>the introduction section of physics articles, integrat-</p><p>ing linguistic analysis, genre awareness, and text production in a way that</p><p>genre moves and linguistic analyses work hand in hand as the basis of task</p><p>design. The linguistic description, based on corpus linguistics and genre</p><p>theory (Swales, 1990), led them to detect key lexical bundles with fixed</p><p>grammatical words and internal variable slots that are filled with content</p><p>words (e.g., the * of * * is/was: the purpose/aim of this paper/present study is/</p><p>was). Bocorny and Welp (2021) highlighted that key lexical bundles have</p><p>clear communicative purposes; therefore, they are worth teaching to the</p><p>target group on focus (i.e., physicists), who wish to improve their writing</p><p>skills to be able to successfully publish research articles. Considering that</p><p>the unique teaching and learning context of EAP warrants that carefully</p><p>24</p><p>designed principles be followed, they followed Welp et al.’s, (2019) propos-</p><p>al. First, the target group discipline and students’ needs should guide the</p><p>setting of objectives. Second, text genres should match the objectives and</p><p>be relevant for the EAP group. Third, authentic texts should be used and</p><p>“represent the social practices and the genres that are produced in the ac-</p><p>ademic context” (p. 6). Fourth, the use of language should be promoted</p><p>along with awareness of use. Fifth, tasks should be organized to encour-</p><p>age scaffolding and facilitate learning. Sixth, “tasks should induce relevant</p><p>interaction among students and texts, students and students and students</p><p>and teachers” (p. 6). Finally, tasks should generate learning that is meaning-</p><p>ful and impacts language usage beyond the classroom. The series of tasks</p><p>in Bocorny and Welp (2021) is a good example of a sequence that aims</p><p>to make learners activate knowledge to write the genre they need. They</p><p>do so by, first, accessing their previous genre knowledge or acquiring new</p><p>knowledge through observation of the text type. Second, they have several</p><p>opportunities to see how lexical and phraseological resources are used with</p><p>specific communicative purposes in the chosen genre. The corpus-based</p><p>analysis informs the meaningful key bundles and is the basis for this guided</p><p>language analysis. Finally, they write their own texts, giving and receiving</p><p>feedback and learning from each other. Consequently, the classroom con-</p><p>text may foster scaffolding and meaningful learning opportunities.</p><p>In addition to the description of general and specialized corpus, cor-</p><p>pus linguists have used learner corpora to conduct systematic description</p><p>of learner language and to “help to develop new pedagogical tools and</p><p>classroom practices” (Granger, 1998: 17), which has positively affected the</p><p>EAP area. The International Corpus of Learner English (ICLE) was the first</p><p>major learner corpus to compile argumentative essays written in English by</p><p>university students from 25 mother tongues, totaling 5.5 million words in</p><p>its third version (Granger et al., 2020).6 The investigations based on ICLE</p><p>have contributed to English for general academic purposes (EGAP; Hyland,</p><p>2016) as they have covered an array of topics—namely, learners’ use of ad-</p><p>jective intensification (Lorenz, 1998), adverbial connectors (Altenberg &</p><p>6 https://uclouvain.be/en/research-institutes/ilc/cecl/icle.html</p><p>25</p><p>Tapper, 1998), exemplification (Paquot, 2008), and core vocabulary from</p><p>a phraseological perspective (Granger & Larsson, 2021). In the academic</p><p>contexts in Brazil, where there is pressure to internationalize higher edu-</p><p>cation (Sarmento et al., 2016), EAP programs have more recently boosted</p><p>the need for a focus on learners’ writing ability in EAP courses.7 This de-</p><p>velopment has led learner corpus research to flourish with an analysis of</p><p>discrete categories (Dutra et al., 2017, 2019 on linking adverbials; Matte &</p><p>Sarmento, 2018) and a great number of linguistic features with the objec-</p><p>tive of understanding variation in ICLE, especially on the Brazilian learn-</p><p>ers’ subcorpus (Berber Sardinha & Shimazumi, 2021; Delegá-Lúcio, 2013)</p><p>using the multi-dimensional methodology, which will be further discussed</p><p>later in this chapter.</p><p>Some CL studies have concentrated on academic oral language (Liu</p><p>& Chen, 2020; Neelly & Cortes, 2009) being good support to EAP instruc-</p><p>tors, who often need to prepare their students to attend and understand</p><p>academic lectures. Based on Biber et al.’s (2004) and Nesi and Basturkmen’s</p><p>(2006) lexical bundles’ lists, Neely and Cortes (2009) investigated the five</p><p>most frequent lexical bundles used to introduce new topics in lectures,</p><p>studying their occurrence in the Michigan Corpus of Academic Spoken</p><p>English (MICASE) as well as their functions in the academic context.</p><p>Comparing the use of specific bundles—namely, if you look at, a little bit</p><p>of, a little bit about, I want you to, and I would like you—by instructors and</p><p>students, they were able to, contextually, analyze the specific bundle func-</p><p>tions. Neely and Cortes (2009: 29) realized that bundles that are broadly</p><p>categorized as “discourse markers” or “topic introducer” may play different</p><p>functions during lectures, such as “if you look at, [which is] (…) not always</p><p>used to introduce a topic in a lecture or student presentation, [but which</p><p>is] (…) often used to ask students to turn their attention to a new object in</p><p>the classroom or to imagine or contemplate a topic already under discus-</p><p>sion.” The authors also presented a series of lesson plans in which students</p><p>are led to analyze MICASE lecture excerpts to identify lexical bundles used</p><p>7 EAP courses in Brazil adopted a greater focus on reading skills in the 20th century</p><p>(Salager-Meyer et al., 2016).</p><p>26</p><p>to introduce new topics, compare such uses with lectures included in text-</p><p>books, and detect the specific functions of the bundles. These model lesson</p><p>plans can serve as inspiration to EAP instructors who are compelled to</p><p>design materials for their classes, which could also be supported by Liu</p><p>and Chen’s (2020) results in their study on lecture lexical bundle variation</p><p>across disciplines. In this regard, this article, which is based on an 8.8-mil-</p><p>lion lecture corpus in four disciplines (engineering, science and math,</p><p>humanities and art, and social sciences), is a valuable source of cross-dis-</p><p>ciplinary variations in information from a central university register and</p><p>presentations, allowing for the preparation of activities that could boost</p><p>learners’ listening comprehension skills. Liu and Chen (2020) provided a</p><p>list of the most frequently used lexical bundles across the four areas, com-</p><p>paring the frequency and the role of the bundles as well as their functions</p><p>as referential, stance, and discourse-organizer bundles. Among the differ-</p><p>ences, they highlighted that the engineering, science and math, and social</p><p>sciences lectures carried more stance lexical bundles than the humanities</p><p>and arts lectures. The three areas often use bundles, such as is going to and</p><p>is going to be a, to give explicit step-by-step guidance in which logical steps,</p><p>effects, and outcomes can be observed and are crucial for the process. On</p><p>the other hand, humanities and arts lectures appeared to be “less definite</p><p>and less clearly defined,” enabling students to make connections and come</p><p>to conclusions in a “distinct style of knowledge construction” (Liu & Chen,</p><p>2020: 132). They concluded that, “although the frequency of lexical bundles</p><p>appearing in disciplines vary considerably, the items used across disciplines</p><p>are similar” (Liu & Chen, 2020: 133), which can be interpreted by EAP</p><p>instructors as a warning for working with both cross-disciplinary and dis-</p><p>cipline-specific bundle activities.</p><p>We close this section by bringing to the foreground the notion that</p><p>lexis and grammar are interconnected and, therefore, their associations are</p><p>worth studying. This notion is fundamental in corpus linguistics as it “allows</p><p>researchers to identify and analyze complex ‘association</p><p>patterns’” (Biber et</p><p>al., 1998: 5). These authors argued that patterns should be investigated in</p><p>terms of their linguistic associations (how words relate to each other and</p><p>how grammatical structures are associated). In addition, linguistic features</p><p>27</p><p>should be studied from a perspective of non-linguistic associations, such</p><p>as how registers, dialects, and time periods affect language use. Another</p><p>perspective would be to explore text or text varieties through the linguistic</p><p>association patterns of linguistic features, including how patterns co-occur.</p><p>Our next two sections will present corpus linguistics studies that prioritize</p><p>the associations mentioned: grammatical complexity with a focus on noun</p><p>phrases and co-occurrence of linguistic features based on MD analyses. In</p><p>these sections we will show the centrality of lexico-grammatical features in</p><p>language, their associations with registers, and contributions to EAP.</p><p>Grammatical complexity from the lens of CL and contribution to EAP</p><p>In this section, we discuss what grammatical complexity is as well</p><p>as how it has been studied in first and additional languages and highlight</p><p>suggestions to EAP programs based on corpus-based studies that deal with</p><p>such complexity. A widespread interest in language teaching, in both first</p><p>language (L1) and second language (L2), focuses on writing development,</p><p>its relation to grammatical complexity, and how to measure it. The T-unit</p><p>concept of grammatical complexity, defined as “a main clause and all asso-</p><p>ciated dependent clauses” (Biber et al., 2011: 7), has permeated most stud-</p><p>ies in L1 and L2 in the last century and in the first decade of this century.</p><p>More specifically, two measures have often been used in investigations on</p><p>grammatical complexity:</p><p>mean length of T-unit (MLTU), which relies on the overall length</p><p>in words of the T-unit, averaged across all T-units in a text, and</p><p>clauses per T-unit (C/TU), which relies on the number of depen-</p><p>dent clauses per T-unit, again averaged across all T-units in a text.</p><p>(Biber et al., 2011: 7)</p><p>The common interpretation of these measures was that more com-</p><p>plex texts would carry longer words and more dependent clauses. Above</p><p>all, clausal subordination became synonymous with complex and elaborat-</p><p>ed L2 written texts, influencing many EAP courses to overemphasize the</p><p>role of connectors in academic writing.</p><p>28</p><p>Despite the popularity of the MLTU and C/TU measures in applied</p><p>linguistics studies in the 20th century, a few scholars noticed that oth-</p><p>er measures were called for. Bardovi-Harlig (1992) challenged the T-unit</p><p>measures as they seemed to not reflect how advanced learners of English</p><p>were writing. She showed how coordination needed to be accounted for as</p><p>such measures are frequently used in earlier-stage writings and pointed out</p><p>that embedding should also be considered as a characteristic of advanced</p><p>learners. She stated that:</p><p>T-unit analysis artificially divides sentences that were intended to</p><p>be units by the language learner, imposing uniformity of length</p><p>and complexity on output that is not present in the original lan-</p><p>guage sample. By treating all conjoined sentences as if they were</p><p>not conjoined, a T-unit analysis discounts the learner’s knowl-</p><p>edge of coordination. (Bardovi-Harlig, 1992: 391)</p><p>One of her examples, reproduced below, shows that, by simply count-</p><p>ing the number of clauses, a T-unit analysis would ignore that the sentence</p><p>reflects a certain rhetorical sophistication that includes coordination:</p><p>Hundreds of schools were built, and tens of institutions are start-</p><p>ing to join in providing technical education to the public. (L1</p><p>Arabic) (2 T units/1 sentence). (Bardovi-Harlig, 1992: 391)</p><p>Ortega’s (2003) review paper, published 11 years after Bardovi-</p><p>Harlig’s warning, confirmed that T-units were still a popular measure in L2</p><p>writing studies. In order to understand how studies had been looking at L2</p><p>writing syntactic complexity in relation to proficiency, Ortega (2003) ana-</p><p>lyzed 27 studies: 21 cross-sectional and 6 longitudinal studies. The major-</p><p>ity of the reported investigations (i.e., 25) relied on MLTUs. Ortega (2003:</p><p>514) was cautious to point out that:</p><p>researchers interested in using syntactic complexity measures as</p><p>global indices of L2 proficiency may refer to these findings as in-</p><p>terpretive landmarks for aiding study design and interpretation</p><p>of study outcomes in future college level L2 writing research.</p><p>29</p><p>She thus recommended that studies focus on developmental prediction</p><p>and cross-rhetorical transfer.</p><p>Biber et al.’s (2011) corpus-based study filled the gap Ortega (2003)</p><p>identified as they revisited the concept of grammatical complexity in light</p><p>of a register perspective. This study presented an analysis of 28 features in</p><p>two different registers, conversation and academic research articles, and</p><p>concluded that clausal complexity was characteristic of conversation while</p><p>complexity in research articles was attested by phrasal complexity, such</p><p>as by nonclausal features frequently embedded in noun phrases. In other</p><p>words, finite clauses often occur in conversation and function as adverbi-</p><p>als and verb complements (e.g., “I think we better wait. […] he gets mad</p><p>cause he can’t smoke cause we always take non-smoking”; Biber et al., 2011:</p><p>24) while prepositional phrases, attributive adjectives, and noun phrases</p><p>are commonly found in articles (e.g., We expected that the use of different</p><p>transformations would have significant effects on our perceptions of spatial</p><p>patterns in kelp holdfast assemblages; Biber et al., 2011: 27). This publication</p><p>marked a major turning point in grammatical complexity studies demys-</p><p>tifying the T-unit and subordination characteristics as the best measures</p><p>of grammatical complexity. The paper culminated in the presentation of</p><p>hypothesized developmental English stages for complexity features. These</p><p>stages are based on their analysis of English as an L1 oral and written</p><p>texts and are hypothesized as following the same sequence of acquisition</p><p>in English as an L2 language. They argue that “conversation is acquired</p><p>first; the grammar of writing is acquired later, and not always successfully”</p><p>(Biber et al., 2011: 28). Not all native speakers produce academic texts, and</p><p>the phrasal complexity features detected in research articles, if acquired,</p><p>would be part of the adult repertoire. Taking into account this rationale,</p><p>the authors proposed that the hypothesized developmental stages for com-</p><p>plexity features include five stages, starting from the production of features,</p><p>such as “finite complement clauses (that and WH) controlled by extremely</p><p>common verbs (e.g., think, know, say),” and continuing to quite complex</p><p>phrasal embedding: “extensive phrasal embedding in the NP: multiple</p><p>prepositional phrases as postmodifiers, with levels of embedding,” as in</p><p>30</p><p>“The [presence of layered [[structures] at the [[[borderline]] of cell territo-</p><p>ries]]]” (Biber et al., 2011: 31).</p><p>In the following paragraphs, we first highlight studies on English as</p><p>an L1 that were inspired by this expanded notion of grammatical complex-</p><p>ity. We then explore how the hypothesized developmental stages influenced</p><p>studies on English as an L2, taking into consideration the implications for</p><p>EAP.</p><p>Biber and his associates (e.g., Biber, 2006; Biber & Gray, 2010; Biber</p><p>et al., 2011) have investigated the unique qualities of academic language,</p><p>culminating in a historical analysis of academic English in Biber and Gray</p><p>(2016) that revealed how a register can change diachronically to reflect new</p><p>community practices. In the 18th and 19th centuries, academic scientific pa-</p><p>pers were most frequently organized around clausal features, and academic</p><p>research articles were quite similar, linguistically, to fiction; thus, phrasal</p><p>features were often not found in academic texts of those periods. The au-</p><p>thors claimed that, in the 20th century, two major societal changes influ-</p><p>enced written</p><p>texts. First, mass literacy became a reality, increasing reader-</p><p>ship of any written registers. Many different types of texts, such as fiction</p><p>books and newspaper articles, had to popularize and were influenced by</p><p>oral registers. Second, science became much more specialized with the</p><p>emergence of sub-disciplines, which meant that written scientific texts</p><p>have increasingly targeted very specific audiences. Biber and Gray (2016)</p><p>argued that this social force influenced scientific writing in two ways: There</p><p>is a constant rise in information volume, and texts need to “present more</p><p>information in an efficient and concise way,” leading to “greater ‘economy’</p><p>in written informational texts” (p. 129). In the 20th and 21st centuries, sci-</p><p>entific writing has adopted a compressed and dense style, with a high use</p><p>of phrasal features; when this register is compared with conversation, it</p><p>becomes clear that clausal embedding is much more frequent in the latter</p><p>register (Biber et al., 2016), revealing clausal complexity in conversation</p><p>but not in academic writing. These results from corpus-based studies, un-</p><p>like investigations using T-unit measures, unveiled a use of phrasal features</p><p>in academic writing that had not been noticed before.</p><p>31</p><p>Along the same lines as Biber et al. (2016) and Biber and Gray (2016),</p><p>other corpus-based disciplinary and register variation investigations on</p><p>English as an L1 as well as an L2 have been carried out, uncovering more</p><p>characteristics of academic discourse that were not known and that can</p><p>take EAP closer to students’ needs. Gray (2013) studied the extent to which</p><p>discipline as well as the nature of the research (quantitative, qualitative or</p><p>theoretical) would affect linguistic variation in research articles. The dis-</p><p>ciplines investigated were physics, biology, applied linguistics, philoso-</p><p>phy, history, and political sciences. Some results showed that qualitative</p><p>history, political science, and applied linguistics text analyses revealed the</p><p>co-occurrence of similar features (e.g., nouns, time and topic adjectives,</p><p>tense and aspect markers, communication verbs) whose “focus is on recon-</p><p>structing an event to serve as the foundation for interpretations and subse-</p><p>quent claims” (Gray, 2013: 168) and characterize contextualized narrative.</p><p>Quantitative political science and applied linguistics articles showed many</p><p>fewer narrative features as they also incorporated features that make the</p><p>text more concise and informative to construct descriptions. Quantitative</p><p>biology and physics as well as theoretical physics are aligned in their use</p><p>of several features that convey procedural description, carry heavy infor-</p><p>mation load (e.g., nouns, attributive adjective), and compose the frequent</p><p>phrasal features. Gray’s conclusion was that multiple parameters should be</p><p>considered to augment the understanding of linguistic variation in research</p><p>articles. EAP teachers should be aware of discipline variation as well as the</p><p>nature of the research—be it quantitative, qualitative, or theoretical—as it</p><p>does influence linguistic variation across and within disciplines.</p><p>Considering that complex phrasal structures play a major role in</p><p>the construction of economic and dense academic scholarly writing, there</p><p>has been a growing interest in better understanding noun pre-modifica-</p><p>tion (Ang et al., 2017; Dutra et al., 2020; Hutter, 2015). Results from dis-</p><p>cipline-specific complex noun phrase investigations should provide EAP</p><p>teachers with information that has received little coverage in popular</p><p>English textbooks, which “extensively cover finite dependent clausal struc-</p><p>tures (e.g., relative clauses, conditionals, and complement clauses for re-</p><p>porting speech)” (Biber et al., 2016: 16). Through a detailed description</p><p>32</p><p>of complex noun phrases composed of adjectives and/nouns in chemistry</p><p>and applied linguistics research articles—two distinct disciplines—simi-</p><p>larities and differences were uncovered in Dutra et al. (2020). First, high</p><p>lexical variation in the noun phrases was found, and only 1.7% of adjective</p><p>pre-modified noun phrases were lexically the same in both corpora. Not</p><p>surprisingly, these commonly shared noun phrases are not discipline spe-</p><p>cific. Nonetheless, they play crucial referential roles addressing parts of the</p><p>article (e.g., the statistically significant results) or referring to present or pre-</p><p>vious studies (e.g., more recent study), which make them strong candidates</p><p>for being easily taught in general EAP classes. Second, they discovered that</p><p>both disciplines pack a great deal of information as their communities pro-</p><p>duce noun phrases ranging from two words (e.g., prosodic nature) to seven</p><p>words (e.g., four identical in-class individual web-based writing tasks). This</p><p>result confirms the need to explicitly teach complex noun phrases to EAP</p><p>learners in these two disciplines. Third, by carefully analyzing the relation-</p><p>ship between the elements of long noun phrases, they were able to attest</p><p>that noun phrase complexity is the result of not only packing premodifiers,</p><p>but also interrelationships between the elements of the phrase (Dutra et</p><p>al., 2020). Such a complexity trait was acknowledged by Biber et al. (1999:</p><p>600):</p><p>…sequence of words in the premodification can represent a large</p><p>number of different structural/logical relations, with forms often</p><p>modifying other premodifiers instead of the head noun. As a re-</p><p>sult, there is much structural indeterminacy, leading to the possi-</p><p>bility of incorrect interpretations.</p><p>A good example of how noun phrase complexity can add difficul-</p><p>ties to comprehension comes from their chemistry corpus’s eight-word</p><p>noun phrases, most of whose modifiers do not modify the head noun:</p><p>low temperature 3He strongly adsorbed gas diffusion experiments (Figure</p><p>2). The head noun (experiment) is modified by gas diffusion and by low</p><p>temperature, but not by adsorbed or strongly. The adverb strongly modifies</p><p>adsorbed, and this adjective modifies gas. Such a noun phrase may not be a</p><p>barrier in understanding for an expert in the area, but novice writers would</p><p>33</p><p>certainly benefit from teaching interventions focused on such a linguistic</p><p>phenomenon.</p><p>Figure 2. Sample of interrelations of modifiers from a chemistry corpus</p><p>Dutra et al. (2020) also noticed that a great deal of applied linguistic com-</p><p>plex noun phrases behave quite differently from the chemistry noun phras-</p><p>es since all modifiers refer to the head noun (Figure 3): writing modifies</p><p>tasks, the head noun, in the same way that web-based, individual, in-class</p><p>and identical modify tasks.</p><p>Figure 3. Sample of interrelations of modifiers from an applied linguistics</p><p>corpus</p><p>Presenting the information shown in Figures 2 and 3 in EAP class-</p><p>es should raise learners’ awareness of the extent of phrasal complexity in</p><p>different disciplines. It should also help improve the writing of dense aca-</p><p>demic texts in higher education in countries such as Brazil where the first</p><p>language differs from English, in some ways, in how it constructs noun</p><p>phrases. In other words, long noun phrase structure may pose challenges</p><p>for many students, especially for the ones whose first language does not</p><p>34</p><p>frequently use heavily pre-modified noun phrases, such as for Portuguese</p><p>speakers (Dutra et al., 2020). Noun phrases are structured in Portuguese,</p><p>most learners’ first language in the country:</p><p>Portuguese allows the use of attributive adjectives but not the use</p><p>of nouns as pre-nominal modifiers. Consequently, understanding</p><p>and producing heavily pre-modified [noun phrases] can be an</p><p>arduous task in a second language, especially in research writing.</p><p>(Dutra et al., 2020: 209)</p><p>It seems undeniable that grammatical complexity should be ad-</p><p>dressed in EAP in academic writing classrooms, and learner corpus studies</p><p>can further support the planning and implementation of such interven-</p><p>tions so that they are adequate for students’ needs. It is not the case that</p><p>EAP learners</p><p>do not use complex noun phrases even when they are B28,</p><p>with an intermediate level of proficiency, but the question is which complex</p><p>noun phrases are used when they produce which type of essay (Queiroz,</p><p>2019). Queiroz’s study revealed that Brazilian writers use more complex</p><p>than simple noun phrases, especially those with premodifying adjectives</p><p>as well as with postmodifying prepositional phrases. The EAP corpus that</p><p>Queiroz investigated, CorIFA9, included a subcorpus formed from gener-</p><p>al topic and specific topic essays. Queiroz found that the mean score of</p><p>complex noun phrases in the specific topic subcorpus was clearly higher</p><p>than in the general topic essay subcorpus. These complex noun phrases</p><p>are discipline specific, leading the author to posit that the task type, spe-</p><p>cific topic essays, promoted the use of more complex noun phrases. This</p><p>result is relevant for general EAP courses as they should find room for dis-</p><p>cipline-specific language activities and, above all, should stimulate writings</p><p>about students’ learning and research area.</p><p>8 Common European Framework of Reference (CEFR) corresponds to the level of</p><p>proficiency ranging from A1, beginners, to C2, proficient users of the language.</p><p>9 CorIFA stands for Corpus de Inglês para Fins Acadêmicos (see Dutra et al., 2022 for</p><p>information on CorIFA).</p><p>35</p><p>Other learner corpus studies have focused on investigating wheth-</p><p>er the hypothesized stages proposed in Biber et al. (2011) correspond to</p><p>real learners’ development in their writing skills. Parkinson and Musgrave’s</p><p>(2014) corpus-based study revealed that EAP learners’ essays, when com-</p><p>pared to the essays of master’s degree students in applied linguistics, pres-</p><p>ent significantly more adjectives as premodifiers and fewer prepositional</p><p>phrases. The more proficient students (i.e., master’s degree students) use</p><p>more nouns as premodifiers and more prepositional phrases as postmod-</p><p>ifiers. In other words, more proficient students use more complex noun</p><p>phrases, as hypothesized.</p><p>More recent learner corpus studies have looked at longitudinal data</p><p>to track learners’ development to see if they confirm cross-sectional stud-</p><p>ies’ results (Ansarifar et al., 2018; Parkinson & Musgrave, 2014). Biber et al.</p><p>(2020) explored a multiple L1 learner corpus compiled from students’ dis-</p><p>ciplinary texts written in English, and Alves (2022) assessed a longitudinal</p><p>corpus of Brazilian EAP learners who have produced a range of different</p><p>register assignments (statements of purpose, abstracts, essays, literature re-</p><p>views, and research articles) in various disciplines. Both studies revealed a</p><p>decrease of dependent clause complexity features while phrasal complexity</p><p>feature usage went up as students’ proficiency increased, as hypothesized in</p><p>Biber et al. (2011). However, Alves (2022) found no steady increase of all</p><p>expected phrasal features along the three moments of corpus compilation,</p><p>which may be due to the short interval between the terms when students</p><p>wrote the text (i.e., about 4 months). The author added that a qualitative</p><p>analysis pointed to an increase in lexical variation, “specifically in the scope</p><p>of attributive adjectives, linking adverbials, nouns as premodifiers, adjec-</p><p>tives in extraposed constructions, and as [preposition phrases’] postmodi-</p><p>fiers” (Alves, 2022: 117), and most of them contributed to improvement in</p><p>textual phrasal complexity. Alves also compared EAP learners’ texts across</p><p>academic divisions (social sciences and education, humanities and arts,</p><p>physical sciences and engineering, and biological and health sciences), de-</p><p>tecting a high use of attributive adjectives in all academic areas as noun</p><p>modifiers, but a preference for nouns as postmodifiers in social sciences</p><p>and education texts. These academic divisions include many disciplines,</p><p>36</p><p>which means that EAP teachers should consider these results with caution</p><p>and compare them to discipline-specialized corpora. If they compile or</p><p>have their students compile small discipline-specialized corpora, accord-</p><p>ing to their students’ disciplines, they could lead learners to explore texts</p><p>written by experts and compare them to their own use of complex noun</p><p>phrases.</p><p>MD Analysis and EAP</p><p>Multi-dimensional analysis is a framework used to identify sets of</p><p>correlated linguistic features shared across many different texts in a cor-</p><p>pus. These correlated sets, which are statistically identified through factor</p><p>analysis, are communicatively interpreted as dimensions, the underlying</p><p>parameters of variation in language use. In the 1980s, Douglas Biber (1988)</p><p>developed the multi-dimensional analysis as a tool for analyzing variations</p><p>in spoken and written language, with the assumption that multiple dimen-</p><p>sions shape the texts simultaneously. Such an assumption was in sharp con-</p><p>trast to the literature at the time, which tended to describe registers using a</p><p>single parameter (e.g., formality, involvement). Multi-dimensional analysis</p><p>was revolutionary not only because of its emphasis on a multi-faceted ap-</p><p>proach to text analysis, but also because it was designed as a corpus-based</p><p>framework at a time when corpus linguistics was in its early stages and the</p><p>focus of most corpus linguistic studies was the corpus rather than the actu-</p><p>al texts in the corpus.</p><p>It is beyond the scope of the current chapter to provide a detailed</p><p>description of the procedures involved in conducting a multi-dimension-</p><p>al analysis (see Almela, Cantos Gómez & Berber Sardinha, 2022; Berber</p><p>Sardinha, 2000; Berber Sardinha & Veirano Pinto, 2014, 2019; Biber, 1988;</p><p>Conrad & Biber, 2001; Egbert & Staples, 2019; Friginal & Hardy, 2014;</p><p>Zuppardi, Veirano Pinto & Berber Sardinha, in prep.). Briefly, however, the</p><p>basic steps involve: (1) Collecting a corpus that represents a particular reg-</p><p>ister or domain; (2) Tagging the corpus for part-of-speech 10 or for other</p><p>10 “Factor analysis identifies sets of features that co-vary …” (Biber 1988: 65)</p><p>37</p><p>linguistic characteristics automatically; (3) Counting the linguistic features</p><p>annotated and norming the counts (e.g. to a rate per thousand words); (4)</p><p>Entering the counts in a factor analysis, and determining the latent factors</p><p>in the data; (5) Scoring each text by summing up the counts of the fea-</p><p>tures loading on each factor; (6) Interpreting the factors communicatively</p><p>by reading samples of texts and assigning a label to each factor that reflects</p><p>the major communicative properties of the dimension. It is important to</p><p>note that it is common for dimensions to comprise two ‘poles’, that is, two</p><p>different sets of features in complementary distribution in the texts, such</p><p>that when the features in one pole occur in the text, the features in the</p><p>other pole are generally absent, and vice-versa. Although these poles are</p><p>referred to as ‘positive’ and ‘negative’, these labels are not evaluative and</p><p>simply reflect the fact that two complementary sets of features exist in a</p><p>single dimension. In summary, then, each dimension comprises a set of</p><p>linguistic features cooccurring in the texts, determined through statistical</p><p>analysis and interpreted qualitatively by the analyst to reflect its underlying</p><p>communicative purpose.</p><p>The multi-dimensional analysis literature on EAP is vast, encom-</p><p>passing studies conducted on the basis of grammatical structures, lexical</p><p>units (collocations, lexical bundles), and discourse. Because of its emphasis</p><p>on cross-text analysis and statistical rigor, multi-dimensional analysis pro-</p><p>vides rich descriptions that can be of interest to EAP teachers, as these de-</p><p>scriptions provide a detailed view of the most used sets of linguistic features</p><p>in academic registers. It is important to stress that dimensions are sets of</p><p>correlated linguistic features that frequently occur together in texts because</p><p>they perform a particular communicative function. As such, multi-dimen-</p><p>sional analysis descriptions show how seemingly different features work</p><p>together to</p>
  • Prova bimestral - 9A - Inst Educ Prof Maria dos Anjos
  • Elementos da Circunferência
  • Problemas Matemáticos
  • AVALIAÇÃO DIAGNÓSTICA 3 ANO PORTUGUES clickseducativos
  • números decimais 4 ano clickseducativos
  • A linguagem dos desenhos
  • FACULDADE UNICA hoje3
  • APOSTILA 4 UNICO 2024 VESPERTINO 3
  • Porcentagem: conceitos e cálculos
  • revisao Final
  • onsidere que uma empresa de e-commerce tem uma receita bruta de R$ 40.000,00. Para executar essas vendas, é preciso descontar os custos diretamen...
  • Quais custos são considerados para manutenção de estoques? Explique cada um deles e exemplifique para contextualizar sua resposta.
  • Marque a alternativa que NÃO apresenta uma regra correta do Futsal. Questão 2Escolha uma opção: a. Os jogos de categoria adulto correm em 40 minut...
  • A existência humana tem a participação como atitude natural. Participamos do nosso grupo familiar, do trabalho como processo coletivo e de grupos o...
  • A Psicologia Hospitalar é um conjunto de estudos científicos, educativos e profissionais que a psicologia utiliza para oferecer assistência de qual...
  • A Psicologia Hospitalar é um conjunto de estudos científicos, educativos e profissionais que a psicologia utiliza para oferecer assistência de qual...
  • 2 3 4 5 6 7 8 9 10 11 12 Clique aqui para imprimir este exercício. Uma praça é classificada como bem: A particular e inalienável. B público e ...
  • Em 1943, nascia o primeiro Banco de Leite Humano no Brasil. Por que ele foi criado? a. Para coletar e distribuir LH para casos como prematuridade...
  • Conforme a figura, para representar o ponto “P”, no referencial A, é preciso saber quais informações? Assinale a alternativa correta. Selecione a r...
  • Qual a importância da linha de frente na captação do cliente? Questão 1 Escolha uma opção: a. A linha de frente é composta por profissionais terce...
  • Maria deseja fazer uma ornamentação em seu jardim e comprou 4 vasos de plantas idênticos, cada um com uma flor. As flores têm cores diferentes e ap...
  • umprocesso esta caracterizadopor uma distribuição normal com media de 52g e um desvio padrao de 2,5. Sabendo que as especificações de nosso cliente...
  • Sobre as Projeções Macroeconômicas, considere Sobre as Projeções Macroeconômicas, considere: I. Para a correta projeção dos números referentes aos...
  • Coxinha de Macaxeira - CORDEL
  • Aula 02
PONTO 1 Livro English For Academics Purposes - 30-01-2024 - Matemática (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Rev. Leonie Wyman

Last Updated:

Views: 6233

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Rev. Leonie Wyman

Birthday: 1993-07-01

Address: Suite 763 6272 Lang Bypass, New Xochitlport, VT 72704-3308

Phone: +22014484519944

Job: Banking Officer

Hobby: Sailing, Gaming, Basketball, Calligraphy, Mycology, Astronomy, Juggling

Introduction: My name is Rev. Leonie Wyman, I am a colorful, tasty, splendid, fair, witty, gorgeous, splendid person who loves writing and wants to share my knowledge and understanding with you.