I’m still working on a FutureLearn/Lancaster University course on Corpus Linguistics (CL). It runs for 8 weeks and is much more work than any of the previous FutureLearn courses that I have undertaken, so whether I’ll get to the end of it remains to be seen, given my new teaching commitments and other roles to juggle. In the meantime, I’ll share with you my thoughts as each week progresses.

The first activity this week was to look at the language used around a term such as ‘refugee’ in a newspaper article.  I looked at an article from a recent edition of the Guardian on President Trump’s changes to the US refugee programme. It referred to them as refugees, displaced people, and talked about them in comparison to ‘people seeking asylum at the southern border’ before pointing out that the asylum program and the refugee program were separate, which implies that the two categories are not the same (I know they aren’t legally – my point is that it doesn’t make this explicit in the text and you are left to work it out). It gives examples of groups with whom we might be expected to be sympathetic (those fleeing religious persecution, for example) and a case study and quotes from a resettled refugee who had contributed to American society. The language around here is positive: ‘love’, ‘safely’ and ‘allowed’. There is an interesting contrast though, towards the end of the article, where the positive language of contribution is replaced by language such as ‘loss’, ‘sad’, ‘shut down’, ‘difficult’ and ‘complicated’, when describing what life is going to be like in the future.

The week then used an ESRC funding research project on how British newspapers talk about refugees and asylum seekers.  The focus was methodological, looking at how Corpus Lingustics (CL) might contribute to critical discourse analysis.

Tony McEnery, the course tutor, described how the project team put together the corpus on which the study was based.  First they put together a pilot corpus of texts about refugees and asylum seekers and looked at what words became key in that corpus.  This helped them to compose a query string which could be used to search huge corpora of newspaper articles for relevant material.  Then they split the data into two corpora – one of tabloid and one of broadsheet journalism. They looked carefully at the number of articles and words in each of the two corpora, explaining the difference by pointing out that tabloid articles are usually shorter than broadsheet ones. Moving on to think about how CL contributes to critical discourse analysis, he introduced the idea of the topos (plural = topoi) – a broad theme in data, although according to the course website, ‘In critical discourse analysis it usually has a slightly different sense of ‘warrant for an argument’ rather than theme’.

The two corpora were tested to see which keywords were associated with the query terms.  As well as looking at the overall picture of what keywords were most commonly associated with the query terms in the broadsheets and the tabloids, this could be done on a more focussed basis by looking at, for example, the words used to describe the arrival of refugees.  So the keywords help to shape the topoi, but also, the discourse was created mainly by the tabloids – almost all the keywords were dominated by the tabloids, except in the category of ‘plight’, where the language used was shared by both but the broadsheet newspapers had more. 

The next video looked at whether the words which turned up most frequently in the articles were collocates.  There were some collacates relating to the number of refugees and the theme of plight, and these were across both tabloid and broadsheet newspapers.  But once the team looked at words clustered to the right of ‘illegal’ which might indicate modifying adjectives.  And the theme of illegality was more emblematic of the tabloids, with some especially strong tabloid clustering with ‘immigration and’ – the conjunction and was forcing discourses together.  In comparing the two corpora, the use of particular words and clusters had to be normalised per million words because the broadsheet corpus was much larger.

Step 4 looked at a particular cluster, ‘pose as’ – both how it was used on its own and how it was used in proximity to refugees or asylum seekers.  The tabloids used the phrase far more often than the broadsheets (normalised per million words) and especially so in relation to refugees and asylum seekers.  The course also needed that in the tabloids, the phrase was reported as fact not opinion and was closely associated by a negative stance, with no space given to an opposing side.  Another interesting cluster was thrown up by ‘X pose as’ plus a statement of status such as ‘asylum seeker’, which was used to show faults in the asylum system.

The final video looked into direct or indirect presentation. Direct presentation is when something is directly attributed to something else through modification. For example, in the phrase ‘illegal immigrants suffocated’, the modification of the immigrants with the adjective ‘illegal’ attributes them with that quality directly.  In indirect references, there are general or indirect references which imply the same – words such as ‘trafficked’ or phrases such as ‘sneaking in’.

After the quiz, we moved on to the hands on section of the course in #LancsBox. At least I would have done, had I been able to make it work. Because here again I was hit with the same problem from week 1- #LancsBox has to be run from a folder which will allow you to make changes… and that’s not as simple as it sounds. I’ve changed the machine on which I’m working, so I had to download the software again. After over an hour of searching around the internet, the course forums, my own notes, my computer folders, as well as several failed attempts to extract and run the files, I finally got it working on my Desktop.