This week I started a FutureLearn/Lancaster University course on Corpus Linguistics (CL). It runs for 8 weeks and is much more work than any of the previous FutureLearn courses that I have undertaken, so whether I’ll get to the end of it remains to be seen. But the course leader suggests that once we get past the first few weeks we can pick and choose to study the elements which are useful to us, so I’m hoping it will be manageable alongside the new teaching activities that I’ve got ahead. In the meantime, I’ll share with you my thoughts as each week progresses.

I signed up to the course way back at the beginning of summer, so that I would get a grounding in CL ready to undertake my project on Fake News and Facts in English ballads. It has become immediately apparent, however, that my little project is more Corpus Discourse Analysis than true CL – more on this another time!

The simplest definition of a corpus is a lot of words stored on a computer.  But it is also a methodology for approaching language.  Large amounts of data in a corpus can tell us about tendencies and what is normal or typical, or rare or exceptional cases – you can’t tell either of these things from looking at single texts.  Computers are quicker and more accurate than humans in dealing with such large amounts of data.

When putting a corpus together or choosing what to use, what you choose to look at depends on your research questions; it must be broadly representative of an appropriate type of language; and it must be in machine readable form such as a text file.  It might be considered a standard reference of what’s typical and is often annotated with further linguistic information.

Next, we had a brief look at annotation (or tagging).  The computer can’t tell what is a heading, where new paragraphs start.  You might want to search for something just within titles, and a computer can’t tell the difference unless you tell it and therefore can’t do what you want it to.   A lot of this info is there in the control characters that make a document to appear the you want it to.  It can’t tell grammatical data such as where a word might begin or end.  It can’t tell what part of speech each word is.  These sorts of annotation allows you to tailor your request, to look what words follow a particular word, or just search the headings. They improve the quality of your searches. Actually, this annotation is often done by computer.

There are two main types of corpora:

  • Specialist – with a limited focus of time, place, language etc which creates a smaller amount of data
  • General – often with a much larger number of words and more representative.

Other corpora include:

  • Multilingual – comparing different languages or different varieties of the same language
  • Parallel – comparing the same texts in translation
  • Learner – the language used by language learners
  • Historical or Diachronic – language used in the past
  • Monitor – which are continually being added to.

From here we moved on to look at some technical terms, including frequency data, which quickly shows how often words appear per million words in the corpus. Concordances show the context of the hits, and again, it can be done quickly, so you can sort the context according to emerging patterns.  Collocation is a requirement for words to co-occur and for meaning to be built from those co-occurrences.

Then we thought a bit about the limitations of corpus data analysis.  It can’t tell us whether something is possible in language (ie, it can’t tell us about appropriate usage). We can only deduce things from them, they aren’t facts in and of themselves, and although it gives us evidence it can’t give us explanations.  Finally, the corpora rarely present the related images alongside the text, or the body language, behaviours etc of speakers, so they present language out of context.

Then the course moved on to having a go at using #LancsBox, which we had to download and open – bizarrely, this was by far the hardest bit of the course so far, because it has to be run from the Applications folder (my Applications folder is dusty and otherwise unused, hidden somewhere in the attic of my machine and only located using the search function on the file manager).  #LancsBox comes with various ready-prepared corpora that you can download, so I decided to have a go with the 1641-1660 English newsbooks.  As I wasn’t working on my desktop, but my laptop, it wasn’t the quickest import, but once it was done we had a go at using the KWIC function to create a concordance.  You search for a word or ‘node’. Having found all the instances of the word ‘Rupert’, in my case, I could sort them by the context on the left or the right, or I could filter them by clicking on one side or the other and typing in a word (although if you want filter by the word immediately preceding or following the node, you need to use  the advanced filter function).  But I had really jumped the gun, as the next activity was a walk-through in using one of the corpora to search for some given words!  Still it was fun to play around with it.

There were several readings for the week – the first was interesting and talked, for example, about the distinction between types of corpora (monitor, balanced, snapshot). Most of the corpora in which I would be interested would, I think, be classified as opportunistic corpora, because something else comes into play when working with historical corpora – survival. So in some respects, historical corpora are self-selecting, because we can only create a corpus out of what survives. #LancsBox comes with the 1640-60 newspapers, but (I think – not quite my field so I’m not sure) they are a relatively ‘good’ source because Thomason made the decision to collect everything that he could of the printed ephemeral material relating to the civil war. Without collectors like him, other material is much more dependent on survival rates. Which isn’t to say that I don’t think CL is useful, just that there are extra (or maybe different) caveats about what it can tell us, so we need to be very aware of these when we interpret our data.

As I’ve never done any linguistics, the second chapter was much harder going and I didn’t really understand a lot of it!  Then things got even more complicated for me with a chapter on statistics for CL.

The final task before the optional activities was to think about the design aspects of our proposed work.  Although my Fake News and Facts project uses concordances and collation, I wanted to think about something on a rather bigger scale, so I imagined a larger corpus of ballad texts and news pamphlets to search for any features that were particularly common or common to both in the period prior to the civil war.  The question is whether the data would be skewed by the inclusion of all ballads rather than exclusively topical ones.   The restriction to topical ballads (and news pamphlets) would in itself be subjective, whereas the inclusion of all ballads might show up other common features that we would not otherwise spot, so as far as I’m concerned the jury is still out on that question! 

The corpus would be quite large, but as it is an opportunistic corpus based on only those texts which have survived, it would not be anything like as large as some that have been mentioned on the course. Annotation might be helpful, in terms of divisions between headings and body text, as this might highlight particularly sensational language which is used to grab attention, or news lexicon which highlights topicality, novelty and truth claims. 

At the end of September I went down to London to hear a paper by Chris Marsh at the Royal Historical Society, so I took the opportunity to travel down a bit ahead of time and spend the afternoon in the British Library.  This is something I haven’t done for a couple of years, for one thing because it isn’t all that easy for me to get down there, but also because up to now I’ve been working mainly on the documents that I found while I was carrying out my doctoral research.  But with the submission of the manuscript to Routledge, the time has come to move on.  This post is less about what I found when I was there and more about the process of carrying out the research itself.  It’s about how I work.

IMG_20170922_121859874

I only knew that I would be going to London a couple of days in advance, so I had to drop everything and start finding something to look at when I was there.  The first job, in fact, was to check up on how to renew my reader’s pass, as it had expired since I last went.  Once I’d got that sorted out, I knew that I would only have a few hours in the library itself. This affects the way I work, I think: I need to make sure that I am well prepared with a list of exactly what I want to look at.

I ran a search on the British Library Archives and Manuscripts catalogue for ‘ballad’, up to the mid-seventeenth century, and read through the descriptions of each result (of which there were many).  If I thought it looked potentially interesting, I copied the entry into Word, making each manuscript number a heading and including the descriptions for each entry.  It makes for a long document (at the moment, it’s 45 pages long!), but at least every item was easily accessible and the descriptions mean that when I’m in the library I know what I’m looking for and where to find it in the manuscript itself.  Next, I sorted the descriptions into the order that I wanted to look at them – by which I mean I put the materials I wanted to see first at the top of my list, running right down to the ones I considered to be less urgent.  Finally, I logged into my British Library account and pre-ordered as many as I could for the day of my visit.

way I work image 1

 

IMG_20170922_211455954When I arrived at the library I renewed my reader pass, had a quick brew and then settled myself into the Western Manuscript Reading Room with my tablet (much easier to carry than my laptop), my camera, notepad and pencil.  My trips to the British Library are a bit like a smash and grab…  metaphorically-speaking, of course.   This visit was going to be a particularly short one.  My priority is to accumulate as much evidence as I can, so that I can then work on it at home.  I looked at the documents that I ordered ahead of my visit and made notes on their features which I added to my Archive Research Document.  Then I photographed the relevant parts of the manucript. Often, I took several photos of the same folios, showing the overall layout on one and the detail on others. For each document that I’d looked at, I added a tick before its title in my list.

IMG_20170922_125155155What I didn’t do much of when I was in the library itself was to make transcriptions.   As I mainly work on 16th century documents, they are often in secretary hand, which can take a bit of deciphering at times (and yes, I suffer palaeographic jealousy when I look at the people working on beautiful italic hands!). I usually do my transcribing at home.  So when I’d looked at all the ones I’d pre-ordered, I prioritised working on what I thought was the most useful manuscript.  I kept this out, sent the others back to storage and called up some more.  While I waited for them to arrive, I started to transcribe the document that I’d kept, making the transcription in the big document but in a different colour of text so that I knew that it was my own transcription.  I then repeated the process until I’d looked at as many items as I could that afternoon – it was the bell that stopped me!

Once I got home, I transferred my archive photographs to dropbox and a mobile hard drive, putting each document into a separate folder under the heading Archives/British Library. Then I spent a relentlessy boring day renaming each individual file by the name of its folio number – I have learned in the past how difficult it is to find the relevant image of a particular folio later if I don’t do this.

I’m now in the process of transcribing the document in which I was most interested – I open the image on one screen and use another, usually my tablet, to make the transcription, making sure that I mark any words about which I’m uncertain with a question mark and each new folio with it’s number.  I am doing this in a new document, which I save alongside the images in the relevant folder.

 

 

 

I was2017-09-20 19.31.33

 

I was fascinated by this series of posts on Twitter by Bradley Irish…  It’s true, I think.  I was reminded of some interviews done by the Marine Lives project last year which looked at the way historians carry out research using electronic databases.  I wrote a short blog post at the time, which made much the same point that Bradley did – we rarely talk about the ways in which we carry out the research that leads to our outputs, be they books, articles, websites, even blog posts…  Okay, we might (and probably do) mention our methodology in the output itself, but not in the level of detail that Bradley and I both meant.  There are students out there who might find this sort of openness helpful.  Heavens, I might find it helpful.  The way that I work as an academic morphed out of the way I worked as an undergraduate 20 odd years ago.  There was nothing planned, and certainly nothing taught, about it. I can only remember one single conversation about how to sit down and do the research I do, and it consisted of something like this:

‘Prof. X keeps all their research notes in a single, huge file – it makes it really easy to search for a key term or a person…’

And that was it.  Thinking about it, it wasn’t really a conversation at all.

As I embark on finding something new to work on over the next few months (plenty of ideas, by the way, just nothing concrete yet), I’m going to write a few posts about what I’m doing along the way, subtitled ‘the way I work’.  If anyone felt moved to join me, or to respond, that would be great.  I’m absolutely sure that I’ve got plenty to learn.

I was warned on Wednesday that my luck will have to run out eventually.  That may not sound too much like good news, but the converse is, of course, that,  in order to provoke the comment, things must be going relatively well at the moment.  Work on the commonwealth chapter continues, with some quite major revisions to the opening of the chapter and smaller changes to individual sentences.  It’s getting closer.  I still need to check a couple of references and make some alterations to one of the musical examples, but it’s certainly getting closer. (And about time too, I might add, considering that it’s taken the best part of six months!)

I spent almost all of yesterday just working on the footnotes, trying to get Endnote to play ball.  Don’t get me wrong, I do like Endnote.  I used to enjoy writing my footnotes by hand, but the way that Endnote does it for me is, usually, enormously labour saving.   But for some reason, yesterday, it got its knickers in an almightly twist and started putting in references to whatever manuscript it felt like.  It wasn’t a problem with the books, or the journal articles, or the webpages: just the manuscripts.  Since the chapter is  based around manuscript collections, it caused a bit of a problem.  I have no idea  what caused the glitch, but I ended up typing in the manuscript references  manually.

I’ve also started secondary reading for my concluding chapter on the news.   If anyone has any suggestions of things I should read on early modern news, I’d be very glad to hear of them.  The reading that I’ve done this week surprised me by giving me several ideas for  my first couple of chapters on ballad music.  In fact, I had to leap out of bed at 11 one night this week to write down an idea!  It’s the first time that that’s happened for a very long time, so I think I can safely say that the thesis is out of the doldrums and on the move again.

This afternoon I briefly revisited my chapter plan, taking into account some of the comments that my supervisors made when they looked at it last and writing an abstract for the commonwealth chapter now that it’s completed.  The rest of the afternoon I spent  transcribing documents in the State Papers.  For once, the handwriting is relatively easy to read.  Unfortunately, the digital scan of one page is so dark that it is illegible in places – I suppose a girl can’t have everything.

On Wednesday evening I went to the committee meeting for the Historical Association in Bolton.  A very productive meeting and plenty of things to work on in the coming months, not least of which is putting together the programme of lectures for next season.

After a couple of dodgy days at the beginning, the week has definitely ended on a high.  I spent quite a lot of time at the beginning of the week consolidating the ideas that my trip to the British Library generated and I wrote a thousand words in a couple of hours, bringing together my thoughts .  It was very satisfying, especially in the light of the 6 months I’ve been struggling with the 7000 words of the commonwealth chapter.  In a sense, it made the chapter all the more frustrating.  Although the chapter had improved, I was still really struggling  to make it flow.  Everything was there, in vaguely the right order, but with no grace and no flow.  Cue accusations that the naughty child in me didn’t want it to flow yet.   My response was along the lines of ‘get lost’.  There is nothing fun about spending six months messing with the same set of words.  But at least writing about London proved to me that I hadn’t lost it (whatever ‘it’ is) completely.

On Wednesday night I did something a bit different.  I read the chapter aloud.  Perhaps I should have done it a long time ago, because it was so obvious when I thought about it, but it simply hadn’t occurred to me.  I printed the chapter out and attacked it with a red pen and scissors.  And it worked.  Bashing it out line by line, aloud, showed exactly where the  problems were and what didn’t make sense, what needed more explanation and what would be better broken down into more sentences.   Thursday I spent typing up all the changes that I had made and by 2.30 that afternoon, I was a very happy girl.  It’s not ready, by any stretch of the imagination, but it will do as a first draft.  What’s more, it has lost its hold on my nightmares and no longer causes me feelings of guilt and insecurity.  Maybe it won’t be the best chapter in the thesis (who knows, maybe it will), but at least I’ve now got something down that I’m confident about.

I celebrated by unpacking a box-load of books.  I’ve inherited another library, he second in three months, so my brand new shelves are now groaning under the weight of scholarship I could never have afforded to buy.

Today I checked through the results of some searches that I ran on State Papers Online and found a perfect little nugget to help with one of my arguments, so I am very happy indeed.

Finally, I’d like to pass on my very best wishes to Glyn Redworth who retires from the University of Manchester this week after more years than either of us probably cares to think about.  Time to start a new chapter, in more ways than one.

DSCF3072I don’t have a lot to tell, this week (after all, it’s only a couple of days since I last posted) so I thought I’d just share the good news that I’d managed to write a bit of my common weal chapter and then post some photos of some of my favourite birds from today’s visit to Martin Mere.

Yesterday morning I intended to spend a couple of hours on my common weal chapter, but just as I got stuck in and finally started making something that feels like proper progress, I had to abandon it in favour of looking after a dying hamster.  The hamster is still with us, just, but I doubt it will be much longer.  The chapter remains unfinished, but I can see a light at the end of the tunnel.  I hope it isn’t the oncoming train.

To the left is a fibre optic crane.  At least that’s what we call it – really it’s a grey-crowned crane.  Fabulous creatures.  And below are some avocets.

DSCF3062

 

 

 

 

 

 

 

DSCF3069

The year of big, scary life changes.  The year in which my husband is likely to retire and in which I need to become the main breadwinner for the family.  The year in which, 20 years after starting at the University of Manchester the first time round, I should earn the title of doctor.

234 So to end 2013, I got some new bookshelves.  I need them because in the last couple of months I’ve accumulated so many books that I’ve run out of space to put them.  Two of the shelves on the bookcase in my bedroom are now devoted to post-1950 history, as I was given a lot of high-quality books by a friend who could no longer use them.  I’ve also had to buy quite a few texts for my work and, of course, there are the ones that Father Christmas brought for me last week.  New bookshelves were a must.

And to begin 2014, I put some books on them.

235The eagle-eyed among you might have noticed that it required the movement of my printer from my right to my left.  This may not seem significant, but it created a strange sense of space.  Working in there this morning, it felt like there was a lot more room.  I stopped for a moment to consider it, deciding that the space in the corner had been redundant space, because it was trapped between my Spanish dictionary and the printer.  Now it isn’t.  I’m not sure how ‘working round a corner’ is going to pan out in the long run, but for now it seems quite pleasant.

236

On a more research-based note, I am pleased to report that my chapter finally seems to be coming together.  I’m slightly more confident of it than I was.  This week, I’ve been working very much part-time, alternating it with playing games with the family and trying to get some fresh air between the raindrops and gales.  Somewhere along the way, I have found 6500 words of a chapter, which is interesting because it’s certainly not yet what I’d call a chapter – a lot of it is still in notes, or just lists of primary or secondary quotations.  When I mentioned this to my husband the other day, he commented that I had brain incontinence!  Puddles of words that don’t have any flow.  But, today, what prose there is is finally beginning to coalesce.  I’ve read several articles (I could do with going to the library but I don’t think I’m going to get there before the children go back to school next week), ordered yet another pile of books from Amazon and in the evenings, I’ve been cataloguing and analysing ballads, a few at a time.  Progress, I think.

Yesterday I began an 8 week mindfulness course, a present from a friend for Christmas intended to help me with my depression and stress since I can no longer take anti-depressants.  I’ll keep you posted on how it goes.