The EMDA folks spent yesterday afternoon enthralled by Mark Davies’ corpora and his interface for them. Rather than casually noodling around, as I like to say, many of us were in a mad dash to engage with one corpus in particular. Dashing because while Davies had built the thing, most of us had a very short window to access one particular corpus. I’m being deliberately vague here because I value the access that Davies gave us and because my point isn’t about the particularities of any one resource. Instead, I’m concerned with differential access to legacy data and how we think about this problem.
The data that Davies was working from belongs to a major organization, one that many early modernists depend upon. But, as we learned in earlier in the week, our access to that data is not equal. Rather, there are multiple levels of subscription for this resource and with that comes differential access to the underlying data. If one is at an institution that has the highest level subscription – then using Davies’ bewitching tools in the future is not a problem. If, however, one’s institution has one of those other levels of subscription….well, access was limited to a window measured in days. Hence the dashing.
What was paradoxic about my own dashing yesterday is that I’m not generally interested in corpus analysis and I am pretty suspicious of the quality of this particular data resource. What’s more, while this isn’t ‘big data’ in the sense of the sciences, it is bigger, and my current research agenda is focused on relatively small scales. There are a number of reasons for this, which deserve a different post, but none of this kept me from feeling desperate about the short time I had with the data and Davies’ interface yesterday. Nor did it stop me from being openly frustrated about hierarchies of access.
We all know that there are different resources and expectations (although this latter bit is shifting in disturbing ways) at R1s. As a colleague helpfully pointed out via twitter, it’s not just small liberal arts colleges (SLAC) where these differences become apparent – comprehensive universities and community colleges have similarly differential access. While there are a handful of SLACs, CUs, and CCs, that have access, it’s far less likely to see smaller or less affluent institutions subscribe to a $60,000 humanities resource (don’t get me started on the comparison with science data subscriptions).
A handful of people spent some time yesterday talking about the ways that we might address this kind of issue. We might leverage local consortial arrangements to make the case for subscription, we might engage with national consortia (like the Alliance to Advance Liberal Arts Colleges), we might turn to our professional organizations (MLA, RSA, etc) for help with subscriptions and data access – or we might undertake more “guerrilla” approaches. Each, I suspect, has its affordances and constraints. But I’m aware that I spent a bunch of time thinking about getting access to something that I’m not even sure I want.
Jonathan Sawday asked us earlier in the week if our current technological situation might have been otherwise. This morning I think that this might be a more fruitful vein of inquiry than the “how can I hack access?”. It’s easy to become entrenched in the have/have nots conversation – while the structures of higher education hierarchy and closed data deserve calling out, they might also be a distraction. Why bother fighting for a very dirty data set when we could create it anew and in better form? We were shocked to hear how little it would actually cost to create a new set of high quality images and transcriptions of early modern texts, particularly given how much we value that kind of resource. Given that we’re talking about a relatively small set of texts, such work might not actually take that long.
Now, having hand encoded texts myself as a graduate student and now as a researcher, I know that the devil is in the details and that one needs money to make the work flow happen on a scale of months rather than years. But I’d rather put my energy into that set of intellectual and practical questions. Focusing on making a better, open data set wouldn’t constitute an avoidance of the real issues of access inequity but, rather, a refusal to engage in a battle created by corporate control of humanities resources. I woke up with Audre Lorde in my head (at a spaghetti dinner, but I digress) and I think its worth considering alternative approaches when tools are old and broken. And we don’t have to start from scratch, there are a number of existing projects early modern text projects (Women Writers Online is just one) that have already begun the work, in some sense. That’s where I’ll be hanging out.