digitalhumanities

A Match Made in the Archive: Reading and Poaching Through Ngrams and Rare Books (22 January 2016)

On a hunch, I went home after the DH events last September and typed “Jesuit” into the English corpus of Google Books’ Ngram Viewer. The tool is more powerful than what I used it for, but my search revealed how popular the word was in the English-language books that Google has digitized and made searchable. One result from 1524 (before the Society was founded) is the result of a wrong date (actually 1920). But things get really interesting in 1609, four years after the Gunpowder Plot to restore a Catholic monarchy failed and English Jesuit missionaries took a good chunk of the blame. Things more or less taper off as the corpus of extant books expands in later years, but with curious spikes in popularity, one of which occurs between 1840 and 1860.

A line graph tracing the popularity of “Jesuit” from 1550 to 1900.

Jesuits making seismic waves in English literature … or at least tremors.

This spike follows the 1844 Philadelphia Nativist Riots, which is in the news lately as journalists draw apt comparisons between the anti-Muslim paranoia of Donald Trump & Co. and the ultra-nationalist anti-Catholicism of nineteenth-century America. Sometimes called the Bible Riots, Philadelphia nativists violently destroyed Catholic property after outrage over Catholic parents’ wanting their children to be allowed to personally use a Catholic translation of the Bible in their public-school Bible studies: Protestants feared a foreign takeover from these people who often heralded from Ireland or Italy and followed a religious leader in Rome. The Google Books corpus shows that anti-Catholic sentiment persisted on both sides of the Atlantic, however, and includes such salacious titles as Jesuit Interference with Domestic Affairs: A True Statement of Facts Concerning the Conduct of the Jesuit Priests of Texas (unknown, 1848); The Friendship of a Jesuit (with an epigram from Hamlet: “meet it is, I set it down / That one may smile, and smile, and be a villain”; Edinburgh and London, 1848); The Perverter in High Life: a True Narrative of Jesuit Duplicity (London, 1851); The Female Jesuit; or, The Spy in the Family (New York, 1851 — and its 1853 sequel); and Madelon Hawley, or, The Jesuit and His Victim: A Revelation of Romanism (New York, 1857; 1859).

The title page and an engraving from Madelon Hawley: the latter features an old man in a black cassock and biretta seated next to a young man in a suit; in the background are a leaded window and at least three crucifixes, one quite large.

Sometimes stereotypes ring true, though, and a Catholic affection for many and ornate crucifixes is definitely one of them.

 

This last title, written by William Earle Binder, has a hard copy housed in the Special Collections Resource Center (SCRC) at Syracuse University’s Bird Library. The book is fairly small for a hardbound book by today’s standards, a little smaller than a trade paperback — a book for casual reading. Though the interior is in good shape, it has a worn-out cover and spotted edges. Framed as a story that the author heard from a dying man, Joseph Secor, who left a corrupt Society of Jesus, the text traffics in the hallmarks of anti-Catholic prejudice handed down from the English Reformation. During his time in the Society, the Fr. Joseph clashes with a tyrannical and cunning senior member, Fr. Eustace, who tormented to death the married Mrs. Hawley, a woman who refused his sexual advances; later, he pursues her virginal daughter Madelon even after excommunicating her (spoiler alert: he and Madelon both die). The pages are filled with priests in disguise, gross caricatures of the Irish, kidnapped women held prisoner in cloisters, allusions to the Inquisition, and vivid (historically and doctrinally inaccurate) ritual. Flipping through the book, I wondered what kind of person would have held it before I put my hands on it — what they would have thought of a dorky Catholic studying Jesuit literature for a living. Indeed, I found written large inside the back cover:

Mrs Clara T Crane

91 3/4 Clark Street

Auburn NY

 A hand holds open the back cover of a book, with Mrs Clara T Crane's name and address written in large, curly script.

Don’t feel bad if you read that as 9 3/4, too.

Auburn hits rather closer to home than Philadelphia. A little Googling revealed Mrs. Crane to be the wife of a W.W. Crane, an English immigrant and manufacturer; in 1900, he died and was buried with Episcopalian and Masonic services.[1] Perhaps she was an anti-Catholic crusader: her husband came to America in 1852, when many of the anti-Jesuit texts were published in England and the US. Maybe she just liked sensational conspiracy novels. Or maybe the book belonged to someone else who merely took note of her address on the back flap of a text they didn’t care about.

A torso in a blue sweater reaches out, holding open the book: one page full of text, the other an illustration of the dying Madelon and Fr. Eustace.

Caution: Dorky Catholic at Work.

In his book The Practice of Everyday Life, Michel de Certeau (himself a Jesuit) suggests that there is no text without a reader: the reader “invents in texts something different from what [the author] ‘intended’” (169), “like nomads poaching their way across field they did not write, despoiling the wealth of Egypt to enjoy it themselves” (174).[2] Consequently, there are as many readings of a text as there are readers; each reader interacts with the text differently, and that interaction subtly shapes their interpretation of the book. My experience, as a dorky Catholic studying Jesuit literature, with a specific copy of a text that may have belonged to someone who enjoyed anti-Catholic sensation novels will necessarily shape my interpretation of that particular book. Similarly, your experience, as someone with your background, with this digitized version of a different copy of the text will necessarily shape your interpretation differently, especially since you’ve read this post and are probably thinking your own thoughts in the process. Especially, you can’t touch the handwriting at the back of the SCRC’s book; if this book were reproduced by a nineteenth-century equivalent of EEBO instead, you wouldn’t even know there’s a name and address copied there, since EEBO doesn’t usually include covers in their scans.

A sepia-toned portrait photograph of a middle-aged white man in a turtleneck, corduroy blazer, and tinted glasses.

Michel de Certeau, in the craftiest of Jesuit disguises: nerdy scholar

Books have real, material, human consequences. Sometimes, digital humanities can efface the small consequences: somewhere in Auburn, probably, someone bought an anti-Jesuit sensation novel in the wake of the Nativist Riots, buying into anti-Catholic sentiment at least economically. But other times, DH brings to light the bigger consequences: Madelon Hawley fits into a long literary tradition of demonizing the Catholic other. The humanities are at their best when they combine the two to reveal something about how humans read the books we can still access today, no matter the format.

Next week: The human in the humanities.

[1] See Auburn Weekly Bulletin, February 27, 1900 and Official Gazette of the United States Patent Office, Vol. 6.

[2] Michel de Certeau, “Reading as Poaching,” The Practice of Everyday Life (Berkeley: University of California Press, 1984), 165-76.

Many thanks to the Special Collections Resource Center at Bird Library, and especially to Nicolette Dombrowski and Nicole Dittrich, for their assistance with researching this post and for permission to post photographs of the book.


Ashley O’Mara (@ashleymomara | ORCID 0000-0003-0540-5376) is a PhD student and teaching assistant in the Syracuse University English program. She studies how Ignatian imagination and Catholic iconology shape representations of sacred femininity in Early Modern devotional writings. In her down time, she writes creative nonfiction and snuggles her bunny Toffee.

Advertisements

Common Knowledge?: EEBO, #FrEEBO, and Public Domain Information (15 Jan. 2016)

If you work in the humanities and you’ve used a database, a dictionary, or Google Docs in the past ten years, congratulations! — you’re already doing digital humanities. This was a point emphasized by Syracuse University professor Chris Hanson in a panel discussion on the digital humanities that I attended after the Six Degrees of Francis Bacon workshop last fall. Grad students, faculty, and a librarian from a range of disciplines underscored that, according to this definition, anyone can do digital humanities — in fact, many already do — as long as they have access to digital information and the tools to manipulate it.

Not everyone has that kind of access, however, and this became painfully obvious for Renaissance-studies scholars a few weeks later when ProQuest discontinued access to the Early English Books Online (EEBO) database for Renaissance Society of America (RSA) members. Previously, those who didn’t have EEBO access through a university’s library subscription — such as independent scholars or those at smaller schools with smaller budgets — could gain access by joining the RSA, a professional organization rather than a library. After a Twitter uproar, ProQuest quickly restored access without much of an explanation, but not before Renaissance scholars could write about the implications of a private business’s controlling access to what is ultimately public domain information.

EEBO’s origins lie in World War II, when the London Blitz threatened to destroy English libraries and the thousands of medieval and Early Modern books they contained — a potential massive loss of information. University Microfilms International (UMI) stepped in to scan the texts for future generations … and for profit. UMI began to offer microfilmed titles in the English Short Title Catalogue (SCT) to university libraries through print-on-demand services.[1] For decades, Renaissance scholars outside the UK relied upon libraries’ microfilm reprints to do their research. Seventy years later, UMI is now ProQuest and the microfilmed SCT is now EEBO, a digitized and expanded collection of scanned texts. Just under half of the (rapidly expanding) current collection was released into the public domain last year. But anyone without library access will have to wait until 2020 for ProQuest’s exclusive rights to expire in order to access the complete collection.[2]

A library with the ceiling caved in. Beams, rubble, curtains, and ladders are heaped in the center. Three men in hats and wool coats inspect the books that remain on the shelves.

The private library at the seventeenth-century Holland House was bombed in the London Blitz. Books in national libraries were quaking in their dust jackets.

I’m one of the lucky ones: Syracuse University participates in the EEBO Text-Creation Partnership, so I have access even to texts that haven’t been made fully searchable. Without my university library access, I couldn’t possibly be an Early Modernist studying Jesuit literature. Syracuse is a long way from the Huntington and the Folger libraries, let alone Cambridge or Oxford. Not only do I not have a research budget as a PhD student, but some of the most prestigious libraries limit access to students already working on a dissertation.. If I hadn’t spent time browsing EEBO’s collections, I wouldn’t even know that I wanted to write about Jesuit literature. I may eventually have read that Richard Crashaw, a seventeenth-century poet and Catholic sympathizer-turned-convert, was raised by a virulently anti-Catholic father who wrote a tract called “The Bespotted Jesuite.” But without EEBO, I would never have had the opportunity to actually read the elder Crashaw’s text for its obsession with the maternal role of the Virgin Mary in Catholic notions of salvation, and then compare its horrified images of breastfeeding with the glorifying images that appear in the younger Crashaw’s baroque — even mystical — poetry. Without EEBO, I couldn’t read about the Maryland colony’s connection to the English Jesuit mission; I couldn’t perform full-text proximity searches comparing discourse on Eucharistic flesh and New-World cannibals; and I couldn’t crosscheck textual references to English Jesuits to add to Six Degrees of Francis Bacon.

 

A poorly copied black-and-white page of text titled “To OUR LADY OF Hall, and to the Child JESUS”; the rest of the text is half-obscured because text from the opposite side bleeds through.

A page from William Crashaw’s “The Bespotted Jesuite,” aka the “Jesuites Gospell” (1642). Read might be a generous verb.

But not everyone is so fortunate: in the few days when some RSA members believed they would lose their only means of accessing the full EEBO, proposals to make a #FrEEBO circulated on the internet. The conversations reminded me of when I graduated from undergrad and realized, to my horror, that I no longer had access to the Oxford English Dictionary. I found myself keeping younger classmates “on retainer,” pestering them to please, please look up the seventeenth-century definitions of this word so I can revise my writing sample to apply to grad school. Imagine being a scholar trying to publish a journal article for tenure and having to do the same thing — but with every single primary text you’re analyzing. Unlike the OED, the texts in EEBO are public domain, after all, even if ProQuest’s digitizations aren’t; there’s no reason scholars couldn’t create a parallel database that’s wholly public domain from inception.[3]

Digital texts have their shortcomings, of course, including other forms of inaccessibility as well. Untranscribed texts are wholly inaccessible to those with visual impairments. Databases like EEBO offer OCR transcriptions of some scanned texts, and while the good ones can be helpful, quality is inconsistent and frequently bad, especially for Early Modern typefaces and spellings. (If anyone has had a good experience using a screen reader with EEBO, let me know in the comments.) Digital texts also necessarily misrepresent the material object it’s based on by transcribing it into a different medium: a scan of a book obscures its size, its texture, its color, its smell, and even, in EEBO’s case, its cover. (More about that next week!)

A black-and-white scan of two pages of text fills the top two-thirds of the image; a transcription fills the bottom third. The transcription is filled with punctuation marks to signal line breaks and diacritical marks. Each transcription has a yellow post-it note icon in the middle of sentences. The text that fills the margins of the scan is not included in the transcription.

A side-by-side comparison between the scan and the transcription of two pages from “True relations of sundry conferences had between certaine Protestant doctours and a Iesuite called M. Fisher” (1626) in EEBO. To read marginal commentary, you have to click the yellow post-it note icons — a very different experience than the Early Moderns had.

 

But shortcomings shouldn’t stop us from finding new ways to increase access to these texts. One aspect of Jesuit philosophy that’s always resonated with me is that education is inseparable from social justice. Extensive higher education is required during Jesuits’ training in part because they are meant to share that knowledge in service to others. Education itself is a common good, and as an aid to education the cultural heritage contained in databases like EEBO shouldn’t be limited to scholars attached to the wealthiest schools — or even to scholars alone. If public scholars are truly committed to democratizing knowledge, our work shouldn’t end at merely presenting our research to the public, which only reinforces the ivory tower’s hierarchical relationship to the public. Our service to the public should extend to enable universal access to the primary sources we work with, so that anyone who wants to — no matter their situation — can discover not only our knowledge but also how we arrived at it, and how they could make some new knowledge themselves.

[1] http://folgerpedia.folger.edu/History_of_Early_English_Books_Online

[2] http://www.textcreationpartnership.org/tcp-eebo/

[3] https://medium.com/@john_overholt/together-we-can-freebo-b33d39618f8#.wpxzn95s1


Ashley O’Mara (@ashleymomara | ORCID 0000-0003-0540-5376) is a PhD student and teaching assistant in the Syracuse University English program. She studies how Ignatian imagination and Catholic iconology shape representations of sacred femininity in Early Modern devotional writings. In her down time, she writes creative nonfiction and snuggles her bunny Toffee.

The Human in the Digital Humanities (8 January 2016)

The digital humanities (or as the cool kids call it, DH) have been in my peripheral vision since my first year in grad school: something that looks useful and fun; but for someone who dreads calculating grades, working with data is intimidating. Last September, a series of DH events in a symposium on the future of the humanities inspired me to reconsider how the digital humanities fit into the humanities generally. This month, I’ll be looking at the human in the digital humanities in order to think about where the human is located in the humanities. To do this, I’m going to introduce to you some of my research on the Society of Jesus, or Jesuits, a missionary order of Catholic priests founded in Spain by Ignatius Loyola at the time of the Reformation. This focus is partly self-serving: I’m a dork and I love studying Jesuit literature even beyond my Early Modern period. But the connection between the Jesuits and the issues I’ll tackle is a lot closer than just my research. Ignatian philosophy on education, public service, and the relationship of the material to the ideal has greatly informed my appreciation of the digital and public humanities.

A cartoon of Ignatius Loyola, wearing sunglasses and holding a to-go cup of coffee.

Time to get Iggy with it.

The first event I was able to attend in the DH series this past September was a workshop with Daniel Shore and Chris Warren on the new DH project they’ve launched, Six Degrees of Francis Bacon (or SDFB).* If you’re familiar with cinema’s (Kevin) Bacon Number, the principle is similar: the database maps degrees of separation between major and minor figures in Early Modern England based on the different kinds of relationships they had with each other. Francis Bacon’s network of relationships greets visitors on the home page: he is one degree of separation from Anne Bacon (“parent of” Francis), Elizabeth Hatton (“attracted to” Francis), and William Fulbecke (“collaborated with” Francis); and he’s two degrees of separation from the Archbishop William Laud (via Thomas Coventry) and Sir Edwin Sandys (via Sir Thomas Coke). Francis Bacon also belongs to the groups “Virginia Company” and “Company of Mineral and Battery Works,” and you can search just for members of a single group or for members of two groups (turns out Bacon is the only one in the database who belonged to both those companies). While the foundational information for SDFB was imported from the Oxford Dictionary of National Biography, new information is crowd-sourced: anyone can add a new person, add a person to a group, add a new relationship, or assign a new relationship type (for admin approval, of course).

A very busy map of Francis Bacon’s first- and second-degree relationships.

Two degrees of Francis Bacon. I can only hope to be so socially well-connected.

Before I started playing with SDFB the night before the workshop, I hadn’t really understood how any DH methodologies, outside of simple word frequency analyses, would be useful to my research. But as I clicked around the website, looking up individual English Jesuits whose writings I’d read, I began to appreciate the power of visual representation of the connections between these priests and the social circles they moved in during their work in England.

Some historical context about the sixteenth-century English Jesuit mission is helpful here. With the replacement of all Roman Catholic bishops with conforming Church of England bishops, and with the institution of the 1584 “act against Jesuits, seminary priests, and such other like disobedient persons,” English Catholics could no longer ordain their own priests to serve their communities. Politically, England was reduced to the same status as an Aztec, Chinese, or any other historically non-Catholic kingdom: it became a mission field served by foreign-trained priests, mainly from the expat community in Douai, France. Indeed, it could even be more hostile than other foreign mission fields: the 1584 act made it high treason to be a Catholic priest, and a felony to aid one; even suspicion of either crime could subject a person to any number of gruesome tortures.

Protestant-era England did have two advantages over other mission fields, however. First, most of the Jesuit missionaries serving in England were born there or descended from English families. And second, Catholicism was still fairly widespread in its underground status, with some families even managing to retain considerable wealth. English Jesuits had something of a home turf advantage, and these connections were crucial to carrying out their work in often hostile territory.

Printed engravings of Edmund Campion (with a dagger in his heart, a noose around his neck, and gallows and stretchers in the background); Robert Southwell (with a dagger in his heart, a tiny noose around his neck, and a cherub waving a crown of laurels over his head); and Alexander Briant (with a dagger in his heart, a noose around his neck, and holding a handful of reeds while a cherub waves a laurel crown over his head)

Some of the English Jesuit martyrs: Edmund Campion, Robert Southwell, and Alexander Briant (whose good looks were legendary).

 

When I came to the SDFB workshop the next morning, my goal was to help map the English Jesuits’ human networks of supporters, and I was thrilled to find that this was something in which I had the expertise to contribute and that it was something that was useful to my research. Because of SDFB, I could begin to really see just how tightly connected many of the Jesuit missionaries and their English supporters were, something I hadn’t recognized in the disparate texts I had read before. And I was very pleased to convince Drs. Shore and Warren to add “Jesuits” as a standalone group in addition to “Jesuit missionaries to England,” in order to account for expats who never returned to England.

The database is in beta testing, so there are still some quirks and bugs and inefficiencies. There is a terrible shortage of women in the database, as a consequence of how only 6% of the entries in the Oxford Dictionary of National Biography, from which the vast majority of SDFB is drawn, are for women.1 This is particularly problematic for my research, as recusant Catholic women were better able to fly under the English government’s radar (so to speak), especially if their husbands conformed, and thus were essential to Jesuit ministry: hiding priests, offering financial support, and granting access to printing presses. (If you want to help boost the number of women in SDFB, check out the Networking Early Modern Women event on January 23 at the Carnegie Mellon and Folger libraries and live online.)

Some features of the current design can also be shortcomings. A sometimes-limiting selection of terms used to categorize and group relationships can flatten their contours and conceal the dynamics. In some ways, broad strokes are necessary to even begin to sort relationships. For instance, “collaborated with” or “attracted to” mean different things to different people, but a general sense of what they could mean enables the first step of investigation. On the other hand, it was rather chilling to see Robert Southwell’s visualized relationships to Robert Persons (a Jesuit) and Anne Howard (a recusant and priest-harborer) given equal weight as his relationship to Richard Topcliffe — his torturer.

aofig4

Robert Southwell’s first-degree relationships.

 

But these are the problems that are resolved by the humanities side of digital humanities. As I often remind my students, data does not an argument make. It doesn’t tell us anything — it must be interpreted. Thanks to SDFB, we can see the names, or the dates, or the likelihood of the relationships between people and the extent of their networks. But we need to read their texts and contexts not only to understand the difference between an ally and an enemy, but also to fully appreciate the contributions these figures made to literary history.

*Full disclosure: To my surprise and delight, I was made a curator for SDFB between writing this post and its publication. Opinions are very much my own.

Next week: EEBO and public-access literature

  1. networkingwomen.sixdegreesoffrancisbacon.com

Ashley O’Mara (@ashleymomara | ORCID 0000-0003-0540-5376) is a PhD student and teaching assistant in the Syracuse University English program. She studies how Ignatian imagination and Catholic iconology shape representations of sacred femininity in Early Modern devotional writings. In her down time, she writes creative nonfiction and snuggles her bunny Toffee.

The Dust-Heap of the Database and the Specters of the Spectator

In 2014, networks launched some 1,715 new television series, a staggering number that prompted many articles to declare variations on the theme “there are too many shows to watch.” Same story, different medium, I say. Franco Moretti, a contemporary literary scholar, writes that while twenty-first century Victorianists may (may) read around two-hundred Victorian titles, that barely counts as a drop in the bucket of the 40,000 titles published in the nineteenth century. And the other 39,800 novels? The short version: gone. The longer version: maybe not.

The plethora of “lost” Victorian novels challenges any sweeping claims about Victorian society based on the fourteen or so (depends on how you count) full-length novels of Charles Dickens. But it becomes even more daunting if one’s studies include explorations of Victorian popular magazines and journals. The Waterloo Directory of English Newspapers and Periodicals 1800-1900 lists 50,000 titles. If each of those titles published a single, twenty-page issue—and certainly they published more—that alone would amount to 1,000,000 pages to read.

The imbalance between what we read, what we could read, and what we can’t read makes Victorian studies (and, I suspect, other historical studies) a strange beast. Any decent Victorianist monograph will address the familiar tunes (Dickens, the Brontës, Eliot, etc.), but it will probably do so through ephemera and periodicals that maybe only the author has read thanks to hours of archival digging. The internet makes the strange Victorian studies beast even stranger. The internet not only changes how I do history because I can do most of my archival work from the back corner of Mello Velo (the local coffee shop, to which I owe my doctorate, whenever I finally defend). Historical research online changes academic reading practices, the kinds of arguments we can make, and finally, how we teach historical reading in the classroom. Internet archives make available texts virtually nobody has read. Electronic archives offer the chance to reinvigorate the dust-heap of forgotten novels—although with the change in what we can read, there comes an inevitable and sometimes ineffable change in how we read. It also makes it possible to discover a text nobody has read, without leaving the comfort of your favorite coffee shop table.

And yet, when I say a text nobody has read, this isn’t quite true. These texts do not simply appear on one’s screen. These historical documents already bear the marks of their nineteenth-century readers, but they now bear the marks of my search terms, the database algorithms and tags, scanners, computer processing, and somewhere in a basement, other people who plugged this material into the database. These extra, mostly ineffable hands mark the text like the fingerprint of electronic ghosts—and these spectral hands can sometimes offer us bizarre, fortuitous accidents.

I’m sorry, Peter. I’m afraid I you can’t read that.

Here’s an example. My dissertation is in part about Charles Dickens, because of course it is. I’m also heavily invested in Victorian literary criticism; that is, as opposed to Victorianist literary criticism of the twentieth- and twenty-first centuries, I gravitate toward the theories and ideas the Victorians themselves used to analyze their own work.  I’m specifically interested in Dickens’s serial publications (stories told in installments, like a modern television show), and I wanted to see what the Victorians thought about serialization.

So, off I go to sundry databases and metadatabases, where I search terms like “serial,” “part,” “periodical,” “novel,” and “publication.” As part of my search, I examined the Spectator Archives (1.5 million pages, by the way), where I found this priceless artefact: “Doe’s Oliver Twist.”

Wait, didn’t Dickens write Oliver Twist? you ask. Who on earth is “Doe”?

Welcome, Dear Reader, to the dust-heap of the archival database. Archives like the Spectator Archive use something called Optical Character Recognition (OCR), which is the process by which a computer converts scanned images of pages from something like an 1838 edition of a magazine into searchable text. It’s built in part by programs like reCAPTCHA, the obnoxious text you have to enter before buying or registering at some websites to prove that you’re a human, because only humans scream obscenities at their computers after the thirtieth failed entry.  It’s pretty incredible, when you think about it.

And it’s also terrible, as proven by the title: the Spectator Archive’s OCR rendered “Boz” as “Doe.” Wait, didn’t Dickens—

Yes, Dickens wrote Oliver Twist. But before that, he published Sketches by Boz, a series of wonderfully liberal musings on life in London. And so, when Dickens began to serialize Oliver in Bentley’s Miscellany in 1837, the author’s name was “Boz.” But the Spectator Archive doesn’t know that. In fact, it doesn’t know anything. It’s a scanner, and a computer that runs OCR software, tags its garbled production, and then throws it into the ether for some random grad student to stumble across. And behind that, someone—probably a random grad student or intern—in the basement of the Spectator building on Old Queen Street—could have read this article. Because someone had to put the page on the scanner and press “go.” Behind the Spectator is a series of spectral readers: the Victorians who may have read the article in 1838, the person who scanned the article, the scanner, the computer, the series of algorithms and programs that brought me from Google to the Archive and to that article.

“Doe’s Oliver Twist” is a gold-mine for Victorian theories of reading, serial publication, and distinctions between common readers and academic readers. But in order to find it, one has to enter the right search terms, and—here’s the real punchline—those search terms may abound in a document and not show up in the algorithm because the OCR is wrong. But there’s one final twist, and it isn’t Oliver.

deadpeople

No, it’s not that, either.

In fact, “Doe’s” showed up in my search results because something was OCR’d incorrectly. While it thought it recognized one of my terms, in fact, that term does not appear in the document.

Internet archives allow scholars to dive into the dust-heap of history. In their clunky, unintuitive ways, they cough up garbage and leave us to sort the mess. And as I will argue in future posts, they fundamentally alter the ways we perform these readings. Welcome to twenty-first century history: a tangled heap of trashed treasures and treasured trash.


 

Cover image: Stone, Marcus and Dalziel. The Bibliomania of the Golden Dustman. Scanned by Phillip V. Allingham. Victorian Web.

Peter Katz is a fifth-year Ph.D. student in Victorian Literature and Culture. His dissertation focuses on sensation fiction, the history of science, and the history of the novel.