Hostname: page-component-78c5997874-8bhkd Total loading time: 0 Render date: 2024-11-05T05:40:46.777Z Has data issue: false hasContentIssue false

Command Lines for the Humanities

Published online by Cambridge University Press:  08 October 2024

Rights & Permissions [Opens in a new window]

Abstract

Type
Theories and Methodologies
Copyright
Copyright © 2024 The Author(s). Published by Cambridge University Press on behalf of Modern Language Association of America

I begin my classes about poetry by asking my students what they already know. How do they know a poem when they see one? Would they be able to tell the difference between a sonnet and a ballad from across the room or in a column of newsprint? If they are not seeing white space on a page, what do they think a poem should sound like? Do any of these things fit precisely with that thing we might refer to as Poetry with a capital P? The metaphor of pretraining a model has been useful in approaching the question of how our formation as students and critics of literature prepares us to interpret different kinds of poetic genres—or of how we might prepare ourselves to be better readers of genres as such. The more we know about historical genres and media formats, the better we can understand historical contexts and therefore interpret historical texts.Footnote 1 And the more we know about the circulation of these genres and their formats, the more we understand about the power structures that underlie our access to the cultural materials we study, now largely mediated not only by the books in school rooms and libraries but by the companies that sell the subscriptions those libraries purchase (or don't purchase) so that we might access (or not have access to) collections of digitized materials from the past.Footnote 2 Asking what poetry means in the era of large language model (LLM) outputs allows us to approach historical and contemporary reading practices alongside questions about our own training: What is it we think we know about poems now and in the past, and where did that knowledge come from?

Prompted by that question, I've spent my career studying (collecting, counting, classifying, interpreting) the history of scholars of literature and linguistics (before these were separate disciplines) who argued about how to define poetry, from its smallest segments to its forms and genres and societal impacts. I'm therefore not too surprised that we now find ourselves in a moment in which, as Matthew Kirschenbaum and Rita Raley write, “linguistic protocols have been eclipsed by arithmetic means.” In some ways that was the dream of one branch of linguistics’ study of prosody—the branch of linguistics from which English departments split in the early twentieth century. This split, I might argue, still haunts our interpretations (at the most granular level) of poems and our conceptions (at the macro level) of the discipline of English poetic criticism.Footnote 3 Agreeing with Katherine Elkins in our convention seminar, I would argue that our current models for academic research, writing, publication, service, and promotion disallow and disincentivize us from any response other than what aligns with expected, traditional, individual modes of humanities research and undermine collaborative modes at work in linguistics. Could we retrofit some of our disciplinary habits or develop new habits in order to think beyond our disciplinary formations? To the institutional models that Kirschenbaum and Raley identify, I add a few more that I would like to see become normalized within the profession.

Collaboration → Forums, Open Conversations, Working Groups

The seminar format of the conversation at the MLA convention meant that we were encouraged to share papers in advance of the discussion. Working papers and works in progress are important in any new field, and there is plenty of work to go around. When I saw Rita Raley and Seth Perlow again in the seminar AI and/as Form the annual meeting of the American Comparative Literature Association (ACLA) a few months later, it felt like a continued conversation. I commend PMLA for publishing these essays so soon after the MLA convention in January. We all have to be careful about how we manage our extrainstitutional commitments, and those of us working in this space are often too overtaxed to provide what Kirschenbaum and Raley call “sacrificial labor” for our various institutions (I am on the AI committees for the dean of the college, the dean for research, and the provost, and on five other campus executive committees; I served on the interdisciplinary data science search committee, and I direct the Center for Digital Humanities). And yet, what if we imagined, instead of containment as an institutional response, proliferations of new formats for collaborative, interinstitutional work that would lead to collaborative outputs? This is already happening, but faculty members in modern languages are not rewarded or recognized for this work. Institutional administrators and faculty members alike should cite and acknowledge these ongoing, iterative policy papers; they rely on hard-won expertise and years of research and experience working with and thinking about language.Footnote 4 Yes, we are helped by products from the tech industry here. Let's leverage Slack, Zoom, Google Drive, and sync our Outlook calendars across multiple time zones. Fostering and sustaining interinstitutional collaboration—especially with our global colleagues who have different regulations and funding structures—is beneficial not only to us as individual scholars but to the entire profession.Footnote 5 I don't mean to reenact an era of manifestos within the profession as it currently stands (though the genre of the manifesto document might be helpful);Footnote 6 I mean to encourage avenues for collaborative, proactive response and cooperation across institutions and associations so that we might continue to reimagine the profession ourselves. Perhaps with enough participation and shared knowledge we could (in some cases) intervene to prevent the inevitability of foolhardy reliance on all manner of questionable AI-related products and services.Footnote 7 At the very least, we might take advantage of our interest in cocreated textual outputs to think about collaboratively writing a research paper.Footnote 8

Communication → Rapid Response Publications, Communicating to Broader Audiences

We also need to ensure that these collaborations become, in Kirschenbaum and Raley's words, “capturable by the metrics of performance review.” Now more than ever, we must move away from valuing the single-author monograph or peer-reviewed article as the sole marker of achievement. Publishing, circulating, citing, and valuing as scholarly labor the outputs of these collaborative working groups and forums is vital, and our funding structures, strapped though they may be, need to support this scholarly infrastructure broadly and effectively.Footnote 9 Scholars need to cite scholar-built resources and workflows (software, web applications, databases) as research outputs. We need to, once and for all, eliminate the sense that public writing or public scholarship is not real scholarship. We need to take seriously the idea that in some cases we cannot wait to respond until we have an article-length argument that will come out in a peer-reviewed publication in twelve or fourteen months. We should regularize the shorter forms that allow us to distill complicated arguments to a broader public even—maybe especially—when they correspond to longer-form writing.Footnote 10 Shifting our values toward collaboration and shorter, process-oriented publications will help us intervene in areas of machine learning (style, character) where we are experts.Footnote 11 It is not easy to move outside our regular models of publication and promotion, and I am not advocating that we abandon them but that we meaningfully expand them. I would click the bait that advertised a review of Character.ai by my colleagues who study character or agency, for example, and I would like for important evaluative work to count at the institutional level. Can we work with our university communications units to get these and other interventions, engagements, and critiques into op-eds and on social media platforms? Is the endgame to improve the closed LLMs with our input, to provide drag on the corporations that control them, or to build our own alternatives and explain them to different publics? I think that will depend on the scholar, but the more we own our expertise and turn it into actionable and, yes, marketable, interventions, the more likely our expertise will be taken seriously by researchers who are publishing papers at a lightning pace. Or, to put it another way, we already know how to find relevant data and to succeed in tasks that aren't easily measured by simple metrics, but we need to fine-tune our ability to communicate our expertise in simpler ways.

These systems are not one monolithic entity. What parts of our own training might be applicable for critical interventions, and how can we collectively respond in public (in addition to publishing our white papers and special issues and organizing seminars)? I believe strongly in the need for projects like AI for Humanists, but I believe even more in the need for humanities for AI.Footnote 12 We know more about character, style, image, creativity, poetry, and so forth, than computer scientists, and we likely always will, but we are not great at translating this knowledge into the kinds of critique or intervention that broader publics will read, nor are we rewarded for doing so. Of course, many of us are already collaborating meaningfully with researchers in machine learning, but here I mean also the kind of critique that comes from those who are not interested in collaboration per se, but whose critique exposes how much better an output could have been had the experts (the humanists) been consulted. All the interventions Kirschenbaum and Raley propose could use better circulation beyond the pages of this publication, but particularly the impetus to “bring the entire tradition of critical judgment—as it has been honed, debated, and theorized over the centuries—to bear on the problem of how to qualitatively evaluate model output.”

Method → Making Data Work for Us

How can we put our values as humanists into practice in a way that is applied rather than theoretical? We must focus on and clearly define our method, our praxis, as something that translates into the language of workflow. Some of you may be rolling your eyes, but others may be looking at the list of grant deadlines or going over Gantt charts and time lines and feeling grateful that you had to translate your research project into a work plan. When we do this, we break down how we work into its important component parts. We aren't forced to slow down and think about our process as researchers and writers as often when we have an argument or an idea for an article or a book. How do we do research? How does our writing process—and our planning for our writing—inform the arguments we make or the histories we write? I am asking whether we might translate our epistemic practices into communicable information, into a kind of workflow that might be broadly applicable (I would argue that these practices are already broadly applied, but not by experts). Humanities approaches are essential for understanding data and for understanding the world; they are not extra- or cocurricular. If, and let's just speculate here, everything becomes a branch of data science, then let's make it a slower, humanistic data science in which contextualization, close reading, critical thinking, collaboration, and communication are the starting points. We need more work, not less, that explains how we do what we do and why it matters. We need a command line for the humanities we can argue for beyond (and alongside) the claim that we are uniquely equipped to intervene.Footnote 13

Collectives → Datasets, Data Reviewing, Open Repositories, Data in the Humanities

Kirschenbaum and Raley ask us to move past copyright battles and “instead work to cultivate and support alternative data resources. Let there be legibility all the way down.” If we are going to work to cultivate and support alternative data resources and build community data hubs, then we must value and acknowledge the work of curating and creating datasets fairly within the profession. The Post45 Data Collective and the Nineteenth-Century Data Collective are working toward an umbrella Cultural Data Collective, and the Journal of Cultural Analytics and the Journal of Open Humanities Data both review datasets among other nontraditional outputs.Footnote 14 If we are going to value data work as scholarly labor then all of us who are in positions of power must meaningfully intervene at the moment of hiring or of tenure and promotion to recognize datasets and data essays as meaningful scholarly interpretation. Scholars need to be trained to understand their cultural materials as data and rethink their relationships with their already highly mediated forms of digitally assisted research and with the people who have been doing the labor of creating and maintaining these digitally mediated sources. Publishing datasets as research outputs is a good start, but building a scholarly community that knows how to read and evaluate a dataset is the dream. How many faculty members who read review files know how to review a dataset? Graduate students nowadays, one hopes, learned the difference between a text file and a page image as undergraduates in any course where their materials were uploaded electronically into a content management system. We all take a moment to explain this important mediation, don't we?Footnote 15

I would add to Kirschenbaum and Raley's remarks on data work that open-source tools and datasets require the often invisible labor of creating, curating, reviewing, and updating or versioning in addition to storing and providing a platform for retrieval of those tools and datasets. For the latter, the default community data hubs are a Wild West. Alan Liu (in, e.g., “Data Moves”) and others in digital humanities have known for years that curated humanities datasets for research are important for all kinds of exploratory data analysis and discovery, and that how we access and think through our resources as data is a crucial first step. If we are going to truly do the work of collaborating on datasets for language models, then we need to reconsider what we mean by “open” and why open access matters (HathiTrust sends me a daily email notifying me that certain texts are no longer available because of Google's copyright restrictions). Legibility all the way down, the cultivating and supporting of alternative data resources (deciding parameters, training annotators) relies on a huge amount of work that our profession has yet to learn how to measure, and it often relies on the invisible and undercompensated labor of information specialists and librarians who are tasked with doing coordination work on our behalf in addition to their other full-time jobs. We need to build relationships and collaborations with the people on whose labor our scholarly infrastructure already relies, if we haven't already done so.

I would not know anything about poetry without my decades-long conversation with the Historical Poetics reading group. I would not be able to participate in any conversations about AI or data science in the humanities had I not wanted to build a database so that I could search for the weird scansion marks that prosodists made on poems. When I started on this path, I didn't know the difference between a dataset and a website, and I certainly couldn't have told you what natural language processing is. The researchers at the Center for Digital Humanities at Princeton University have been interlocutors and collaborators and advocates for the humanities in many ways that I reflect on here only tangentially, from holding institutes on data science and the humanities to organizing sessions supported by the National Endowment for the Humanities on linguistic diversity in natural language processing to getting humanities datasets into Introduction to Data Science classrooms (and training graduate students in R along the way). Some days I would like to be able to think about a poem all day. Some days I still do that. That's what I naively thought my job as a professor was going to be. But historical poetry, though seemingly made more accessible by large-scale digitization than ever before, was also rendered unsearchable by Google Books’ OCR, and my own pretrained assumptions about literary value and archives and canons and literary history and even the affordances of particular kinds of close reading began to stretch so thin I could see right through them. We are already in a moment where our scholarly research and writing exceed the knowledge infrastructures and systems of evaluation of our profession. How quickly we buy into and build new infrastructures and systems is up to us.

Footnotes

With thanks to Natasha Ermolaev, Grant Wythoff, Quinn Dombrowski, Laure Thompson, Mary Naydan, and Ted Underwood.

1. The Princeton Prosody Archive (prosody.princeton.edu), for example, facilitates this kind of knowledge. For more on format, see McGill, “Format” and “Literary History”; McGill and Parker; and Sterne. On genre, see most of the work of the Historical Poetics reading group (historicalpoetics.edu), especially McGill's “The Traffic in Poems: Traversing the Atlantic,” one of its foundational documents.

2. On digitized archives and the question of access, see Cordell; Underwood; Schwartz and Cook; Mak; Gitelman; Gregg; Foreman and Mookerjee; Caswell and coeditors’ special issue of the Journal of Library and Information Studies naming the field (Critical Archival Studies); and in that issue especially Caswell's introduction (“Critical Archival Studies”). As Dugan and Smith write in this cluster, “[L]ibraries reveal institutional and epistemological investments, physically [and, I would add, digitally] connecting the legacies of techniques of curation, ordering, and description rooted in outmoded models for efficiency with their newer iterations.”

3. See Martin, Rise and “Writing.” In an effort to “bring linguistics and literary study closer,” as Aarthi Vadde suggests in this cluster that AI will, we might start by thinking about why we exiled linguistics from literature departments to begin with.

4. See the guidelines recently revised by the tireless work of Alan Liu and others on the MLA Committee on Information Technology. I am also thinking of the reports and position papers of Always Already Computational: Collections as Data (collectionsasdata.github.io/), among other collectively authored statements.

5. Examples include the various working groups of the Digital Research Infrastructure for the Arts and Humanities (DARIAH-EU), the European Consortium for Humanities Institutes and Centres, the European Alliance for Social Sciences and Humanities, standing committees and special interest groups for the Association of Computers and the Humanities, the European Association for Digital Humanities, and of course the working groups of the MLA. Elkins also mentions the Open Innovation AI Research Community. Many of these groups offer institutional visibility for collaborative work, and certainly people are collaborating, but the model for collaboration in the humanities is neither institutionally supported nor expected in the United States.

6. See, for example, Schüller-Zwierlein et al.; “Digital Humanities Manifesto,” which has one of my favorite definitions of the field (in the abstract): “The Humanities are more necessary than ever as our cultural heritage as a species migrates to digital formats. Our relationship to knowledge and information is changing in profound and unpredictable ways. Digital Humanities studies the cultural and social impact of new technologies as well as takes an active role in the design, implementation, interrogation, and subversion of these technologies.” See also “Collaborators’ Bill of Rights”; and Di Pressi et al.

7. Ithaka S+R keeps a valuable product tracker here: sr.ithaka.org/our-work/generative-ai-product-tracker/. We have already been through and are living in the aftermath of the digitization crisis and the educational technology crisis, and we have been in a labor crisis for some time. See Marcum and Schonfeld; Seybold, “Jason Wingard's EdTech Griftopia” and “Ed Tech.”

8. Or we might use our critiques of the models built with the labor of underpaid annotators to reflect on the impact these models have had on adjunct writing instructors at our own institutions. Scholars in cultural analytics and digital humanities more broadly are more accustomed to collaboration and there is no lack of coauthorship in those fields, but the typical literature department tenure and promotion structure is less likely to see coauthored work or collaborative research projects as something they are able to evaluate, and therefore in most cases we continue to rely on, and train our students to produce, the same kinds of scholarship our advisers produced, except now in a Word file instead of on a Brother word processor.

9. The Center for Digital Humanities oversaw an early discussion about “On the Dangers of Stochastic Parrots” six months after the piece was published, in late October 2021, with Underwood, Lauren Klein, and Gimena del Rio Riande, in conversation with two of the coauthors of the piece, Angela MacMillon-Major and Margaret Mitchell. Grant Wythoff oversaw the publication of these papers and their translation into Spanish (startwords.cdh.princeton.edu/issues/3/) and their introduction, by Natasha Ermolaev and Toma Tasovac, contextualized the discussion. Hosting a bespoke, in-house journal is difficult to say the least, but more forums for quick responses (such as the recent forum in Critical Inquiry, “Against Theory”) are needed.

10. For a playful, pre-ChatGPT example, see Lang and Dombrowski.

11. One genre of scholarly intervention we've seen in the past year asks what the models can do for literary and cultural analysis, and another asks what we can learn about the data in the models. At the most recent Computational Humanities Research conference, for example, several of the papers fit into one of these two categories. We need teams of humanists willing to test products; for an example of this kind of work, see Kubacka.

12. The AI for Humanists project (aiforhumanists.com; formerly the BERT for Humanists project) is developing resources to “inform, empower, and inspire” humanities scholars to use LLMs in their disciplines in creative new ways. The project is directed by Matt Wilkens and David Mimno at Cornell University and Melanie Walsh at the University of Washington.

13. When I write “command lines for the humanities,” I am referring to the interface (now alien to many computer science students!) through which we can enter text commands to ask a computer to perform a variety of tasks, but I am also talking about something that is commonly known as a basic building block in computer programming. In my classes, the command line is a useful way to teach the history of computing (it improved on punch card technology) as well as to show students how to find things beyond the metaphor of the folder to which they are accustomed. The humanities, or, more specifically, literary study, has a proliferation of building blocks but not a single concept—or even a collection of concepts—that we might consider a command line, or a common set of practices that we (within and beyond literary study) agree on.

14. Post45 Data Collective (data.post45.org) is currently edited by Dan Sinykin (Emory University) and Melanie Walsh (University of Washington). Laura McGrath (Temple University) was a founding editor. The Nineteenth-Century Data Collective (c19datacollective.com) is edited by Megan Ward (University of Oregon) and me, along with Sarah Rief-Connell (Princeton University).

15. We must reckon, here, with our own complacency with the infrastructures, standards, labor, and material structures of our digital environments. As Kirschenbaum and Raley write, “[W]hen we type, the flickering signifiers that appear on the screen mask a cascading series of translations from character encoding down through levels of programming languages and assembly code, at the root of which is machine code's manipulation of electrical voltages. Input and output signals, in other words, have been undergirding language as such for many decades of this journal's publication.” Tenen, Kirschenbaum and Raley, and others have argued that these environments have as many political and social consequences as the book market and the material formats through which we traditionally encounter texts. And yet, unlike the technology of the book, the digital formats on which we rely require ongoing human labor to update and maintain them.

References

Works Cited

“The AI for Humanists Project.” AI for Humanists, aiforhumanists.com. Accessed 11 Apr. 2024.Google Scholar
Caswell, Michelle, et al., editors. Critical Archival Studies, special issue of Journal of Library and Information Studies, vol. 1, no. 2, 2017.Google Scholar
Caswell, Michelle, et al., editors. “Critical Archival Studies: An Introduction.” Journal of Library and Information Studies, vol. 1, no. 2, 2017, https://doi.org/10.24242/jclis.v1i2.50.Google Scholar
Cordell, Ryan. “‘Q i-tjb the Raven’: Taking Dirty OCR Seriously.Book History, vol. 20, 2017, pp. 188225, https://doi.org/10.1353/bh.2017.0006.CrossRefGoogle Scholar
Digital Humanities Manifesto 2.0.” Assembled by Schnapp, Jeffrey et al., Multitudes, vol. 59, no. 2, 2015, pp. 181–95.Google Scholar
Di Pressi, Haley, et al. “A Student Collaborators’ Bill of Rights.” Humanities Technology, University of California, Los Angeles, 8 June 2015, humtech.ucla.edu/news/a-student-collaborators-bill-of-rights/.Google Scholar
Foreman, P. Gabrielle, and Mookerjee, Labanya. “Computing in the Dark: Spreadsheets, Data Collection, and DH's Racist Inheritance.” Always Already Computational: Library Collections as Data: National Forum Position Statements, Mar. 2017, pp. 11–12, collectionsasdata.github.io/aac_positionstatements.pdf.Google Scholar
Gitelman, Lisa. “Searching and Thinking about Searching JSTOR.” Representations, vol. 127, no. 1, summer 2014, pp. 7382, https://doi.org/10.1525/rep.2014.127.1.73.CrossRefGoogle Scholar
Gregg, Stephen H. Old Books and Digital Publishing: Eighteenth-Century Collections Online. Cambridge UP, 2020.Google Scholar
Kubacka, Teresa. “There Is More to Reliable Chatbots than Providing Scientific References: The Case of ScopusAI.” The Scholarly Kitchen, 21 Feb. 2024, scholarlykitchen.sspnet.org/2024/02/21/guest-post-there-is-more-to-reliable-chatbots-than-providing-scientific-references-the-case-of-scopusai/.Google Scholar
Lang, Anouk, and Dombrowski, Quinn. “The Ghost in Anouk's Laptop.” The Data-Sitters Club, no. 9, 18 Feb. 2021, https://doi.org/10.25740/ys319vz9576.CrossRefGoogle Scholar
Liu, Alan. “Data Moves: Libraries and Data Science Workflows.” Libraries and Archives in the Digital Age, edited by Mizruchi, Susan, Palgrave Macmillan, 2020, pp. 211–19.CrossRefGoogle Scholar
Mak, Bonnie. “Archaeology of a Digitization.” Journal of the Association for Information Science and Technology, vol. 65, no. 8, 2024, pp. 1515–26.CrossRefGoogle Scholar
Marcum, Deanna, and Schonfeld, Roger C.. Along Came Google: A History of Library Digitization. Princeton UP, 2021.CrossRefGoogle Scholar
Martin, Meredith. The Rise and Fall of Meter, 1860–1930. Princeton UP, 2012.Google Scholar
Martin, Meredith. “The Writing of Sound.” The Sound of Writing, edited by Cannon, Christopher and Justice, Steven, Johns Hopkins UP, 2023, pp. 127–50.Google Scholar
McGill, Meredith L.Format.” Keywords in Early American Literature and Material Texts, special issue of Early American Studies, edited by Dinius, Marcy and Hazard, Sonia, vol. 16, no. 4, fall 2018, pp. 671–77.Google Scholar
McGill, Meredith L.Literary History, Book History, and Media Studies.” Turns of Event: American Literary Studies in Motion, edited by Blum, Hester, U of Pennsylvania P, 2016, pp. 2339.Google Scholar
McGill, Meredith L.The Traffic in Poems: Traversing the Atlantic.” Introduction. The Traffic in Poems: Nineteenth-Century Poetry and Transatlantic Exchange, edited by McGill, Rutgers UP, 2008, pp. 112.Google Scholar
McGill, Meredith L., and Parker, Andrew. “The Future of the Literary Past.” PMLA, vol. 125, no. 4, Oct. 2010, pp. 959–67.Google Scholar
MLA Committee on Information Technology. “Guidelines for Evaluating Digital Scholarship.” Modern Language Association, 2024, www.mla.org/Digital-Scholarship. PDF download.Google Scholar
Schüller-Zwierlein, André, et al. The Ljubljana Reading Manifesto: Why Higher-Level Reading Is Important. readingmanifesto.org/. Accessed 11 Apr. 2024.Google Scholar
Schwartz, Joan M., and Cook, Terry. “Archives, Records, and Power: The Making of Modern Memory.” Archival Science, no. 2, 2002, pp. 171–85.Google Scholar
Seybold, Matt. “Ed Tech, AI, and the Unbundling of Teaching and Research.” The American Vandal, episode 13, 2 Nov. 2023, theamericanvandal.substack.com/p/ed-tech-ai-and-the-unbundling-of.Google Scholar
Seybold, Matt. “Jason Wingard's EdTech Griftopia.” Los Angeles Review of Books, 23 Feb. 2023, lareviewofbooks.org/article/jason-wingards-edtech-griftopia/.Google Scholar
Sterne, Jonathan. MP3: The Meaning of a Format. Duke UP, 2012.Google Scholar
Tenen, Dennis. Plain Text: The Poetics of Computation. Stanford UP, 2017.CrossRefGoogle Scholar
Underwood, Ted. “Theorizing Research Practices We Forgot to Theorize Twenty Years Ago.” Representations, vol. 127, no. 1, Aug. 2014, pp. 6472.CrossRefGoogle Scholar