A morphosyntactic authorship attribution study of the speeches of Demosthenes and Apollodorus

Vanessa B. Gorman; Robert J. Gorman

doi:10.1017/S0075426924000302

A morphosyntactic authorship attribution study of the speeches of Demosthenes and Apollodorus

Published online by Cambridge University Press: 21 November 2024

Vanessa B. Gorman and

Robert J. Gorman

Show author details

Vanessa B. Gorman*: Affiliation:
University of Nebraska-Lincoln
Robert J. Gorman: Affiliation:
University of Nebraska-Lincoln
*: Corresponding author: Vanessa B. Gorman; Emails: [email protected]; [email protected]

Article contents

Abstract
Introduction
Feature preparation
Classification
Results and discussion
Conclusion
Footnotes
References

Rights & Permissions

Abstract

Questions about authorship have plagued the corpus of Demosthenic orations since antiquity. In particular, scholars often assign certain speeches (usually 46, 49, 50, 52, 53 and 59; sometimes also 47 and 51) to Apollodorus, son of Pasion. We apply an innovative approach to the problem, using morphosyntactic information from dependency treebanks. From the treebank annotation we create input data for various well-established computational approaches to authorship attribution. The usefulness of the input data is first tested with clustering algorithms. We then make finer distinction with a logistic regression classifier. All steps are explained in detail for the benefit of those unfamiliar with computational stylometry. In broadest terms, our results are remarkably consistent with the common opinion about the orations, identifying 49, 50, 52 and 53 as written by a single author, who was not Demosthenes (presumably Apollodorus). We also discuss syntactic traits that are peculiarly ‘Apollodoran’ or ‘Demosthenic’. However, we demonstrate that the data point away from both authors for Dem. 46 and 51, while conclusions about 47 and 59 are ambiguous.

Keywords

Demosthenes stylometry author attribution dependency syntax Apollodorus

Type: Research Article
Information: The Journal of Hellenic Studies , Volume 144 , November 2024 , pp. 65 - 92

DOI: https://doi.org/10.1017/S0075426924000302 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of the Society for the Promotion of Hellenic Studies

I. Introduction

Since antiquity, the collection of speeches associated with the name Demosthenes has been plagued by questions of authorship. While critics have long since reached consensus for many works in the Corpus Demosthenicum, other orations are still the subject of doubt and dispute.Footnote ¹ In particular, a number of speeches are often assigned to Apollodorus, son of Pasion. Recently developed stylometric techniques allow us to approach the same attribution question in ways that employ the big data sets of corpus linguistics in order to assign a statistical probability of authorship. This new approach confirms the common opinion for the most part, but also demonstrates that several speeches usually assigned to either Demosthenes or Apollodorus instead have an outlier status.

In his seminal work, Jeremy Trevett examines the speeches delivered by Apollodorus (45, 46, 49, 50, 52 and 59) and others that have variously been attributed to him (47 and 51), and reviews the scholarship pertaining to them.Footnote ² Drawing conclusions on stylistic grounds, as well as from a consideration of the ancient testimonia, Trevett deduces that ‘Demosthenes almost certainly wrote 45’, and that ‘46, 49, 50, 52, 53, and 59 are not the work of Demosthenes, and were probably written by the same man’, presumably Apollodorus. Finally, he asserts that 51 is genuinely Demosthenic, while the authorship of 47 is problematic.Footnote ³

Computational stylometry came early to the study of Demosthenic authorship. Already in the 1970s, Donald McCabe used computers to calculate distributions of certain metrical patterns, hiatus, etc. among the works of the corpus.Footnote ⁴ That study was a Herculean task since the author had to acquire a copy of the texts digitized on magnetic tape and write a series of programs in the BASIC computer language. Fortunately, the recent arrival of inexpensive computational power and the associated increase in research in stylometric authorship attribution across many academic disciplines have made the tools and methods of this field of study widely available.

In the current scholarship of authorship attribution, most analyses are based on the authorial lexicon. The vocabulary of a text or author is scrutinized to reveal patterns that may be presumed to transcend the local circumstances (topic, audience/readership, genre, etc.) in order to serve as a reliable authorial signature. Scholars recognize that content words (for example, most nouns and verbs) are highly dependent upon local context. Therefore, most investigations concentrate on a limited stratum of vocabulary, the most frequent words. Underlying this procedure is the fact that for any language there are relatively few content words among the several hundred most frequently occurring items of vocabulary. Instead, the top frequency ranks are populated by function words (prepositions, conjunctions, auxiliaries, articles and the like). Patterns in function words are taken to be a reasonable proxy for the peculiarities of an author’s syntax.

Studies based on a direct analysis of syntax are rare. Hand annotation by experts is costly and time-consuming. However, the results of algorithmic, or automated, annotation are considered too noisy to be useful.Footnote ⁵ Students of Demosthenes have a rare advantage in this connection. Syntactic data are available on parts of the corpus. Vanessa Gorman has manually annotated a substantial selection of Greek prose and the results are publicly available for download.Footnote ⁶ At the time of the current investigation, the Corpus Demosthenicum is represented in Gorman’s dependency treebank by 21 works totalling more than 95,000 words.

Dependency grammar is a mode of syntactic analysis that has become increasingly widespread in recent times. In essence, it differs from the more familiar (at least in the anglophone world) phrase structure grammar in at least two important ways. First, dependency grammar makes the verb the root structure of a clause, while phrase structure grammar portrays a basic binary structure (subject/predicate). Next, dependency grammar posits a direct relationship between the words of a clause and constitutes the syntactic structure from this relationship. In contrast, phrase structure grammar assumes the existence of intermediate elements, such as the noun phrase, for which a syntactic analysis must account.Footnote ⁷ These characteristics, and a resultant tolerance for free word order and discontinuous constituents, make dependency grammar especially suitable for analysis of languages such as ancient Greek.Footnote ⁸ Thus, a variety of dependency analysis was chosen as the annotation scheme by the Ancient Greek and Latin Dependency Treebank (AGLDT) hosted by the Perseus Digital Library.Footnote ⁹ This lead has been followed by many others, including Gorman in her extensive work.

An example may be the best way to introduce the details of AGLDT annotation. Fig. 1 shows a graphical representation of a sentence from Demosthenes’s Against Conon (54.3):

Fig. 1. Sample tree of Demosthenes 54.3. ἐσκήνωσαν οὖν οἱ υἱϵῖς οἱ Κόνωνος τουτουὶ ἐγγὺς ἡμῶν, ὡς οὐκ ἂν ἐβουλόμην (‘Thus, the sons of that Conon made camp near us, as I would not have wished’).Footnote ¹⁰

Here we see the hallmark of dependency grammar: each word in the tree except the main verb is dependent directly on another word, but only one word. The lines connecting words bear a label indicating the syntactic relationship that holds between the two relevant words. These labels are drawn from a standard inventory set forth in the AGLDT Guidelines. For example, AuxY indicates a sentence adverbial, AuxC a subordinating conjunction, OCOMP a complementary object and ATR an attribute of a substantive. Invisible in this view is the fact that each word has also been annotated with lemma and morphological information: gender, number, case for nominals, etc.

The graphical form (the syntax tree) in which this information is presented here illustrates the crucial advantage offered by texts that are annotated according to a well-designed protocol such as the AGLDT: they are machine actionable. This quality not only allows for the production of useful visualizations such as trees, but also ensures that the annotation is available to serve as input for the wide variety of computer applications now available for language analysis.

II. Feature preparation

Considered from the simplest perspective, stylometric authorship studies are based on counting phenomena in the relevant texts. The particular textual phenomena whose counts are used for the computations that may produce an attribution are commonly called (input) features or (input) variables. The choice among types of input features (for example, most frequent words, character n-grams, hapax legomena) is a crucial step and is often used to classify attribution studies themselves in bibliographical reviews.Footnote ¹¹ As noted, most authorship attribution experiments use features based on vocabulary, and those rare attempts to incorporate syntactic features have often proven disappointing.Footnote ¹² It is likely that these poor results are often due to the use of shallow syntactic features. Generally, the relevant studies limit themselves to features based on part-of-speech (POS). Thus, attribution is calculated according to the relative frequency of noun, verbs, etc. Some more sophisticated feature sets include POS n-grams and some morphological information,Footnote ¹³ but these data, at best, only indirectly reflect the syntactic structures of the texts in question.

In contrast, recently published work has shown that using dependency treebanks to generate deeper syntactic features can result in very successful attribution experiments.Footnote ¹⁴ The present investigation will incorporate input features designed on the same lines.

Treebanks annotated according to the AGLDT protocol offer a range of morphosyntactic information. Each word is given a morphological annotation: gender, number, case and possibly degree of comparison for nominals; person, number, tense, mood and voice for finite verbs, etc. Along with the POS itself, these data do give some information about syntactic structure and certainly allow a better representation of that grammatical level than does a study based on frequencies of function words.Footnote ¹⁵ However, each word in the treebank contains three data points that transform the morphological annotation into a deep structural nexus. First, and most obviously, each word is explicitly tagged with its syntactic function in relation to its parent word. In the sentence given in fig. 1, the annotation tells us that Κόνωνος is a masculine, singular, genitive noun and that it functions as an attribute. And, crucially, we know what word Κόνωνος is an attribute of, for the second key syntactic data point is the index of the parent word. Thus, we see that Κόνωνος is an attribute of υἱϵῖς which in turn is the subject of ἐσκήνωσαν, which is the main verb. More importantly for authorship attribution, if we abstract away from specific vocabulary, we have a MASC SG GEN noun attribute of a MASC PL NOM noun subject of an AOR IND ACT verb. The final important structural datum is the index for the position of each word in the linear order of the sentence.Footnote ¹⁶

Taken together, the morphological annotation and the linear and hierarchical ordering constitute an extremely rich database of the deeper syntactic characteristics of a Greek text. However, the extent to which this information can be brought to bear on questions of authorship depends in large measure on the specific design of the input features based on this data.

As we have seen, the AGLDT schema annotates nine morphological categories plus the dependency relationship.Footnote ¹⁷ These data are the building blocks of our input features. For example, from the annotation for Κόνωνος and its parent υἱϵῖς, we may derive many different input features.Footnote ¹⁸ Here is a sample of possibilities:

• noun
• dependency on a noun
• singular noun
• noun dependent on noun
• genitive dependent on subject

The process of combining morphosyntactic features along a hierarchical dependency axis can lead to an enormous set of input features, many of great complexity, but setting a limit is necessary. Because the presence of annotation giving the dependency parent of every word allows the transformation of shallow morphological data into deeper structural information, it might seem optimal to include such information in the greatest depth possible. Here depth is meant as a measure of the number of dependency generations involved in an input feature. Thus, the word sequence ἐσκήνωσαν … υἱϵῖς … Κόνωνος from Dem. 54.3 has a depth of three. However, including features of greater hierarchical depth inevitably leads to a sharp increase in data sparsity. Sparsity is a technical term referring to the non-occurrence of input features in a text. For example, an attribution attempt might take the entire Greek lemmatized vocabulary as its feature set. In such circumstances, most texts in a corpus, no matter how large, would show a count of zero for the large majority of the input features. In other words, the resultant data would be extremely sparse. This is a deleterious outcome and should be avoided, since accuracy in author attribution tends to decrease with an increase in sparsity. Indeed, extra precautions against sparsity must be taken when a hierarchical relationship such as syntax is included in the input features, as the following examples will show.

It is easy to see the connection between depth and sparsity. Once more, take as an example the dependency sequence ἐσκήνωσαν … υἱϵῖς … Κόνωνος. A simple search of Against Conon reveals that there are 594 words annotated as ATR out of a total of 3,283 for the speech (= 18 per cent). We can expand the depth of our query by including the dependency relationship of the parent word of our target (υἱϵῖς … Κόνωνος = ATR dependent on SBJ). There are 97 instances of this hierarchical sequence in the speech (≈ 3 per cent). If we increase the depth by another increment to include the dependency relationship of the syntactic grandparent of Κόνωνος (the sequence ATR-SBJ-PRED),Footnote ¹⁹ the query finds 25 occurrences (≈ 0.8 per cent).

The generational effect on sparsity increases as the input variables become more complex. For example, considered from the perspective of a combination of case and number, Κόνωνος is one of 199 words annotated as GEN SG in the speech (= 6 per cent). The sequence GEN SG dependent on NOM PL (υἱϵῖς … Κόνωνος) is reported only five times (= 0.15 per cent). It will therefore come as no surprise that a three-generation sequence of GEN SG dependent on NOM PL dependent on AOR occurs just the once in Against Conon, precisely in our example sentence. To avoid sparsity, therefore, we have limited input features in this study to data drawn from only two generations: features are constructed using annotation from each target word and the dependency parent of the target word, if such a parent exists.

Similarly, a balance must be struck with respect to the number of morphosyntactic categories that may be included in a single input feature. The few previous studies that have tested such features have shown a tendency to group morphological categories into compound features. For example, in one study the Portuguese verb é (‘is’) generates a ‘flexion’ feature with the value PR 3S IND VFIN, while the indefinite article um is assigned the flexion feature M S (masc sg).Footnote ²⁰ That is, instead of treating the different morphological values as separate variables, the researchers group them as one, creating more and unnecessary sparsity. The frequency of these compound features undoubtedly contains some information about the style of a given author, but it may at the same time obscure other more informative data. It is impossible to know in advance whether, for example, the frequency distributions of tense or person are more useful for an authorship attribution than is the compound feature. In other words, it is by no means certain which of the following facts carries more information about authorial style in the Against Conon: it contains 88 words that are MASC SG GEN, 173 MASC GEN words, 199 SG GEN words, 1,166 SG words and so forth. For this reason, in addition to simplex features (containing only one category of annotation), we construct compound features including all feasible combinations of categories.

Feasibility is always a serious consideration when dealing with combinations. As we have seen, the AGLDT annotation produces nine categories of morphosyntactic data, as well as the dependency relationship, that may be associated with any word. Since our features will contain data for each target word and its parent word, there are 20 categories from which to construct combinations. Accordingly, there are 2²⁰ possible combinations: over a million in all. Leaving aside the enormous sparsity that a million types of input features introduce, the resultant computations are practicably impossible. Thus, we limit the study to simplex features and to those made from combinations of two or three categories of data. Even under this constraint, there remain over 1,300 combinations under consideration.

A few specific examples may illustrate this limitation on the input features. Once more, we will use the annotation for ἐσκήνωσαν … υἱϵῖς … Κόνωνος. An input feature containing values for the gender, number and case of Κόνωνος is allowed, but if we include the POS value, the annotation for either gender, number or case must be deleted, since we have limited the variables to sets of no more than three elements. As a result, the information from the four morphological categories, POS, gender, number and case, is distributed among a series of input variables: gender, number and case; POS, gender and number; POS, gender and case; etc. In a similar fashion, the greater morphological complexity of verbs means that information about a verb may be more sharply truncated in any single input feature. Thus, ἐσκήνωσαν may be represented by a variable with values for tense, voice and mood, another variable for tense, voice and person, etc. Combining annotation for a word and its dependency parent naturally limits the contribution made by each word to the input feature. A variable may include the gender, number and case of υἱϵῖς, but if the tense and mood of the parent ἐσκήνωσαν is included, the result is a series of variables such as gender of υἱϵῖς, tense of parent and mood of parent; number of υἱϵῖς, tense of parent and mood of parent, etc. From this discussion, we can see that the limits of what is computationally practical entail that much of the morphosyntactic information from the texts is encoded indirectly.Footnote ²¹

Alongside the morphosyntactic categories discussed so far, we have also included a small additional group in our feature set. These additions reflect information about the linear order of words in our texts. The first of these linear features considers dependency distance (DD). This DD is nothing more than the distance in the linear order of a sentence between a given word and its parent word, as measured in the number of words. Alternatively, we can think of DD as the absolute value of the index of the parent word minus the index of the target word. Thus, in our example sentence, the DD of Κόνωνος is 2 since it is word six and its parent υἱϵῖς is word four. As treebanks of many languages become widely available, research on DD is burgeoning. It has been suggested as a proxy for sentence complexityFootnote ²² and as an explanation for aspects of word order.Footnote ²³ It is therefore reasonable to include DD among our input features on the assumption that it represents something important about authorial style.Footnote ²⁴

The second addition to our set of categories is dependency direction (DDir). This category is concerned directly with word order and is quite simple, noting whether each word comes before or after its parent word in the linear order of the sentence. Word order has long been a staple of analyses of Greek stylistics, so it naturally finds a place in a stylometric study.Footnote ²⁵

Selecting which categories of hierarchical or linear data to use does not complete the process of creating effective input features. Each of the many combinations of one to three categories represents a data type.Footnote ²⁶ But each type can itself have many values: a type representing the gender and number of the target word can have any one of nine values since there are three possible genders and also three possible numbers, and so on for all types. Thus, another round of feature culling is in order. Again, because we cannot know in advance which combinations of types and values will be more or less significant for authorship, we fall back upon a naïve method. Only the most frequent items will be retained. These frequencies are established empirically, through an examination of the corpus. Because counting the number of occurrences for such a large number of combinations is computationally slow, a small sample corpus is used. This sample consists of 500 randomly selected words from every file in the treebank. Evaluation of the sample shows that, even for our limited set of combinations (1–3 categories), there are more than 156,000 different type-value pairs. As is to be expected from linguistic data, most of these pairs are useless for authorship attribution owing to their low frequency. More than 21 per cent occur only once in the sample while 70 per cent of the pairs have a frequency of less than 0.1 per cent. To avoid sparsity and at the same time include a wide range of variables, we set the minimum frequency for inclusion at 2.5 per cent.Footnote ²⁷ The result is a set of 3,189 input features. A few examples of type-value pairs may help readers reach a clear understanding of the input features (fig. 2).

Fig. 2. Examples of type-value pairs in Against Conon.

III. Classification

The scholarship on authorship attribution offers an ever-increasing set of methods, some of great complexity. Before attempting a full-scale attribution experiment, it is prudent to test one’s chosen input features in order to see if they carry the kind of information pertinent to the question at issue. Perhaps the simplest such test is based on measuring the distance between each file in the target corpus. We will illustrate this step using a toy example (i.e. a simplified, small example) with a corpus containing only: Demosthenes For Phormio (36) and Against Pantaenetus (37), Xenophon Hellenica 1 and 2 and Polybius Histories 1.

Mathematically, distance between texts is analogous to physical distance. Thus, we take the input features of our texts and treat them as dimensions in space. In Polybius 1 the frequency of nouns is 20.7, verbs 18.3 and adjectives 9.4 (per 100 words). The respective frequencies in Xenophon Hellenica 1 are 21.6, 21.6 and 8.8. We can imagine these values as coordinates (x-, y- and z-axis) fixing the location of the two texts as points in three-dimensional space. The straight-line distance between the points can be calculated relying on Euclid:

$$\sqrt {{{\left( {20.7 - 21.6} \right)}^{2{\rm{\;}}}} + {\rm{\;}}{{\left( {18.3 - 21.6} \right)}^2}{\rm{\;}} + {{\left( {9.4 - 8.8} \right)}^2}} =$$

$$\sqrt {{{\left( { - 0.9} \right)}^2} + {\rm{\;}}{{\left( { - 3.3} \right)}^2} + {\rm{\;}}{{\left( {0.6} \right)}^2}} =$$

$$\sqrt {0.81 + 10.89 + 0.36} =$$

$$\sqrt {12.06} = 3.4$$

The Euclidean distances between the positions established by these input features are presented in fig. 3. The same calculation can be made using any number of dimensions (i.e. input features). Although distances in these higher dimensions cannot readily be visualized (or even comprehended) geometrically, research in classification has recognized that such simple distance measures can effectively capture the similarity/dissimilarity among many phenomena.

Fig. 3. A sample toy distance matrix.

The matrix meets our expectations in several ways. The smallest distances and therefore the greatest similarities are within the two sets of texts by a single author. Demosthenes 36 and 37 have a distance of 55.4 and the two books of Xenophon’s Hellenica have a distance of 47.8. It is also reassuring that the two parts of Xenophon’s single work are closer together than the two speeches of Demosthenes, written as they were for delivery in very different circumstances. Book 1 of Polybius, a writer well known for the idiosyncrasies of his style, is not surprisingly quite dissimilar to Demosthenes, with Histories 1 roughly three times as distant from the two speeches as the Demosthenic texts are from each other. At the same time, Polybius 1 is much closer to Xenophon than to Demosthenes, perhaps reflecting generic similarities between the historians. In sum, the morphosyntactic information contained in our input features seems to support a grouping of texts that is consistent with what we know from other sources.

Patterns within data are generally much harder to discern than those given in our toy example. Cluster analysis is a powerful tool to identify groups in complex data sets. Essentially, clusters are groups such that ‘the objects in any group are more similar to one another than they are to objects in other groups’.Footnote ²⁸ Distance/dissimilarity measures such as those in fig. 3 are the most common basis for cluster analysis. The results of cluster analysis can be displayed in a dendrogram (fig. 4).

Fig. 4. Dendrogram showing clusters of the sample toy data set.

We see that the cluster analysis identifies two main groups. Group 1 contains the two Demosthenic orations and group 2 consists of the three texts of narrative history. Group 2 has an internal structure, with the two books of the Hellenica forming a subgroup distinct from Polybius 1. The height scale on the y-axis indicates distances in the underlying matrix. The position on the y-axis of the horizontal line joining two texts gives the distance between elements. For example, the height of the line joining Hell. 1 and Hell. 2 is about 50 (actually, 47.8 according to the matrix), and the line between Polybius 1 and the Xenophon group shows a distance of about 100.Footnote ²⁹

This extremely simplified application of distance measurement and cluster analysis indicates that our input features show promise. The clusters revealed in this process correspond to the different authors involved. It remains to test the value of the data on the more difficult problem posed by the Demosthenic texts in the treebank. Caution is called for here, since the results of cluster analysis can be quite sensitive to the input features and parameters chosen.Footnote ³⁰ A useful method of dealing with this concern is to repeat the analysis many times with a wide variety of settings. The results may be visualized in consensus trees for comparison.Footnote ³¹ Figs 5 and 6 present examples of consensus clustering of the 21 Demosthenes files in the Gorman treebank. Common to all analyses is the underlying sampling process. Repeated cluster analyses were made based on different sets of input features: starting with the 100 features with the greatest values (the most frequently occurring morphosyntactic combinations), features were added to the cluster analyses two at a time in order of decreasing frequency. Thus, each consensus clustering represents 1,472 iterations. Each consensus tree differs in the type of distance measurement and clustering linkage used.

Fig. 5. Unrooted consensus tree of 21 works of Demosthenes. Distance measure = Euclidean. Clustering linkage = Average.

Fig. 6. Unrooted consensus tree of 21 works of Demosthenes. Distance measure = Wurtzburg. Clustering linkage = Ward.

These trees show some odd disagreements that will bear closer examination: for example, the placement of Demosthenes 46 and 51 is quite divergent. However, the analyses reproduce reasonably well the traditional view about Demosthenic and Apollodoran authorship. In both consensus trees, the grouping of 47, 49, 50, 52, 53 and 59 is sharply separated from the other speeches. Recall that Trevett identified 49, 50, 52, 53 and 59 as ‘not the work of Demosthenes’ and considered 47 to be ‘very similar’ in style to these others.Footnote ³²

These consensus trees are clear evidence that the information inherent in our input features is granular enough to contribute to the question at issue. This impression is confirmed by the last of our clustering tests, the creation of a consensus network. In this method, each text’s nearest neighbour is identified in a large number of iterations.Footnote ³³ The results are then combined to create a network of nodes joined by heavier or lighter edges according to the extent to which the nodes were found to be most similar to each other. Visualizations of the results are presented in figs 7 and 8.

Fig. 7. Consensus network of all 21 Demosthenic works. The thickness of edges between nodes represents the frequency at which node pairs were identified as nearest neighbours.

Fig. 8. Consensus network of 13 works in two clusters.

Once again, the structure is parallel to the hypotheses that certain speeches were not written by Demosthenes but by some other author, probably Apollodorus. Fig. 7 represents all 21 Demosthenic works in the treebank.

The odd behaviour of Against Stephanus 2 (46) and On the Trierarchic Crown (51) is again apparent, with both belonging to a cluster that otherwise includes only public speeches. Nonetheless, the most interesting structure identified by the algorithm remains two large clusters, one located on the left of the diagram, the other at bottom centre. Fig. 8 reduces clutter by showing only those texts belonging to the two clusters.

Just as in the consensus trees (figs 5 and 6), the texts commonly associated with Apollodorus are grouped together, showing strong connections among themselves and practically no edges linking them to texts in the second group. This second group consists of speeches where traditional evidence strongly points to Demosthenes as author.

By now it is clear that the morphosyntactic input features as we have constructed them from the treebank annotations may offer a viable avenue for investigating the authorship of the speeches in question. The next step in our study is to look beyond clusters to the characteristics of individual speeches. To do so, we will move from a method based on simple distance measures to a machine learning algorithm. Specifically, we will use logistic regression classification to refine our understanding of the relationships among the speeches. In addition, we will focus not only on the patterns of classification, but also the reasons underlying them. In other words, we will examine the input features identified as most or least Demosthenic or Apollodoran.

A logistic regression classifier assigns a probability that data, such as a set of input features, are associated with a particular class, such as works by a certain author. Unlike the clustering algorithms that we have relied upon up to this point, logistic regression classifiers are supervised. This term indicates that the user trains the classifier by providing a set of data (the training set). The training set must be labelled with the relevant class information. Thus, in an author attribution experiment, each element of the training set would be labelled with the author. Since it is a machine learning method, the logistic regression classifier learns to identify the various classes in the training set. More specifically, it learns discriminatively, in that it calculates which input features are more or less valuable to distinguish each class in the training set from all other classes.Footnote ³⁴

Because the first step in setting up the classifier is to provide an accurately labelled training set, we immediately face a problem. The usefulness of any result is dependent on the quality of the training set we construct. In particular, we must decide which texts will serve as our model for the style of Demosthenes and which will represent Apollodorus. To reach this decision we have relied upon the opinions of traditional scholarship as well as a computational method. A recently developed clustering algorithm, ‘affinity propagation clustering’, both separates data into groups and identifies an exemplar for each cluster.Footnote ³⁵ The exemplar of a cluster is the sample (i.e. the text) whose features are most representative of the cluster as a whole. For our data set, the algorithm produces a clustering with two major branches quite similar to the patterns in figs 5–9. In addition, it identifies Against Polycles (Dem. 50) as the exemplar of what we may call the Apollodoran cluster. Scholarly consensus assigns Dem. 50 to Apollodorus’ authorship, and Trevett has identified it as the most dissimilar to genuine Demosthenes among those speeches delivered by Apollodorus.Footnote ³⁶ Thus, we assign Against Polycles to our Apollodoran training set. In addition, because we may assume a certain stylometric variation among the texts in our corpus, according to the demands of the occasion and other factors, it will be prudent to include an additional text in this set. Against Callippus (Dem. 52) is also securely Apollodoran since that orator delivered it in 369/8 BCE, too early for Demosthenes (born 384 BCE) to be the author. Accordingly, Dem. 50 and 52 constitute the material on which the Apollodoran model will be based.

As for genuine Demosthenes, the affinity propagation clustering algorithm identifies Against Stephanus 1 (Dem. 45) as the Demosthenic exemplar. Traditionally, Dem. 45 is accepted as genuine.Footnote ³⁷ It is therefore the first text in the Demosthenes training set. As the second text, the genuineness of For Phormion (Dem. 36) is supported algorithmically as well as by contemporary testimony and scholarly opinion. Finally, we include Against Pantaenetus (Dem. 37) as a third genuinely Demosthenic speech. This addition is meant to give our classifier a slight bias in favour of Demosthenes. We prefer a classifier that is conservative by being slightly more resistant to producing false negatives. This bias should make more plausible any results rejecting Demosthenic authorship.

Logistic regression has been shown to be quite reliable with ancient Greek morphosyntactic input.Footnote ³⁸ To validate the classifier used in the present study, we ran a preliminary classification test using all texts in the treebank. To increase difficulty, short texts of 250 randomly selected words were created from all files.Footnote ³⁹ Classification was repeated 20 times. The average accuracy in this 20-class test was 89.2 per cent (median = 89.5 per cent, interquartile range = 81.6 per cent to 95.4 per cent). It is reasonable to expect high accuracy when the input is a full text.

In classifying the works in our test set, we follow what is known as the impostor method.Footnote ⁴⁰ In essence, texts of uncertain authorship are tested against a slate of works of known authorship, including works by those considered the best candidates for authorship of the target works. In the manner of a police line-up, if the classifier chooses the wrong candidate, this is strong evidence that the suspected author is not responsible. On the other hand, if the right candidate is chosen, the positive evidence is only as strong as the composition of the impostor set allows. Impostors that are similar to the target candidate (in genre, intended audience, etc.) allow the greatest confidence.

Our impostor set is limited by the contents of the treebank. It includes: Aeschines 1; Antiphon 1, 2 and 5; Andocides 1.1–75; Isaeus 3; Isocrates 18; and Lysias 1, 12 and 13. To these are added ‘Apollodorus’ (Dem. 50 and 52) and ‘Demosthenes’ (Dem. 36, 37 and 45) to form our training set. Only ‘Apollodorus’ or ‘Demosthenes’ are considered plausible candidates. Other outcomes are considered powerful negative evidence.

Our test set includes those texts not already in the training set that are associated with Apollodorus: Dem. 46, 47, 49, 51, 53 and 59.Footnote ⁴¹ We also include the three other private orations in the treebank for which Demosthenic authorship is secure: Dem. 27, 39 and 41. These will serve as a control group, providing a basis of comparison for our results.

The logistic regression classifier used for this study is published in the LiblineaR package for R.Footnote ⁴² The classifier creates a model for each class in the training set. This model is simply a series of real numbers, the length of the series corresponding to the number of input features in the data. This series of numbers (also called the weights of a class or model) can be usefully thought of as a row in a matrix, such as a spreadsheet. The classifier calculates the probability of a target text belonging to a given class by multiplying the values of input features for that text by the weights for the relevant class; the results are then summed to produce a single real-valued number. This number is also called the inner product or the dot product of the row of input variables and the row of weights. From the dot product is calculated the probability that the text in question belongs to the class being modelled.Footnote ⁴³

We may illustrate with a highly simplified (and fanciful) example. Imagine the hypothetical scenario in which the authorship of book 3 of Thucydides is in doubt. In this imaginary situation, it is decided that a logistic classifier based on Thucydides book 1, Herodotus book 1 and Polybius book 1 can help solve the problem. Such a classifier is then trained on the frequencies of occurrence for the morphosyntactic input features in these three authors. In this toy example, we imagine only five independent variables (input features):

• var. 1: a word precedes its parent in the linear order of the sentence
• var. 2: a word is plural
• var. 3: a word has a dependency distance greater than 4
• var. 4: the parent of a word is annotated as a coordinator
• var. 5: the parent of a word is annotated as a coordinated main verb

The resultant classifier contains a model for each of the three authors, which is simply a row of positive or negative scores (weights) for each input feature considered.

Positive weights indicate that a greater frequency of the input feature is associated with a higher probability for that model, and contrariwise for negative weights. The relevant frequencies for Thucydides book 3 are: var. 1 = 0.59, var. 2 = 0.34, var. 3 = 0.24, var. 4 = 0.19, var. 5 = 0.09. The classifier applies the models by multiplying the value of the input features by the corresponding weight in each row and then summing the result:

$${\rm Thuc.: \ }\left( {0.59\, {\rm{*}} \,0.43} \right) + \left( {0.34\,{\rm{*}}\,0.59} \right) + \left( {0.24\,{\rm{*}}\,0.39} \right) + \left( {0.19\,{\rm{*}}\,0.32} \right) + \left( {0.09\,{\rm{*}}\,0.25} \right) = $$

$$0.25 + 0.20 + 0.094 + 0.06 + 0.02 = {\bf 0.62}$$

$$\small {{\rm Hdt.:\ }\left( {0.59\,{\rm{*}}\,- 0.42} \right) + \left( {0.34\,{\rm{*}}\, -0.73} \right) + \left( {0.24\,{\rm{*}}\, -0.49} \right) + \left( {0.19\,{\rm{*}}\, -0.27} \right) + \left( {0.09\,{\rm{*}}\,0.05} \right) = }$$

$$ \small {- 0.25 - 0.25 - 0.12 - 0.05 + 0.004 = {- \bf 0.66}}$$

$$\small {{\rm Polyb.:\ }\left( {0.59\,{\rm{*}}\,- 0.18} \right) + \left( {0.34\,{\rm{*}}\,0.05} \right) + \left( {0.24\,{\rm{*}}\, -0.16} \right) + \left( {0.19\,{\rm{*}}\, -0.1} \right) + \left( {0.09\,{\rm{*}}\, -0.32} \right) = }$$

$$\small { -0.11 + 0.02 - 0.04 - 0.02 - 0.03 = {\bf - 0.18}}$$

The results for ‘Thucydides’ are the highest (and only positive) value, and the classifier therefore assigns our ‘questionable’ text to that author.

Despite the relative simplicity of the logistic regression classifier, as demonstrated in this toy example, this method has been proven successful for many years across a range of subjects. Logistic regression remains in wide use even in the face of competition from an ever-increasing group of more complex methods. It is clear that the weights are the essential part of a logistic regression classifier and serve to separate this method from those based on simple distance measures. Crucially, the logistic classifier’s machine learning algorithm learns how to adjust the weights in order to distinguish each class from all others in the training set.Footnote ⁴⁴ As we have seen, the weights for some input features are set relatively high, indicating that higher frequencies for these features make the target class more probable. Weights for other input features are set as negative numbers of relatively large magnitude when larger values for those features make the target class less likely. Other weights are set very near zero, discounting the corresponding input features in the probability calculation. This ability to learn the best weights to discriminate the classes can make logistic regression a very effective method of classification. The details of how the classifier sets a value on each weight is the most technical part of the algorithm and would take us beyond the scope of this paper. Many good explanations of this process are widely available.Footnote ⁴⁵

IV. Results and discussion

The results of the classification of our test set are presented in fig. 9.Footnote ⁴⁶

Fig. 9. Results of classification by logistic regression.

The classifier’s handling of the control group may allow us some confidence. Given the information available, the algorithm considers no one other than the author of Dem. 36, 37 and 45 (the training set) as a likely author of Dem. 27, 39 or 41. When possible, it is prudent to interpret the probabilities reported by logistic regression in light of the associated dot products.Footnote ⁴⁷ The dot product represents the model’s estimation of the likelihood that the test in question belongs to the associated class, as opposed to all other classes taken together.Footnote ⁴⁸ In the case of Dem. 27, 39 and 41, the probabilities and the dot products are consistent with each other, and this agreement indicates that the classifier’s attribution is strong.

The second clear and unequivocal result evident in fig. 9 is that logistic regression based on morphosyntax supports the general view of scholars that (with the exception of Dem. 45) the speeches delivered by Apollodorus or otherwise associated with him were not written by Demosthenes.Footnote ⁴⁹ This negative judgement is emphatic, as indicated by the low reported probabilities and the correspondingly large negative magnitudes of the dot products. The classification also offers positive results. Evidently, two speeches from the test set that were delivered by Apollodorus, Dem. 49 and 53, were written by the same person who wrote Dem. 50 and 52 in our training set. The attributions are fairly unambiguous, as is apparent from the data in columns 5 and 6.Footnote ⁵⁰

On the other hand, while previously published stylometric tests do not give evidence on the possibility of different authorship within the Apollodoran group,Footnote ⁵¹ the morphosyntactic data indicate divisions within that group. The classification results are decidedly against Apollodorus’ authorship for Against Stephanus 2 (Dem. 46), although it was delivered by that orator and is accepted as Apollodoran by Trevett. Scholars have tentatively assigned Against Evergus and Mnesibulus (Dem. 47) to Apollodorus on the grounds of style. The logistic classifier makes the same attribution, but the relatively small dot product (-0.18) does not constitute strong support for that decision. There was an ancient tradition that held that Apollodorus was the author of On the Trierarchic Crown (Dem. 51), but much modern scholarship comes down firmly in favour of Demosthenic authorship.Footnote ⁵² The evidence from the classifier is inconsistent with either view. Both dot products are unusually large negative values (‘Demosthenes’ -6.83, ‘Apollodorus’ -17.4) making the probability that Dem. 51 belongs to either training class practically zero. Finally, Against Neaera (Dem. 59) is a speech that was, for the most part, delivered by Apollodorus, and its author is identified by modern scholarship as ‘almost certainly’ that orator.Footnote ⁵³ The logistic classifier disagrees, although the small magnitude of the relevant dot product (-0.65) indicates that there is room for doubt.Footnote ⁵⁴

These results are plausible in their general outline, in that they more or less reproduce the general consensus about Demosthenic and non-Demosthenic groups of speeches. The results are intriguing since they seem to offer new information about subdivisions within the non-Demosthenic group. However, students of Demosthenes are not likely to find these attributions useful without further information about the precise bases of the classifier’s decisions. Fortunately, logistic regression provides access to the details of each of the class models used for classification. We will now turn to an examination of the morphosyntactic features that contribute most significantly to the logistic models.

The most straightforward way to understand the authorial models produced by the classifier is a careful analysis of the weights for each class. Generally speaking, the greater the magnitude of the weights (positive or negative), the more important the associated input features for attribution of the relevant class. In the analysis that follows, only a select few input features will be discussed. There are hundreds of large-magnitude weights in each model in the classifier, and a detailed investigation of a large fraction of them is far beyond the space available here. However, even in this limited analysis, it is important to understand that morphosyntactic input features such as those used in our study are extensively collinear, and necessarily so. Roughly, collinear features are statistically interdependent, and morphosyntax is rife with such relationships. For example, if a word in our data set has a tense, it will also have a voice, be a verb, etc. Collinearity complicates the analysis of weights and features and must be carefully considered for each example.Footnote ⁵⁵

We will first examine the basis of the most general level of attribution in our experiment, that between the Demosthenic and Apollodoran groups. We may conveniently do this by focusing on those input features where the model weights for ‘Demosthenes’ and for ‘Apollodorus’ differ most greatly. It may come as a surprise that by this measure the most important distinction occurs at the simplest possible level of grammar: the frequency of words annotated with the neuter gender. For this input feature (target word is neuter) the difference in weights between two models is 3.44 (‘Demosthenes’ 2.91, ‘Apollodorus’ -0.52), making this input feature a good discriminator (to use a convenient term from the attribution literature) to distinguish ‘Demosthenes’ from ‘Apollodorus’. As expected, the weights reflect the frequencies of the relevant phenomenon in the training set. In the ‘Demosthenes’ training texts, the frequency of a neuter target word, considering only those words annotated for gender, is 0.311. In contrast, the ‘Apollodorus’ training texts show a frequency of 0.214. This is an unusually large difference for such a common grammatical category.

The input feature with the second largest difference between the ‘Demosthenes’ and the ‘Apollodorus’ weight is a good example of the collinearity of morphosyntactic data. The weight in question is for the feature ‘target word is neuter plural’, where a large positive weight for ‘Demosthenes’ corresponds to a small negative weight for ‘Apollodorus’. These weights are consistent with the frequency of neuter plural words in the training set. Among words annotated for both gender and number, the value for ‘Demosthenes’ is 0.179 and that for ‘Apollodorus’ 0.103. This difference is not surprising; given the great discrepancy in the frequency of neuter words between training texts for the two authors, we would expect many input features constructed from combinations including neuter gender to reflect this imbalance.Footnote ⁵⁶ At the same time, we cannot assume that the discrepancy of the frequency of neuter plural words is due entirely to this collinearity. If we control the data for the variable ‘neuter’ by considering only words of that gender, we find that the difference in frequency remains large. Among words annotated as neuter, the ‘Demosthenes’ training set has a plural frequency of 0.575; for the ‘Apollodorus’ texts, this value is 0.480. Thus, the relative preference of ‘Demosthenes’ for neuter plural words is a genuine element of the style of the speeches used to train the classifier and not a ‘statistical artefact’ arising from the design of the input variables.

The two input features that we have examined so far involve only shallow grammatical information: gender and number. The next good discriminator is based on the deeper syntactic information associated with dependency annotation. This input feature is the frequency of words whose dependency parent has present tense and active voice. Considering only words annotated for the two relevant categories, the values for this combination are ‘Demosthenes’ 0.357, ‘Apollodorus’ 0.263. Again, this is quite a significant difference, considering the number of words involved. It is clear that this input feature corresponds to an interesting and important stylistic distinction.

We have looked at the three weights in the classifier that most sharply favour an attribution of ‘Demosthenes’ over ‘Apollodorus’; we will now merely touch upon three more to try to give an impression of the range of stylistic information forming the basis of the algorithm’s attributions. Two of these weights concern types of dependency relationship. The first is the frequency of words that depend on the main verb of a sentence (labelled PRED in the AGLDT). In ‘Demosthenes’ the frequency is 0.163, in ‘Apollodorus’ 0.119. Next is the frequency of words whose parent has been annotated as an attribute (ATR). The dependency annotation of the AGLDT is functional, in that it assigns syntactic labels according to the role played by a word in the clause or sentence rather than by POS, etc. Thus, ATR is essentially the dependent modifier of a noun, and this function is commonly performed by both nominal forms and verbs (a participle or conjugated verb as head of a relative clause). The relevant frequencies are ‘Demosthenes’ 0.062 and ‘Apollodorus’ 0.048. Our final example of pro-‘Demosthenes’ weights takes into consideration an aspect of word order: the frequency of dependents whose parent word precedes its own parent. The values for this input feature are ‘Demosthenes’ 0.437, ‘Apollodorus’ 0.385.

The examination of the classification models has so far focused on weights and input features that point to Demosthenic authorship. The phenomena selected by the algorithm as discriminators in favour of ‘Apollodorus’ as against ‘Demosthenes’ are also important. We will present these examples as a list (fig. 10), with a minimum of discussion.Footnote ⁵⁷

Fig. 10. Selected input features identified by the model as pro-‘Apollodoran’ discriminators.

To this point, we have explored the classifier’s models in the abstract. Model weights represent a set of generalized discriminators calculated to distinguish each class in the training set from all others. We will now move from the general to the more specific and look at a selection of input features and the effects they have on the actual attribution problem that is the subject of this paper. Figs 11 and 12 concern speeches in our experiment that are attributed to ‘Apollodorus’. Fig. 11 presents some features that are important in pointing the classifier towards ‘Apollodorus’ while fig. 12 lists features that point away from ‘Demosthenes’.Footnote ⁵⁸

Fig. 11. Selected input features favouring attribution to ‘Apollodorus’: Against Evergus and Mnesibulus (Dem. 47), Against Timotheus (Dem. 49), Against Nicostratus (Dem. 53).

Fig. 12. Selected input features against attribution to ‘Demosthenes’: Against Evergus and Mnesibulus (Dem. 47), Against Timotheus (Dem. 49), Against Nicostratus (Dem. 53).

Many of the input features in these tables are morphological (for example, ‘Is masculine’) and therefore self-evident. Others may be less clear, and we give a few selected examples.

Parent follows its own parent: our example tree from fig. 1 contains the sequence ἐσκήνωσαν οὖν οἱ υἱϵῖς οἱ Κόνωνος (‘the sons of Conon set up camp’). The noun υἱϵῖς is the parent of three words here, οἱ, οἱ and Κόνωνος. ἐσκήνωσαν, in turn, is the parent of υἱϵῖς. Because υἱϵῖς follows ἐσκήνωσαν, the three dependents of υἱϵῖς are counted as examples of ‘parent follows its own parent’.

Is OBJ: the category ‘object’ is the annotation applied to the second and/or third arguments of most verbs.Footnote ⁵⁹ This single functional category includes a range of different grammatical phenomena, considered morphologically or semantically. In addition to the expected direct and indirect objects, there are many expressions such as ἐπισκηψάμϵνος ταῖς μαρτυρίαις (‘having brought an action against the testimony’, Dem. 47.1). Here μαρτυρίαις is annotated as the OBJ of the participle, which requires the dative as part of the verb ‘frame’ of ἐπισκήπτω (i.e. it is an argument of the verb) in the context of Athenian legal terminology (LSJ s.v. ἐπισκήπτω, III). It is also important to remember that dependency grammar assigns syntax labels by function rather than by word form. Thus, alongside nouns in various cases, verbal structures may also be annotated as OBJ. Infinitive clauses with this structure are very common, for example, at 47.2 ἵνα μὴ … ἀποτρέπωνται διώκϵιν (‘so that they may not be deterred from prosecuting …’), διώκϵιν is an OBJ of ἀποτρέπωνται. Instances of indirect discourse are also frequently categorized as the OBJ of an appropriate verb. For example, at 47.10 ᾔδϵσαν ὅτι … ἐξϵλϵγχθήσονται ἀδικοῦντϵς (‘they knew that they would be proved guilty’), ἐξϵλϵγχθήσονται, the verb of the indirect statement, is the OBJ of ᾔδϵσαν. Less expected structures also occur, as at 47.3, πιστϵύσαντϵς οἷς ἂν οὗτοι μαρτυρήσωσιν (‘relying on whatever testimony they might give’), where οἷς … μαρτυρήσωσιν is an autonomous relative clause functioning as the second argument of πιστϵύσαντϵς. Thus, μαρτυρήσωσιν, as the verb of the relative clause, is annotated as an OBJ dependency of πιστϵύσαντϵς.

Parent is ADV: This category is more complicated than it may at first seem. One must keep in mind that ADV indicates the function ‘adverbial’ and is not restricted to the adverb as POS. It is in fact quite rare that a morphological adverb has a dependency and is associated with the ‘Parent is ADV’ input variable. For practical purposes, the dependencies of morphological adverbs are limited to negative and intensifying particles, for example οὐκ ἄλλοθϵν (‘not from elsewhere’, 47.39) and καὶ νῦν (‘even now’, 47.17). In contrast, adverbials that are morphologically nominal are regularly further specified by dependencies. This is true whether or not the ADV is the object of a preposition. A prepositional example is 47.8, ἔϕη γὰρ ἐν τῇ δίκῃ τῆς αἰκϵίας (‘for he said during the trial for assault’); here the direct dependencies of the ADV δίκῃ are τῇ and αἰκϵίας. Without a preposition is 47.76, τὸν μὲν ἄλλον χρόνον ἀνέμϵνϵν (‘he waited the rest of the time’); τὸν and ἄλλον specify χρόνον as its direct dependencies.

By far the most frequent sources of the ‘Parent is ADV’ variable are clausal structures.Footnote ⁶⁰ Part of the first sentence in Dem. 47 is sufficient to give some idea of the range of such constructions. The speaker points out the excellence of the laws allowing actions based on false testimony:

… ἵνα, ϵἴ τις μάρτυρας τὰ ψϵυδῆ μαρτυροῦντας παρασχόμϵνος … ἐξηπάτησϵν τοὺς δικαστάς, μηδὲν αὐτῷ πλέον γένηται …

… so that if anyone, by producing witnesses offering false testimony … has deceived the jurors, he may gain no advantage …

Here γένηται, the verb or the final/result clause, is naturally annotated as adverbial; the direct dependencies of γένηται are μηδὲν and πλέον. γένηται is itself modified by an adverbial clause, in this case the conditional ϵἴ … ἐξηπάτησϵν. The direct dependents of ἐξηπάτησϵν are τις, παρασχόμϵνος and δικαστάς. Even more numerous than finite verbs in adverbial clauses are adverbial uses of the participle.Footnote ⁶¹ The example here is the circumstantial participle παρασχόμϵνος. Given the fact that dependency grammar assumes that most verbs regularly have mandatory dependents (i.e. arguments) while most nominals do not, verbal structures most often predominate as the sources of many ‘Parent is X’ input variables.Footnote ⁶²

To return now to the details of figs 11 and 12, in these tables the first three numeric columns present the frequency of the input features in the target texts. The centre column contains the frequencies for the two training sets. The final three columns present the product that results from multiplying the frequencies by the associated weights. Observe that the values in the final columns are most often of lesser magnitude than the frequencies on which they are based. The algorithm constrains the size of the weights to prevent a single input feature (or a small group of features) from predominating the classification. Therefore, relatively few weights are larger than 1.0 and most are significantly smaller.

A detailed analysis of even this limited set of data is beyond our scope here. We present these tables primarily to further demystify the workings of the classifier, as well as to give a glimpse of the range of morphosyntactic information that underlies the attributions. If one examines the training frequencies given in the centre column of fig. 11, the basis of the significant positive weights that the ‘Apollodorus’ model assigns to these input features becomes clearer. For all the listed features, the frequency in the ‘Apollodorus’ training set is higher than the ‘Demosthenes’, often distinctly so. Thus, the algorithm considers texts with higher values in these features to be more ‘Apollodoran’, at least with regard to the relevant morphosyntactic characteristics. Indeed, we can see that the frequencies listed for the target texts are generally higher than the ‘Demosthenic’ frequency and often higher even than the value for ‘Apollodorus’. Combined with the model weights, these relatively high frequencies contribute to the classifier’s score in favour of ‘Apollodoran’ attribution.

The data in fig. 12 can be understood similarly, but with an important modification. Once more, the frequencies in the centre column are generally larger for ‘Apollodorus’ than for ‘Demosthenes’. This time, however, the weights are drawn from the ‘Demosthenes’ model, and it is clear from the last three columns that these weights are negative. Thus, the greater the frequency of a feature in the target texts, the greater the negative magnitude of the corresponding score. From the perspective of the ‘Demosthenes’ model, high frequencies of such input features are evidence that the possibility ‘Demosthenic’ authorship should be rejected.

Figs 13 and 14 present data for the three speeches in the test set that the classifier assigns to neither ‘Demosthenes’ nor to ‘Apollodorus’ but to a so-called impostor. The same input features are listed in fig. 13 as in 11, since both tables concern the positive weights in the ‘Apollodorus’ model. Fig. 14 presents input features with significant negative scores according to the ‘Apollodoran’ model.

Fig. 13. Selected input features in favour of attribution to ‘Apollodorus’: Against Stephanus 2 (Dem. 46), On the Trierarchic Crown (Dem. 51), Against Neaera (Dem. 59).

Fig. 14. Selected input features against attribution to ‘Apollodorus’: Against Stephanus 2 (Dem. 46), On the Trierarchic Crown (Dem. 51), Against Neaera (Dem. 59).

The inclusion of the same input features in figs 11 and 13 allows a useful comparison. Dem. 46, 51 and 59 consistently have frequencies below those found in Dem. 47, 49 and 53. Dem. 46 has lower frequencies in 80/93 possible comparisons. The ratio for Dem. 51 is 76/93 and for Dem. 59 it is 55/93. When we compare the three files to the frequencies in the training set, we see that Dem. 46 is lower 30/31 times, Dem. 51 29/31 and Dem. 59 21/31. Since these input features are associated with significant positive weights by the ‘Apollodoran’ model, it is clear that, if these data represent a general phenomenon, the classifier will calculate a relatively low positive score for Dem. 46, 51 and 59. In fact, the relevant positive scores are Dem. 46 = 28.4, Dem. 51 = 23.1 and Dem. 59 = 31.9. By comparison, the positive scores for the texts attributed by the classifier to ‘Apollodorus’ are Dem. 47 = 32.9, Dem. 49 = 38.9 and Dem. 53 = 33.0. It is also noteworthy that the data from Dem. 51 behave differently to those from the other two speeches, showing a sharply lower positive score than any of the comparanda.

Fig. 14 tells roughly the same story. Here the data represent input features that have been assigned negative weights by the ‘Apollodorus’ model. Larger frequencies in the target texts correspond to a larger negative score and a greater probability that the relevant class should be rejected by the classifier. Thus, it is important to observe that these input features tend to be more frequent in the three speeches under examination than in the training set. For Dem. 46, 12/15 features are more frequent; for Dem. 51, the value is 11/15 and for Dem. 59, 10/15. As might be expected, this phenomenon results in a relatively large negative score for each text. Dem. 46 = -35.65, Dem. 51 = -40.57 and Dem. 59 = -32.56. These numbers may be compared to those for the texts considered ‘Apollodoran’ by the classifier: Dem. 47 = -33.07, Dem. 49 = -31.2 and Dem. 53 = -30.6. Once again, On the Trierarchic Crown (Dem. 51) differs sharply from all other texts under consideration.

The outlier status of Dem. 51 is not surprising. The scores that we have mentioned in the two previous paragraphs are calculated from the input feature frequencies and the ‘Apollodoran’ logistic model. While Trevett reports that this speech and Dem. 47 ‘have been associated with’ the Apollodoran orations, he rejects this association on stylistic grounds.Footnote ⁶³ The evidence of the morphosyntactic data used in the present study strongly supports this rejection. On the other hand, Trevett finds Dem. 51 ‘thoroughly Demosthenic is style’ and attributes it to that orator. Victor Bers, dismissing the doubts of earlier scholars, also prefers an attribution to Demosthenes, based on his view that the ‘polish and vigor of the writing seem consistent with speeches unquestionably written by Demosthenes’.Footnote ⁶⁴ Our morphosyntactic evidence is strongly at odds with this view. A brief selection of some of this evidence is presented in figs 15 and 16.

Fig. 15. Selected input features in favour of attribution to Demosthenes: On the Trierarchic Crown (Dem. 51).

Fig. 16. Selected input features against attribution to ‘Demosthenes’: On the Trierarchic Crown (Dem. 51).

Frequencies such as those in figs 15 and 16, when multiplied by the weights in the ‘Demosthenes’ model, produce scores that are inconsistent with a Demosthenic attribution. The positive ‘Demosthenes’ score for Dem. 51 is 37.49, the negative score is -44.30. These values should be understood in the context of the corresponding numbers produced from the evidence of the three speeches that we have used as a Demosthenic control group (27, 39 and 41). These values are Dem. 27 = 40.37 and -38.10, Dem. 39 = 42.61 and -38.72, and Dem. 41 = 42.63 and -38.40. Because these three speeches are generally accepted as genuine, Dem. 51’s relatively low positive score (37.49, against a mean of 41.87) and large negative score (-44.3 ∼ -38.41) seriously undermine the plausibility that On the Trierarchic Crown (Dem. 51) was written by Demosthenes.

V. Conclusion

In closing, we will summarize the results of our study in concrete terms: what traditional attributions do our data support and to what extent? Most generally, of the speeches in our test set, all of those delivered by Apollodorus (Dem. 46, 49, 53 and 59) are very unlikely to have been written by Demosthenes. For each of these texts, the dot product of the morphosyntactic frequencies and the weights of the ‘Demosthenes’ model are negative numbers, sometimes of great magnitude. Large negative values correspond to extremely low probabilities that the texts belong to the model class. So far, the attributions calculated by the logistic classifier agree with the communis opinio of scholars.

On the other hand, the evidence of the morphosyntax differs from the scholarly consensus about authorship within the group of speeches that Apollodorus delivered. The classifier divides these works into three categories. First, the logistic regression model strongly supports the view that Against Timotheus (Dem. 49) and Against Nicostratus (Dem. 53) belong to the class ‘Apollodorus’. Thus, the morphosyntactic data confirm that Dem. 49 and 53 were written by the same person as Dem. 50 and 52, presumably Apollodorus. In contrast, the evidence is decidedly against the Apollodoran authorship of Against Stephanus 2 (Dem. 46). The dot product score produced by the ‘Apollodorus’ model is very low at -7.2, suggesting that the odds that Dem. 46 belongs to the same class as Dem. 50 and 52 (the training set) are worse than 1/1,000. The classifier’s attribution of Against Neaera (Dem. 59) is more ambiguous. The dot product score is negative, but of small magnitude: -0.65. This number indicates that input feature frequencies in Dem. 59 are much more similar to the training set speeches than are the frequencies in, for example, Dem. 46. All the same, the possibility that Apollodorus wrote Dem. 59 is undercut by the classifier’s attribution of this speech to an impostor, ‘Aeschines’. The fact that Dem. 59 is, with regard to important input features, closer to Aeschines’ Against Timarchus (Or. 1) than to the speeches in the ‘Apollodorus’ training set should make one very cautious about the authorship of the Against Neaera.

There remain the two speeches sometimes associated with Apollodorus, although not delivered by him. Against Evergus and Mnesibulus (Dem. 47) is characterized by Trevett as ‘problematic’, in that ‘its style is very similar’ to that in the other texts in the Apollodoran group.Footnote ⁶⁵ Our classification experiment does little to make the situation clearer. The ‘Apollodoran’ dot product for this speech is negative, but close to zero. From the point of view of the input feature frequencies, then, the odds that Dem. 47 is a member of class ‘Apollodorus’ are roughly 50/50. The traditional arguments on the authorship of this work remain unaffected by the evidence of the present study. In contrast, the morphosyntactic evidence is emphatic in the case of On the Trierarchic Crown (Dem. 51). The dot product generated by the ‘Demosthenes’ model is negative and quite large (-6.83). Thus, against the prevailing view of scholars of the Demosthenic corpus, it seems doubtful that Dem. 51 can be genuine.

By way of conclusion, a few remarks on the limitations of the methods of our investigation are in order. It should be evident to the careful reader that attribution by logistic regression is only as reliable as the chosen training set allows. For any author, it is reasonable to assume that the countless features that constitute their style will vary significantly according to many factors: genre, subject matter, intended audience, developing/declining skill, care/inattention, etc. Optimally, we would control for such variety of what we may call externalities,Footnote ⁶⁶ by including in the training set examples of all possible styles of the authors whose work we seek to classify. This ideal situation is of course impossible for almost all ancient authors. For Apollodorus, the matter is worse even than usual, since the texts are few and the authorship of each is uncertain. Thus, the narrow ‘Apollodoran’ training set certainly admits the possibility that a speech such as Dem. 59 might be misclassified by the algorithm due to insufficient information about Apollodorus’ stylistic range, given some change in externalities.

Offsetting this limitation is the great advantage that the input features used in this study are based on annotations in publicly available treebanks. Suppose, for example, one suspects that the ‘un-Demosthenic’ character of the morphosyntax of Dem. 51 arises because the speech was written for delivery before the boulē rather than the courts. All the evidence used by the classifier is available for exploration, and the case can be argued on that basis. Thus, in addition to adding a new perspective from which to view the complexities of the Corpus Demosthenicum, this paper has sought to demonstrate the general value of dependency annotation and the ancient Greek treebanks.

Footnotes

¹ Trevett (Reference Trevett2019). Dilts (Reference Dilts2002–2009) argues against Demosthenic authorship for 25 of the orations in the corpus.

² Trevett (1992) 50–74, (Reference Trevett2019) 419–22.

³ Trevett (Reference Trevett1992) 73; cf. Scafuro (Reference Scafuro2011) 8–9.

⁴ McCabe (Reference McCabe1981).

⁵ Surveys of linguistic phenomena used in stylometric investigations are Stamatatos (Reference Stamatatos2009); Swain et al. (Reference Swain, Mishra and Sindhu2017); Lagutina et al. (Reference Lagutina, Lagutina, Boychuk, Shliakhtina and Paramonov2019).

⁶ Gorman (Reference Gorman2020). Her GitHub repository of annotated dependency trees of ancient Greek prose (https://github.com/vgorman1/Greek-Dependency-Trees) contains more than 670,000 tokens as of July 2022.

⁷ De Marneffe and Nivre (Reference de Marneffe and Nivre2019).

⁸ Happ (Reference Happ1976) 318 traces parallels for several features of a full-blown dependency syntax to a ‘Frankfurtischen Gelehrtenverein für deutsche Sprache’ at the beginning of the 19th century. Students of Greek and Latin will recognize the name of at least one member of this learned society, Raphael Kühner. Kühner’s fundamental works on the grammar of the classical languages gave rise to a ‘praktisch Dependenz-Grammatik des Griechischen und Lateinischen’ (332) that is still influential in the German tradition. The classical languages’ general suitability for dependency analysis can be seen by its use by Harm Pinkster (Reference Pinkster2015; Reference Pinkster2021) in the monumental Oxford Latin Syntax.

¹⁰ All translations are our own unless noted otherwise.

⁹ Bamman and Crane (Reference Bamman and Crane2011).

¹¹ Lagutina et al. (Reference Lagutina, Lagutina, Boychuk, Shliakhtina and Paramonov2019). An n-gram is a contiguous sequence of textual elements, where the n stands for the number of elements considered. For example, the expression ‘one of our own’ contains the bi-grams (also ‘2-grams’) # one, one of, of our, our own, own # (# indicates the beginning or end of a sentence or expression). Surprisingly, character n-grams (essentially sequences of letters rather than words) have been found to be very successful input features for authorship attribution.

¹² Eder and Górski (Reference Eder and Górski2022) 10 report ‘significantly worse performance’ for syntax-based features.

¹³ For example, Eder and Górski (Reference Eder and Górski2022)

¹⁴ Gorman and Gorman (Reference Gorman and Gorman2016); Gorman (Reference Gorman2019); (Reference Gorman2022).

¹⁵ Attribution studies based on lexical features most often lemmatize all words. As a consequence, most information about morphology is lost.

¹⁶ As we will see, this annotation allows us to consider word order and related phenomena in the stylometric analysis.

¹⁷ POS, person, number, tense, mood, voice, gender, case and degree of comparison. Of course, no single Greek word has a value in all morphological categories.

¹⁸ Ignoring matters of linear order, the various annotations of Κόνωνος and υἱϵῖς allow for 1,023 different ways to combine these values. We discuss the implications of the so-called combinatorial explosion below. Note that, for convenience, values for a parent word may be included in the input feature of the dependent word, but values for a dependent word are not used in input features for a parent word. The rules of the AGLDT allow only one parent per word, while each parent can have many dependencies. Thus, adding to the dependent word information about the parent word is more straightforward than vice versa.

¹⁹ PRED is the label for the main verb in a sentence.

²⁰ Varela et al. (Reference Varela, Albonico, Justino and Bortolozzi2018). Eder and Górski (Reference Eder and Górski2022) experiment with breaking these compound features into at least some of their component parts.

²¹ For example, we have explained that there is no input feature that will indicate that Κόνωνος is a genitive singular masculine noun. However, Κόνωνος will be associated with a series of simplex input features identifying the word as noun, as masculine, etc. The duplex and triplex input features will also contain partial information. Thus, information about all four of these morphological categories will be available indirectly to the classifier.

²² Lui (Reference Lui2008); Chen et al. (Reference Chen, Deng and Liu2021); Ferrer-i-Cancho et al. (Reference Ferrer-i-Cancho, Gómez-Rodríguez, Esteban and Alemany-Puig2022).

²³ Futrell et al. (Reference Futrell, Levy and Gibson2020).

²⁴ Since DD may serve as a proxy for sentence complexity or difficulty, input features based on DD may have a particular stylistic relevance. For example, one might expect a greater frequency of high DD values in a speech delivered by its author than in a logographic text. A detailed discussion of modern research on DD as it may pertain to ancient Greek prose can be found in Vatri (Reference Vatri2017) 137–59. We thank the anonymous reader for this aspect of DD.

²⁵ Computational studies based on treebank data for DDir tend to be focused on typological questions rather than stylistics ones; for example, Chen and Gerdes (Reference Chen and Gerdes2017).

²⁶ With the addition of the DD and DDir for both target and parent word, the total number of data category combinations is 2,324.

²⁷ Because several type-value pairs had the cut-off frequency and all such were included, the actual minimum percentage is slightly lower.

²⁸ Moisl (Reference Moisl2020) 406.

²⁹ This particular clustering algorithm calculates the distance between two clusters as the average distance between the elements of one cluster and the elements of the other. Here Polybius 1 is 83.4 from Hell. 1 and 109 from Hell. 2, so the distance between the clusters is 96.2.

³⁰ Among possible parameters are a range of distance measures (Euclidean, Manhattan, cosine, etc.) as well as different methods of calculating distances between clusters made up of multiple points (average, centroid, etc.).

³¹ A helpful procedure is outlined by Eder (Reference Eder2015). Eder et al. (Reference Eder, Rubicki and Kestemont2016) provide very useful code for the R software environment.

³² Trevett (Reference Trevett1992) 73.

³³ Eder (Reference Eder2015). We used the same sampling system as for the consensus trees: 1,472 iterations of input feature sets of increasing number. Six distance matrices were produced from each iteration using different distance measurements: Euclidean, Manhattan, Burrow’s Delta, Eder’s Delta, cosine and Wurzburg.

³⁴ Jurafsky and Martin (Reference Jurafsky and Martin2021) offer a very good introduction to logistic regression in chapter 5.

³⁵ For details of the algorithm, see Frey and Dueck (Reference Frey and Dueck2007). We used the implementation for R produced by Bodenhofer et al. (Reference Bodenhofer, Kothmeier and Hochreiter2011).

³⁶ Trevett (Reference Trevett1992) 69–70.

³⁷ Trevett (Reference Trevett1992); Dilts (Reference Dilts2009). Dem. 45 concerns a dispute between Apollodorus and a certain Phormion. Trevett discusses the ancient accounts which accused Demosthenes, in his role of logographer, of working both for and against Phormion. Dem. 36 is the speech Demosthenes wrote (and perhaps delivered) on Phormion’s behalf. If Demosthenes composed a speech against Phormion as well, among the extant works that speech could only be Dem. 45. It is noteworthy that Trevett’s overview of stylometric characteristics (his own and previous calculations) finds Dem. 45 to be strongly similar to genuine speeches of Demosthenes.

³⁸ Gorman (Reference Gorman2019).

³⁹ Dividing the texts is necessary for supervised classification, because many authors in the treebank are represented by only one text. Texts were partitioned by random sampling (without replacement) to populate separate training and testing sets. The classifier was trained on 90 per cent of the texts; 10 per cent were withheld for testing.

⁴⁰ Koppel and Winter (Reference Koppel and Winter2014).

⁴¹ Trevett (Reference Trevett1992) 50.

⁴² Hellputte (Reference Helleputte2021).

⁴³ Strictly speaking, logistic regression calculates probability from the dot product plus an additional term called the ‘bias’ or ‘intercept’. This value is converted to a probability through the logistic function: ${1 \over {1 + {e^{ - z}}}}$ where z is the dot product and e is Euler’s Number. When z = 0, the probability is 0.5, indicating that the input features provide no effective basis for the classification. A positive dot product corresponds to a probability greater than 0.5 and a negative value to a probability lower than 0.5.

⁴⁴ In other words, the LiblineaR package classifier used the ‘one versus the rest’ (OVR) method.

⁴⁵ The best starting point is chapter 5 of Jurafsky and Martin (Reference Jurafsky and Martin2021).

⁴⁶ Any probability below 0.001 has been rounded down to zero. Dealing with the various probabilities produced by a logistic regression classifier can be confusing. The algorithm produces both individual and comparative probabilities. The latter are those shown in fig. 9. They are based on the requirement that the algorithm must choose an author from the set of labels provided, even if the odds are very small that any of those labels are correct. The classifier attributes a class by picking that with the highest dot product for a target text. Comparative probabilities are proportional to the various dot products of all the classes.

⁴⁷ Although in this paper we focus on the dot product as a more transparent calculation, we also include the probability as derived from the dot product. Reporting probability is usual in stylometric classification studies. In addition, the probability helps contextualize the dot product. For example, in fig. 9, the probability that Dem. 46 belongs to the class ‘Demosthenes’ is 14 per cent, while the relevant dot product is -4.81. We would not expect such a negative dot product to correspond to a significant probability. Thus, we can assume that support for Dem. 46 is weak across all classes considered. This is in fact the case, since no model produces a positive dot product, but the values range from -9.4 (‘Antiphon’) to -3.3 (‘Andocides’).

⁴⁸ So the unadjusted probability that Dem. 41 is from the class ‘Demosthenes’ is given by ${1 \over {1 + {e^{ - 4.22}}}}$ = 0.985. Because the probabilities of all classes must be positive, the algorithm here reduces the reported likelihood so that, taken together, the probabilities of the eight classes sum to 1.

⁴⁹ Of course, the claims about authorship in the following paragraphs are subject to the proviso ‘if the results can be relied on’. We will discuss strengths and weaknesses of our classification experiment in the final section of this article.

⁵⁰ The logistic model for ‘Apollodorus’ as applied to Dem. 49 generates a positive dot product of unusual magnitude (7.69). This value represents an unadjusted likelihood of 0.999. However, Dem. 49 also includes input features that produce a non-trivial dot product from the ‘Isaeus’ model (-0.357, an unadjusted probability of 0.41). Considering that the ‘Isaeus’ model is based on the annotation of a single speech, it is reasonable to withhold doubts until further data become available.

⁵¹ Trevett (Reference Trevett1992) 70.

⁵² Trevett (Reference Trevett1992) 73; Bers (Reference Bers2003) 39–40 summarizes the modern view: ‘The polish and vigor of the writing seem consistent with speeches unquestionably written by Demosthenes’.

⁵³ Bers (Reference Bers2003) 151. For a detailed overview of the evidence for Apollodoran authorship, see Kapparis (Reference Kapparis1991) 19–27.

⁵⁴ Deciding when a dot product is convincingly high or low is a rather subjective proposition. On the one hand, every dot product in a logistic regression corresponds to a precise value in the logistic probability distribution. Thus, the dot product of the ‘Apollodorus’ model as applied to Dem. 59, -0.65, corresponds to a probability of 0.343 of ‘Apollodoran’ authorship ( ${1 \over {1 + {e^{ - \left( { - 0.65} \right)}}}}$ ). It is likely that most scholars will feel that such a probability is too large to dismiss out of hand. We may also inform this theoretical interpretation with a more empirical perspective. Three texts, Dem. 27, 39 and 41, are included in our test set as controls since they generally accepted as genuine. We see that the ‘Demosthenic’ dot products for these works fall between 2.27 and 4.22. Comparing the magnitudes of these values with Dem. 59’s ‘Apollodoran’ value of -0.65 suggests that the classifier’s rejection of ‘Apollodorus’ is substantially less definitive than the positive attributions for the Demosthenic control texts.

⁵⁵ The presence of so much collinearity is one of the principal reasons we chose to use logistic regression for classification. Logistic regression deals well with interdependent input features. For example, it divides the weights of highly correlated features among all relevant weights. See Jurafsky and Martin (Reference Jurafsky and Martin2021).

⁵⁶ We are only speaking of the creation of input features by the combination of morpheme values, etc. We do not mean to suggest that, at a psycholinguistic level, authorial style develops from the combination of smaller units into larger ones. In statistical studies such as the present article, the arrow of causation often remains unclear, and the greater frequency in ‘Demosthenes’ of neuter words in general could be the result of a stylistic preference for neuter plurals rather than the convergence of two ‘independent’ preferences, one for neuter words and another for plural words.

⁵⁷ The first frequency given in each column is calculated as the fraction per relevant word: i.e. a frequency of 0.615 masculine words means that 61.5 per cent of words annotated with gender (not adverbs, non-participle verbs, etc.) have that trait. The number in parentheses gives the absolute frequency and is calculated as a fraction of all words in the relevant texts.

⁵⁸ The input feature always describes characteristics of the target word unless the dependency parent is specified. OBJ means the dependency relationship object (OBJ_CO is used for an OBJ that is coordinated with another OBJ). This annotation includes a verb’s direct and indirect objects as well as other complements that are required by the verb’s argument frame or valency. ADV means the dependency relationship ‘adverbial’. In addition to indicating the function of adverbs per se, it may also refer to various finite clauses (temporal, causal, etc.) and participles annotated as adverbial to a matrix verb. ATR is the label for a word used as an attribute; it is generally used of dependent adjective, genitive, etc.

⁵⁹ The second argument of copular verbs is called ‘predicate nominal’ (PNOM). For three-place verbs with an object and a complement, the third argument is called ‘object complement’ (OCOMP).

⁶⁰ For example, in Dem. 47, clauses annotated as ADV produce 401 instances of the variable ‘Parent is ADV’. Nominal structures (both prepositional and not) produce 233 such instances.

⁶¹ Of the 401 instances of the variable ‘Parent is ADV’ associated with verbs in Dem. 47, 213 are generated by participles.

⁶² In Dem. 47, for example, the average number of direct dependencies per verb is 1.97; the average for direct dependencies of noun is 1.125.

⁶³ Trevett (Reference Trevett1992) 73.

⁶⁴ Bers (Reference Bers2003) 39.

⁶⁵ Trevett (Reference Trevett1992) 73.

⁶⁶ By externalities we mean any factors, whether subjective or objective, other than authorship itself that may affect the stylometric variables of interest.

References

Bamman, D. and Crane, G. (2011) ‘The ancient Greek and Latin dependency treebanks’, in C. Sporleder, A. van den Bosch and K. Zervanou (eds), Language Technology for Cultural Heritage: Selected Papers from the LaTeCH Workshop Series (Berlin) 79–98CrossRef Google Scholar

Bers, V. (2003) Demosthenes: Speeches 50–59 (Austin)Google Scholar

Bodenhofer, U., Kothmeier, A. and Hochreiter, S. (2011) ‘APCluster: an R package for affinity propagation clustering’, Bioinformatics 27, 2463–64 DOI: 10.1093/bioinformatics/btr406 CrossRef Google Scholar

Chen, R., Deng, S. and Liu, H. (2021) ‘Syntactic complexity of different test types: from the perspective of dependency distance both linearly and hierarchically’, Journal of Quantitative Linguistics 1–31 DOI: 10.1080/09296174.2021.2005960 CrossRef Google Scholar

Chen, X. and Gerdes, K. (2017) ‘Classifying languages by dependency structure: typologies of delexicalized universal dependency treebanks’, in S. Montemagni and J. Nivre (eds), Proceedings of the Fourth International Conference on Dependency Linguistics (Pisa) 54–63Google Scholar

Dilts, M. (2002–2009) Demosthenis orationes (4 vols) (Oxford)Google Scholar

Eder, M. (2015) ‘Visualization in stylometry: cluster analysis using networks’, Digital Scholarship in the Humanities 32.1, 50–64 DOI: 10.1093/llc/fqv061 Google Scholar

Eder, M. and Górski, R. (2022) ‘Stylistic fingerprints, POS-tags and inflected languages: a case study in Polish’, arXiv DOI: 10.48550/ARXIV.2206.02208 CrossRef Google Scholar

Eder, M., Rubicki, J. and Kestemont, M. (2016) ‘Stylometry with R: a package for computational text analysis’, R Journal 8.1, 107–21 https://journal.r-project.org/archive/2016/RJ-2016-007/index.html Google Scholar

Ferrer-i-Cancho, R., Gómez-Rodríguez, C., Esteban, J.L. and Alemany-Puig, L. (2022) ‘Optimality of syntactic dependency systems’, Physical Review E 105 DOI: 10.1103/PhysRevE.105.014308 CrossRef Google Scholar

Frey, B. and Dueck, D. (2007) ‘Clustering by passing messages between data points’, Science 315, 972–77 DOI: 10.1126/science.1136800 CrossRef Google Scholar PubMed

Futrell, R., Levy, R. and Gibson, E. (2020) ‘Dependency locality as an explanatory principle for word order’, Language 96.2, 371–412 CrossRef Google Scholar

Gorman, R.J. (2019) ‘Author identification of short texts using dependency treebanks without vocabulary’, Digital Scholarship in the Humanities 35.4, 812–25 https://doi.org/10.1093/llc/fqz070 Google Scholar

Gorman, R.J. (2022) ‘Universal dependencies and author attribution of short texts with syntax alone’, Digital Humanities Quarterly 16.2 http://www.digitalhumanities.org/dhq/vol/16/2/000606/000606.html Google Scholar

Gorman, R.J. and Gorman, V.B. (2016) ‘Approaching questions of text reuse in ancient Greek using computational syntactic stylometry’, Open Linguistics 2.1, 500–10 https://doi.org/10.1515/opli-2016-0026 Google Scholar

Gorman, V.B. (2020) ‘Dependency treebanks of ancient Greek prose’, Journal of Open Humanities Data 6.1 http://doi.org/10.5334/johd.13 CrossRef Google Scholar

Happ, H. (1976) Grundfragen einer Dependenz-Grammatik des Lateinischen (Göttingen)Google Scholar

Helleputte, T. (2021) LiblineaR: linear predictive models based on the Liblinear C/C++ Library. R package version 2.10–12 https://rdrr.io/cran/LiblineaR/ Google Scholar

Jurafsky, D. and Martin, J. (2021) Speech and Language Processing (3rd edition), unpublished draft of 29 December 2021 ∼https://web.stanford.edu/~jurafsky/slp3/ Google Scholar

Kapparis, K. (1991) Demosthenes 59, Against Neaira: Introduction and Commentary (Ph.D. Diss. Glasgow) https://theses.gla.ac.uk/78355/1/11011422.pdf Google Scholar

Koppel, M. and Winter, Y. (2014) ‘Determining if two documents are written by the same author’, Journal of the American Society for Information Science and Technology 65.1, 178–97Google Scholar

Lagutina, K., Lagutina, N., Boychuk, E., Shliakhtina, O. and Paramonov, I. (2019) ‘A survey on stylometric text features’, Proceedings of the 25th Conference of FRUCT Association, 184–95 DOI: 10.23919/FRUCT48121.2019.8981504 CrossRef Google Scholar

Lui, H. (2008) ‘Dependency distance as a metric of language comprehension difficulty’, Journal of Cognitive Science 9.2, 159–91Google Scholar

McCabe, D. (1981) The Prose-Rhythm of Demosthenes (New York)Google Scholar

de Marneffe, M.-C. and Nivre, J. (2019) ‘Dependency grammar’, Annual Review of Linguistics 5.1, 197–218 CrossRef Google Scholar

Moisl, H. (2020) ‘Cluster analysis’, in M. Paquot and S.T. Gries (eds), A Practical Handbook of Corpus Linguistics (Cham) 401–34 DOI: 10.1007/978-3-030-46216-1 CrossRef Google Scholar

Pinkster, H. (2015) The Oxford Latin Syntax, Volume 1: The Simple Clause (Oxford)CrossRef Google Scholar

Pinkster, H. (2021) The Oxford Latin Syntax, Volume 2: The Complex Sentence and Discourse (Oxford)CrossRef Google Scholar

Scafuro, A. (2011) Demosthenes: Speeches 39–49 (Austin)Google Scholar

Stamatatos, E. (2009) ‘A survey of modern authorship attribution methods’, Journal of the Association for Information Science and Technology 60.3, 538–56 DOI: 10.1002/asi.21001 Google Scholar

Swain, S., Mishra, G. and Sindhu, C. (2017) ‘Recent approaches on authorship attribution techniques—an overview’, in International Conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, 557–66 DOI: 10.1109/ICECA.2017.8203599 CrossRef Google Scholar

Trevett, J. (1992) Apollodoros the Son of Pasion (Oxford)CrossRef Google Scholar

Trevett, J. (2019) ‘Authenticity, composition, publication’, in G. Martin (ed.), The Oxford Handbook to Demosthenes (Oxford) 419–30Google Scholar

Varela, P.J., Albonico, M., Justino, E.J.R. and Bortolozzi, F. (2018) ‘A computational approach for authorship attribution on multiple languages’, in 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 1–8 DOI: 10.1109/IJCNN.2018.8489704 CrossRef Google Scholar

Vatri, A. (2017) Orality and Performance in Classical Attic Prose: A Linguistic Approach (Oxford)CrossRef Google Scholar

Fig. 2. Examples of type-value pairs in Against Conon.

Fig. 3. A sample toy distance matrix.

Fig. 4. Dendrogram showing clusters of the sample toy data set.

Fig. 5. Unrooted consensus tree of 21 works of Demosthenes. Distance measure = Euclidean. Clustering linkage = Average.

Fig. 6. Unrooted consensus tree of 21 works of Demosthenes. Distance measure = Wurtzburg. Clustering linkage = Ward.

Fig. 7. Consensus network of all 21 Demosthenic works. The thickness of edges between nodes represents the frequency at which node pairs were identified as nearest neighbours.

Fig. 8. Consensus network of 13 works in two clusters.

Fig. 9. Results of classification by logistic regression.

Fig. 10. Selected input features identified by the model as pro-‘Apollodoran’ discriminators.

Fig. 11. Selected input features favouring attribution to ‘Apollodorus’: Against Evergus and Mnesibulus (Dem. 47), Against Timotheus (Dem. 49), Against Nicostratus (Dem. 53).

Fig. 12. Selected input features against attribution to ‘Demosthenes’: Against Evergus and Mnesibulus (Dem. 47), Against Timotheus (Dem. 49), Against Nicostratus (Dem. 53).

Fig. 13. Selected input features in favour of attribution to ‘Apollodorus’: Against Stephanus 2 (Dem. 46), On the Trierarchic Crown (Dem. 51), Against Neaera (Dem. 59).

Fig. 14. Selected input features against attribution to ‘Apollodorus’: Against Stephanus 2 (Dem. 46), On the Trierarchic Crown (Dem. 51), Against Neaera (Dem. 59).

Fig. 15. Selected input features in favour of attribution to Demosthenes: On the Trierarchic Crown (Dem. 51).

Fig. 16. Selected input features against attribution to ‘Demosthenes’: On the Trierarchic Crown (Dem. 51).

Article contents

A morphosyntactic authorship attribution study of the speeches of Demosthenes and Apollodorus

Abstract

Keywords

I. Introduction

II. Feature preparation

III. Classification

IV. Results and discussion

V. Conclusion

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests