Published online by Cambridge University Press: 12 November 2019
We shall first introduce the use of artificial intelligence (AI) in producing new intellectual creations, distinguishing approaches based on knowledge representation and on machine learning. Then we shall provide an overview of some significant applications of AI to the production of intellectual creations, distinguishing the extent to which they depend on pre-existing works, and the different ways in which such pre-existing works are used in the creative process. In addition, we shall discuss some methods to automatically assess the similarity of works and styles, in the context of AI technologies for text generation. Finally, we shall discuss the legal aspects of AI-reuse of copyrighted works, focusing on the rights of the authors of such works relative to the process and the outputs of AI.
1 A Halevy et al, “The Unreasonable Effectiveness of Data” (2009) IEEE Intelligent Systems 8.
2 See A Ramalho, “Will Robots Rule the (Artistic) World? A Proposed Model for the Legal Status of Creations by Artificial Intelligence Systems” (2017) Journal of Internet Law; J Ginsburg and L Ali Budiardjo, “Authors and Machines” (2019) 34 Berkeley Technology Law Journal.
3 M Sag, “Orphan Works As Grist for the Data Mill” (2012) 27 Berkeley Technology Law Journal 1503.
4 MA Boden, “Computer Models of Creativity” (2009) 30 AI Magazine 23.
5 P Gervás, “Wasp: Evaluation of Different Strategies for the Automatic Generation of Spanish Verse” (2000) Proceedings of the AISB-00 symposium on creative & cultural aspects of AI.
6 Heung-yeung Shum et al, “From Eliza to XiaoIce: Challenges and Opportunities with Social Chatbots” (2018) 19 Frontiers of Information Technology & Electronic Engineering 10.
7 Wen-Feng Cheng W-F et al, (2018) Image inspired poetry generation in xiaoice <arxiv.org/pdf/1808.03090.pdf> accessed 26 September 2019.
8 See <www.nextrembrandt.com> accessed 26 September 2019.
9 <deepdreamgenerator.com> accessed 26 September 2019.
10 A Mordvintsev et al, “Deepdream-a Code Example for Visualizing Neural Networks” (2015) 2 Google Research 5.
11 A Elgammal et al, “Can: Creative adversarial networks, generating art by learning about styles and deviating from style norms” (2017) <arxiv.org/abs/1706.07068> accessed 26 September 2019.
12 See, for example, Li, M and Vitanyi, P, An Introduction to Kolmogorov Complexity and Its Applications (Springer-Verlag, New York 1997) and Cilibrisi, R and Vitányi, PMB, “Clustering by Compression” (2005) 51 IEEE Transaction on Information Theory 11523.CrossRefGoogle Scholar
13 Witten, IH et al, Managing Gigabytes (Morgan Kaufmann Publishers 1999).Google Scholar
14 MA Boden, “Creativity and Artificial Intelligence” (1999) 103 (1-2) Artif Intell 347.
15 Reiter, E et al, Building Natural Language Generation Systems (MIT Press 2000) p 33.CrossRefGoogle Scholar
16 Y LeCun et al, “Deep Learning” (2015) 521 Nature 436.
17 Information theory was born in 1948 with CE Shannon’s, “A Mathematical Theory of Communication” (1948) 27 Bell System Technical Journal 379, which poses and solves the problem of defining the amount of information contained in a “message”, for example a text or more generally any sequence of symbols. For a more extensive account see Pierce, JR, An Introduction to Information Theory: Symbols, Signals and Noise (Dover, New York 1980).Google Scholar
18 Cf, among others, A Lempel and J Ziv, “On the Complexity of Finite Sequences” (1976) 1 IEEE Transactions on Information Theory 75; A Lempel and J Ziv, “A Universal Algorithm for Sequential Data Compression” (1977) 3 IEEE Transactions on Information Theory 337; A Lempel and J Ziv, “Compression of Individual Sequences via Variable-Rate Coding” (1978) 5 IEEE Transactions on Information Theory 530, and the review paper A Lempel and J Ziv, “Compression of Individual Sequences via Variable-Rate Coding” (1998) 44 IEEE Transactions on Information Theory 2045.V
19 AD Wyner, “Typical Sequences and All That: Entropy, Pattern Matching and Data Compression” (1995) IEEE Information Theory Society Newsletter.
20 E Stamatatos, “A Survey of Modern Authorship Attribution Methods” (2009) 60 Journal of the American Society for information Science and Technology 538.
21 J Ziv and N Merhav, “A Measure of Relative Entropy Between Individual Sequences with Application to Universal Classification” (1993) 39 IEEE Transactions on Information Theory 1270.
22 D Benedetto et al, “Language Trees and Zipping” (2002) 88 Physical Review Letters 48702.
23 C Basile et al, “An Example of Mathematical Authorship Attribution” (2008) 41 Journal of Mathematical Physics.
24 Benedetto et al, supra, note 22.
25 If you are interested in a detailed analysis of the compression of attached files, see A Puglisi et al, “Data compression and learning in time sequences analysis” (2003) 180 Physica D 92.
26 After a first experiment based on bigram frequencies presented in 1976 by Bennett, WR, Scientific and Engineering Problem-Solving with the Computer (Prentice-Hall, Englewood Cliffs, NJ 1976)Google Scholar, V Keselj and collaborators published in 2003 a paper in which n-gram frequencies were used to define a similarity distance between texts: V Kešelj et al, “N-gram-based author profiles for authorship attribution” (2003) Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING’03, pp 255–264. See also R Clement and D Sharp, “Ngram and Bayesian classification of documents for topic and authorship” (2003) 18(4) Literary and Linguistic Computing 423.
27 Basile et al, supra, note 23.
28 To be more precise, d n is a pseudo-distance, since it does not satisfy the triangular inequality and it is not even positive definite: two texts X,Y can be at distance d n (X,Y) = 0 without being the same, but this has basically no effect on concrete applications. Note that in the previous formula, in contrast with what happens for the Euclidian distance, each term of the sum is weighted with the inverse of the square of the sum of the frequencies of that particular n-gran. In this way rare words, ie n-grams with lower frequencies, give a larger contribution to the sum.
29 M Lippi et al, “Natural language statistical features of lstm-generated texts” (2019) IEEE Transactions on Neural Networks and Learning Systems 12.
30 S Hochreiter and J Schmidhuber, “Long Short-Term Memory” (1997) 9 Neural Computation 1735.
31 Y Bengio et al, “Learning Long-Term Dependencies with Gradient Descent Is Difficult” (1994) 5 Neural Networks, IEEE Transactions on 157.
32 Lippi et al, supra, note 29.
33 PJ Werbos, “Backpropagation Through Time: What It Does and How to Do It” (1990) 78 Proceedings of the IEEE 1550.
34 Hochreiter and Schmidhuber, supra, note 30; Bengio et al, supra, note 31.
35 Lippi et al, supra, note 29.
36 ibid.
37 For a critical discussion, see J Grimmelmann, “Copyright for Literate Robots” (2015) Iowa Law Review 657.
38 PN Leval, “Toward a Fair Use Standard” (1990) 103 Harvard Law Journal 1105.
39 For a liberal approach to non-expressive uses, see Sag, supra, note 3.
40 We may wonder whether the traces of the work in a trained system (eg in the data structures resulting from the training of a neural networks) should also be removed, assuming that this does not require an unreasonable effort. We thank Bert-Jaap Koops for pointing to this issue.
41 Floridi, L, The Fourth Revolution: How the Infosphere Is Reshaping Human Reality (Oxford 2014).Google Scholar
42 Kurzweil, R, The Singularity Is Near (Viking 2005).Google Scholar
43 Balkin, JM, “Information Power: The Information Society From An Antihumanist Perspective” in Katz, E and Subramanian, R (eds), The Global Flow of Information (New York University Press 2011).Google Scholar
44 Lessig, L, Remix: Making Art and Commerce Thrive in the Hybrid Economy (Penguin 2008).CrossRefGoogle Scholar