Introduction
Organizations from both the public and private sector hold large quantities of data, in both static form and real-time flow. US governmental organizations increasingly depend on the timely use of data for evidence-based policy-making, thus allowing government data to be viewed as a public resource and the governance of data to influence public interpretations of the role of government in serving the public good. If these datasets were stored and shared more widely within and across organizations, the resulting analytics could be used to improve organizational efficiency and productivity, enable and empower the general public, and produce economic and commercial value. The governance of data is, however, subject to a tension between data sharing and the need to apply both legal and technical means to protect the privacy of individuals represented in the data, as well as the need to address questions about data as property for public and private agents (Whittington et al. Reference Whittington, Calo, Simon, Woo, Young and Schmiedeskamp2015; Young et al. Reference Young, Rodriguez, Keller, Sun, Sa, Whittington and Howe2019). Local governments face technical and organizational barriers to governing data in the public interest, and have recently begun, as exhibited in the City of Seattle, to piece together policies, departmental resources, and implementation strategies for the purpose of effective governance of data.
The lack of such governance structures not only prevents organizations from receiving the full benefits of their data, but also brings a number of costs to organizations and individuals represented in the data. Data sharing is often an essential first step to enable public–private partnerships, as would be needed to provide government oversight of firms operating within the city under permits or as vendors. Without an established set of protocols for sharing and governing data, considerable costs of repeated negotiation and legal disputes emerge for governments and firms (Savage Reference Savage2019). Furthermore, the lack of established governance structures for data sharing opens up an unregulated and unmonitored market for data brokers, who may then collect and rejoin released datasets, re-identify data subjects, and sell the resulting artifacts for profit (Federal Trade Commission 2014). Any economic or social costs that arise from the loss of privacy from public data can be treated as externalities that, by definition, are paid by the general public (i.e., the data subjects), while the benefits are captured by the private data brokers, which also creates harms to social equity (Savage Reference Savage2019).
This chapter examines the case of institutional design for urban data governance in the City of Seattle as a collective action problem, referencing three prominent theoretical frameworks for studying institutional change and institutional economics. This work centers on the Governing Knowledge Commons (GKC) framework, which is adapted from Elinor Ostrom’s Institutional Analysis and Development (IAD) framework for natural resource commons (Ostrom 1990) and developed by Frischmann, Madison, and Strandburg (2014) to study institutional arrangements for overcoming various social dilemmas associated with sharing and producing information, innovation, and creative works (Frischmann, Madison, and Strandburg 2014). Furthermore, this chapter notes the foundational integration of the IAD framework with Oliver Williamson’s transaction cost economics (TCE), highlighting the role of transaction costs in understanding the externalities associated with the governance of data (Hoofnagle and Whittington 2014; Whittington and Hoofnagle 2012; Williamson 1975, 1985).
The chapter is organized in two main sections. The first provides theoretical context for understanding the case of Seattle City smart city governance, including GKC, IAD, and TCE as presented in the privacy literature, which benefits from the concepts of privacy as contextual integrity (Nissenbaum Reference Nissenbaum2004) and the taxonomy of privacy (Solove Reference Solove2006). The second section applies Seattle’s governance structure to this theoretical framework, as one case study of several in the GKC series to comparatively analyze institutional change and city governance.
Theoretical Frameworks for Evaluating City Governance of Privacy
Privacy research recognizes theoretical frameworks from economics (Acquisti Reference Acquisti2014: Hoofnagle and Whittington, Reference Hoofnagle and Whittington2014) and information economics (Choi, Jeon, and Kim Reference Choi, Jeon and Kim2019), and includes the theories developed from within the field (Nissenbaum Reference Nissenbaum2004; Rubinstein Reference Rubinstein2018; Solove Reference Solove2006). Some recent advances examine the formation of institutions for governing privacy as part of a commons (Savage Reference Savage2019), with reference to Elinor Ostrom’s path-breaking institutional economic work (Ostrom 1990).
Elinor Ostrom’s research on the governance of common pool resources provided an empirical and theoretical explanation of institutional change within communities (Ostrom 1990). Problems governing common pool resources, such as fisheries and groundwater aquifers, offer iconic representations of the tragedy of the commons, found principally in Cournot’s model of noncooperation in the prisoner’s dilemma (Hardin Reference Hardin1971). Ostrom’s contributions to game theory reveal, however, the mechanisms employed to create institutions of self-governance in these settings, providing empirical evidence, more elaborate models, and grounded theories that bear on Nash’s (Reference Nash1953) theory of cooperation. Frischmann, Madison, and Strandburg (2014) extend Elinor Ostrom’s (1990) IAD framework for natural resource commons to study commons-based knowledge production, or governing of the knowledge commons.
Adapted to describe institutions governing information, the knowledge commons framework simply expands on the concept of resource characteristics to include those other than products of the natural world (Frischmann, Madison, and Strandburg 2014). Resource characteristics, attributes of the community, and rules-in-use (a reference to path dependence in institutional economics), are influencing factors for actors in “action situations,” which give rise to patterns of interaction and feedback loops within the community. Further, within Ostrom’s concept of IAD (Ostrom Reference Ostrom2011), action situations are given internal structure by recognizing persons in positions meaningful to potential outcomes, who may differ in their information about the situation or their authority or ability to control the situation (Figure 2.1). And, of course, the costs and benefits of outcome situations vary, for the community as a whole and for the participants and those they may represent in the action situation. This illustrates how, as a framework, Ostrom’s IAD and the GKC identify universal elements to consider in the analysis of institutions (Ostrom Reference Ostrom2011, 8).
In general, the IAD and GKC support research on the economics of institutions, as found in transaction cost economics. Transaction cost economics, as described by Ronald Coase (1960) and operationalized by Oliver Williamson (Reference Williamson1975), provides theoretical and empirical support to the idea that institutions and organizations within well-functioning economies are formed for the purpose of economizing on transaction costs. This theory and the associated body of empirical work apply further to comparative analysis of economies (North Reference North1990), where institutions are the defining source of variation in economic performance. TCE, GKC, and IAD are complementary, as each elaborates on institutions and their economic performance. Their principal difference may be that GKC and IAD provide a framework for connecting qualitative variables in the formation of institutions to outcomes in the form of costs and benefits to participants, which TCE describes as a transaction in a more typical well-formed market. In contrast, TCE offers research designs for comparatively analyzing the efficiency of institutions in terms of costs, which can answer questions about the relative collective benefits of the institutional changes examined from place to place or over time in GKC and IAD fashion.
Both the GKC and the TCE frameworks have been applied to analyze alternative governance forms for privacy. Sanfilippo, Frischmann, and Strandburg (Reference Sanfilippo, Frischmann and Standburg2018) adopted the GKC framework and complemented it with Helen Nissenbaum’s (Reference Nissenbaum2004) “privacy as contextual integrity” approach and Solove’s (Reference Solove2006) taxonomy of privacy’s diverse meanings. Through a meta-analysis of fourteen case studies using the GKC or IAD framework, the study demonstrated the usefulness of the GKC framework to systematically explore and structure variance among communities with respect to knowledge resources and participation, obstacles and dilemmas surrounding knowledge formation and flows, objectives of participants, and rules-in-use structuring knowledge and privacy commons. Whittington and Hoofnagle (Reference Whittington and Hoofnagle2012; Hoofnagle and Whittington Reference Hoofnagle and Whittington2014) utilized the TCE framework and the concept of asset specificity to explain the hidden privacy cost of the exchange between consumer information and “free” online services. Asset specificity arises in data as individuals may be identified from it, making data subjects party to transactions that use the data or third parties subject to external effects from transactions. Since the exchange between consumers and online firms is not simple and discrete, but rather a continuous transaction with atypical attributes, these exchanges make it very difficult for consumers to determine the value of what they are trading. Even after an individual consumer becomes aware of the cost of privacy loss ex post, it is difficult for that consumer to switch service or withdraw their information without a significant cost due to the high asset specificity of personal information. Building upon existing studies using GKC and TCE, this case study of governance of information by the City of Seattle offers an application of these theories to municipal data governance for privacy, and an illustration of the potential for research design on the efficiency of institutional change for the purpose of privacy governance.
Case Study of the City of Seattle
The City of Seattle has led the nation in the adoption of privacy principles and governance structures since the first appointment of a Chief Privacy Officer in 2015. This was followed by a surveillance ordinance and a series of system-wide evaluations of applications, technologies, and data assets, including the contents available as open data and for public records requests. The following subsections map the systems and policies of information governance in the City of Seattle to the components of the GKC framework.
This analysis covers several relevant actions as Seattle has expanded and deepened its means and methods for governing privacy in municipal data over the past five years. This brief evolution of policy and its implementation includes the governance of data and the technologies used to collect and process data by the City of Seattle to protect the privacy of city residents, while retaining the utility of data for municipal purposes. The IAD and GKC frameworks exist as a constellation of variables that surround and define these actions. This subsection first addresses so-called external variables, which are (1) the resource characteristics, (2) attributes of the community, and (3) the rules-in-use. This is followed by subsections that focus on the action arena, by identifying (4) action situations, (5) actors in positions, (6) rule configurations as they may affect the action under consideration, and (7) patterns of potential outcomes and interactions.
Resource Characteristics
Cities often face conflicted objectives when it comes to the governance of urban data. On the one hand, they are pushed to provide more public access to the data to better inform decisions, facilitate research, and enhance governance transparency and accountability. On the other hand, they also have the obligation to protect the privacy of their residents in order to build and maintain public trust in represented government, the fiduciary responsibility of government with taxpayer funds and the provision of government services for the public good. The goal of the governance of urban data is to balance these two objectives through the management of urban data as a public resource. This subsection discusses the characteristics of urban data and associated common privacy challenges, noting that public trust in government is associated with effective governance of data in the public interest while perceptions of lack of trust can be interpreted as indicators of inefficient or ineffective governance of data in the public interest. However, since the concept of trust in governance of the commons is rather complex and has been discussed extensively in other IAD contexts (Ostrom Reference Ostrom2009), this chapter only focuses on the governance of urban data. Data has monetary and governmental value as property, bringing measurable costs and benefits to users and data subjects. In transactions involving personal data, safeguards for privacy are viewed as necessary to reduce the cost to the data subject that may arise from opportunism with guile on the part of any individual or organization possessing such data. In this sense, governance structures that safeguard privacy reduce ex post transaction costs over data that may be used to identify individuals and groups in society. A single transaction represents the smallest indivisible unit of analysis (Whittington and Hoofnagle Reference Whittington and Hoofnagle2012) in the study of information flow (e.g., Nissenbaum Reference Nissenbaum2004). A transaction is anchored in contextual integrity when the data subject is a voluntary party to the transaction and, furthermore, knows the ex post implications of the exchange. This knowledge, however, is not easy to acquire. Safeguards for privacy lower transaction costs for the data subject by making the implications of exchange more explicit (reducing information asymmetry), constraining information flow (e.g., preventing transfer of data to third parties or distribution of data to secondary markets), and giving data subjects the right to delete data held by others (Whittington et al. Reference Whittington, Calo, Simon, Woo, Young and Schmiedeskamp2015; Hoofnagle and Whittington Reference Hoofnagle and Whittington2014). The aim is to form institutional arrangements that the parties (including the data subject) would have formed if endowed with equal bargaining power (Hoofnagle and Whittington Reference Hoofnagle and Whittington2014; Whittington and Hoofnagle Reference Whittington and Hoofnagle2012).
However, not all transactions of personal data require the same level of safeguarding, and implementing the appropriate safeguard requires the evaluation of privacy risks and associated transaction costs involved in each type of transaction. Safeguards can be construed as alternative governance structures in transaction cost economics (i.e., alternative institutional arrangements), and the idea of economizing is to find the alignment of transactions (with their privacy characteristics) with governance structures (institutional arrangements) to minimize transaction costs to the collection of parties involved, ex ante and ex post.
In terms of safeguarding personal data for privacy, both Nissenbaum (Reference Nissenbaum2004) and Solove (Reference Solove2006) have explored the variability and heterogeneity of privacy expectations. Nissenbaum (Reference Nissenbaum2004) has pointed to key parameters of information norms, such as actors, attributes, and transmission principles, to locate context, identify disruptive flows, and determine the constraints on the flow of information. The TCE framework of comparative institutional analysis could be used to empirically examine or implement Nissenbaum’s contextual integrity framework in order to find governance structures that fit – i.e., that minimize ex post privacy loss. In other words, the TCE framework is a complementary methodology for understanding the effects and perhaps quantifying the variability of privacy risk and associated harms by examining the relationship between personal data and the bilateral contractual relationship of the transaction through the lens of asset specificity (e.g., trade in personal or identifiable information for another good).
Asset specificity describes the degree to which an asset can be redeployed to alternative uses and by alternative users without sacrifice of productive value (Williamson Reference Williamson1975). Asset specificity in information is, in this way of thinking, a function of the personal or re-identifiable nature of the data in question (Hoofnagle and Whittington Reference De Montjoye, Hidalgo, Verleysen and Blondel2013). Personal information is, in the TCE sense, an asset unique to each consumer and difficult to redeploy. For example, daily routines or habits would often take months or years to change and it is almost impossible to change a person’s biological information. When an asset cannot be redeployed without a significant cost, transactions are more likely to form a bilateral dependent relationship ex post, even when the contractual relationship starts from perfect competition ex ante. Such a bilateral dependent relationship would lock consumers via their personal or re-identifiable information in the transaction, which increases the risk of exploitation by opportunism. The higher the asset specificity of a piece of personal information, the more likely a consumer is to be locked in bilateral dependent trading relations with the firms that obtain this information, and therefore this requires higher levels of safeguard.
Previous analysis of Seattle data governance highlights the outsized role of personal identifiable data in city affairs (Whittington et al. Reference Whittington, Calo, Simon, Woo, Young and Schmiedeskamp2015). The emergence of location-based services has led to an unprecedented surge in spatiotemporal data sources available to cities and their vendors, and Seattle is no exception. While the new sources offer opportunities to discover subject-level knowledge and expand fields of inquiry, they also allow the re-identification of individuals, thus raising privacy risk by revealing intimate information about persons (Thompson and Warzel Reference Thompson and Warzel2019). Location-based and time-stamped data may be analyzed with malicious intent, with serious consequences for the persons identified through the data. This subsection identifies the common data sources the City of Seattle encounters in daily practice and summarizes the empirical evidence identified in the literature, demonstrating the privacy risks of different types of such data.
Public Records
Although data from public records do not possess the same level of spatiotemporal resolution as data emerging from new sources, studies have shown that the simplest location or temporal information in public records can be linked to existing available records to re-identify people. Golle (Reference Golle2006) examined 2,000 census data records and found that 63 percent of the US population can be uniquely identifiable by gender, five-digit zip code, and estimated date of birth. Even at the county level, 18 percent of the US population can be identified using these three variables. With the same identifiers of gender, zip code, and birth date, Sweeney (Reference Sweeney2002) linked two publicly available datasets – the voter registration list and health insurance data in Massachusetts – and successfully identified the governor of Massachusetts. Acquisti and Gross (Reference Acquisti and Gross2009) used the birth records from the SSA Death Master File (DMF) and observed a correlation between individuals’ Social Security numbers (SSNs) and their birth data, which allows statistical inference of private SSNs. The correlation is more significant for the younger cohort between 1989 and 2003, with 61 percent of records being correctly predicted by their birth data. More recently, Whittington et al. (Reference Whittington, Calo, Simon, Woo, Young and Schmiedeskamp2015) examined the datasets available from Seattle’s open data portal and estimated that nearly all tables in the selected sample can be spatially linked to data identifying persons either by spatial coordinate information or by zip code.
Surveillance Cameras
Surveillance cameras, or closed-circuit television (CCTV) in particular, have long been approved by police forces, governments, local councils, and business owners to maintain safety and security (Ditton Reference Ditton2000). A plethora of attention to surveillance cameras can be found in urban studies literature. As early as 1996, Jean Hillier documented the course of events in summer 1994 at Burswood Casino, where security camera operators abused their access to control equipment by targeting the cameras at women for voyeuristic pleasure. The story caused major public outrage and started a widespread debate on the blurring boundaries between public and private space and activities (Koskela Reference Koskela2002). More recently, Spiller (Reference Spiller2016) discussed his own experiences in the United Kingdom of identifying seventeen different CCTV cameras and being recorded, and the attempts to access his images through subject access requests. He wrote thirty-seven letters, made thirty-one phone calls, and spent £60 making the requests; and he faced a number of obstacles in obtaining the footage, including inadequate contact information, misleading or incorrect information, lack of responses, and simple rejection.
Apart from the aforementioned qualitative studies, others have applied a more quantitative approach. Ma et al. (Reference Ma, Yau, Yip and Rao2010) studied how snapshots of traffic intersections can be used as side information to achieve various privacy attacks on a person’s mobility traces. The study used both real and simulated mobility trace data and found that ten snapshots can identify the trace of 30–50 percent of the victims. Chen, Yang, and Xu (Reference Chen, Yang and Xu2017) applied the K-means clustering algorithm to one week of license plate recognition data obtained in Shenzhen, China and successfully reduced the data into groups with unique travel times, travel purposes, and spatial travel patterns. Gao, Sun, and Cai (Reference Gao, Sun and Cai2019) measured the privacy vulnerabilities of license plate recognition data captured by high-resolution cameras on highways in Guangzhou, China. The study found that five spatiotemporal records are enough to uniquely identify about 90 percent of individuals, even when the temporal granularity is set at half a day. The study also proposed two privacy protection methods: a suppression solution and a generalization solution. An entropy measure of information loss is also introduced to measure the utility loss caused by each solution.
Spatial Trajectory Data
Spatial trajectory data is another popular data type. Compared with public records data and data collected by location-based sensors, it has the highest spatiotemporal resolution, which is typically within a time interval of less than a minute. Thus, it also processes the highest privacy risks among all data types. Spatial trajectory data can come in several forms, such as GPS, cell phone signal tower data, or location-based micro-transaction information.
Studies on the privacy risk of spatial trajectory data abound. Terrovitis and Mamoulis (Reference Terrovitis and Mamoulis2008) examined the privacy risk of trajectory data using a synthetic dataset with an initial setting of 100 unique addresses and 10,000 trajectories. The number of adversaries who observe information increases gradually and the result shows that with five adversaries (i.e., spatiotemporal data points), over 90 percent of the individuals can be identified through the dataset. Munizaga and Palma (Reference Munizaga and Palma2012) developed an estimation method for transit alighting and applied it to a week of transit smartcard and GPS data on 36 million observations for Santiago, Chile. The proposed method can build a detailed public transport origin and destination matrix at any desired time–space disaggregation. De Montjoye et al. (Reference De Montjoye, Hidalgo, Verleysen and Blondel2013) have shown, in their study of the hourly cell-phone tower tracking of 1.5 million devices by media access control (MAC) address over fifteen months, only four spatiotemporal data points per day are needed to re-identify 95 percent of the owners of those devices. And as noted above, Gao, Sun, and Cai (Reference Gao, Sun and Cai2019) measured the risk of license plate recognition data and found that, even when aggregating data over a twelve-hour period, about 90 percent of individuals may be identified with as few as five spatiotemporal data points.
In summary, the public resource under consideration for this study is data collected, held, and used by the municipality, and many types of municipal data happen to carry the threat of loss of privacy and associated costs to the data subject if released to the public, giving municipalities a compelling rationale for governing municipal data with privacy in mind. In the City of Seattle, events occurring in 2013 and 2014 elevated public concern over privacy to a peak, prompting the city to adopt a fresh perspective on the problem of public surveillance.
Attributes of the Community
The modern evolution of Seattle’s privacy policies and their implementation began in 2013, as the Seattle police began to install surveillance cameras and a mesh network with the capability of tracking wireless devices through downtown. Attentive to the emergence of cameras on city streets, critics of the system were vocal in their concern and opposition, including the Seattle Privacy Coalition, a group formed in March 2013 and incorporated as a nonprofit organization in 2014 to protect citizen privacy from government surveillance programs and intrusive corporate data collection (Seattle Privacy Coalition 2013). In response to criticism, the city deactivated the network, and began a multiyear process to conceive of policies and a governance system to protect public privacy in municipal data and information technology.
Action on the part of the city was swift. In November 2014, the city convened a Privacy Advisory Committee composed of academics, practitioners, lawyers, and community advocates, which provided advice to city departments as they engaged in a new initiative to explore the role of the municipality with regard to protecting the privacy of its residents. By February of 2015, these efforts resulted in a unanimous vote of the city council to adopt the City’s Privacy Principles, referred to by Mayor Murray as “a guide for our work in local government in order to help build and maintain trust with the people we represent” (City of Seattle Reference Frischmann, Madison and Strandburg2015). Implementation began immediately, in 2015, as the city hired a Chief Privacy Officer and initiated policies and procedures associated with the principles, including notice and consent, the minimization of data collection and use, and the deletion or de-identification of data according to city data retention schedules. At the same time, the city’s approach to data governance for privacy was furthered by its participation in a study of internal data governance practices and public perception of privacy risk (Whittington et al. Reference Whittington, Calo, Simon, Woo, Young and Schmiedeskamp2015). For example, Whittington et al. (Reference Whittington, Calo, Simon, Woo, Young and Schmiedeskamp2015) found that while the city’s open data initiative was induced by the hope of improving government transparency and accountability, without a comprehensive assessment of latent risks and effective governance structures, it can lead to harms of privacy and social equity to the general public and public employees.
In 2016, the City of Seattle made an ambitious change to its organizational structure by consolidating its IT staff across the Department of IT and other departments into one office, Seattle Information Technology (Seattle IT) to provide centralized information management and tech support to its twenty-eight departments. Seattle IT hosts its Security, Risk, and Compliance division, working in tandem with the Privacy Program to meet both privacy and security needs. The Privacy Program holds city departments accountable to its privacy principles and has published a privacy toolkit for use by each department to assess the privacy implications of the data it collects and uses. In 2016–17, Seattle IT accomplished the feat of training 92 percent of municipal employees on data privacy and security via interactive training; this effort achieved high completion rates through internal monitoring, reminders, and customization. Seattle IT’s organizational restructuring plays a key role in facilitating administration of the Privacy Program and privacy training. In 2017, managerial performance reviews began to include personnel completion of annual privacy training as a success criteria (Whittington, Young, and Armbruster Reference Whittington, Young and Armbruster2018).
Figure 2.2 illustrates the structure of the Privacy Program and its relationship with other municipal departments in 2018. The Privacy Program oversees privacy issues associated with data used in the other twenty-eight departments in the municipality. The Privacy Program personnel include a Chief Privacy Officer, a Privacy Program Manager, Senior Privacy Specialist, Privacy Specialist, and Data Analyst Intern. It also has indirect reports within each department called “Privacy Champions.” Privacy Champions are volunteers nominated by the directors of each department. They are trained in data privacy, and assist the Privacy Program personnel in carrying out privacy assessments of datasets intended for Seattle’s Open Data Platform.
The evolution of Seattle’s governance of municipal privacy was given further definition in 2018, when a city ordinance took effect that was touted by the American Civil Liberties Union as the “Nation’s Strongest Regulations for Surveillance Technology” (ACLU Washington 2017). Seattle had, in 2013 and again in 2016 (Ordinance No. 124142 and 125376), already enacted two ordinances advancing privacy concerns over technologies that may be used for the purpose of surveillance. The latest of these acts (Ordinance No. 125679) offered a significant expansion of the city’s efforts. It deepened the role of municipal governance of privacy in relation to the community by establishing a Community Surveillance Working Group, an advisory body to the city comprised of community members, and a detailed apparatus for communicating to the public about the process and results of city decisions regarding the adoption and use of technologies capable of surveillance.
Altogether, this brief overview of the development of privacy policy and administration at the City of Seattle reinforces the concept of municipal privacy governance as an iterative process between the municipality and the general public it serves, even as the varied departments and personnel, with their roles and responsibilities, grow and adapt to the new norms of privacy protection in the governance of municipal data.
The Action Arena: Action Situations and Actors
The next factor of study in the City of Seattle’s institutional environment is the action situation in the action arena – in particular, the action situations for governing privacy in urban data. An action situation is a key conceptual unit of GKC to describe the social space where individuals or actors interact, exchange goods and services, solve problems, dominate one another, or fight (Ostrom Reference Ostrom2011). The identification of an action situation and the resulting patterns and outcomes is essential as most of the description, prediction, analysis, and explanation under the GKC framework takes place at this level (Ostrom Reference Ostrom2011; Sanfilippo, Frischmann, and Strandburg Reference Sanfilippo, Frischmann and Standburg2018).
To identify the action situations in this case study, we first apply Varian’s (Reference Varian2004) concept of system reliability, common to data security as well as privacy, which argues that system reliability can be treated as a type of public good that depends on the successful function of the weakest link of the system. In other words, any one of many possible actors or action situations can result in a release or distribution of data that results in a loss of privacy: multiple dimensions, actors, and interactions could be identified as weak links in the effort to govern data for privacy and undermine the collective effort of the municipality to preserve citizens’ privacy. This study therefore conceives of the action situation as existing in these multiple contexts.
To further dimensionalize privacy, we also employ Solove’s taxonomy of privacy (Solove Reference Solove2006) to identify the array of privacy problems and the action situations under consideration. Table 2.1 presents the privacy dimensions identified in Solove’s taxonomy of privacy and the corresponding action situations, actors, and associated privacy concerns. It is worth noticing that under the original taxonomy of privacy there are four dimensions of privacy problems, which are information collection, information processing, information dissemination, and invasion. Here we cover the first three dimensions and combine collection and processing, as they often fall under the same policies.
Privacy dimension | Action situations | Actors | Privacy concerns |
---|---|---|---|
Information collection & processing | Vendor agreement | Public agency and vendors | Sale to third-party data brokers of data collected by vendors for public use |
Surveillance ordinance | Public agency and special-interest groups | Abusive use of data collected by city surveillance technologies | |
Privacy impact assessment | Departments within public agency | Personally identifiable or sensitive information | |
Information dissemination | Public disclosure request | Public agency and the general public | Request with malicious intent |
Open data release | Public agency and the general public | Data in a single dataset or from multiple joined datasets contain personally identifying or sensitive information |
As illustrated in Table 2.1, public agencies face a variety of action situations under which they may interact with different actors, including private vendors, special interest groups, other departments in the municipality, and the general public. Depending on access to information and control, the same actors may have a different position under different action situations. For example, while public agencies provide monitoring under the action situation of the vendor agreement, they are being monitored by the special interest group under the action situation of surveillance ordinance to prevent opportunistic behavior by the public agencies.
Rules-in-Use
We group rules into federal and state laws and municipal policies. For the action situations, federal and state laws can be viewed as exogenous because they are not influenced by the outcomes of the action situations. In comparison, municipal policies can be endogenous to the action situations since they may be created or amended as an outcome of the action situation.
In the State of Washington, the Public Records Act (RCW 42.56) allows for the clear majority of public agency records to be disclosable in response to a specific request. The City of Seattle estimates that it receives between 200 and 250 public disclosure requests (PDR) each week, over 12,000 annually. There is careful consideration about what can be disclosed in response to each such request. To comply with the Public Records Act, requested records may only redact or exempt attributes that are explicitly exempted from disclosure under the law, such as the home addresses of city employees, children’s information, and personal information for individuals receiving some services associated with welfare.
Recognizing the impact of these requests on state and municipal government, in 2017 the Washington State Legislature passed two bills relevant to public disclosure. HB 1595 provides for an agency to charge a per-gigabyte fee for the production of electronic records, whereas costs were previously charged for photocopies and hard drives alone. It also allows agencies to deny requests generated by bots, which are automated software programs that were used in the past to send multiple requests for records. HB 1594 requires public records officers to undergo additional training from the Attorney General’s Office as to how electronic records must be handled under the law, and initiated a study of how new technologies could facilitate disclosure of records, such as a statewide online public records platform. In addition to the Public Records Act, other state and federal legislative activities that have implications for the city are listed in Table 2.2.
Legislative activities | Timeline |
---|---|
State | |
The California Consumer Protection Act (CCPA) | Effective January 1, 2020 |
The Washington Privacy Act (WaPA – SB 6281) | Failed a House vote in 2019; Reintroduced January 13, 2020 |
The Use of Facial Recognition Services Bill (SB 6280) | Did not pass out of committee in 2019; Reintroduced in 2020 |
The Remedies for Misuse of Biometric Data Bill (HB 2363) | Introduced in 2020 |
The Consumer Protection Requirements for Data Brokers Bill (HB 1503) | Held over in 2019; Reintroduced in 2020 |
Federal | |
Information Transparency and Personal Data Control Act | Reintroduced in 2018 |
Consumer Online Privacy Rights Act (COPRA) | Introduced in 2019 |
The rules-in-use by the City of Seattle for governing data for privacy may be categorized according to Solove’s taxonomy of privacy, by their purposes in information collection, information processing, and information dissemination.
Information Collection and Processing
Information collection and processing occurs as part of the governance of municipal data for privacy through the Data Privacy Review Process, vendor agreements, and the implementation of the surveillance ordinance offer insights into the information collection practices of the city.
Data Privacy Review Process.
All projects initiated since 2016 must follow the Data Privacy Review Process. The Data Privacy Review Process has steps which are completed based on whether a program is deemed to have personally identifying or sensitive information. Programs that were in place prior to the creation of this process are referred for a privacy review on a case-by-case basis for specific questions, such as a request for Privacy Program personnel to evaluate an existing vendor agreement. In addition, the city has incorporated the privacy review as part of the technology purchasing process. This is intended to identify technologies that meet the surveillance technology ordinance criteria and ensure that they are submitted to council for review and approval prior to acquisition.
Step I: Self-assessment. The first step of the privacy review process is the self-assessment. The self-assessment is a simple web form that asks the user whether the dataset contains any personal information; it defines personal information as “any information relating to an identified or identifiable individual,” including more than twenty data elements, such as name, address, social security number, financial records, or ethnicity. If the user finds that the data does not contain personal information, or meet the definition of surveillance, no further action is needed, and the results of the self-assessment are filed for record-keeping purposes. If the user indicates that the data does contain personal information, the data proceeds to the threshold analysis. The self-assessment document is available as a web form to be filled out by the project manager. The output of the analysis is automatically filed on an internal Sharepoint server to document that it has been completed. This record also notes how many of the dataset’s attributes have been reviewed, so that in the future, if the data is updated or expanded, it may be monitored for further privacy assessments. In 2017, the Privacy Team implemented a case management and automated workflow process to keep a record of the review cases and details, manage response expectations (service-level agreements for response time) and track surveillance and privacy impact assessment requirements.
Step II: Threshold analysis. This analysis is used to assess the risk rating associated with the data collected. It requires users to specify if sensitive attributes are collected by the program, such as names, addresses, drivers’ license number, social security number, birthdate, email, biometric data, sex and/or gender, race, household info, credit card info, financial, health, or location. It next asks a series of questions about the dataset’s present purpose, data minimization, provision of notice, third-party vendor contract terms, data security, and records retention schedule. The output of the threshold analysis is a recommendation to the respondent as to whether a privacy impact assessment will be necessary to evaluate the program. The threshold analysis, like the self-assessment, is a web form to be filled out by the project manager; it is filed to an internal site in SharePoint as documentation of the answers provided.
Step III: Privacy impact assessment. The third step of the Privacy Review Process is a privacy impact assessment (PIA); it is conducted on programs that use personally identifiable information and have been identified in the threshold analysis as representing higher risk. The project manager, privacy champion, or data owner create an initial draft of the PIA. The PIA asks for detailed information about the program, assessing the contractual terms, security measures, data collected, how data is used, and its retention period. The intent of the PIA is to compare the program to the city’s stated commitments in the privacy principles, for example, asking whether there is a means for data subjects to opt into or opt out of the dataset, or to correct inaccurate information. The Privacy Program Manager or other personnel then take this document and work closely with the project manager or data owner to refine the assessment; this investigative period usually requires a series of in-person meetings between the data owner and Privacy Program personnel. The content of the PIA depends on the salient qualities of the data collection program under review. The output of a PIA is a written report to the project manager documenting the privacy practices in place, and issuing privacy impacting mitigation recommendations where needed. In the long term, the intent of Seattle IT is to release its privacy impact assessments as open data.
Vendor Agreement.
Whittington et al. (2015) analyzed eighteen agreements between the City of Seattle and vendors that handle its data and found a wide variation in the terms governing data privacy, security, and accountability. Third parties are required to meet the same privacy principles that city departments are obligated to follow. As a result, the city has drafted model contracts for consulting engagements and third-party data-sharing agreements to include appropriate data privacy and security expectations. These are available to all departments considering data-intensive engagements with firms.
Surveillance Ordinance.
The first City of Seattle Surveillance Ordinance (SMC 14.18) went into effect in 2013. Its purpose was to provide transparency and oversight to the city’s increasing acquisition of specific surveillance technologies, such as cameras and drones. With public input, including active lobbying by the American Civil Liberties Union (ACLU), the statute was revised in 2015 to provide the council with the authority to develop an approval process for a broader definition of surveillance technologies. The new focus is on technologies whose primary purpose is to track and analyze the behavior and actions of individuals in a manner that negatively impacts civil liberties. This revised definition applies to all city departments; however, it primarily impacts public safety, transportation, and utilities, whose missions both provide needed services and regulate the public’s activities.
Figure 2.3 shows the criteria of a surveillance technology used in the review of surveillance ordinance. The Intelligence Ordinance requires that they be incorporated into the privacy review process. At the review intake stage, a set of questions are used to qualify certain technology acquisitions as surveillance. Surveillance technologies to undergo council review and ordinance approval for their purchase, deployment, or continued use. For certain categories of technology acquisitions, privacy review is required by default at the purchase request stage.
Information Dissemination
Information dissemination within the scope of governance of privacy occurs mainly through the public disclosure request process and the open data program.
Public Disclosure Request.
In the State of Washington, the Public Records Act (RCW 42.56) allows for the clear majority of public agency records to be disclosable in response to a specific request. The City of Seattle estimates that it receives between 200 and 250 public disclosure requests (PDR) each week, over 12,000 annually. Each department hires staff to handle. There is careful consideration about what can be disclosed in response to each such request. To comply with the Public Records Act, requested records may only redact or exempt attributes that are explicitly exempted from disclosure under the law, such as the home addresses of city employees, children’s information, and personal information for individuals receiving some services associated with welfare. A summary of information exempt from disclosure under the law is available online via the Washington State legislature. Recognizing the impact of these requests on state and municipal government, in 2017 the Washington State legislature passed two bills relevant to public disclosure, the previously discussed HB 1594 and HB 1595.
Open Data Program.
Since 2016, all new datasets pushed onto the Open Data Platform, data.seattle.gov, undergo the Data Privacy Review Process as described above. Open data is published through the release process by an open data champion in consultation with the open data manager in Seattle IT and the data owner. Most prospective open datasets are not about individual people – for instance, data that would be useful for home buyers such as green building data and which properties use underground gas storage tanks. In these cases, privacy self-assessment is used to document the fact that the dataset does not contain personally identifiable information. In some cases, datasets that become open data require a thorough privacy review and consultation. Two such datasets are from the police department: “officer use of force” and “officer involved shootings”, both of which document incidents in which police officers used force or discharged weapons. These datasets were released as part of a transparency and accountability initiative within the police department.
Privacy Program personnel also advise open data champions and the open data program manager not to include “foreign keys” or other attributes in a dataset that could be used to link the dataset with another one. This is a measure to avoid the “mosaic problem” – the capability to combine disparate datasets on common attributes, which makes it more likely that the persons represented in anonymized datasets could be re-identified. In a seminal study, Harvard researcher Latanya Sweeney (2000) could uniquely identify 87 percent of the US population using only three attributes: date of birth, gender, and zip code.
Ostrom (Reference Ostrom2011) classified rules-in-use into seven categories: boundary rules, position rules, scope rules, choice rules, aggregation rules, information rules, and payoff rules. Boundary rules define the number and attributes of the participants. Scope rules identify the potential outcomes that can be affected and the actions linked to specific outcomes. Position rules establish positions in the situation. Choice rules articulate choice sets or actions that actors in each position may, must, or must not take. Aggregation rules define the level of control that an actor exercises in a position. Information rules delimit an actor’s access to information or define what information should be held secret. Payoff rules describe the rewards and punishments as a result of certain actions or outcomes. Table 2.3 summarizes the categories of rule used in each action situation. Two immediate observations emerge from Table 2.3. First, compared with other action situations which mostly rely on punishment or cost as deterrents for actions, vendor agreement is a more market-driven governance form that utilizes both punishment and rewards under the payoff rules. Secondly, the public disclosure request and the open data program as alternative forms of information governance may have weaker control over shared information as they have fewer rules compared with other action situations, and thus they may be more vulnerable to privacy attacks.
Privacy dimension | Action situations | Categories of rules-in-use | Descriptions of rules-in-use in the action situations |
---|---|---|---|
Information collection & processing | Vendor agreement | Boundary rules | Who has access to the data |
Position rules | The role of public agencies and the vendor | ||
Scope rules | Purposes of data collection | ||
Choice rules | Intended uses of data | ||
Aggregation rules | Control over collected data | ||
Information rules | Access to the information of data collection and processing | ||
Payoff rules | Rewards for fulfilling the agreement and punishment for violations | ||
Surveillance ordinance | Boundary rules | The geographic boundary of the proposed surveillance technology | |
Position rules | The role of public agencies | ||
Scope rules | Purposes of data collection | ||
Choice rules | Intended uses of data | ||
Aggregation rules | Control over collected data | ||
Information rules | Access to the information of data collection and processing | ||
Payoff rules | Punishment for violations | ||
Privacy impact assessment | Boundary rules | The geographic boundary of the proposed project | |
Position rules | The role of public agencies and other involved parties | ||
Scope rules | Purposes of data collection | ||
Choice rules | Intended uses of data | ||
Aggregation rules | Control over collected data | ||
Information rules | Required training and access to the information of data collection and processing | ||
Payoff rules | Punishment for violations | ||
Information dissemination | Public disclosure request | Boundary rules | The requested dataset |
Position rules | Not applicable | ||
Scope rules | Purposes of data request | ||
Choice rules | Intended uses of data request | ||
Aggregation rules | Not applicable | ||
Information rules | Not applicable | ||
Payoff rules | Cost of the data request | ||
Open data program | Boundary rules | The geographic boundary of the data | |
Position rules | Personnel involved in the privacy impact assessment | ||
Scope rules | Not applicable | ||
Choice rules | Not applicable | ||
Aggregation rules | Not applicable | ||
Information rules | Not applicable | ||
Payoff rules | Not applicable |
Outcomes and Patterns
This subsection summarizes the three patterns discernible at this time. Other patterns may emerge as time goes on, or perhaps in relation to additional shifts in technology, internal organizational changes within the municipality, or the municipality’s relationship with firms engaged in permitted activities, firms acting as vendors, and city residents.
Positive Feedback Loop
The original policy and office of privacy set up a feedback loop within the city’s organizational structure, which reinforced the purpose of the new institutional rules under development and in action. Table 2.4 illustrates the institutional feedback loop with the development of the city’s Privacy Program from 2015 to 2019. As the Privacy Program matures, the privacy practices, policies, and processes become more institutionalized, emerging from unstructured and reactive practices into more formally defined governance rules and cultural norms of the organizations. Besides the organizational changes, the Privacy Program has reinforced its technical capacity by adopting new tools, such as the implemented Privacy Review and Risk Management Tool by OneTrust, the Data and Survey Demographic Data Collection Playbook, and If-Then Planning Tool for IT Project Reviews and extending its scope to integrate systems, such as credit card purchases that were previously not covered by the review process. The If-Then Planning Tool is a privacy recommendation tool created by Orrick and the City Attorney’s Office to identify action items and risks mitigations prior to their privacy review to decrease the privacy review process time (Privacy Office 2018).
2015 | 2017 | 2018 | 2019 | ||
---|---|---|---|---|---|
First created | Ad hoc | Repeatable | Defined | Managed | Optimized |
The program was created with six principles to provide guidance and tools for city employees when working with personal information. | Unstructured approach where privacy policies, processes, and practices are not sufficiently defined or documented. Privacy management is mostly dependent on initiatives by individuals rather than processes. | Privacy is viewed as a compliance exercise and the approach is largely reactive with some guidelines. There is limited central oversight of the privacy policies, processes, and practices, with siloed approaches between units. | Privacy policies, processes, and practices are defined, comprehensive to meet business needs, and are consistently implemented throughout. There is a holistic and proactive approach with widespread awareness. | Privacy is embedded in the design and functionality of business processes and systems and is consistent across the agency. Well-defined governance and oversight structures exist. | Privacy is viewed as a strategic initiative with a clear agency culture of continuous improvement. The agency is viewed by stakeholders and the public as a leader in privacy management, introducing innovative initiatives to meet their needs. |
A city-wide data privacy and information security training is foundational to the city’s Privacy Program. Included with other mandatory training courses for new employees, data privacy is a top priority for the city’s leadership. The training was deployed in late 2016; employees received reminder emails until they had completed the training. As of March 2017, 92 percent of all 12,000 City of Seattle employees had taken the training (Privacy Office 2018). Some departments, such as Seattle police, have 98 percent compliance with the training. As of October 2017, the training course has been required to be completed annually by all city employees. Completion is tracked through an automated training system and managers are held accountable for their employees through performance review metrics. The training materials were developed over six months via a collaboration with a private partner specializing in online training management systems. After authenticating their ID through an online portal, employees can access a thirty-minute interactive training. When personnel do not have access to a computer (e.g., stage hands for events at the Seattle Center), they are sent the key points of the training as a paper document to their homes. Some training is customized for the needs of certain personnel, such as the City Light service fleet, which handles unique data types.
Privacy Governance Consolidation and Scope Expansion
The Privacy Office of the city gradually became a locus of consolidation for the privacy review of data-intensive technologies and activities. This is a sign of maturity of the system of governance, and may be considered a source of efficiency, even as it can be considered an expansion of the scope of its work across programs appropriate to privacy concerns. This includes reaching back into some of the more challenging tasks that such an office may face, such as the assessment of existing data for potential privacy concerns.
Many of these patterns are evident in organizational changes, or changes in roles and responsibilities noted above. This is also evident, however, in the growth and types of privacy reviews undertaken by this office. Table 2.5 shows the number of privacy assessments undertaken by the type of privacy review. Overall, contracts with vendors, acquisitions, and IT projects receive the most assessments. Besides, an increase in the number of assessments for acquisitions and contracts was observed from 2017–18 to 2018–19. Whereas there could be multiple factors leading to such an increase, from a transaction cost economics perspective, the observed trend can illustrate the effort of gradually providing more safeguards by the city for these two types of activities since they have higher complexity and privacy risks than others.
2017–18 | 2018–19 | Total 2017–19 | ||||
---|---|---|---|---|---|---|
Assessments | Percentage | Assessments | Percentage | Assessments | Percentage | |
Acquisitions | 153 | 14.87% | 229 | 40.18% | 382 | 23.89% |
Contracts | 225 | 21.87% | 191 | 33.51% | 416 | 26.02% |
IT projects | 324 | 31.49% | 90 | 15.79% | 414 | 25.89% |
Other | 257 | 24.98% | 30 | 5.26% | 287 | 17.95% |
Survey/form | 19 | 1.85% | 19 | 3.33% | 38 | 2.38% |
Open data | 51 | 4.96% | 11 | 1.93% | 62 | 3.88% |
Total | 1029 | 100.00% | 570 | 100% | 1599 | 100.00% |
Table 2.6 shows the number of technologies reviewed by the surveillance ordinance in the city department. It is worth noting that of all 912 technologies, only eight were determined to be surveillance technology. Table 2.7 lists the eight technologies. While Seattle City Light (the city’s electricity company) and the IT department have the largest number of technologies reviewed, the police department has the most surveillance technologies (six out of eight). The vast difference between the number of reviewed technologies and the number of surveillance technologies shows the use of the surveillance ordinance as not only an assessment tool for a large collection of technologies, but also a screening tool that can dramatically reduce review time by narrowing down to a small selection of technologies for detailed assessments.
Department | 2017 Q4 | 2018 Q1–Q4 | 2019 Q1–Q4 | 2020 Q1–Q4 | 2021 Q1 | Total |
---|---|---|---|---|---|---|
Seattle City Light | 28 | 40 | 48 | 39 | 8 | 163 |
19.31% | 17.47% | 17.52% | 19.02% | 13.56% | 17.87% | |
IT dept. | 45 | 41 | 41 | 28 | 4 | 159 |
31.03% | 17.90% | 14.96% | 13.66% | 6.78% | 17.43% | |
Seattle police dept. | 14 | 38 | 31 | 21 | 5 | 109 |
9.66% | 16.59% | 11.31% | 10.24% | 8.47% | 11.95% | |
Seattle public utility | 15 | 23 | 27 | 26 | 10 | 101 |
10.34% | 10.04% | 9.85% | 12.68% | 16.95% | 11.07% | |
Transportation dept. | 5 | 21 | 29 | 19 | 6 | 80 |
3.45% | 9.17% | 10.58% | 9.27% | 10.17% | 8.77% | |
Citywide | 0 | 7 | 16 | 16 | 7 | 46 |
0.00% | 3.06% | 5.84% | 7.80% | 11.86% | 5.04% | |
Other | 38 | 59 | 82 | 56 | 19 | 254 |
26.21% | 25.76% | 29.93% | 27.32% | 32.20% | 27.85% | |
Total | 145 | 229 | 274 | 205 | 59 | 912 |
Department | Reviewed items | Year | Quarter |
---|---|---|---|
Seattle police dept. | SmartForce‚ BulletinWizard for Retail Theft | 2017 | 4 |
IT dept. | Seattle IT Visitor Registration System | 2017 | 4 |
Transportation dept. | Seattle’s Safest Driver Competition Mobile App | 2017 | 4 |
Seattle police dept. | Body-Worn Video Program | 2017 | 4 |
Photo Enforcement Program | 2017 | 4 | |
UFED Premium Software Upgrade | 2019 | 2 | |
Black Bag Forensic Software | 2019 | 3 | |
Seattle Justice Center Interview Room Camera Replacement | 2021 | 1 |
City and Public Interaction and Public Attitude Change
The work of the city has elicited increasing interest and participation from the community as it has delved into matters of community concern, such as the uses of technology by the police force and department of transportation, which raise concerns about civil liberties. Table 2.8 shows the number of public meetings held by the different departments or groups from 2018 to 2020. Prior to 2019, the public meetings were only used for public comments on the surveillance technologies and were held by the department that was responsible for the introduction of the new technology. In 2019, regular monthly meetings were held by the Surveillance Advisory Working Group, which includes members from both the public and private sectors, academia, and communities. Besides meetings on newly acquired surveillance technologies, there were also public meetings designed to raise the awareness of information privacy among the public, such as the Data Privacy Day and public workshops delivered in the Seattle public library.
Departments or groups | 2018 | 2019 | 2020 |
---|---|---|---|
Police dept. | 3 | 0 | 0 |
Transportation dept. | 2 | 0 | 0 |
Fire dept. | 2 | 0 | 0 |
Surveillance Advisory Working Group | 0 | 10 | 3 |
Seattle privacy office | 0 | 1 | 2 |
IT department | 0 | 0 | 2 |
To explore the attitudinal change toward privacy among the general public, we examined the 2013 and 2018 Technology Access and Adoption Surveys of the City of Seattle. The surveys were conducted by the City of Seattle IT department to learn about residents’ use of and attitude toward information and communication technology, such as computer and the Internet, cable TV, and mobile phones. Table 2.9 presents the number of respondents with and without privacy concerns over high-speed internet stratified by age and income group. Overall, only 32 percent of respondents expressed privacy concerns in 2013 while 70 percent of the respondents expressed privacy concerns in 2018, which indicates a significant increase in the awareness of privacy among the general public. In terms of demographic differences, baby boomers and the middle-income class ($25–75K) had the highest percentage of respondents with privacy concerns in both 2013 and 2018. However, millennials (aged 22–37) and the highest income group ($100K+) showed greatest increases in privacy concerns from 2013 to 2018.
2013 | 2018 | ||||
---|---|---|---|---|---|
Age | Respondents without privacy concern | Respondents with privacy concern | Respondents without privacy concern | Respondents with privacy concern | % Increase of respondents with privacy concern |
Millennials (aged 22–37) | 625 | 206 | 290 | 504 | |
75% | 25% | 37% | 64% | 156% | |
Gen X (aged 38–53) | 498 | 243 | 416 | 843 | |
67% | 33% | 33% | 67% | 104% | |
Baby Boomers (aged 54–72) | 467 | 325 | 287 | 925 | |
59% | 41% | 24% | 76% | 86% | |
Income | |||||
<$25K | 205 | 108 | 194 | 385 | |
66% | 35% | 34% | 67% | 93% | |
$25K to <$50K | 195 | 110 | 136 | 357 | |
64% | 36% | 28% | 72% | 101% | |
$50K to <$75K | 244 | 125 | 121 | 348 | |
66% | 34% | 26% | 74% | 119% | |
$75K to <$100K | 233 | 112 | 135 | 295 | |
68% | 32% | 31% | 69% | 111% | |
$100K+ | 596 | 220 | 527 | 1062 | |
73% | 27% | 33% | 67% | 148% |
In summary, the city’s Privacy Program has seen significant growth with more structured institutional design, expanded scope of work, and more active public engagement. However, since the program has only been established for five years, most of the observations are only short-term patterns while the long-term patterns and outcomes still need to be examined in the future.
Conclusion and Future Research
This chapter suggests the extent to which municipal governments, through the case of the City of Seattle, can evolve systems of governance to address the external effects of the technology it deploys, and to do so in constructive iterations with the public it serves. The GKC framework provides an organizational mapping tool and a structured narrative that helps to break down the complex interactions and rule configurations within the systems and allows for the comparison between governance systems of urban data and technologies. Through the analysis, this study found that there are two aspects that are unique to the governance of urban data, which requires a different treatment in the research design compared with studies of conventional natural resources. First, the study of urban data governance requires a broader array of action situations to be covered as any one of many possible actors or action situations could be identified as weak links in the effort to govern data for privacy and undermine the collective effort of the municipality to preserve citizens’ privacy. Second, it is important to examine urban data governance through a longitudinal perspective due to the rapid change of technology and evolution of related laws and policies.
Ostrom (Reference Ostrom2011) distinguished the concepts between framework, theory, and model. A framework defines the boundary of the studied system and maps each component within the system. Based on the framework, a theory proposes the relationship between the selection of the components and the outcomes of the systems. A model focuses on a more specific issue and tests the hypotheses generated from the theory. This study demonstrates the effectiveness of the GKC as a framework that maps the systems and policies of information governance for data privacy in the City of Seattle. A potential direction of future research is to build on this study and further examine the effectiveness and efficiency of alternative forms of governance based on the TCE theory. In particular, one hypothesis from the TCE is that transactions involving more asset specificity carry increased risk to one or more parties to the transaction and possibly third parties in the case of externalities, which calls for more safeguards and possibly hierarchical governance to minimize ex post transaction costs. In the context of privacy, information with higher privacy risks would thus require stronger rules and enforcement characteristics for their governance, where transaction risks can be either qualitatively measured using the “contextual integrity” approach (Nissenbaum Reference Nissenbaum2004) or quantitative measured by k-anonymity (Sweeney Reference Sweeney2002) and governance forms can be categorized by the seven categories of rules-in-use (Ostrom Reference Ostrom2011) or incentive intensity, administrative controls, adaptation, and contract law (Williamson Reference Williamson2000).
Introduction
Governments around the world have embraced technology deployments intended to make cities, communities, and governments smarter. Yet the wealth of data that governments and communities gather is often left in databases or is simply released as files in an online portal. The commitment to this simplified portal-based data sharing has led some to question if governments have data management plans to assist in deciding what data should be widely open and publicly available in a highly connected, always available online world where data is easy to collect, but difficult to delete or challenge.
To examine approaches to managing government-held data in a structured and detailed way, this chapter will apply the Governing Knowledge Commons (GKC) framework in order to analyze open government data (OGD) as a knowledge resource (Frischmann, Madison, and Strandburg Reference Frischmann, Madison and Strandburg2014). Specific to our case, we argue that a framework that considers data as belonging to digital commons of intellectual and cultural resources is an appropriate and useful tool to consider open government data. As Divya Siddarth and E. Glen Weyl (Reference Siddarth and Glen Weyl2021) point out:
Data lends itself especially well to a commons framework: both inputs and impacts are fundamentally shared, distributing access to these resources provides a foundation for further bottom-up innovation and technological progress, siloing or privatizing these erodes the possibility of stewarding collective benefit. Together, they form a shared layer necessary for economic growth and democratic participation.
They go on to argue for a paradigm shift in the way we think about technology and economic power:
Many core technologies increasingly resemble resources like air, water and a habitable earth – resources that are expected to be accessible to, and managed in the interest of, all members of society. In other words – a commons-based resource. Taking inspiration from the Ostromian approach, shifting our view to see technology as a digital commons could begin to create more flexible, responsive and regenerative systems to build and deploy technology.
Digital data is an essential part of the technology ecosystem. It became a ubiquitous essential resource that drives economic and technological development and, as such, it needs to be managed in the interest of all. In some instances, data likely should be considered digital public goods, non rivalrous and non excludable resources that can be created and distributed without becoming depleted for free or at a low cost (Digital Public Goods Alliance n.d.; Greenstein Reference Greenstein2013). To reconsider data and related technologies in the hands of governments as digital public goods that belong to the commons, the conversation must address the challenges and responsibilities of collecting and managing OGD.
Many of the digital public good conceptualizations are already captured at the federal level in the United States. For example, the federal circular entitled “Managing Federal Information as a Strategic Resource” summarizes:
Federal information is both a strategic asset and a valuable national resource. It enables the Government to carry out its mission and programs effectively. It provides the public with knowledge of the Government, society, economy, and environment – past, present, and future. Federal information is also a means to ensure the accountability of Government, to manage the Government’s operations, and to maintain and enhance the performance of the economy, the public health, and welfare. Appropriate access to Federal information significantly enhances the value of the information and the return on the Nation’s investment in its creation.
Simply put, data and information are valuable for many reasons in the government’s use of smart and innovative technologies. In addition to driving technological innovation, open data can increase governmental accountability and efficiency; it is one of the drivers behind the move toward improved decisions, equity, and progress. Yet much of the data held by governments remains in a troubling state. With unfunded mandates, open data portals do not receive sufficient attention beyond simple technical implementation. One might argue that government’s commitment to “open data” is largely a failure, particularly when it comes to achieving impacts beyond access to information (Attard et al. Reference Attard, Orlandi, Scerri and Auer2015).
Open Data and Open Government
Underlying the idea of open government there is a long-standing, well-researched tradition surrounding the right to know as a US constitutional principle (Schudson Reference Schudson2018). Even though the principles of open government often need to be reconciled with certain information restrictions, US administrations tend to publicly uphold and even expand their commitment to these principles. As President Obama wrote in his Executive Order “Making Open and Machine Readable the New Default for Government Information”: “My Administration is committed to creating an unprecedented level of openness in Government. We will work together to ensure the public trust and establish a system of transparency, public participation, and collaboration. Openness will strengthen our democracy and promote efficiency and effectiveness in Government” (The White House, Office of the Press Secretary 2013).
The goal of this and other similar policies is to promote economic growth and to improve trust by opening government workings to scrutiny. As can be inferred from the executive order, the first commitment to building an efficient government is to open up access to its information and data that are used in decision-making. The technology community has also recognized the value of data in government transparency and citizen services innovation, and, as such, produced the Open Government Data Principles that state that government data should be (“The 8 Principles of Open Government Data” 2007):
1. Complete, i.e., data that are not subject to valid privacy, security, or privilege limitations must be made available.
2. Primary, i.e., data are published as collected at the source, not in aggregate or modified forms.
3. Timely, i.e., available as quickly as necessary to preserve the value of the data.
4. Accessible, i.e., available to the widest range of users for the widest range of purposes.
5. Machine Processable, i.e., reasonably structured to allow automated processing of it.
6. Non Discriminatory, i.e., available to anyone, with no requirement of registration.
7. Non Proprietary, i.e., available in a format over which no entity has exclusive control.
8. License-Free, i.e., not subject to any copyright, patent, trademark, or trade secret regulation.
The transition to a new view of data that acknowledges its entrepreneurial value and views security and privacy as crucial elements of federal government information occurred in 2016 when the US government formalized the commitment to data being managed as a strategic resource. The circular “Managing Federal Information as a Strategic Resource” stated that openness and interoperability increase operational efficiencies, reduce costs, and increase public access to valuable federal information and that making federal information discoverable, accessible, and usable “can fuel entrepreneurship, innovation, and scientific discovery that improves the lives of Americans, and contributes significantly to national stability and prosperity, and fosters public participation in Government” (Office of Management and Budget (OMB) 2016, emphasis added). In this way, the turn from open and transparent to unlocking the value of data while ensuring its privacy and security began formally to take shape.
The shift, however, required some additional considerations of how data should be managed and handled, especially in the area of entrepreneurial activity, as this area includes a much larger group of actors viewing, using, and potentially exploiting the data. For example, the federal government should formalize the expectation of the entity considering and protecting an individual’s privacy throughout the data or information life cycle. The choice of the term “lifecycle” is an important choice as privacy and data experts speak of lifecycle as a much broader conceptualization of data than storing data in a database.
The data lifecycle is a high-level, general framework that describes how data can flow through an organization or from the point of its collection or creation to dissemination and use. Like living organisms, data can be seen as being born (created) and then going through evolution (changes) up to its archiving or disposal. In general, the lifecycle includes such stages as collection, storage, sharing, analysis, preservation, and, if needed, deletion (Ball Reference Ball2012). Viewing data through the lens of the lifecycle framework helps to maximize the benefit of data, minimize its cost, and improve its quality.
Lifecycle thinking helps to break down the processes of working with data into stages and understand what types of activities and resources are needed at each stage. As such, the database as a tool for storage and retrieval is a small portion of what must be considered. Moreover, managing throughout the lifecycle encompasses not only the data or information itself, but also the related resources, such as people, technology, finances, and so on (Kouper et al. Reference Kouper, Tucker, Tharp, van Booven and Clark2021). The lifecycle approach is further amplified with explicit attention to privacy and security: “While security and privacy are independent and separate disciplines, they are closely related, and it is essential for agencies to take a coordinated approach to identifying and managing security and privacy risks and complying with applicable requirements” (Office of Management and Budget (OMB) 2016, emphasis added). While one can imagine coordinated to mean agencies working together to develop a unified approach, it also should be read as demanding agencies understand how to manage risks of aggregated data. For example, while an owner of one dataset may claim that individual names and addresses will not do harm and are fine to be collected and shared, once that information goes further through its lifecycle and becomes merged with other data, such as the data about police callouts, the result might cause massive harm. As such, “coordination” means entities working together to communicate risks and restrictions of an agency’s own data and to understand how their datasets are part of a repository full of datasets.
Moreover, “The Federal Government shall provide members of the public with access to public information on Government websites. This responsibility includes taking affirmative steps to ensure and maximize the quality, objectivity, utility, and integrity of Federal information prior to public dissemination, and maintaining processes for addressing requests for correction of information disseminated publicly” (Office of Management and Budget (OMB) 2016, emphasis added). The language in the quote envisions and places the burden upon the government entity as the owner of the website. Thus, the onus is on the government to monitor and ensure that the website meets the standards. In making decision in all these areas, “The design of information collections shall be consistent with the intended use of the information, and the need for new information shall be balanced against the burden imposed on the public, the cost of the collection, and any privacy risks” (Office of Management and Budget (OMB) 2016, emphasis added).
At the federal level, examination of the federal open government website can provide some insight into how these aspirations are accomplished and the challenges of managing data throughout the lifecycle. The data.gov website aggregates data from government publishers under the Open Government Data Act and the Federal Data Strategy (“About This Repository | Resources.Data.Gov” 2021). Its Data Governance Playbook acknowledges that effective data governance is foundational to leveraging data as a strategic asset and tasks every agency with identifying their data assets, developing a data management plan, and providing adequate data oversight and communications (“Federal Data Strategy: Data Governance Playbook” 2020). The task, however, is not easy to implement at the portal or repository level, where data comes from multiple sources and owners.
The data.gov website is a catalog that offers opportunities to download data from over 300,000 datasets in various formats. The catalog draws from federal, state, and local datasets, so it is a vast data resource that covers almost any aspect of government data gathering that can be imagined. There are seven broad categories, including agriculture, climate, and energy, and so on. In addition to hosting federal data, the repository also catalogs data from local governments, currently across ten cities, three counties, and eight states. Local governments, according to data.gov, should get their data on data.gov to make local data more discoverable and to show the international open data community how active the United States is in the domain of data (admin 2015).
The website brings together, organizes, and provides a search function for a broad array of datasets. For example, when examining the area of “older adult health,” users of the website can view various datasets and can then click to access them. In the instance of dataset “Examination of Resident Abuse in Assisted Living Facilities in the United States, 2011 (ICPSR 34575),” the federal website links to a repository that provides the dataset; in this case, the data comes from the National Archive of Criminal Justice Data (NACJD), and the website restricts access to the dataset to users who have completed a Restricted Data Use Agreement and obtained approval from an Institutional Review Board (IRB), an entity that ensures ethical conduct of research.
This brief search and review of the data illustrates that the website serves an important function, as it is a single source of search for government-gathered and government-funded data. As a catalog, though, data.gov is not responsible for the stages of data collection, description, analysis, and even, in most cases, preservation of data. It is a search and discovery interface that links to other websites. As such, one can infer that the rules of creating, accessing, and using the datasets are all controlled by the dataset hosts/sources. To assess the quality and utility of data, one must follow the links, visit the websites, and read about their procedures of data management and curation.
This design style of a centralized catalog seems to have been replicated in many other locations, across levels of government. Many US states have followed the federal example and implemented the open data initiatives, although they did so in their own way, leading to vastly different state websites. The policy landscape is also vastly different at the state level as less than half of the states, namely seventeen, have a solid public policy foundation (“State Open Data Laws and Policies” 2021).
For example, in 2014, the Illinois General Assembly unanimously passed a bill (Illinois, Public Act 098-0627) intended to increase transparency, accountability, and savings in government by establishing a new State Open Operating Standard. As a result, the State of Illinois built its Open Data Portal at data.illinois.gov, which, according to its About page, “lets you find data across state, find facts about your state, lets you create maps and graphs, and lets you freely download the data for your own analysis. Many of these datasets are updated daily, and some even more often.” The website provides instructions to those who are new to open data in a user guide, but the guide is only a basic description of how the site works and the technology behind it. Thus, the website uses CKAN, an open-source data management system that provides the functions of describing and storing resources (datasets), searching for them using keywords and categories (metadata), and registering and managing user accounts. As the guide points out, an account and login are usually not needed to search for and find data but is needed for other data lifecycle functions: datasets can be created, edited, etc. by users with the appropriate permissions.
As such, the website seems to create a three-tier system of interacting with data throughout its lifecycle. Anonymous use of the first tier allows one to view and sometimes download the data without registration. Creating an account enables the second tier with limited and likely specific use, such as uploading data. The third tier requires both registration and additional permissions, most likely, to edit other datasets and perform other functions. In addition, and similar to the federal website, the datasets are typically owned by the organization that created the dataset.
Each of these websites provide a rich set of examples that illustrate gaps in the basic design and decision-making that went into the creation of the “open” government data websites. These gaps point to the collective action dilemmas that can be conceptualized as lack of cooperation between individuals, communities, and government entities in the areas of open government data. The dilemmas concern the creation, dissemination, and use of the datasets. For example, how did the dataset host determine the contents of the data and which portions of it should be open and widely available? Who or what government units were responsible for creating the data? Was there community engagement as part of this decision-making process? Moreover, if someone’s data has been inappropriately included, is outdated, or otherwise is inaccurate or capable of misuse, how would individuals and/or others update or otherwise challenge the data? Finally, as open data concerns both the individual/community well-being and the functioning of civic society, what are the plans for long-term curation and preservation of these datasets, including their migration to newer technologies?
Both open data and open government are key for advancing government transparency and entrepreneurship, but the transparency does not seem to extend to decision-making in the creation and sharing of the datasets. Within the Illinois data portal, for example, dataset ownership is attributed to an organization, and it is mentioned that a portal may have more than one organization (State of Illinois Data Portal 2020). “Organization” here is merely a type of account with administrative rights that allows government departments to create and publish datasets and authorize others as contributors. It seems that individuals would need to find out who is behind such an account and interact with each organization that holds the original data (prior to it being added to the larger dataset), which is a short-sighted design decision as it demands individuals reach out to every individual organization to have data edited, updated, or removed.
The actual website provides no description of the mechanisms of data gathering, or the decision-making that goes into what data is “open” and how to challenge such a decision. There appears to be no mechanism for an individual to ask for the removal or correction of information. There is also no description of how the government can be made accountable for providing incorrect data. Lack of such information makes it unclear how to connect openness and participatory governance structures. On one hand, individuals “contribute” the data through their interactions with the government and its record-keeping; on the other hand, government compiles those records and, through digitization and aggregation, creates a new product. Data undergoes multiple transformations, and yet there is often limited citizen participation in the decision-making about the data itself or its aggregation.
The expectations of what is being done with data vary across the stages of data collection, sharing (publication) and access, and across the actors that are involved in managing data. If the community is not involved at the stages of data collection, but their data is now open for all to use, what are the implications for management and governance of this digital resource? And, finally, what are the obstacles to sustainable sharing and cooperation around government data? The remaining sections of the chapter take on these issues, beginning with a broad description of the knowledge commons governance models.
Knowledge Commons Framework and Open Government Data
Aligning the general knowledge commons framework outlined in Governing Knowledge Commons (GKC; Frischmann, Madison, and Strandburg Reference Frischmann, Madison and Strandburg2014) with a generic framework for open government data and data portals created as part of this movement generates insight into both frameworks. Doing so makes sense if one considers public data held by various levels of government as a public good generated by co-creation that yields public value (Foster and Iaione Reference Foster and Iaione2015; Jetzek, Avital, and Bjorn-Andersen Reference Jetzek, Avital and Bjorn-Andersen2019; McBride et al. Reference McBride, Aavik, Toots, Kalvet and Krimmer2019). As was mentioned earlier, data is an evolving entity that goes through various lifecycle stages, including creation, processing, analysis, dissemination, and even destruction. The data used in an OGD portal could be collected by and from individuals and separate organizations prior to being digitally aggregated. The digitization and aggregation create new knowledge resources, which can exacerbate the collective action dilemmas mentioned earlier. Following GKC terminology, we introduce three action areas that correspond to the three stages of the data lifecycle: (1) collecting the data, (2) publishing data as digitized aggregated material on a portal, and (3) accessing and using the data.
The first action arena, collecting data, encompasses collective action at the beginning of the data lifecycle. It involves multiple participants (actors), among them citizens and government officials, who have various levels of agency, responsibility, and power. Governments collect data as a means to keep records of government activities, such as notes from committee meetings and financial planning. Government also collects data as a function of being an organizing entity of citizens. In most of these situations, citizens are required to submit their data to a government for various reasons: registering to vote, securing a driver’s license, purchasing or renting a property, etc. Government employees also play a role in the creation of data through research, observation, and statistics gathering (Boettcher and Dames Reference Boettcher and Matthew Dames2018).
Within this action arena government entities have been collecting records and data for considerable periods of time. Many of these datasets existed in analog form and have been converted to digital form. This presents new challenges, especially in the area of citizen-generated data as data collected in analogue form might be presumed by those providing the information to have been less “open” and under specific rules of access. Data collection continues in parallel with the next action arena, data publishing, as governments continue requiring data from citizens and third parties create information from citizens’ devices to make their own software more user-friendly. In case of the latter, data collection often occurs without the citizens’ knowledge or consent.
The second action arena concerns the publishing, sharing, or dissemination stage of the data lifecycle, which could range from a simpler storage and discovery model to more complicated models that involve processing, analytics, and visualization (Shah, Peristeras, and Magnisalis Reference Shah, Peristeras and Magnisalis2021). This arena requires an examination of actions around how and when to publish government data. First, it is important to recall that government-generated data comes, broadly, into the following categories: data generated by the governmental entity as a byproduct of its activities, data gathered by the government about its citizens and their activities and/or need to interact with the government entity, and data collected as part of the research and statistics efforts of the government. The latter category is the most documented area of actions, and it is mostly overseen by the federal government, so it is not examined in detail here.
For the first two categories, some local governments are trying to be active and transparent about how their publishing decisions are made, or at least they try to provide guidance for those who wish to publish data. For example, Nashville Open Data Portal contains links to resources that could be useful for data publishers and for the public that wishes to know how this local government makes data publishing decisions. The links, however, were defunct at the time of writing this chapter as they did not lead to any relevant materials, such as a data policy or open data documentation.Footnote 1
Through open data policies that encourage citizen contributions and feedback, citizens could become involved in the decision-making process of what data should be published, how, and when. The existence of such policy creates a commitment to comply with the federal and state laws. Unfortunately, in many instances of creating an OGD portal, as briefly described earlier and explored further below, the design process seems to omit citizen cooperation and participation. Instead, legislators and government officials constitute the prime co-creators of such portals through incentivizing or enforcing portal creation efforts. Third parties, whether nonprofit or commercial, also become participants in OGD portal creation, as these organizations provide governments with the technological infrastructure to support these portals.
The third action arena involves data access and use. The protocols for accessing data differ according to whether the data is available in analog format or on a data portal. If the government data is only available from an analog source, those who request the data must often go in person to the appropriate office and fill out forms declaring who they are and why they need this information. There are thus greater obstacles that must be overcome to acquire these public records.
Digitally accessible data, such as the data on an OGD portal, is often much easier to acquire. Anyone can create an account, which they can use to access records that are available online. Such ease of use gives the impression that greater access to data means that greater numbers of citizens can participate in this action arena. This would be true, of course, if participation is measured only in terms of the numbers of people who access digitally accessible OGD and view or download the data. This narrow understanding of participation excludes creation and meaningful use of data. Disregarding meaningful use shifts the narrative of the OGD commons toward data access and dissemination (passive availability) rather than use (active co-production).
Bloomington Open Data Portal
To further explore the issues discussed above, we use the city of Bloomington, Indiana, and its open data portal as a case study. As we consider open data portals to be an instantiation of digital commons, we assume that its design and governance would support cooperation and community participation and at least some forms of communal ownership, co-creation, and use. To test our assumptions, we use this specific case and apply the GKC framework and its concepts and guiding questions to understand the actions around the portal and their patterns and outcomes. Our approach involved a close examination of the portal (extensive search, browsing, reading, and note-taking of the portal materials until no new information could be inferred) and a data analysis of one dataset. For the latter, we downloaded the data and performed an exploratory analysis and visualization using R statistical packages.
We selected Bloomington for several reasons: it is an example of local government being committed to open data; it has a community of entrepreneurial, technology-driven actors; and its citizens are drawn from a large university environment, so they are somewhat predisposed to be proactive about data creation and use. Moreover, selecting a local OGD portal allows us to compare local initiatives with federal and state efforts. As was shown earlier, federal data websites often serve as aggregators of datasets, with data collection and ownership being distributed among many organizations. The city, on the other hand, gathers the most specific, local, and granular data, which makes it an important actor in government-driven data gathering. At the city level the rules of data collection, ownership, and access may be different. In this section we examine such local government actor and its data gathering: the Bloomington Open Data Portal B-Clear.
To accomplish this goal, we use the previously identified three action arenas as OGD moves throughout its lifecycle: (1) collecting data, (2) publishing data, and (3) accessing data. The GKC framework prompts us to ask questions about the background environment of this data commons, the attributes of this resource, governance strategies (including relevant action arenas) for managing the data commons, and patterns of interaction and outcomes that emerge from each action arena. To build this case study, we first address the background environment and attributes of the resource. Second, we move to governance strategies, by examining a specific dataset, “rental properties” from B-Clear as an example of the types of granular attributes one might find in a data resource. Finally, we make observations about governance strategies and patterns of interaction and outcomes.
The City of Bloomington and the Open Data Portal
The city of Bloomington, Indiana, is in south-central Indiana, roughly fifty miles southwest of Indianapolis. About 85,000+ people reside in Bloomington, including the students who attend Indiana University, making it the seventh largest city in Indiana by population numbers (Cubit 2021). The data collected and published by the city of Bloomington reflects these demographics and biophysical attributes. For instance, nuisance complaints and bus rides increase during the academic year, making relevant data a more prominent part of the portal homepage.
We consider the open government data portal in Bloomington to be a type of data commons. The background environment for this commons is comprised of three layers: (1) the city of Bloomington, (2) the city’s Information & Technology Services Department (ITS), which manages and oversees digital and smart initiatives, and (3) the city’s open data portal, B-Clear, which ITS created. Similar to the websites briefly described earlier, it is not apparent on the portal or in another easily accessible space what policies guide the creation of the portal, its structure, the definition of “open,” or the mechanisms that would allow citizens to challenge the “open” nature of data. Looking through the metadata of the datasets within the portal, one can determine the entity that uploaded various datasets, which is ITS administration’s individual employees. Thus, one can reasonably assume that the ITS implements policy, and ensures compliance with applicable law, but this is not explicitly stated anywhere.
Since the data portal is open, any member of the community can use it, including local politicians, city employees, citizens, and others who seek to ensure government is operating in a transparent manner. However, the use statistics or any other metrics of engagement are not part of the portal. One can assume, for example, that the city and its employees use some of the data to make decisions about city services and function of government. The data could also be used in an entrepreneurial manner by those who seek to do business with the city or desire to use city-driven data to build apps, provide services, or otherwise build business environments based on data-driven decision-making – for example, Lyft ride rentals and Lime scooters. The evidence of these uses did not appear in the searches that we performed. The only active use of the data that was supported by our examination was the analysis that local journalists have done on topics relevant to life in the city (Askins Reference Askins2021).
Open Data Portal and Its Data Resources
B-Clear, which stands for “Bloomington Clear, or Be Clear about what we’re up to,” was launched in January 2016 at data.bloomington.in.gov as part of the mayor’s campaign to “open the doors of city hall, both figuratively and literally” (Banta Reference Banta2016). The portal was envisioned to be a source that can inform and educate people about what is going on in city government. As the website explains: “B-Clear is a one-stop place to build an ever-growing assembly of useful data. We’re organizing it as open, accessible data so everyone can see and use it and manipulate it.”
It is important to note two distinct functions of the Bloomington Open Data Portal. First, the portal seeks to provide transparency and openness of governmental (and similarly situated) institutions, such as city finances, zoning commission, and others. And second, the portal opens access and use of data generated by government entities during the performance of their functions. This directly matches the federal commitments mentioned above, namely, to be a place of open data for the purposes of both governmental transparency and entrepreneurial use.
Yet this dual commitment creates difficulty in establishing boundaries and defining communities. Who is the producer of the data resources, the government or the individuals who supply their records? Similarly, the data can be used by many actors, including government employees, citizens, businesses, and so on. The absence of a well-defined user community and clear boundaries between the producers and the consumers poses a difficulty for defining the rules of access and sharing and evaluating contributions. On one hand, federal and state mandates require local governments to provide open data to citizens; on the other, citizens who use the data, especially for business and entrepreneurship, may become, in a way, “free riders” who pursue their selfish interests rather than cooperate for the public good. Citizens can also become “selfish” by withdrawing their information from the records pool and reducing the quality of aggregated data.
So far, this emerging conflict has been a potential rather than an actual conflict as it is not clear how much the data that is being published is actually used. Lack of understanding of the portal users and communities did not deter the city employees from publishing in the portal. As city departments pooled together their data resources, using the shared infrastructure of B-Clear, each city department was expected to contribute to the portal by making some data available. As of June 22, 2021, the B-Clear portal hosted 219 datasets, ranging from transportation to government finance.
The main page of the B-Clear website provides information about the datasets and the organizations contributing datasets (defined as CKAN Organizations or accounts discussed earlier). A “group” function allows for the grouping of datasets and information about the portal into categories. A separate google document linked from the website provides information about “what makes a good dataset” and how to upload data. Absence of user and community consideration is evident in the absence of information for citizens whose data is captured in the various datasets. There is no explanation of the data shared, the authority determining publishing rules and restrictions and no means to request changes or challenge wrong or inaccurate data.
Moreover, an examination of the portal reveals that some datasets “available” on B-Clear are hosted or managed by third parties, adding a third layer of control and participation. For example, the CityProtect service links to an external website (cityprotect.com.map) that aggregates and creates visualizations of crime in Bloomington, IN. The visualization, with some level of granular data, is available without any barriers or agreement to terms of use, or other commitment to use of the visualization or the data contained, including locations of specific crimes. The information for visualization is updated regularly, using data gathered by the Bloomington police department.
A Specific Dataset: Rental Properties
Background
Examining the history of one dataset that was made available through the B-Clear data portal allows us to understand the granular nature of a data resource in an OGD portal. The dataset is called “Rental Properties,” and it includes information about all properties that have been registered with the city of Bloomington as rental properties. The dataset was created in 2018 and is regularly updated. In April 2020, we downloaded the rentals dataset to conduct an exploratory analysis and examine what kind of information the dataset contains and what implications, both positive and negative, it may have for citizens.
The portal interface provides an activity stream for each dataset, which allows any person to see who uploads and updates the datasets. The rentals dataset was created by a user, whose account information does not provide many details. We know the name of the user, but there is no other information, such as their work position, job title, or contact information. At least five other people besides the author have been involved in updating the dataset. These people are primarily associated with the ITS department of the city of Bloomington, although some are also associated with Indiana University. A search on the Internet revealed that at some point these users were students from the university who worked as interns or participated in the “Summer of Code” program organized by the city.
Dataset Attributes
The dataset is available in a comma-separated (CSV) format. It contains twenty columns that provide information about properties, their owners, and agents who might be representing the owners, including the following: (1) property id, (2) date the property was registered, (3) date the rental registration permit expires, (4) address of the property, (5) owner’s name, (6) owner’s address, including street address, city, state, and zip code, (7) name of the agent representing the owner, (8) agent’s address, including street address and zip code, (9) the type of building where the rental property is, (10) type of the property itself, (11) number of bedrooms, and (12) maximum occupation allowance. Many columns (fourteen out twenty) have gaps in data, leading us to conclude that the rental property owners who filled out the forms described were probably not required to provide all the information. Anyone who visits the portal can download the full dataset and access all this information.
In what follows, we discuss the Rental Properties dataset in the context of the three action arenas: (1) collecting the data, (2) publishing the data, and (3) accessing the data.
Action Arena One: Collecting the Data
In examining the action arenas, we seek to consider: “(1) participants (who may be either single individuals or corporate actors), (2) positions, (3) potential outcomes, (4) action-outcome linkages, (5) the control that participants exercise, (6) types of information generated, and (7) the costs and benefits assigned to actions and outcomes” (Ostrom Reference Ostrom2005, 32–68). In the analysis here we focus on participants and discuss outcomes in a separate section below.
The participants in our first action arena, “collecting the data,” are property owners who own rental property in Bloomington, the city of Bloomington, and the city’s department of Housing and Neighborhood Development (HAND). Most of the information in the “Rental Properties” dataset is collected from property owners. These property owners, excluding Indiana University, are required by the city of Bloomington to register their property with HAND, which also inspects the property. The data is collected through the registration process, as every property owner has to fill out a registration form available on the city website. Thus, most of the information in the “Rental Properties” dataset is provided by property owners through the registration process.
To register, property owners use a fillable PDF form, with mandatory fields for the property address, owner’s name and address, and some additional information about the property. Neither the form nor the website contains statements about whether the owner’s information is intended to be publicly available. There is no description of mechanisms by which owners can request that this information not be shared. Thus, it seems that control over this information lies largely in the hands of the city.
Rental property procedures are governed by Title 16 of the Municipal Code, which states that the owner is required to register their rental unit (City of Bloomington IN 1998, Ordinance No. 21-32:16); such registration requires the owner to provide HAND with the name, address, phone number, email of the owner(s), and name of the in-state agent if the owner lives outside of Indiana. The title also contains a section “Disclosure” that prescribes the owner to keep this information current. Title 16 does not contain any other statement regarding the sharing of the owner’s name or address publicly or by request.
Action Arena Two: Publishing the Data
The same groups of actors participate in the second action arena, “data publishing,” but their actions concern the data that has already been collected and processed into a standardized digital format, e.g., a spreadsheet. At this stage of the data lifecycle, participants decide what data resources are published, when, how often it is updated, and who is responsible for data quality (i.e., accuracy, timeliness, and completeness). Guidance on these decisions is simply not found on the portal or any easily accessible city of Bloomington website.
It almost seems that both data producers and consumers are expected to be familiar with the various legal requirements and best practices that guide the decision-making, especially regarding critical and sensitive data. At the same time, as some of the data in this dataset may fit with the definition of “Protected Information,” it should not be openly available, as, for example, is determined by the city of Nashville’s TN Open Data Policy (City of Nashville TN 2016):
Data shall not be Open Data if it meets the definition of Protected Information. … “Protected Information” means, but is not limited to (i) all confidential or restricted information, as defined in the Information Classification Policy, or any record or portion thereof, disclosure of which could be denied under the TPRA or other law; (ii) any record that contains a significant amount of confidential information … (iii) any record that reflects the internal deliberative or administrative process(es) of any Department … (iv) any record subject to privacy laws, or to copyright, patent, trademark or trade secret protection, or that are otherwise protected by law or contract.
While this is an incredibly broad and comprehensive policy, it is a perfect example of the wide-ranging concerns that can arise when publishing data in an open format. It also illustrates that open data policies are connected to other policies, such as the Information Classification Policy or the Tennessee Public Records Act (TPRA). Many publications contain data that can be personal or, in aggregate, can become highly sensitive. As such, one can argue that the widest of policies must exist when considering publication in an open environment. Such policies should not only define and regulate protected information, but also address the issues of potential harm from aggregation, loss of privacy, and terms that regulate user-generated content.
Moreover, it is important for policy-makers to understand that the citizens whose data is captured in the open data portal, sometimes long after the fact of providing the data to the city, may be surprised to discover that their data is shared in the portal environment. Much of the data “shared” was originally gathered in paper form. Aggregation of data in one place and ease of search and discovery are unintended uses of digitized record-keeping. As such, the decision should not be taken lightly.
Cost and effort of data publishing are two other considerations that need to be factored into OGD governance. Creating quality datasets is a time-consuming and often costly process, especially if the intended use is innovation, research, or other public benefit. The Rental Properties dataset is available in the spreadsheet format, which makes it easy to process with statistical packages. However, as will be shown later, it has missing data that affects its quality. Many of the other datasets on B-Clear are available as pdf documents. Such data is virtually useless and places the burden of conversion into the hands of those who seek to use the data. Without providing machine-readable data, the local government does not deliver on its second function – enabling analytical and entrepreneurial use.
Action Arena Three: Accessing the Data
The participants in our third action arena, “accessing the data,” are prospective renters, HAND, and any citizens who might be interested in this information. From a review of the website, it appears that HAND maintains records of all rental properties, so one can reasonably assume they control the data source in this action arena. Yet it is not clear what entity converts the paper forms into a spreadsheet. Prospective renters can also visit the department and look at the property files of landlords before they sign a lease. Reviewing the records allows prospective renters to verify the information that landlords presented to them and make sure the latter comply with the necessary city codes and regulations, including safety and occupancy codes.
Prior to digitization, a renter would have to fill out a request for a public records form, such as the one in Figure 3.1, and provide their name and address.
Like all other public records, including vital records and criminal records, real estate and property records requests are reviewed by the city, which decides whether the request is valid, and the data is disclosable.
Data digitation changes the dynamics in this action arena. Rather than accessing individual records, interested citizens can download the whole dataset and review information on multiple property owners at once. Moreover, in addition to specific rental properties, one gets information about the property owner’s address and all properties they own. Such information potentially enables uses other than a prospective renter inquiring about a property of interest. We discuss some of these issues in the “Patterns and Outcomes” section.
The rental properties dataset was noted to have been updated as of June 27, 2021 and was available for download without registration or other restriction. Usually, services that offer online searches for public records, such as Doxpop, a service for accessing court records (www.doxpop.com/prod/), require users to register before they can download data reports. Sharing digital records in an aggregated form as large datasets and making these datasets available for download without any registration, as B-Clear does, allows anyone to download the dataset and gain access not only to specific records of interest, such as a landlord of interest, but to all landlords and their personal identifiable information (PII).
The downloader does not have to provide any information about themselves, and HAND makes no determination regarding their ability to disclose the information. This situation suggests that while HAND maintains the records regarding rental properties, it (1) no longer directly interacts with the prospective renter and (2) no longer provides governance regarding the ability to disclose the information. In an OGD portal, control of this information seems to have transferred to the user of the information. There is no governance structure around this or other B-Clear datasets once it gets digitized and aggregated, and the user is free to use the resource however they wish.
Patterns and Outcomes
Like other data resources on the portal, the Rental Properties dataset provides a glimpse into city life. In this case, we can learn, for example, how many properties are registered as rentals in Bloomington (28,739) and where the owners come from (21,888 from Indiana, 3,938 from Illinois, and 756 from California).
The dataset may provide useful insights for residents, visitors, or business owners about the geographical distribution or density of rental properties. By geocoding the property addresses, one can create a map that shows which zip code areas have a higher concentration of rental properties. As Figure 3.2 shows, the most concentrated area of rentals is at the bottom right corner of Bloomington with the zip code 47408. Not surprisingly, Indiana University is located near that area, and the students who attend the university are one of the largest groups of people that rent houses and apartments in Bloomington.
Aggregate mapping, such as discussed earlier, represents a positive implication of having rental property data available. More disturbing implications may come from the fact that the dataset contains personal identifiable information about the property owners, including their names and addresses.
While many landlords in the dataset are companies, many are also individual owners. By analyzing the dataset one can identify the biggest individual owners in Bloomington as well as learn about small property owners, where they live, and where their properties are located. We discovered that while several individuals own hundreds or thousands of properties in Bloomington, the majority of owners are small-scale landlords who own between one and five single-unit houses, condos, or apartments.
The detailed PII information provided in this dataset creates multiple risks for individual owners. The name and address data can be combined with other data and analyzed in a way that increases the risk of exposing other PII, such as gender, race, workplace, family members, and other descriptors). Learning about their data being exposed in this way, rental property owners may not trust the city government and be less willing to provide data to it. One pattern that we see is that some owners try to protect themselves and their information by providing a PO Box address rather than their street address. This may be because they know they can no longer exert control over their information once it is in the city’s databases, and they may not agree with the city’s governance of such highly sensitive personal information. Rental property owners may be viewed as wealthy and become targets for crimes, especially when their home addresses are publicly available.
Governance of the Bloomington Open Data Portal
Our analysis of the datasets and how they are presented on B-Clear reveals that the city of Bloomington at some point identified open data as one of their priorities, requiring the city’s departments to share their data to the portal. Historically, some data has policies that govern its sharing, such as city budgets and spending but other data either has no such policies or has policies that were designed for the analog record-keeping based on the “walk-in” scenario for requesting the data. A citizen who is interested in the data walks into the city hall and submits the request. A clerk provides a specific record or records that have been requested and no other data is made available.
Governance of open data on B-Clear exists in bits and pieces rather than as a systematic set of rules and policies that all participants have developed and approved. For example, for the “City of Bloomington Annual Compensation” dataset, there is the following statement: “I agree, pursuant to Indiana Code 5-14-3-3(f), that any information, including the names and addresses of government employees, that I obtain by viewing, printing and/or downloading will not be used for commercial or political purposes.” For some datasets, such as “Cities and Towns Report,” the metadata provides information about the reasons this dataset is made publicly available (although the latest data is from 2015): “Per IC 5-11-1-4 every municipality and local government is required to provide electronically and, in a manner prescribed by the state examiner, financial reports for the fiscal year not later than sixty days after the close of the fiscal year.”
Yet the majority of datasets contain no restrictions whatsoever. For example, the rental property data, the guns reported stolen, the domestic battery data, the hate crimes data, and the SarsCoV2 Data 2020, to name a few, are all available without restriction or limitation. The “About” page on the B-Clear portal contains several sentences that declare B-Clear a “one-stop place” for an ever-growing assembly of useful data. The sentences capture the city of Bloomington’s priorities of openness, but one can argue the governance of the portal and the Bloomington Open Data website are incredibly lacking, especially in terms of transparency of process, accountability, and citizen engagement in the process. Moreover, the use of the phrase “our data” on that page is troubling as it seems to be inclusive of individual record information owners, but both the term and its underlying concept (referent) must be challenged and redefined in the context of citizen participation and contributions in data gathering, curation, and publishing.
Data Dump without a Lesson Learned
In 2018, Bloomington began tracking opioid overdose deaths online through the portal and created a visualization map that pinpointed, very precisely, the addresses of overdose victims within city limits. After a loud outcry by many in the community, the city of Bloomington altered the dataset to remove “specific addresses” and justified the existence of the dataset by stating:
Sharing this public data about where these deaths are occurring can help give those working toward a solution more tools to help those who are suffering … At the same time, we understand that this is a sensitive issue. As we have done since launching the site, we will continue to consult with those on the front lines of this crisis to follow best practices.
The actions and the following statement are another example of the lack of governance-oriented thinking that appears in the B-Clear Portal. If the data needs to be made available to those who are working toward solutions, it does not have to be open to everyone without restriction. For example, the city could restrict the open publication either to those who registered or to those whose accounts have specific permissions. The city could even require entities to specifically request access with a permissible use explanation. None of this was done. Similar to the rental properties data, the city seems to have decided to act in the spirit of complete openness without fully considering the consequences of its actions for all participants.
This data upload had a public reaction, but many other datasets that have not yet been combed through, integrated with other data, and visualized, remain to be open for any use or misuse. As the inference is that citizens are given no input at the earlier stages of the data lifecycle, they may find they will be unprotected in the future from aggregated datasets.
Conclusion
Like many communities across the United States, Bloomington seeks to adhere to the federal commitment of providing “the public with knowledge of the Government, and to ensure the accountability of Government” and to maintain the public trust by establishing a system of transparency, public participation, and collaboration. While a lot of knowledge in the form of open data has been provided, one can argue, the later parts of this commitment, namely, public participation and collaboration, have been lost in many communities.
As the chapter has briefly highlighted, some communities have embraced collaboration and trust as essential elements of “open” data, government or otherwise. But it appears more needs to be done. Viewing government data about the citizens as a digital data commons is one important step in the right direction that would allow open data portals to codify their policies and collect, publish, and access data with a purpose. It would allow local governments to prioritize which data is most important for which community and how the public and businesses can best use government data as a shared resource. If trust, participation, and accountability are the hallmarks of the government commitment to open data, then they must be embedded as implementing principles of the data portal creation, maintenance, and governance.
Of course, this is an important first step, but more knowledge must be brought into the data that exists on the data portal as well. Digitized data, both when converted and when created in digital form, is within an infrastructure (portal) that enables sharing, aggregation, and use of additional services. Every decision about all of the data must be made with this in mind. As local governments get involved in data aggregation and sharing, they cannot ignore the fact that they, the citizens, and all other participants become part of a global digital platform that is more and more pervasive in everyday life.
Viewing OGD as digital data commons will also enable local governments to collaborate with other stakeholders and draw on the expertise of legal and other scholars who have been studying governance of knowledge and other resources. Multiple governance models have been discussed by scholars that prioritize participation, innovation, or control (Janowski, Estevez, and Baguma Reference Janowski, Estevez and Baguma2018). Exploring these models in the context of OGD will enrich both the policy debates and the collective action around digital platforms and data commons. It will help to create mechanisms of safe and useful releases of open government data without encouraging selfish or harmful behavior (Yakowitz Reference Yakowitz2011). Finally, none of this works without full citizen participation during the entire data lifecycle, including decision-making around user- or citizen-generated content and the ability to access and use data. A digital data commons framework will help the governance become clearer as it will engage the citizens and government to work together in a way that benefits all.