Seeing Urban Spaces Anew at the University of California
The digital revolution in the humanities, arts, and social sciences (HASS) is most definitely here. It has been slow and difficult in coming, for multiple, complicated reasons. Cyberinfrastructure (CI) is just starting to be explored in academic fields outside of high-performance computing. HASS scholars are finding ways to overcome impediments to their participation in CI, and are producing new knowledge by using advanced technology to pursue their research.1 This article will first lay out some of the obstacles these disciplines face in using technology, and will then profile two projects in the University of California system that exemplify what computational analysis can bring to HASS fields.
Compared to scientific disciplines, HASS have historically been “low tech” in their methodologies. So it can seem surprising that their technological needs for research, now that they are turning in this direction, pose significant difficulties for programmers. Data sets, if indeed the data exist as “sets” at all, are disparate, fragmented, and may actually be larger than those in the scientific fields. Structuring data for access and display is extremely complex. Technological difficulties abound: comparative analysis with respect to geographical location and temporal, societal, linguistic, and cultural aspects requires capabilities for creating hierarchically structured data with various levels of abstraction. The challenge lies in determining how best to aggregate, organize, and display the data and in developing the most beneficial tools for analysis. The final report of the ACLS Commission on Cyberinfrastructure and the Humanities delineates many of the deterrents that HASS needs present to efficacious CI:
Digitizing the products of human culture and society poses intrinsic problems of complexity and scale. [This cultural record is] multilingual, historically specific, geographically dispersed, and often highly ambiguous in meaning. . . . [A] critical mass of information is often necessary for understanding both the content and the specifics of an artifact or event, and this may include large collections of multimedia content. . . . [HASS] scholars are often concerned with how meaning is created, communicated, manipulated, and perceived [which further complicates programming for access and display]. Recent trends in scholarship have broadened the sense of what falls within a given academic discipline: for example, scholars who in the past might have worked only with texts now turn to architecture and urban planning, art, music, video games, film and television, fashion illustrations, billboards, dance videos, graffiti, and blogs.2
An equally important if not greater barrier to HASS participation in CI is the requirement for enormous cultural change. The first thing that may come to mind is faculty resistance. Most faculty members welcome the increased access to objects of study that digital technologies have made possible. But some, especially in the humanities, have not believed that advanced technology can make a real contribution to their fields, or they may fear an overemphasis on technology, i.e., that it is largely whistles and bells, ultimately distracting scholars and students from careful interpretive study requiring close attention to text, context, artifact, human communication/ interaction/ performance, and other objects of study. Resistance, however, is not the most significant impediment to the use of advanced technologies in these disciplines – plenty of HASS researchers are tech-savvy and excited about new tools for their research and teaching. The far more pressing obstacles concern data, collaboration, and funding.
What is HASS data? It is not “data” in the dictionary sense (“measurements or statistics,” Merriam-Webster parenthetically suggests), but the objects of study in these disciplines. HASS data is extremely varied and can pose many challenges to effective technological representation. Medievalists may look at illuminated manuscripts, tapestries, paintings, and ecclesiastical/court records. Visual Studies researchers examine film, paintings, photos, sculpture, and more. Those in the arts need these as well, plus recordings of music, dance, and theater productions. Philosophers need access to texts, in various languages, with comparative translations. Legal historians need case law, court and voting records. Social scientists’ needs may include not only studies that have produced quantifiable data but also audio and video records of subject interviews, etc. Languages ancient, early modern, and modern are fundamental to many of these disciplines, so that access to data can be restrictive without good mechanisms for translation. Old data may present difficulties because they are delicate or unwieldy to represent (ancient maps and texts, for example, did not originally appear in neat typeset book format). More modern examples may be under copyright, which entails another layer of complexity.
HASS data are often not reducible to numerical analysis and are thus characterized as “qualitative” rather than “quantitative.” This is perhaps what has produced a blind spot for computerized application. It is certainly true that robust computational storage and aggregation of HASS data can enable quantity to play an important role in interpretative, qualitative study. Steven Spielberg’s Survivors of the Shoah Foundation, now the Shoah Foundation Institute at the University of Southern California, “has collected more than 52,000 eyewitness testimonies [of Holocaust survivors] in 56 countries and 32 languages…” When subjected to computational infrastructure, this qualitative data becomes extremely valuable as “critical mass”:
“The tale of what happened to one or two families, in one or two villages, in one or two countries, is worth recording and disseminating. But we can gain far more knowledge from the record of some 52,000 testimonies. In history, art history, classics, or any scholarly enterprise that benefits from a comprehensive comparative approach, quantity can become quality.”3
But HASS scholars want to learn lots of different things from their data, and they do not always seek the numeric answers that are (stereo)typical of computational analysis. They may want to ask why or how something happens, or how it accrues its meaning. This poses a conundrum for HASS researchers and computer programmers alike. How can disparate types of data best be integrated and displayed? How can interfaces be developed that will be truly useful to research, representing genuine advances over other methodologies? What tools will be most beneficial to analysis, and how can they be developed?
Still another problem is that while there may be data, in many cases there is no data set. Simply collecting the data often requires enormous effort. Early initiatives in this regard have proven extremely beneficial for access. A well-known example is JSTOR, which collects entire archives of scholarly journals and represents them online in searchable form but exactly as they originally appeared in print, including advertisements, etc. Some of these journals have been in existence for over 100 years, so finding a complete archive in good condition that the owner is willing to have dismantled can be prohibitive. Integrating the data into searchable form through a user interface cannot happen until the data is prepared, which itself requires a large number of staff, including those who collect the archives, those who photograph them page by page, and those who index the contents of each issue. A recent comparable endeavor by a single scholar is that of Patricia Seed, a historian at the University of California, Irvine whose article appears in this issue of CTWatch Quarterly. She has traveled to the archives of libraries all over the world, single-handedly collecting all of the extant early maps of coastal Africa, which she is preparing to make accessible online and available for analysis. The maps have never before appeared collectively in books or other publications and represent a significant, previously inaccessible resource to historians.
Developing CI or advanced digital projects necessitates a kind of collaboration atypical of academic HASS culture. These disciplines emphasize and reward individual work. Collaborative teaching is not unheard of but is not the norm. Collaborative research can put one at risk of not receiving tenure, which is generally awarded for single-authored books. The topics of research are very specific, often unique to the individual researcher, which is the usual route to establishing oneself as a scholar. These structures have been in place for so long that, even if they were changed, many faculty would likely have a hard time knowing just how to divide labor and reconstitute it, as collaboration requires.
There is also the issue of collaboration with the computer science engineers (CSEs) who can realize digital projects for HASS. While there have been successful collaborations across these disciplines, there is also a cultural divide that can prevent productive work. This is new territory. In 2006, the University of California Humanities Research Institute and the San Diego Supercomputer Center offered the first CI HASS Summer Institute, which interfaced CSEs and HASS researchers to explore in depth the CI needs of HASS communities. The cultural divide between the fields was acknowledged by workshop participants, but more prominent was the strong desire to bridge it. Evaluations of CI HASS were unequivocal in expressing a need for this field interface in increasing CI capabilities and usership. Working together, CSEs and HASS researchers can think in new ways about what data is valuable and how it can be accessed for comparative analysis.
HASTAC (“haystack,” the Humanities, Arts, Science, Technology Advanced Collaboratory) has made considerable inroads in cybertechnology and HASS collaboration. The organization was formed specifically to bridge disciplinary structures that prevent necessary cooperation as well as networks scholars and practitioners from many disciplines and institutions both nationally and internationally. The proliferation of digital humanities centers suddenly springing up on campuses across the US has also promoted the collaboration with CSEs that is needed for successful digital HASS projects. This will undoubtedly increase as the centers and their projects gain more attention.
Lack of resources for HASS research in any form is a perpetual problem. The tide is starting to turn somewhat for digital HASS projects, which until recently were largely neglected by the traditional funders of these disciplines. The National Endowment for the Humanities (NEH), American Council of Learned Societies, Mellon Foundation, and MacArthur Foundation all have new initiatives funding digital media. NEH and the Maryland Institute for Technology at the University of Maryland recently held a summit meeting to plan a national coalition of digital humanities centers. NEH wants to encourage national collaboration between the centers and funding organizations, which is a hopeful sign.
Thus far, none of the National Science Foundation’s (NSF) programs have supported technological humanities or arts research. This is understandable, given that NSF’s mission is to support science and engineering and that its counterparts, NEH and the National Endowment for the Arts (NEA), were founded to support these disciplines. Unfortunately, the funding levels of NEH and NEA are grossly inferior to that of NSF and, in real dollars, are lower than they were in the 1970s. It would not be unreasonable for NSF to allocate at least some of its “broadening participation” funds to HASS projects utilizing advanced technology. This would be an effective way to broaden participation outside the traditional science and engineering sectors and perhaps to attract new converts to the science and engineering fields.
The T-RACES and Hypermedia Berlin projects described below exemplify the efforts needed for data acquisition and aggregation as well as the intense interdisciplinary collaboration that are crucial to successful HASS CI. Together, they have the potential to contribute to many HASS disciplines, including but not limited to history, urban studies, ethnic studies, anthropology, human geography, literature, linguistics, art and architectural history, musicology, philosophy, history of science, political science, and sociology.
These projects (as well as Patricia Seed’s African map project described elsewhere in this issue of CTWatch) have very different objectives, data, and interfaces, but they hold in common the need for similar technological solutions. Both utilize GIS, historic maps, and historical and cultural data spatially associated with the maps. Both also are taking digital HASS forward in that they not only provide increased access to data but have the potential to create new knowledge that would not be possible without these digital resources. New research methodologies are coming into being.
T-RACES: Testbed for the Redlining Archives of California’s Exclusionary Spaces, a collaborative endeavor of the San Diego Supercomputer Center (SDSC) and the University of California Humanities Research Institute (UCHRI), will preserve, analyze, and make publicly accessible online digital versions of historical documents relating to the practice of “redlining” neighborhoods in the 1930s and 1940s in eight California cities. The research will make use of the UCHRI HASS grid, a CI initiative to bring the benefits of advanced information technologies to the humanities, arts, and social sciences at all 10 University of California campuses. The project is supported by a National Leadership Grant for Building Digital Resources from the Institute of Museum and Library Services (IMLS).
Data for the T-RACES project comprise neighborhood maps, interviews, financial and banking documents, and detailed city surveys. Thus far, 11 large historical color maps and 5,000 pages of text are included. These documents are part of the “Confidential Residential Security Maps,” a national collection established by the federal Home Owners’ Loan Corporation (HOLC) for all major US cities. Signed into law by Franklin Delano Roosevelt in 1933 as one of several New Deal measures, HOLC initiated the practice of redlining. Frequently conceived as housing loan discrimination resulting from individual bias in the banking industry, redlining’s historic, federal origins and institutionally deliberate dimensions often go unrecognized. The confidential maps and associated secret City Survey Files, compiled by thousands of HOLC agents, reflect neighborhood desirability and loan-granting conditions in over 200 US urban centers. Four classifications were used. First, Second, Third and Fourth grades were coded as A, B, C and D, and Green, Blue, Yellow and Red respectively. Redlined areas are typically characterized by “detrimental influences, undesirable population or infiltration of it.” In southeast San Diego, for example, which was redlined by design in the 1930s, residents of two categories of neighborhoods (A and B) enjoyed preferential treatment of their loan applications and significantly lower lending rates, while the opportunities of others were severely restricted. La Jolla was almost entirely “green” (A) or “blue” (B). Its one “red” (D) section was known as the district’s “servants’ quarters” and was “set aside by common consent for the colored population.”4
As is often the case with technological HASS projects, obtaining the data has turned out to be a considerable challenge. The bulk of the collection of once-confidential (now de-classified) redlining files currently sits on the shelves of the National Archives in Washington, D.C., and can only be browsed in the research room there. It is also the first time project co-PI Richard Marciano, a computer scientist and director of SDSC’s Sustainable Archives and Library Technologies (SALT) lab, has dealt with paper data; in other projects, content was born digital. The human and financial resources needed to bring the data into the digital realm were grossly underestimated and necessitated some creative problem-solving for image scanning.
The digitized content will form a unique collection spanning eight California cities. This will include color maps and textual documents. The maps will be vectorized (image to lines and polygons) and the text will be OCR-ed (image to searchable text) using ABBYY FineReader 8.0. This will allow for searchable PDFs and databases to be created from the text with linkages to the maps. The data will be spatially and temporally geolinked, providing area comparison of past to present with simultaneous viewing of historic and contemporary maps.
The project is in its early stages, with Marciano and his team currently digitizing data and exploring all software involved, including open source GIS software servers and viewers, open source databases such as mySQL and PostGRES, Greenstone digital library software, and other tools. As no integrated toolkit exists, they are designing a framework that will work across the various environments of GIS servers, databases, grid technologies and digital libraries. Ultimately, the project will enable public desktop access to the historic records themselves with search and analysis tools “on top” as a user interface.
The archive and preservation technology is based on the use of data grid technology to manage distributed data. A central metadata catalog (MCAT) at SDSC, based on Oracle technology and capable of managing preservation metadata for tens of millions of electronic records, manages preservation metadata for each electronic record. The preservation metadata includes authenticity, integrity, and descriptive information about the electronic record. The data grid technology maintains consistency between the storage locations of the electronic records and the preservation metadata.
Through the HASS community grid in development at UCHRI, each participating University of California site has access to a separate preservation environment that allows them to define preservation metadata unique to their digital content, with their own structural organization. Each site controls access and updates permissions for its preservation environment independently of the other participants. Metadata administration is off-loaded to the central MCAT catalog at SDSC. Sites are able to leverage common software and hardware resources for the management of the data and metadata, which lowers their cost of participation.
Marciano was drawn to the project because of his interest in applying technology to the humanities and social sciences as well as his desire to unearth the traces of urban zoning in the US. He finds the collaborative aspects the best part of the project and looks forward to developing this further. “I would almost say that putting digital content ‘out there’ is minor, compared to the potential for interactions and discussions,” he says. “Technology creates opportunities for contact and dialog with scholars in related fields (history, black studies, planning, ethnic studies, etc.). Conversely, these interactions impact the development of the interfaces and more importantly the use of the technology. This is the biggest thing we want to learn: how to represent the information to enable ‘community’ and dialog.” Meetings of experts on the California cities are being held at UCHRI, convening scholars and professionals from many fields to collaborate on developing the project to its fullest potential.
Marciano is acutely aware of the need to design access and storage tools that open the use of the content, not restrict it. “In some sense it is the responsibility of the ‘digital curator’ to be true to the collection and not create new ‘digital’ barriers. The collection needs to drive the process, and as a computer scientist I feel I have a deep responsibility to nurture the content and keep it alive and relevant. Falling into the trap of creating a digital ghetto would be redlining the content all over again,” he explains.
He and co-PI David Theo Goldberg, director of UCHRI, expect eventually to extend the T-RACES resource to include the data of many more redlined cities across the US. It is anticipated that other historical, cultural, and legal documents can be overlaid as the project progresses. The project will result in increased knowledge through access to previously remote data, and will transform possibilities in HASS research and pedagogy. Furthermore, the PIs hope the project’s importance will extend beyond academia, with the availability of this content benefiting communities and neighborhoods. The federal implementation of redlining has had a lasting impact on the shape of American cities, the decline of urban cores, urban sprawl, suburbanization and racial segregation in cities. The knowledge to be gained from the dissemination and analysis of this content is significant to many constituencies.
Hypermedia Berlin is an interactive, web-based research platform and collaborative authoring environment for analyzing the cultural, architectural, and urban history of a city space. Founded and directed by Todd Presner, associate professor of Germanic Languages and Jewish Studies at UCLA, Hypermedia Berlin uses GIS technologies and a geo-temporal database to bring the study of cultural and urban history together with spatial analyses and modeling tools. Managing, displaying, and rendering data useful to researchers and teachers in fields as varied as history, urban studies, geography, architecture, and literary studies are central to the mission of the project.
The project is organized according to time-layers in which the uneven spatial and temporal coordinates of Berlin’s cultural and architectural histories can be apprehended. Traditional models of cultural history proceed chronologically and take the linearity of time as their structuring principle. By contrast, Hypermedia Berlin articulates Berlin’s time-layers through multiple detailed, annotated maps connected together by interlinking “hotspots” at hundreds of key regions, structures, and streets. The ability to “drill through” these maps functions to spatialize historical practice, thereby transforming cultural history into a kind of “cultural geography.”
Berlin is a highly stratified, complex space in which linear histories stop making sense. Over its nearly eight centuries, Berlin emerged from a backwater mercantile town built on sand to become the capital of a unified Germany under Bismarck and the site of Hitler’s dream for a world-dominant Germania. It was devastated by the Thirty Years War, occupied by Napoleon in 1805, rebuilt numerous times throughout the eighteenth and nineteenth centuries, destroyed in World War II, divided by the Berlin Wall for 28 years, and put back together again in 1990. Poised on the border between Western and Eastern Europe, this cosmopolitan city has variously welcomed and persecuted its minorities: Huguenots, Jews, Poles, Russians, Turks, and others. It doubled in size in less than a quarter of a century between 1890 and the outbreak of WWI, reaching a size of 4,000,000 people; another quarter of a century later, it lost almost half of its population and nearly all of its Jewish population in WWII and the Holocaust. The city’s complexities make it an excellent candidate for developing functionality and modeling for other cultural mapping projects. Hypermedia Berlin’s back-end systems architecture, the database, and the front-end user interface are all open, modular, and easily scalable to support new functionality and research.
The data “centerpiece’ of Hypermedia Berlin is a series of 50 fully annotated, geo-referenced maps of Berlin from 1237 (when the city was founded) up through the present. The project requires that historical maps be rendered in a format that supports interactivity, flexibility, and search functionality. It uses ESRI’s ArcGIS for geo-referencing the historical maps, and Google’s open-source map API for zooming and hotspot addition. MySQL and PHP are used for the geo-temporal database, and XML for the dynamically populated “intelli-list” and user interface. Viewers can move in and out of the maps, choose locations from a sidebar menu that is keyed to relevant “people” and “place” links, and “travel” both by time and place, diachronically and synchronically throughout Berlin’s history. The content can be searched, viewed, and organized according to the research and pedagogic needs of the user.
Hypermedia Berlin’s hallmark is its interactive, collaborative authorship of data content. The project’s cultural and historical data includes articles and encyclopedic entries. These “annotations” are typically authored by scholars. Capacities are being developed for an editorial board to collaboratively vet scholarly content, and for teachers and researchers to analyze or excerpt portions of Hypermedia Berlin by using a “citation” function that in turn becomes part of the resources for the project. Widespread data creation will result from community authorship annotation functions, which allow any user to upload “micro-annotations” in any media for a highly localized set of temporal-spatial coordinates in Berlin. For example, a user may annotate his grandparent’s apartment or place of business on a particular street, at a particular time. When Version 2.0 is launched, Presner anticipates storing and mining hundreds of thousands of these micro-annotations, amounting to several terabytes. The platform is being built out to support any kind of data and media. There will be large scale population/demographic datasets as well as text, video, audio, multimedia articles, etc.
The project is designed to be able to pull datasets and media items from other digital repositories as well as share data and media with other projects or repositories. Rather than imposing a single database schema or creating a centralized repository, the team is constructing middleware that would allow them to undertake queries or federated searches across repositories that store and share digital assets, potentially on the UCHRI UC-wide HASS grid. The philosophy is that once datasets and media items can be mined across repositories, new kinds of data slicings and visualizations will be created that scale the project in ways that cannot be conceived or delimited ahead of time. This can lead to new research and new knowledge unachievable without the technology.
Hypermedia Berlin’s considerable collaborative apparatus includes 11 members of the project team and an international advisory board. Those involved represent many disciplines, institutions, and skill sets. The project has attracted a diversity of supporters, some of whom are collaborators as well, including the American Council of Learned Societies, several UCLA constituencies, the Stanford Humanities Laboratory, CUNY-Baruch, and Berlin’s Hochschule der Künste.
There is a growing recognition that effective CI cannot simply be produced by one field for utilization by other fields but that its very conception and production must be interdisciplinary: “No single academic discipline or point of view is sufficient to comprehend all the implications of cyberinfrastructure.”5 If CI participation is relegated to science and engineering fields, then important questions are going unasked, research is not being pursued, approaches not realized—not only in other disciplines, but in computer science engineering itself. HASS scholars bring a different dimension to research questions: not how to accomplish X, but why; who benefits, who doesn’t, what are the latent implications? Computer scientists working with HASS researchers can think in new ways about what data is valuable and how it can be accessed for comparative analysis. Bringing these fields together in the creation of CI is essential for preparing current and future generations of scientists, scholars, and educators for its use and design, and in training broader, more diverse constituencies for its expanded utilization.
2 Our Cultural Commonwealth: The Report of the ACLS Commission on Cyberinfrastructure and the Humanities, 2006, p. 18-19 (25-26 in pdf). www.acls.org/cyberinfrastructure/OurCulturalCommonwealth.pdf
3 Our Cultural Commonwealth: The Report of the ACLS Commission on Cyberinfrastructure and the Humanities, 2006, pp. 15, 19 (22, 26 in pdf). www.acls.org/cyberinfrastructure/OurCulturalCommonwealth.pdf
4 HOLC Division of Research and Statistics with Cooperation of the Appraisal Department, San Diego, October 20, 1936.
5 Berman, F., Brady, H. Workshop on Cyberinfrastructure for the Social and Behavioral Sciences: Final Report, 2005, p. 11 (9 in pdf). www.sdsc.edu/sbe/