Knowledge for All Universal Citation Index
A Proposal for the Global Library Community
Prepared by Amanda Stevens and Mark Leggott
June 24, 2010
Academic libraries are finding it increasingly difficult to provide users with sufficient access to scholarly journal literature due to unending increases to subscription costs of indexes and journals. Many academic libraries have reached a tipping point as a result of continuing and sometimes inappropriate increases in subscription fees for these products. Fortunately, current and evolving technology provides the opportunity to create alternative and open modes of access to scholarly literature and decrease reliance on expensive commercial products.
The open access publishing movement has provided scholars with an alternative venue for publishing their research and ensuring it reaches the broader public, while other projects have focused on providing alternative discovery tools to commercial scholarly indexes. These include MEDLINE, HighWire Press, ERIC, and CiteSeer. In addition, we have seen the success of collaborative knowledge projects like Wikipedia, in which the efforts of thousand benefit millions. These projects demonstrate the viability of building an open access alternative to citation-only databases like the Web of Science, alternatives in which open source software tools could be used to create a robust system for searching bibliographic citations of journals while partner institutions could collaborate to provide indexing expertise and rich content. Now is the time to leverage the strength of the crowd for the benefit of all.
There are still significant limitations to subject-specific indexes of scholarly literature, namely that users must still use multiple tools to find all literature on a given topic. Instead we need a single open access tool that would allow one to search all of the world's scholarly journal literature, including that which is published commercially as well as via open access, and in all subject areas. Google Scholar aims to be this tool, but it has not yet replaced commercial databases due to its limited search capabilities and the ever-present cloud of a future Google business model that does not provide open access.
Therefore, we propose to engage the international academic library community to collectively create a universal citation index to all of the world's past and current scholarly journal literature. The tool will be accessible to all via the web and will be called Knowledge for All. The benefits of such a tool are undeniable:
- Libraries would gain control over what content they could access and how to decrease their reliance on expensive proprietary products.
- Researchers and all users anywhere could access the available literature and be certain they are using the most current and comprehensive research.
- Developers could create new ways of accessing and repurposing the world’s academic literature, via an open and accessible data/software framework.
Institutions would save thousands of dollars on database subscriptions, which could be reallocated to collections and services.
This proposal outlines how the tool could be developed and populated with content and how the project would be funded and maintained.
The Citation Index Database
Using an appropriate open source software suite, an open source citation index will be developed that provides a comprehensive set of desired features, including searching, browsing, citation analysis, the ability to annotate and share data, links to author authority records, and links to full text articles. Knowledge for All will utilize controlled vocabularies for each broad subject area and records would be assigned subject terms from controlled vocabularies such as LCSH and MESH. Journals themselves will be classified into broad subject categories and users will have the option to limit their searches to specific categories.
Generating Content for the Citation Index Database
A significant amount of metadata for each record could be automatically generated through a number of possible means. An example would be using table of contents RSS feeds provided by many scholarly journals. A second option is incorporating metadata created by authors and encoded directly into articles. Automated citation indexing software, such as that developed by Stanford University for the CiteSeer project, provides another option. These tools locate publications on the web, extract citations, and store the content in a database (Lawrence, Bollacker, and Giles, 1999). Content could also be supplemented by free subject specific journal indexes such as MEDLINE and ERIC. The full range of options will be researched further. Metadata for journal articles is not protected by copyright because it is considered factual information, and thus harvesting this metadata using the above means would not violate copyright law. The Knowledge for All project will take care to respect licenses and intellectual property rights.
People will monitor and edit automatically generated content and also create original content for citation records, including adding missing metadata and doing subject indexing using controlled vocabularies. This work could be done by staff at participating institutions (or interested members of the larger community), wherein each participating institution would be responsible for indexing a certain number of journal titles on an ongoing basis. Quality control of this content could also be provided and shared by participating institutions.
In addition to indexing newly published literature, all past published literature will be added to the Knowledge for All database. Again, this content could be generated using a combination of automatic processes and human labor, and the human labor could be shared among participating institutions and in the context of a Wikipedia-like model of participation.
Resources Required for Content Generation
To estimate the resources required for this portion of the project, ten journals in a variety of subject areas were randomly selected and the average number of issues published per year, average number of articles per issue, and average number of citations per article were calculated. We then selected Web of Science as an example of an indexing system for comparison. The selected journals publish an average of 100 articles per year, and Web of Science indexes approximately 10,000 journals, thus Web of Science indexes 1 million articles per year. Articles average 37 citations, which means there is a total of 37,000,000 citations per year from the literature covered by the Web of Science.
We can then extend our calculations beyond literature indexed by the Web of Science and consider the resources required to index all of the world's scholarly journal literature. The precise number of scholarly journals currently published is unknown, but in 2004, Tenopir estimated there are 43,500 scholarly journals published per year (approximately 24,000 of those are peer-reviewed). If we apply the data collected from Web of Science journals here, we find there would be 4,350,000 journal articles and 160,950,000 citations per year to index.
If we look at a Wikipedia-style model for producing the data in a system like Knowledge for All we can also consider the human resources needed to maintain such a system. If the work of indexing all of the world's scholarly journal literature were divided among 500 institutions, each would be responsible for indexing approximately 87 journals; if divided among 1000 institutions, 44 journals; among 3000 institutions, 15 journals, etc.
Resources Required for Project Development and Operations
The first phase of the Knowledge for All project would be a three-year development phase in which the citation database tool would be developed and initial content generated. Staff needed during this phase would include a Project Manager, a team of software developers, a Content Manager to coordinate the collective generation of content among participating institutions, and a Community Liaison Coordinator to recruit and work with participating institutions. The Project Manager would report to a Board of Directors or Steering Committee composed of representatives from the participating institutions. Start-up costs are estimated at a total of $3 million for the three-year development phase. In this phase the IT infrastructure for the project would be provided by the University of Prince Edward Island as the host institution. As the project progresses, a long-term business model for a sustainable IT foundation would be developed, suitable for the project's size and scope. An example of one option would be to use a cloud computing approach, leveraging the existing resources of member institutions. We could also explore partnering with another collaborate resource sharing project such as the Scholars Portal in Ontario.
It is suggested that participating institutions fund the development phase of Knowledge for All through shared and equal contributions. For example, 200 institutions/organizations could fund the project by contributing $5,000 each per year. While a proposal like Knowledge for All would lend itself to short-term funding, starting immediately with a community-driven model of support and maintenance would allow us to begin quickly with a robust and sustainable approach and without a reliance on one-time funds.
Following the development phase, ongoing operations would require a Project Manager, Systems Administrator, Content Manager, and Community Liaison Coordinator. While further information about operational costs needs to be gathered, costs of ongoing operations are estimated at $300,000 per year. It is proposed that ongoing operational costs would be funded by a monthly or annual membership fee paid by participating institutions. Thus, the system would be free to all and a gift of the international library community. Once in full operation libraries could collectively save hundreds of millions of dollars, a portion of which, steered back into Knowledge for All would provide a long-term framework for the project.
This proposal was initiatied by the University of Prince Edward Island with the participation of the Council of Atlantic University Libraries and is being presented to key contacts, consortia and libraries internationally to gauge their interest and request further feedback. If sufficient support is gathered, a Project Manager will be hired immedoately to develop a business plan and secure participation and funding from institutions.
We encourage you to join us in ensuring that knowledge remains accessible to all and that libraries remain in the forefront of the effort to build a sustainable information ecosystyem that truly benefits all. Please let us know if you have comments or suggestions on this proposal, or if you would like to step forward and join the University of PEI as a founding member of the Knowledge for All project.
Knowledge for All Website now live at http://www.k4all.ca/
Cameron, Robert D. “A Universal Citation Database as a Catalyst for Reform in Scholarly Communication.” First Monday 2.4 (7 April 1997). http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/522/443
Lawrence, Steve, Kurt Bollacker, and C. Lee Giles. “Indexing and Retrieval of Scientific Literature.” Proceedings of the Eighth International Conference on Information and Knowledge Management, Kansas City, Missouri (1999): 139-146.
Liu, Mengxiong, and Peggy Cabrera. “The New Generation of Citation Indexing in the Age of Digital Libraries.” Policy Futures in Education 6.1 (2008): 77-86.
Panzera, Don, and Evelinde Hutzler. “E-Journal Access through International Cooperation: Library of Congress and the Electronic Journals Library EZB.” Serials Review 30.3 (August 2004).
Tenopir, Carol. “Online Databases—Online Scholarly Journals: How Many?” Library Journal (1 February 2004). http://www.libraryjournal.com/article/CA374956.html
Willinsky, John, and Larry Wolfson. “The Indexing of Scholarly Journals: A Tipping Point for Publishing Reform?” The Journal of Electronic Publishing 7.2 (December 2001). http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451. 0007.202