TECHNICAL AND CHRONOLOGICAL HISTORY
Development, Decision Points, and Directions
We've debated this for a while, and as a team, we've decided it would be good to keep a record of our technical decision points, development issues, and directions we have taken and will continue to take at the University of Arizona. Needless to say, these are exciting times at Zona and at the Libraries.
CONTENT FOR THE IMMEDIATE FUTURE…. Within two years, UAiR is projected to include as many as 420,000 digital assets (with associated metadata and abstract records) and as many as 750,000 independent metadata and abstract records. The University of Arizona Libraries has a wealth of digital assets, most of which will wind up incorporated in the Commons in some fashion. Collection by collection, we plan to upload our digital collections, journals, docs, books, images, and audio–and there is a mess of audio.
STORAGE CAPACITY AND FAILOVER…. Our current storage capacity is approximately 39 TB, mirrored within the Tucson area. While this may seem like quite a bit, we are already wondering how long this will hold out. The bulk of this space, 24 TB (mirrored), arrived in March 2008 and was installed shortly thereafter. Our mirror is designed for fairly effortless failover should anything go wrong locally, and we're still hammering out some of the kinks in the pipes.
- UPDATE: Price negotiation allowed us to purchase more storage space than we originally planned, which is good.
THE NEXT FIVE YEARS…. Through the next five years, UAiR may well become one of the nation's larger repository systems. Then again, it might not. But we like to dream big, and as such, UAiR design considerations have not only included budgets, systems, and architecture, but also the need for simplicity, sustainability, and adaptability, what we call SSA.
PERSONNEL…. Nick Jury was named program manager and tasked with the design, implementation, and management of the university's institutional repository. In October 2007, Patrick Barabe was hired as the lead technician, programmer, systems guru, and general wizard for the project, then named “Digital Commons.” We would include their resumes, but they're pretty boring. Actually, their resumes are included in the top ten home remedies for insomnia. Update: Patrick left the UAiR family for greener pastures in August 2008.
In August 2007, Jie “Canny” Yao was hired as a graduate assistant. Canny developed our first search engine, which was an incredible blend of Lucene and our own mishmash. Canny completed her GA in May 2008.
In February 2008, the project employed Ginger Bidwell, an undergraduate at the University of Arizona. Ginger designed the original exhibits for UAiR and has designed the interfaces for our Beta release (September 2008).
UAiR is being built with minimal personnel on a tight budget in a rapid-development fashion.
April 2007
- University of Arizona Libraries' personnel formally selected Fedora as UAiR's repository system. Through the following six months, UAL personnel studied the Fedora system and designed a rapid-prototyping methodology that would allow us to deploy Fedora components and to identify a common content model (data model) that would simplify the myriad requirements we were facing.
- For those investigating the possibilities of your own repositories…. We talked to a bunch of folks who had taken the leap into repository land. Everyone who had successfully established and released their repositories were consistent in the things they said. Here's an overview.
- Do your best to define the high-level or short-term scope of your repository. What will you repository store? Who is your audience? How do you need to expose your digital assets to meet your customers' needs? Who are your customers? There are a couple dozen more questions along these lines. Tell a story. Sit down and tell yourself the story of what you want to see when you're done, even if it's only phase one. “I want a common portal on the Internet through which people can search for and browse through our collection of rare images.” Excellent! You just set a goal. Set the goals. Ask yourself what it looks like, when you can get it done, and how. And when you have all this, then ask yourself one more question: Does this really make any sense? The answer(s) to this last question might startle you.
- Get the data structure right. If you get the data structure right, most of the larger issues become managable if they don't outright evaporate. In our case, we chose to focus on a single, Dublin Core-based data structure. Everything in its place without regard for the type of object. To this, we chose to minimize the complexities of our content models. For the most part, our storage of journal data looks a whole lot like our storage of image data which looks a whole lot like our storage of random digitized objects. Of course, there are a couple hundred other considerations here, but the best decision we made (and stuck to) was to avoid exceptions at all costs. Exceptions require special coding or configuration and special management and tracking. The more exceptions you have, the more complex your system becomes and the more unmanageable it becomes. Get the data right, and the rest will fall into place. (Thanks largely to Tim Lynch, Cornell)
- Create your repository in small chunks. This is more or less the same as saying, “Don't bite off more than you can chew.” To this, we have added….
- Keep your eye on your goals. Don't ever forget the goals. If you forget where you're going, you might just compromise something critical in your data structure. You might accept an exception in your coding techniques that could have been resolved if you had accepted a delay of 4 issues.
- Finally, don't listen to anyone else! Is this just about the best twist of irony you've seen all day? It's true. This is your repository. This is your digital world. No one but you can define how it works and what it looks like. Just you. There are dozens of people with opinions and suggestions, but the bottom line is you and you alone have to create the “story” of your repository, share the story so others can understand, and then craft the story. Most of you could easily plug into UAiR and use it. And even if you did, your repository would look and function differently than ours!
July 2007
- Nick Jury is selected to lead the digital library effort.
October 2007
- First code.
- Patrick Barabe is hired as lead programmer.
November 2007
- Fedora's FOXml standard is deployed in our testing environment.
- Fedora's PID methodology is integrated into the UAiR test environment.
December 2007
- UAL personnel select the name “Digital Commons” for the University's repository system.
- Research into Fedora's search functionality leads to the decision that the system is far too complex for Common's needs. We found the fact that Fedora developers created a diversely configurable search functionality that on one hand was too arduous for our needs and on the other hand did not fit well with considerations for SRU and RSS and ATOM. We began in earnest the development of our own Lucene-based search functionality.
- A single data model is developed and released for UAiR.
- A content model for journals is developed. (We continue to characterize the content models.)
- Contractual obligations require the release of UAiR. We would have preferred withholding public release until October 2008, but….
January 2008
- Facing extensive Web development for the repository and for exhibits that will use repository assets, we researched and decided to integrate Omeka with UAiR. Omeka is an open source LAMP-based environment for the development and deployment of Web exhibits, the product of the Center for History and New Media, George Mason University. We found the development in line with our own Web development–PHP-based, object oriented architecture, employing MVC (model, view, controller) technology.
- In an effort to ease requirements for federation and to help many of our depositors, we chose to deploy RSS 2.0 and SRU. We hope to release detailed technical specifications on our implmentations and the use of these as well as ATOM in the summer of 2008.
- The Libraries considered and rejected the use of Drupal. While there is nothing inherently wrong with Drupal as an application and we could see many uses for it, we are also in the process of limiting the number of applications our network and server personnel have to manage and maintain, and Drupal is just another one of the many. Without profound need, we're going to shy away from implementing any more widgets.
March 2008
- We reviewed the OAI functionality in Fedora and determined that the benefits of deploying Fedora's solution are more than offset by the amounts of programming and systems development and support that would be required. Out of respect for our budgets and personnel, we decided that OAI will be developed as a “piggyback” system to the very similar RSS 2.0 (XML- and Dublin Core-based system).
- Because of competitive priorities, we are targeting Q4 2008 (calendar) for deployment of OAI-PMH.
April 2008
- FEDORA CONSIDERATIONS: For quite some time, we have struggled to find a good way to explain why we backed away from our initial fervor for Fedora. During a visit yesterday from our beloved compatriot from Arizona State University, Phil Komonos said it all when he said, “I think there are six [persons] who really understand Fedora.” Precisely. Thanks, Phil! Perfectly said.
May 2008
- OMEKA INTEGRATION We have begun the process of integrating Omeka with the Commons to provide administration and a framework for exhibits of Commons assets. Our first exhibits (based on a UAL HTML/CSS template) is slated for release July 2008.
- The Journal of Meteoritics and Planetary Sciences (MAPS) is loading into our prototype environment (mid-May) and will be released in the production Commons within a few days, certainly no longer than the end of the month. Included in this release is support for MAPS subscription patrons.
June 2008
- Shantz collection of images is released, followed by….
July 2008
- Shantz exhibit dedicated to Dr. Shantz's African image collections released.
August 2008
- Escarcega's initial collection and exhibit are released.
October 2008
- Castro collection and exhibit released to the public.
The Future
Through the remainder of 2008, we are slowing to allow time to add functionality. In 2009, we plan to ingest quite a few collections, integrate UAiR collection data with the Libraries' catalogs, and add e-Commerce capabilities.
