Get it online and keep it there
Get it online and keep it there
Using Blacklight and CDL microservices to preserve and serve content
Our goal
1. dude submits newspapers or some other stuff
Our goal
2. ...
Our goal
3. our awesome permanent online surrogate brings tears of joy to the dude
Problem
what happens in step 2?
Simplification
Let's skip the process of digitization for now and go straight to taking submissions, putting them online, and keeping them around forever.
Staging
We usually get submissions on a USB hard drive.
C:\ robocopy E:\news \\SAN\staging\news /E
Identity
We use the Identity microservice to assign an identifier.
$ identity mint 1 bind original_id=news
Id: 16417/xt7msb3wt81w
Element: original_id
Bind: set
Status: ok, 4 bytes written, replacing 0 bytes
AipMaker
We use the AipMaker service to bag and tag the submission.
$ aipmaker $SIPS/news xt7msb3wt81w $AIPS
... grind grind grind ...
Built AIP at /opt/oaisis/aips/xt7msb3wt81w
Bagging...
The AipMaker service dumps the submission into a BagIt directory structure. We submit the resulting archival package to long-term storage.
... and tagging
The AipMaker service collects initial checksums after building the bag and verifies that they match the contents of the data directory as well as the contents of the submission.
e3e2cf29edaa8919d7b08828a688ed07 data/0001.tif
4df418e6069a1b30a90b6d8e815b6952 data/0002.tif
a21b3dd2073a70b5f257f3f0fa06b978 data/mets.xml
Representing objects
We have chosen to represent most objects as a collection of related items in deliberate sequence .
We describe objects using METS.
Items...
<mets:fileGrp ID="FileGrp0001">
<mets:file ID="MasterFile0001"
USE="master"
MIMETYPE="image/tiff">
<mets:FLocat xlink:href="..."/>
</mets:file>
<mets:file ID="OcrFile0001" ...>
...
</mets:file>
</mets:fileGrp>
... in deliberate sequence
<mets:div TYPE="page"
LABEL="17"
ORDER="17">
<mets:fptr FILEID="MasterFile0001" />
<mets:fptr FILEID="OcrFile0001" />
</mets:div>
DipMaker
The archival package is not web-friendly. We use the DipMaker service to produce an access package containing a low-res, web-friendly substitute for the archival package.
$ dipmaker $AIPS/xt7msb3wt81w $DIPS
... grind grind grind ...
Built DIP at /opt/oaisis/dips/xt7msb3wt81w
Derivatives
We create low-res tiled images to enable fast-loading, easily zoomable web images.
<mets:div TYPE="page"
LABEL="17"
ORDER="17">
<mets:fptr FILEID="TiledImageFile0001"/>
<mets:fptr FILEID="PrintImageFile0001"/>
...
</mets:div>
Enabling search
Our discovery system uses a Blacklight front-end powered by a Solr search engine.
We need to summarize the metadata in the access package.
SolrMaker
We use the SolrMaker service to construct JSON serializations of minimal metadata for each item in the object.
$ solrmaker $DIPS/xt7msb3wt81w $SOLR
... grind grind grind ...
Built Solr documents at /opt/oaisis/solr/xt7msb3wt81w
Metadata for Solr
{ "id": "xt7msb3wt81w_17",
"parent_id": "xt7msb3wt81w",
"title": "Snoozy Snorum and the Magic Pickle",
"label": "17",
"mets_url": "http://HOST/dips/xt7msb3wt81w/data/mets.xml",
"format": "books",
"text": "But Snoozy Snorum was snoozy...",
... }
Indexing into Blacklight
With very little up my sleeves, I will now index a book into our demo site.
Indexing into Blacklight
With very little up my sleeves, I will now index a book into our demo site.
Unless something breaks.
Indexing into Blacklight
With very little up my sleeves, I will now index a book into our demo site.
Unless something breaks.
Which would never happen in a live demo.
Thanks
http://scdp.uky.edu/mps/talks/2011-04-01/overview-of-our-process/
HTML5+CSS+JS slides designed by Edward O'Connor.
Next: Paper Vault