Get it online and keep it there

Get it online and keep it there

Using Blacklight and CDL microservices to preserve and serve content

Our goal

1. dude submits newspapers or some other stuff

Our goal

2. ...

Our goal

3. our awesome permanent online surrogate brings tears of joy to the dude

Problem

what happens in step 2?

Simplification

Let's skip the process of digitization for now and go straight to taking submissions, putting them online, and keeping them around forever.

Staging

We usually get submissions on a USB hard drive.


C:\ robocopy E:\news \\SAN\staging\news /E
        

Identity

We use the Identity microservice to assign an identifier.


$ identity mint 1 bind original_id=news
Id:      16417/xt7msb3wt81w
Element: original_id
Bind:    set
Status:  ok, 4 bytes written, replacing 0 bytes
        

AipMaker

We use the AipMaker service to bag and tag the submission.


$ aipmaker $SIPS/news xt7msb3wt81w $AIPS
... grind grind grind ...
Built AIP at /opt/oaisis/aips/xt7msb3wt81w
        

Bagging...

The AipMaker service dumps the submission into a BagIt directory structure. We submit the resulting archival package to long-term storage.

... and tagging

The AipMaker service collects initial checksums after building the bag and verifies that they match the contents of the data directory as well as the contents of the submission.


e3e2cf29edaa8919d7b08828a688ed07 data/0001.tif
4df418e6069a1b30a90b6d8e815b6952 data/0002.tif
a21b3dd2073a70b5f257f3f0fa06b978 data/mets.xml
        

Representing objects

We have chosen to represent most objects as a collection of related items in deliberate sequence .

We describe objects using METS.

Items...


<mets:fileGrp ID="FileGrp0001">
  <mets:file ID="MasterFile0001" 
             USE="master"
             MIMETYPE="image/tiff">
    <mets:FLocat xlink:href="..."/>
  </mets:file>
  <mets:file ID="OcrFile0001" ...>
    ...
  </mets:file>
</mets:fileGrp>
        

... in deliberate sequence


<mets:div TYPE="page" 
          LABEL="17" 
          ORDER="17">
  <mets:fptr FILEID="MasterFile0001" />
  <mets:fptr FILEID="OcrFile0001" />
</mets:div>
        

DipMaker

The archival package is not web-friendly. We use the DipMaker service to produce an access package containing a low-res, web-friendly substitute for the archival package.


$ dipmaker $AIPS/xt7msb3wt81w $DIPS
... grind grind grind ...
Built DIP at /opt/oaisis/dips/xt7msb3wt81w
        

Derivatives

We create low-res tiled images to enable fast-loading, easily zoomable web images.


<mets:div TYPE="page" 
          LABEL="17" 
          ORDER="17">
  <mets:fptr FILEID="TiledImageFile0001"/>
  <mets:fptr FILEID="PrintImageFile0001"/>
  ...
</mets:div>

SolrMaker

We use the SolrMaker service to construct JSON serializations of minimal metadata for each item in the object.


$ solrmaker $DIPS/xt7msb3wt81w $SOLR
... grind grind grind ...
Built Solr documents at /opt/oaisis/solr/xt7msb3wt81w
        

Metadata for Solr



{ "id": "xt7msb3wt81w_17",
  "parent_id": "xt7msb3wt81w",
  "title": "Snoozy Snorum and the Magic Pickle",
  "label": "17",
  "mets_url": "http://HOST/dips/xt7msb3wt81w/data/mets.xml",
  "format": "books",
  "text": "But Snoozy Snorum was snoozy...",
  ... }
        

Indexing into Blacklight

With very little up my sleeves, I will now index a book into our demo site.

Indexing into Blacklight

With very little up my sleeves, I will now index a book into our demo site.

Unless something breaks.

Indexing into Blacklight

With very little up my sleeves, I will now index a book into our demo site.

Unless something breaks.

Which would never happen in a live demo.

Thanks

CC BY-SA 3.0

http://scdp.uky.edu/mps/talks/2011-04-01/overview-of-our-process/

HTML5+CSS+JS slides designed by Edward O'Connor.

Next: Paper Vault