-
Story
-
Resolution: Fixed
-
Major
The user interface for OCR-On-Demand in Virgo has been designed and approved by UX. In order to support it, we need a web service, accessible to the public (or to Virgo on behalf of the public) to trigger the production of OCR or delivery of previously prepared OCR.
The requirements are:
It must be possible to make a request for OCR to be produced from a particular item (metadata record in tracksys) associated with a given e-mail address and have that e-mail address be notified when the OCR is available within Virgo. The OCR should be the concatenation of the OCR for each master file associated with that metadata record. Subsequent requests that happen during the period in which OCR is being generated should not result in redundant processing, but should result in e-mail notification.
There also must be a way to query whether OCR exists for a given metadata record, whether a particular metadata record is flagged as being OCR'able and whether a transcription (rather than OCR) exists for a given record.
There should be a way to quickly retrieve the OCR for any record that has previously produced OCR.
Whether and how much of this information is simply indexed into the Solr record is an open question. Given we only currently have nightly updates, it may be most expedient to have all OCR delivered
Ideally the service would be able to take advantage of auto-provisioned cloud infrastructure in order to reduce turnaround time. Please discuss the possibilities with Dave Goldstein, as this is likely our first application that needs to scale in this manner.