This process currently assumes that you do not have any really powerful OCR tool. ABBYY 8.0 and Microsoft Document Imaging have not been found to be powerful enough, but ABBYY 10 may allow a different procedure than the one below.

Digitization Workflow[]

Stage I: Photography[]

Materials: digital camera, camera stand, light sources, page weights (e.g. coins)


  1. Photograph all even-numbered pages (weighting down the page margin if needed)
  2. Photograph all odd-numbered pages

Issues: 1. Glossies such as 1960s yearbooks require different lighting to avoid glare -- haven't found ideal solution. 2. Page curvature -- possible to correct using Photoshop?

Stage II: Image processing & upload[]

Materials: digital photos, IrfanView, file compression program; optional: PDF to DJVU GUI

  1. Transfer photos to folder on computer
  2. Make backup ZIP file (important to do this before doing any actual processing)
  3. Check for duplicates/missing
  4. In IrfanView, do batch rename & rotate, first all evens & then all odds. When done all pages should be in order & facing right-side up.
  5. Either as a batch or individually, fine-rotate & recrop images to show page only (no random background crud)
  6. Re-sort, select & print images to PDF
  7. Upload PDF via, selecting "Public Domain Mark" & providing informative description
  8. If of potential Wikisource value:
    1. Convert PDF to DJVU (or just wait for conversion script to complete & then download DJVU)
    2. Upload DJVU to

Stage III: Text processing[]

  1. Following the instructions here, start a new page at of DJVU file . (For example, if the file is "Oread_August_1881.djvu", you would create the page ).
  2. Fill in the data in the resulting form, leaving any mysterious fields as they are. Save.
  3. There should now be a row of little red numbers at the bottom of the Index page.
    1. Click on one (preferably "1").
    2. You'll get a little edit box next to a page image.
    3. You can either:
      1. Type the text by hand (not recommended), or
      2. Upload the individual page image to in "Guest" mode to extract high-quality OCR text and paste that text into the edit field, or
      3. OCR the full PDF/DJVU with a program on your computer (probably not worthwhile unless your program is really good), and paste the resulting text into the edit field.
    4. Go through the resulting text, fixing hyphenation, paragraph breaks, scannos &c.
    5. Save.
    6. Go back to Index: file and click on another page.
  1. (To be determined...)

