This process currently assumes that you do not have any really powerful OCR tool. ABBYY 8.0 and Microsoft Document Imaging have not been found to be powerful enough, but ABBYY 10 may allow a different procedure than the one below.

Digitization Workflow[edit | edit source]

Stage I: Photography[edit | edit source]

Materials: digital camera, camera stand, light sources, page weights (e.g. coins)

Steps:

  1. Photograph all even-numbered pages (weighting down the page margin if needed)
  2. Photograph all odd-numbered pages

Issues: 1. Glossies such as 1960s yearbooks require different lighting to avoid glare -- haven't found ideal solution. 2. Page curvature -- possible to correct using Photoshop?

Stage II: Image processing & upload[edit | edit source]

Materials: digital photos, IrfanView, file compression program; optional: PDF to DJVU GUI

Steps: my Name is Bernard A Wimbush Jr. I've been hacked f rauded and compromised... Without CONSENT suspect encrypted third party apps inside my phone. After looking through those app-purchased together feature I was able to locate that they made me out to be a Google App Developer and Expert. Whole PLATFORM ENTERPRISE ADMIN DOMAIN family Wi-Fi as a child. I think they are bloggers who are dumb by letting me obtain leakeage info about them. Multiple police reports made and fake PDF documents and LLC LICENCES IN MY NAME. I BELIEVE IM A WEBMASTER APP DEVELOPER CONTRIBUTER ATTRIBUTES GO TO THEM IN TOKEN BY ENCRYTED CODES DUE TO PLACING ME ON LOCK . SUPPORTED DOCUMENTS PROVING THESE ABOVE ALLEDGE FACTS IN PURSUIT OF PROSECUTION AND LAWSUIT OF THIS INDIVIDUAL COMPANY GROUP ECT. THANKS IN ADVANCE

  1. Transfer photos to folder on computer
  2. Make backup ZIP file (important to do this before doing any actual processing)
  3. Check for duplicates/missing
  4. In IrfanView, do batch rename & rotate, first all evens & then all odds. When done all pages should be in order & facing right-side up.
  5. Either as a batch or individually, fine-rotate & recrop images to show page only (no random background crud)
  6. Re-sort, select & print images to PDF
  7. Upload PDF via http://archive.org/create, selecting "Public Domain Mark" & providing informative description
  8. If of potential Wikisource value:
    1. Convert PDF to DJVU (or just wait for archive.org conversion script to complete & then download DJVU)
    2. Upload DJVU to http://commons.wikimedia.org/Special:Upload

Stage III: Text processing[edit | edit source]

Wikisource
  1. Following the instructions here, start a new page at en.wikisource.org/wiki/Index:name of DJVU file . (For example, if the file is "Oread_August_1881.djvu", you would create the page en.wikisource.org/wiki/Index:Oread_August_1881.djvu ).
  2. Fill in the data in the resulting form, leaving any mysterious fields as they are. Save.
  3. There should now be a row of little red numbers at the bottom of the Index page.
    1. Click on one (preferably "1").
    2. You'll get a little edit box next to a page image.
    3. You can either:
      1. Type the text by hand (not recommended), or
      2. Upload the individual page image to http://onlineocr.net in "Guest" mode to extract high-quality OCR text and paste that text into the edit field, or
      3. OCR the full PDF/DJVU with a program on your computer (probably not worthwhile unless your program is really good), and paste the resulting text into the edit field.
    4. Go through the resulting text, fixing hyphenation, paragraph breaks, scannos &c.
    5. Save.
    6. Go back to Index: file and click on another page.
ShimerSource
  1. (To be determined...)


This page is part of the Shimer College Wiki, an independent documentation project. Shimer College, the Great Books college of Chicago, is not responsible for its content.



Community content is available under CC-BY-SA unless otherwise noted.