Specifications for Digitizing the Newspapers of Connecticut
Technical Specifications
These recommended technical specifications for digitizing newspapers are based on the National Digital Newspaper Program Technical Guidelines & Specifications.
- TIFF 6.0, uncompressed, file per page
- 8-bit grayscale (if digitizing from microfilm or black-and-white originals)
- 24-bit color, if digitizing from color originals
- Stay away from 1-bit bitonal which can reduce the accuracy of OCR
- Maximum resolution possible, between 300 to 400 dpi (relative to the size of original page and considering the size of lettering)
- Crop to the page edge, not to the edge of the text, with up to 1/4 inch beyond the page
- Deskew, if skew is greater than 3 degrees
- If newspapers were microfilmed two pages per frame, split into two separate page filesOptional: Multi-page searchable PDF per issue
File Naming Advice
- Make sure that your filenames are consistent and identifiable across your digitization project.
- Use a standard identifier, such as an OCLC number or LCCN, to identify the newspaper.
- Record the issue date in format YYYYMMDD
- Include the page sequence number as “001” “002” “003” and so on.
- Example:
- Issue folder for the January 2, 1941, issue of the Thompsonville Press (OCLC number 27354139): o27354139_19410102
- Page 1 of the January 2, 1941, issue of the Thompsonville Press: o27354139_19410102_001.tif
Optical Character Recognition (OCR)
Optical character recognition (OCR) is the process of converting a static image into searchable text. It is one of the huge advantages of digitizing your newspapers, and makes them more accessible to researchers. Because OCR is a technology with some limitations, there are a number of factors that will influence OCR quality:
- Stains, tears, faded text, or bleedthrough on the original newspapers
- Damaged or deteriorating microfilm
- Lighting issues during microfilming
- Complex newspaper layouts
- Fonts of different styles and sizes
- Page curvature
- Technical specifications used for digitization
To ensure that you’re getting the best OCR results possible, make sure that you’re following the recommended technical specifications and digitizing from the best quality microfilm or originals that you have access to.