Skip to main content

SCU logo

About this collection

The Santa Clara newspaper digitization pilot project

 

This collection is one of two sets of The Santa Clara newspaper pilot project. They were digitized by Backstage Library Works from our microfilm newspaper collection. The sets run from January 12th, 1961, through December 14th, 1961.

 

The two sets differ in significant ways. The Santa Clara – PDF collection employs the traditional process of digitizing the film, turning the images into PDF pages, collecting the pages together for each issue, performing Optical Character Recognition on the pages to extract the full text of the articles, and presenting the issues as browsable, searchable PDFs.

 

This collection, The Santa Clara – segmentedextends the functionality of the collection greatly by applying METS/ALTO metadata to the images. Briefly, METS defines the structure of each page, and ALTO adds the textual content, layout, styles, and spatial coordinates. This allows the presentation of the issues to have several important advantages:

- Searches will return results with the searched word or phrase highlighted on the page;

- Headlines are indexed, so researchers can scan the headlines of an page rather than scrolling through the PDF to discover its contents;

- Articles are ‘segmented’: clicking on an article will open the full article in a new window, even if it originally was continued on one or more subsequent pages, so researchers can read complete articles without searching through the issue for the article’s continuation.

 

Examples of the differences

 

Here is an item gotten as a search result. On the left is the PDF Collection version; on the right is the Segmented Collection version. Notice that the segmented item is showing the headlines for the page and has highlighted the search word on the page:

 

compare basic and segmented 

 

When a headline (or an article on the page) is clicked in the segmented version, the article becomes highlighted (on the left below), ready to open in its own window to be read, downloaded, or printed (on the right below).

example of highlighting

We invite you to compare the collections and let us know if you find value in the (more expensive to produce) article-segmented version. Details on the costs of the complete newspaper digitization project and information on how to support it are on our Giving to the Library page. 

 
Select the collections to add or remove from your search
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
 
OK