Difference between revisions of "Digitization Guidelines"

From MPDLMediaWiki
Jump to navigation Jump to search
m
Line 1: Line 1:
  * work in progress  -- [[User:Andi|Andreas Gros]] 17:34, 17 March 2008 (CET)
  * work in progress  -- [[User:Andi|Andreas Gros]] 11:06, 1 April 2008 (CEST)


In the following you will soon find a relevant subset of the DFG guidelines for digitization<ref>[[media:Praxisregeln_Digitalisierung_Maerz_2007_DFG.pdf]]</ref> and, in addition, a description of a workflow for digitizing and submitting digital documents for digital collections maintained by the MPDL.   
In the following you will soon find a relevant subset of the DFG guidelines for digitization<ref>[[media:Praxisregeln_Digitalisierung_Maerz_2007_DFG.pdf]]</ref> and, in addition, a description of a workflow for digitizing and submitting digital documents for digital collections maintained by the MPDL.   
Line 21: Line 21:


Master images of greyscale or color images should be stored in &quot;TIFF uncompressed&quot;. For bitonale images TIFF with group 4 compression can be used. In the future JPEG2000 could be used as an alternative to &quot;TIFF uncompressed&quot;. Unfortunately, currently there are not enough software tools available to make JPEG2000 a feasible tool for the archiving of master images.  
Master images of greyscale or color images should be stored in &quot;TIFF uncompressed&quot;. For bitonale images TIFF with group 4 compression can be used. In the future JPEG2000 could be used as an alternative to &quot;TIFF uncompressed&quot;. Unfortunately, currently there are not enough software tools available to make JPEG2000 a feasible tool for the archiving of master images.  


=== Fulltext digitization ===
=== Fulltext digitization ===


....
The machine-readable form of a text has to be provided in either ASCII (Latin-1) or Unicode (either UTF-8 or UTF-16 with Byte Order Mark (BOM)<ref>http://www.unicode.org/unicode/faq/utf_bom.html</ref>).  


The creation of fulltext from digital master files can be done by Optical Character Recognition (OCR) or by manual transcription. Current OCR software, however, is suitable only for printed texts, reaching back as far as to texts printed in more recent roman type or Gothic print produced by using automated printing presses (from approx. 1850 on).





Revision as of 09:06, 1 April 2008

* work in progress  -- Andreas Gros 11:06, 1 April 2008 (CEST)

In the following you will soon find a relevant subset of the DFG guidelines for digitization[1] and, in addition, a description of a workflow for digitizing and submitting digital documents for digital collections maintained by the MPDL.

Scanning[edit]

Resolution and Image Quality[edit]

For scanning greyscale or colored prints, a minimum resolution of 300 dpi is suggested. Documents containing handwriting or maps with fine lines and small descriptions might require a scan resolution of up to 400 dpi. For generating bitonal scans, 600 dpi are requested.


Color depth[edit]

Bitonal scans (b/w) are generated with a color depth of 1 level (1 bit) per pixel. Greyscale images are digitized with 256 levels (8 bit = 1 byte) per pixel. Color images use 3 color channels (red, green, blue) with 1 byte per channel (= 3 byte = 24 bit) per pixel, enabling 256 x 256 x 256 = 16.7 million colors per pixel. 24 bit provide a sufficiently high color depth for color scans. Scanning with 48 bit color depth makes sense only in few cases where images need to be corrected or reworked after the scanning process.


File formats[edit]

Master images of greyscale or color images should be stored in "TIFF uncompressed". For bitonale images TIFF with group 4 compression can be used. In the future JPEG2000 could be used as an alternative to "TIFF uncompressed". Unfortunately, currently there are not enough software tools available to make JPEG2000 a feasible tool for the archiving of master images.


Fulltext digitization[edit]

The machine-readable form of a text has to be provided in either ASCII (Latin-1) or Unicode (either UTF-8 or UTF-16 with Byte Order Mark (BOM)[2]).

The creation of fulltext from digital master files can be done by Optical Character Recognition (OCR) or by manual transcription. Current OCR software, however, is suitable only for printed texts, reaching back as far as to texts printed in more recent roman type or Gothic print produced by using automated printing presses (from approx. 1850 on).



Metadata requirements[edit]

* to be continued



Notes[edit]