Linguistic Literature Single Pages

MPDL,LinguisticLiterature

= Introduction =

This WP consists of three subparts:


 * 1) Separating PDFs into single pages [[Image:Symbol_Remove.png | 20px]]
 * 2) Providing a possibility to view individual pages in a browser [[Image:Symbol_Remove.png | 20px]]
 * 3) Providing a possibility to access snippets of pages through rectangular coordinates [[Image:Symbol_Remove.png | 20px]]

= Usage Scenarios =

1. separating PDFs into single pages
Multi-page PDFs as available in PubMan should be made available on a page-by-page basis. In this way, a particular page can be accessed through a direct link. For example, if a PDF is accessible through a URI /xxx/, then an individual page should be accessible through an URI like /xxx/page_yyy/. These links can then be inserted in other webpages linking to information as described on a particular page in a scientific work. For example, http://wals.info/ links to sources. For some of these sources there are links added to google books to show the actual page, cf. http://wals.info/languoid/lect/wals_code_abk. Such links should also be possible to the works in PubMan.

The pages should be accessible through their logical page numbers (i.e. the actual page number in the published version), not the numbering of the pages in the multi-page PDF. In most PDFs as processed in the Linguistic Literature project these logical page numbers have been added to the PDF. This has been done with the tool Adobe Acrobat 3D and the results can be reused by every viewing environment. When the logical page numbers are not available for a public accessible document, they will be manually added to the PDF by the MPI EVA librarians.

When referencing one page within an PDF via an URL, this URL shall include the logical page number. The logical page numbers are not always in one sequence. For example it could happen that one document first starts with 5 unnumbered pages, than with page I - V, and than with page 3-66. That means that the no information concerning the logical page numbers can be derived from the total number of pages. That also means that the unnumbered pages can not be referenced.

1.1 Submission
Documents do not have to be split "manually", so no changes in the submission mask are needed; they only will be displayed page by page via a special viewing environment (e.g. Digilib). But for this viewing environment, some technical adjustments have to be done (details have to be discussed in the dev team) --> all documents within PubMan can be displayed page by page.

2. providing a possibility to view individual pages in a browser
Public accessible documents When viewing a whole document, following functionality should be offered: Essential hereby is that direct links to individual pages can be used to download individual pages. There are many large PDFs in the collection and then it is very unpractical to always have to download and view the whole thing if you only need one page.
 * 1) Display of the URL of each individual page (must)
 * 2) Download possibility for the whole document (must) or for a selected page only (should)

Restricted accessible documents Theses are documents that are only visible to members of the MPI EVA and individual users.
 * 1) The user has the right to view the document
 * The view will work similar to the one for public accessible documents
 * 1) The user doe not have the right to view the document
 * Based on general usage rights, it is allowed to show single pages from a restricted document, but not the whole book. (That does not mean that the usage rights for the pages could be different from the usage pages for the whole book.)
 * This case is interesting when a restricted book might be shown somewhere as citation. Then it would be nice to view at least the page where the citation comes from. That means that restricted accessible documents will be displayed to the public in some predefined cases: the pages of the books that are referenced in WALS. This restricted accessible documents shall not be displayed as search results for public users, only when someone (like WALS) uses the direct URL. Of course this single page than should also be download-able as a single page.

Search results are part of the Linguistic Literature Fulltext Search Specification.

3. providing a possibility to access snippets
NOTE: Not part of the first milestone!

Often not complete pages are necessary, but only small parts. In the context of the linguistic literature project it is for example of interest to be able to access individual examples on a page of a book (See the following example: ). A system is needed to access specific rectangles on a page through coordinates. As above, these snippets should both be served as images to be downloaded, or as snippets in a special browser preview.

= Use Cases =

UC_LL_SP_01 Browse file
Status/Schedule
 * Status: This use case is realized through the functionality of the PDF Viewer to browse within one document.
 * Schedule: PubMan 6.2

Motivation
 * The user wants to view one whole file of a selected item.

Pre-Condition
 * one item is selected

Triggers
 * This use case can be included by the use cases
 * UC_PM_BD_04 view item version

Steps
 * 1) The user chooses to browse within the selected file.
 * 2) The system displays the first page of the selected file including the logical page numbering.
 * 3) (Optionally) The user chooses to go to a special logical page within the file.
 * 3.1 The system displays the selected page.
 * 1) (Optionally) The user chooses to go to the previous, to the next, to the first or to the last page.
 * 4.1 The system displays the selected page.
 * 1)  Extension point: export single page
 * 5.1 If the user wants to export a single page of the whole file, include UC_LL_SP_04 Export single page.
 * 1) Extension point: view URL of a single page
 * 6.1 If the user wants to view the URL of a single page of the whole file, include UC_LL_SP_02 View URL of a single page.
 * 1) Extension point: export whole document
 * 7.1 If the user wants to export the whole file, include UC_LL_SP_03 Export file.
 * 1) The use case ends successfully.

Actors Involved
 * every user (for publicly accessible files)
 * user group with rights to access restricted items of the LDH collection (for restricted accessible files)

Remarks
 * One Example from google docs: http://docs.google.com/viewer?url=http%3A%2F%2Frelativity.livingreviews.org%2FArticles%2Flrr-2007-5%2Fdownload%2Flrr-2007-5Color.pdf
 * Further information about using pdf: http://www.adobe.com/devnet/acrobat/javascript.html

UC_LL_SP_02 View URL of a single page
Status/Schedule
 * Status: can not be implemented
 * Schedule: PubMan 6.2

Motivation
 * The user wants to view the URL of one individual page which is part of a whole file, so that he can save it and use it as a reference

Expected Outcome
 * The URL of the selected individual page is displayed

Pre-Condition
 * The user has the right to view the whole file.
 * One single page within the file is selected.

Triggers
 * This use case can be included by the use cases
 * UC_LL_SP_01 Browse file

Steps
 * 1) The user wants to view the URL of one individual page within a file.
 * 2) The system displays the URL of the selected page. This URL shall include the logical page number of the selected page.
 * 3) The use case ends successfully.

Actors Involved
 * user (for publicly accessible files)
 * user group with rights to access restricted items of the LDH collection (for restricted accessible files)

Comments
 * This use case is not possible because the document will be displayed in an external viewer (e.g. PDF Viewer) and eSciDoc has no influence on the functionality of this viewer).

UC_LL_SP_03 Export file
Status/Schedule
 * Status: This use case is already realized via the functionality of an pdf viewer "save"!
 * Schedule: PubMan 6.2

Motivation
 * The user wants to export the whole file.

Pre-Condition
 * One file is selected.
 * The user has the right to view the file.

Triggers
 * This use case can be included by the use cases
 * UC_LL_SP_01 Browse file

Steps
 * 1) The user chooses to export the selected file.
 * 2) The system downloads the file and makes it available for saving.
 * 3) The use case ends successfully.

Actors Involved
 * user (for publicly accessible files)
 * user group with rights to access restricted items of the LDH collection (for restricted accessible files)

UC_LL_SP_04 Export single page
Status/Schedule
 * Status: can not be implemented
 * Schedule: PubMan 6.2

Motivation
 * The user wants to download a single page of a file, but not the whole file itself.

Pre-Condition
 * One single page within a file is selected.

Triggers
 * This use case can be included by the use cases
 * UC_LL_SP_01 Browse file

Steps
 * 1) The user chooses to download the selected page of a file.
 * 2) The system downloads the selected page and makes it available for saving.
 * 3) The use case ends successfully.

Actors Involved
 * every user (for publicly accessible files)
 * user group with rights to access restricted items of the LDH collection (for restricted accessible files)

Comments
 * This use case is not possible because the document will be displayed in an external viewer (e.g. PDF Viewer) and eSciDoc has no influence on the functionality of this viewer).

UC_LL_SP_05 Resolve single page by URL
Status/Schedule
 * Status: This use case is realized through the functionality of the PDF Viewer to work with parameters. This means when the user enters behind the PDF link "#page=10" the viewer displays automatically the page 10 (an not as usual the first page).
 * Schedule: PubMan 6.2

Motivation
 * The user knows the exact URL of a page and follows that URL (e.g. the page is cited somewhere via the URL)

Steps
 * 1) The user requests a special page by providing the URL assigned by the system for persistently citing the page.
 * 2) The system displays the requested page including the logical page number and checks if the user has the right to view the whole file.
 * 3) The user has not the right to view the whole file. The use case ends successfully.

Alternatives
 * 3a. The user has the right to view the whole file. Continue with UC_LL_SP_01 Browse file Step 3.

Actors Involved
 * every user (as it is allowed to show single pages of a not public file to everyone)

Comments
 * This is only relevant when the user knows the exact URL of the page (otherwise he will always have to view the whole file (see UC_LL_SP_01 Browse file ). This use case means, that the page itself will be displayed, but without the connection to all the other pages of the whole file. That will only be the case for a predefined set of pages (the ones that are cited in WALS).