Talk:ESciDoc Application Profiles

From MPDLMediaWiki
Jump to navigation Jump to search

This page lists issues from our discussion while defining the eSciDoc application profile.

Open[edit]

PURLs / Namespaces[edit]

Natasa reserved the domain http://purl.org/escidoc.

Substructure[edit]

To be able to move forward we have to define a substructure for the domain. We agreed on following conventions for th path: all lower case, without delimiters

Suggested structure:

Andi, I thought that "degree" would be a eSciDoc-specific term, so I would rather use http://purl.org/escidoc/metadata/terms/degree than http://purl.org/escidoc/metadata/degree... but again, it looks like we are facing terminology problems --Inga 12:48, 10 March 2008 (CET)
Inga, you're right -- Andreas Gros 08:46, 13 March 2008 (CET)
Am I right that http://purl.org/escidoc/metadata/terms/creatorrole/ would only include encoding scheme entries which are used by eSciDoc in addition to those selected from MARC relators? --Inga 16:44, 13 March 2008 (CET)
Yes, that's right Andreas Gros 09:45, 14 March 2008 (CET)

The standard procedure is to indicate that something is an encoding scheme with a forward slash at the end:

and refer to the property itself, e.g. degree, without the slash:

Andi, i did not got the last paragraph exactly probably. I think with "/" or without "/" is a very very small difference. This means that somehow we are really heavily "linking" the possible values (encoding schemas) to be specific to the metadata elements themselves. Why we put metadata elements (only without "/" in under the "terms"?) Are all of our metadata under "terms" namespace (those that are escidoc specific)? Or to be more precise: Is there a big reason not to have
"http://purl.org/escidoc/metadata/elements/degree" and
"http://purl.org/escidoc/metadata/terms/degree" -> that point to allowed values?
Maybe is only my confusion, but just to cross check again. --Natasa 13:26, 13 March 2008 (CET)
This URL (http://purl.org/escidoc/metadata/terms/degree/) is only an identifier that is listed in an application profile together with a descriptive text (and links to the list of terms that appear on the page http://escidoc.org/metadata/terms/degree/index.xxx (for example)). Therefore, I think that it is no problem that the terms belonging to the encoding scheme degree appear under http://purl.org/escidoc/metadata/terms/degree/phd, http://purl.org/escidoc/metadata/terms/degree/staatsexamen, etc. that link to entries on the page mentioned above (http://escidoc.org/metadata/terms/degree/index.xxx). -- Andreas Gros 09:52, 14 March 2008 (CET)

Location[edit]

Furthermore, the PURLs have to link/resolve to somewhere (see discussion on https://zim01.gwdg.de/trac/wiki/MDSSpec/Namespaces). Following this discussion Andreas Gros suggests to use:

During the FIZ-SUB-MPDL-VC-Meeting we agreed on switching from escidoc.de to escidoc.org because of the increasingly international context eSciDoc is used in and linked to.

Would it be http://purl.org/escidoc/metadata/terms/2008/03/xx/reviewmethod/ or http://purl.org/escidoc/metadata/terms/reviewmethod/2008/03/xx/? --Inga 16:53, 13 March 2008 (CET)

Elements[edit]

dc:creator versus dc:contributor[edit]

What is our understanding of the property "creator"? Do we actually mean "contributor" with different roles? Do we use creator for "the main or leading author"?

Recommendation from meeting with Natasa, Ulla, Malte, Andi, Inga: Replace dc:creator by dc:contributor.

Just to remark: dc:contributor - An entity responsible for making contributions to the resource. dc:creator - An entity primarily responsible for making the resource. Frank 08:51, 11 March 2008 (CET)
Currently, the pubman metadata schema not complies with the distinction between "primarily responsible" and "contributed" made by dc. We only have entities (persons/organizations) which participated in content generation and their specific role (e.g. author, editor, etc.). It was argued that all pubman creators with type "author" are dc:creators and pubman creators of any other type are dc:contributors. Two contra arguments:
  • for proceedings and edited books, a creator of type "editor" is probably the entity primarily responsible for making the resource
  • for journal articles with more than a handful of authors it's quite reasonable that not all of them has been primarily responsible for the publication (see edoc example). But the person entering the record in PubMan does not necessarily know which creators are dc:creators in the core sense (as we understood it now) and which are not. In addition, we have no use case which requires the distinction introduced by Dublin Core
    BTW: As you can see in the edoc example provided above, the authors are not listed alphabetically, but the order encodes their importance for this paper, see http://www.sciencedaily.com/releases/2007/11/071105103938.htm. Please note that we will consider mapping the first creator to dc:creator for transformations to dc simple. --Inga 20:19, 11 March 2008 (CET)
Traugott made the point, that we must not over-interpret the dc comment on primarily responsible. The standard use case is that each author of a paper is a dc:creator, no matter how important they were in creating the article. Each creator can be given a corresponding creatorrole. The cases in which contributors are used should follow best-practises, like:
*http://dublincore.org/documents/usageguide/elements.shtml
*http://www.lib.ncsu.edu/cataloging/metadata/NCSUcore1.html
Most importantly, using dc:contributor as a replacement for dc:creator would decouple eSciDoc from most external search-engines because the common property to be looked up when searching for an author is dc:creator and not dc:contributor.
It fine for me to use dc:creator with an creator role (this is what we are currently practicing ;). Does anybody expect us to provide a more detailed mapping (e.g. an illustrator of an article is dc:contributor but no dc:creator)? --Inga 10:28, 13 March 2008 (CET)
I looked up the example of MARC-Relators and found out that most of these relators redefine dc:contributor and not dc:creator. In fact, MARCREL:CRE (creator) is the only refinement of dc:creator. -- Andreas Gros 17:46, 25 March 2008 (CET)

dc:subject[edit]

Vocabulary:

Discussion:

Traugott strongly suggests to use DDC, because it is the only system adapted to online usage and which is maintained and enhanced continuously. It is also fine-grained enough to allow the classification of a large number of documents, so that not too many documents end up in the same class.

missing & more[edit]

Are there predefined properties for:

  • Person->Pseudonym,
  • Person->Alternative_Name
  • Organization->Address,
    eduPerson defines a property called postalAddress. Perhaps thats interesting for you? --Kristina 09:05, 13 February 2008 (CET)
  • Event->Place,
  • Event->Invitation_Status ?

Further questions, see element list below

Encoding Schemes and Data Types[edit]

However, degree is a rather ambiguous term, wouldn't it be better to use academic_degree instead?
Andreas Gros sent an E-Mail to the BMBF to ask for a list of standardized terms of academic degrees used internationally and within Germany
  • Publication identifier: which data type to use?
    • either datatype "string" with a corresponding encoding scheme for the plain-text entries. Taking this option would mean that we can store just the ISBN/ISSN/DOI/...-number in this property, but we would need an export mechanism for the external world to access this information.
    • or datatype "URI" where the identifier is encoded into a URI, e.g.: http://...&ISSN=... Using this option would mean that we are more interoperable with the outside world as others would access such URIs directly, but for internal usage we would have to decode the information again.
Konstantin (GBV) mentioned not to store http-resolver-part with PIDs because the PID then is bound to a specific resolver but to establish some kind of handler that is able to transform a PID to the appropriate URL. So the http-resolver-part is not a part of the PID itself but is part of the resolving system which comes with the PID. Frank 09:31, 11 March 2008 (CET)

Further Tasks[edit]

  • Answer open questions
  • Polishing of AP, e.g. adding best practice examples, define structure and headings of AP?
  • Which consequences need to be derived from the AP to existing escidoc publication xml schemas? (e.g. usage of dc:creator, dc:source and dc:type; re-use foaf elements; integrate links to terms?)
  • Functional changes under discussion for r3, see https://zim01.gwdg.de/trac/wiki/MDSSpec/Revision
  • Requirements regarding copyrights, see http://colab.mpdl.mpg.de/mediawiki/EDoc_to_PubMan_migration#Copyright_Information
  • Do we need an application model as a basis for this application profile?
  • Recommendation for identifier usage required
  • Check person-pseudonym: Inga
  • AP for file required: Andi
  • Decision on subject vocabulary (DDC) - check this with Traugott: Andi - DONE, see above
  • Input for contributor roles used on edoc: Ulla & Vlad
  • Compare with relators provided by LoC, see http://www.loc.gov/loc.terms/relators/

Process[edit]

  1. finish AP
  2. register PURLs
  3. revise pubman xsd to reference definitions (ours and externals)


Namespaces[edit]

Currently, URLs for eSciDoc namespaces do not resolve to anything useful:

This should be changed, and they should be made persistent (possibly using PURL?).

Responsible: Natasa & Lars

Usage of dc:source[edit]

dc:source is used against DC semantics and should be changed to "isPartOf". In addition, the element bibliographicCitation has been created to improve interoperability (should be available twice: maschine-readable as well as human-readable). See also: Guidelines for Encoding Bibliographic Citation Information in Dublin Core Metadata

Usage of dc:creator[edit]

dc:creator is used against DC semantics. Two options:

* introduce the understanding of "primary creator" to pubman (-> could be stored in dc:creator)
* use the dc:contributor element + role (re-use from loc)

Closed[edit]

Components[edit]

from the discussion on 30th of January: Introduce distinction between metadata (e.g. format, description, content category) and properties (e.g. date-created, etc?)

Further process of splitting the component information agreed with FIZ, see mails between Frank and Natasa --Inga 18:02, 1 February 2008 (CET)

Repeating refinements defined in existing AP?[edit]

Question: Do we need to document refinements which are inherited from external sources? Example: For "alternative" we currently repeat the information ("Refines http://purl.org/dc/elements/1.1/title") from dcterms, the same is not done for "tableofcontents" (refines http://purl.org/dc/elements/1.1/description)

Result from meeting with Traugott, Andreas, Kristina and Inga from 31st of January: The application profile should be self-contained, thus all refinements used by the eSciDoc AP should be explicitly specified, even the information is just replicated from the external AP. --Inga 22:41, 31 January 2008 (CET)

Best Practices[edit]

... are used to provide further remarks and cataloging recommendation to users of the application profile. Inga is responsible to deliver this information.

Complex Elements: Creator and Organization[edit]

The proposed data model for a complex type like "Creator" in eSciDoc looks like this:

  • Creator, with properties
    • CreatorType, which can either be:
      • Person, which has the following properties:
        • Complete Name
        • Family Name
        • Given Name
        • Alternative Name
        • Person Title
        • Pseudonym
        • Organisation
        • Identifier
      • Organisation, with the following properties:
        • Oranisation Name
        • Address
        • Identifier
    • Creator Type has a further property:
      • Creator Role, which can be:
        • Author
        • Artist
        • Editor, ....

Has been solved by introducing own application profiles for persons and organization --Inga 12:24, 29 February 2008 (CET)

Complex Element: Date[edit]

Date: We do have more date types, e.g. publication, publication-online

Has been flatten to explicit elements, e.g. dcterms:created --Inga 12:22, 29 February 2008 (CET)