Difference between revisions of "PubMan Func Spec Easy Submission"
Jump to navigation
Jump to search
Line 3: | Line 3: | ||
=Functional Specification= | =Functional Specification= | ||
==UC_PM_EASM_01 upload file in structured format== | ==UC_PM_EASM_01 upload file in structured format== | ||
===Data involved=== | ===Data involved=== |
Revision as of 11:39, 1 September 2009
|
Functional Specification[edit]
UC_PM_EASM_01 upload file in structured format[edit]
Data involved[edit]
BibTeX File, structured format. See example file by the AEI.
Constraints[edit]
- BibTeX files are idiosyncratically structured; BibTool may help with preprocessing/normalization.
- e.g. upper and lower case corrections, resolving macros, unicode encodings vs. (la)tex encoding, etc.
- Basic TeX Parsing is needed to interpret non-ascii characters etc., see for example https://dev.livingreviews.org/projects/epubtk/browser/trunk/ePubTk/lib/bibtexlib.py .
- In BibTeX fields are not repeatable; thus multiple authors need to be parsed from the author field.
- BibTeX allows for different formats of representing an author's name; thus the parser needs to be smart enough to recognize them all. See for example http://search.cpan.org/~gward/Text-BibTeX-0.34/BibTeX/Name.pm
Suggested steps to prepare BibTeX files for import[edit]
- Normalize BibTeX with BibTool (resolves macros, may be used to map field names, unifies the syntax).
- Parse the - now normalized - records.
- Allow for/provide a mapping for non-standard fields (and possibly genres).
- Handle substructure of fields
- Multiple entries in author and keyword fields. (see also http://nwalsh.com/tex/texhelp/bibtx-23.html)
- (La)TeX encoding for special characters/formulae. (see for example https://dev.livingreviews.org/projects/epubtk/browser/trunk/ePubTk/lib/charmaps/tex2unicode.py)
- Map BibTeX fields/genres (including non-standard ones) to eSciDoc PubItem application profile. Mapping can be found here.
- Java Tools to check
Future development[edit]
- Upload files in structured format containing more than one reference
see Ingestion