| Priority
| Identifier
| Problem
| Description/ Example
| Status
|
| ZNC-1988-43c-xyz | This is the problem | This is a more detailed description of the problem, also comments etc. are welcome in this area. | e.g. reported/in progress/ solved
|
| Testing from Nov. 2011
|
 | ZNC-1988-43c-0015 | Author has wrong affiliation | | reported/corrected in new run
|
 | ZNC-1988-43c-0021 | Wrong title | Dilinoylgalactosylglycerol => DiIinoyIgalactosylglycerol (the l becomes a capital i) This problem is OCR related | reported
|
 | ZNC-1988-43c-0029 | All affiliations are listed as creators (not child from author) | | reported/corrected in new run
|
 | ZNC-1988-43c-0034 | Merged affiliations | | reported/corrected in new run
|
 | ZNC-1988-43c-0084 | Author has wrong affiliation | | reported/bug under consideration
|
 | ZNC-1988-43c-0185 | Wrong author name | Gülz => Giilz This problem is OCR related | reported
|
 | ZNC-1988-43c-0189 | Merged affiliations | | reported
|
 | ZNC-1988-43c-0211 | Wrong author name | Hans Ulrich Seitz & Ernst Reinhard => Hans Ulrich & Seitz Ernst Reinhard | reported
|
 | ZNC-1988-43c-0243 | Author has wrong affiliation & Affiliations are listed twice | | reported
|
 | ZNC-1988-43c-0261 | Wrong author name | Helmut Schipp von Branitz => Helmut Schipp & von Branitz (one author becomes two) | reported
|
 | ZNC-1988-43c-0287 | Wrong author name | Leopold von Proff => Leopold von & Proff (one author becomes two) | reported
|
 | ZNC-1988-43c-0340 | Wrong title & Author has wrong affiliation | Pimpinella anisum => Pimpinella anisutn This problem is OCR related | reported
|
 | ZNC-1988-43c-0382 | Wrong title | "New Constituents of Essential Oil from Elsholtzia pilosa* Materials and Methods" (Phrase Materials and Methods do not belong here) | reported
|
 | ZNC-1988-43c-0455 | Author has wrong affiliation | | reported
|
| ZNC-1988-43c-0461 | Author has wrong affiliation | | reported
|
 | ZNC-1988-43c-0475 | Wrong author name (special character, OCR related) & All affiliations are listed as creators (not child from author) & Merged affiliations | | reported
|
 | ZNC-1988-43c-0491 | Wrong author name | Schübel => Schiibel This problem is OCR related | reported
|
| ZNC-1988-43c-0523 | Wrong author name | Borner => Börner This problem is OCR related | reported
|
| ZNC-1988-43c-0527 | Author has wrong affiliation & place is listed twice | | reported
|
 | ZNC-1988-43c-0589 | Author has wrong affiliation | | reported
|
 | ZNC-1988-43c-0613 | Wrong author name | Ilpo => lipo This problem is OCR related | reported
|
 | ZNC-1988-43c-0621 | All subjects become authors | | reported
|
| ZNC-1988-43c-0668 | Merged affiliations | | reported
|
| ZNC-1988-43c-0721 | Author has wrong affiliation | | reported
|
 | ZNC-1988-43c-0794 | Merged affiliations & Author has wrong affiliation | | reported
|
| ZNC-1988-43c-0830 | Author has wrong affiliation | | reported
|
 | ZNC-1988-43c-0869 | Author has wrong affiliation & Merged affiliations | | reported
|
 | ZNC-1988-43c-0900 | Author has wrong affiliation & Merged affiliations & Wrong author created | Francesco Dall'Acqua becomes 2 authors | reported
|
 | ZNC-1988-43c-0905 | part of subtitle becomes author | The english subtitle information gets lost & author Erich F. Elstner becomes 2 authors "Heme Erich" and "F Elstner" (The 'Heme' comes from the subtitle) | reported
|
 | ZNC-1988-43c-0920 | Author has wrong affiliation | | reported
|
| ZNC-1988-43c-0317_b | Author has wrong affiliation & Merged affiliations | | reported
|
 | ZNC-1988-43c-0372_b | Author has wrong affiliation & Merged affiliations | | reported
|
 | ZNC-1988-43c-0_805b | All affiliations are listed as creators (not child from author) & Wrong author names | Three authors are merged to 2 authors | reported
|
| Testing from 17.01.2012
|
| | ZNA-1947-2a-0491 | Wrong pdf??? |
|
| | ZNA-1947-2a-0494 | Wrong pdf??? |
|
| | ZNA-1947-2a-0497 | Wrong pdf??? |
|
| | ZNA-1947-2a-0509 | Wrong pdf??? |
|
| | ZNA-1947-2a-0539b_b | Wrong pdf??? |
|
| Testing from 24.01.2012
|
| | ZNB-1947-2b-0001 | <forename type="first">Von</forename> <forename type="middle">A</forename> <surname>Catsch</surname> |
|
| | ZNB-1947-2b-0005 | content in abstract |
|
| | ZNB-1947-2b-0010 | First part of content becomes abstract and chemical formula becomes text:OH r jYVT^ COOH |
|
| | ZNB-1947-2b-0012 | Beginning of content in abstract |
|
| | ZNB-1947-2b-0014 | title is missing |
|
| | ZNB-1947-2b-0019 | OCR problem, wrong author name, S-Gold-N-Allyl-N' => Ä-Gold-A T -Allyl-A part of abstract is missing |
|
| | ZNB-1947-2b-0025 | OCR problem: Bacterium => Bactenum, Bact => Bad. @BULLET in org info |
|
| | ZNB-1947-2b-0029 | OCR problem: Vermehrung => Yermehrung (title), Tabak => T&bak content in title |
|
| | ZNB-1947-2b-0035 | OCR problem, Beziehungen => Beziehlingen (title) content becomes abstract |
|
| | ZNB-1947-2b-0063 | part of content in abstract |
|
| | ZNB-1947-2b-0066 | tei missing |
|
| | ZNB-1947-2b-0072 | content becomes abstract |
|
| | ZNB-1947-2b-0073 | OCR problem, Bruno => Bkuno part of content becomes abstract |
|
| | ZNB-1947-2b-0081 | OCR problem, Strompotentialkurven => Strompotentialkuryen, wrong translation of formula in abstract |
|
| | ZNB-1947-2b-0089 | OCR problem, Verbindungen => Verbinclungen |
|
| | ZNB-1947-2b-0094 | content in abstract |
|
| | ZNB-1947-2b-0104 | OCR problem in title |
|
| | ZNB-1947-2b-0108 | OCR problem in abstract content in abstract |
|
| | ZNB-1947-2b-0112 | content in abstract |
|
| | ZNB-1947-2b-0146 | OCR problem in title |
|
| | ZNB-1947-2b-0158 | OCR problem in title, author name |
|
| | ZNB-1947-2b-0187 | part of abstract missing |
|
| | ZNB-1947-2b-0203 | tei missing |
|
| | ZNB-1947-2b-0215 | OCR problem in abstract |
|
| | ZNB-1947-2b-0222 | Affiliation missing, content in abstract |
|
| | ZNB-1947-2b-0233 | OCR problem in title |
|
| | ZNB-1947-2b-0249 | OCR problem in title |
|
| | ZNB-1947-2b-0286 | content in abstract |
|
| | ZNB-1947-2b-0292 | content in abstract |
|
| | ZNB-1947-2b-0295 | OCR problem in abstract |
|
| | ZNB-1947-2b-0301 | part of abstract missing |
|
| | ZNB-1947-2b-0308 | OCR problem in title affiliation missing |
|
| | ZNB-1947-2b-0313 | part of content in abstract |
|
| | ZNB-1947-2b-0330.header.tei.xml | no corresponding pdf on regensburg server |
|
| | ZNB-1947-2b-0330 | OCR problem in author name part of content in abstract |
|
| | ZNB-1947-2b-0358 | OCR problem in affiliation |
|
| | ZNB-1947-2b-0361 | content in abstract |
|
| | ZNB-1947-2b-0367 | content in abstract |
|
| | ZNB-1947-2b-0369 | content in abstract |
|
| | ZNB-1947-2b-0382 | OCR too bad for tei transformation |
|
| | ZNB-1947-2b-0397 | OCR problem in title |
|
| | ZNB-1947-2b-0397 | content in abstract |
|
| | ZNB-1947-2b-0400 | content in abstract |
|
| | ZNB-1947-2b-0404 | OCR problem in abstract, content in abstract |
|
| | ZNB-1947-2b-0410 | OCR problem in abstract, content in abstract |
|
| | ZNB-1947-2b-0414 | OCR problem in abstract |
|
| | ZNB-1947-2b-0419 | content in abstract |
|
| | ZNB-1947-2b-0421 | OCR problem in author name content in abstract |
|
| | ZNB-1947-2b-0428 | part of abstract missing |
|
| | ZNB-1947-2b-0433 | OCR problem in abstract, content in abstract |
|
| | ZNB-1947-2b-0444 | OCR problem in abstract part of abstract missing |
|
| | ZNB-1947-2b-0450 | OCR problem in abstract part of abstract missing |
|
| | ZfN-1946-1-0003 | content in abstract |
|
| | ZfN-1946-1-0010 | wrong abstract |
|
| | ZfN-1946-1-0013 | content in abstract |
|
| | ZfN-1946-1-0018 | OCR problem in abstract |
|
| | ZfN-1946-1-0053 | OCR problem in abstract Maxwellschen => Maxwell sehen |
|
| | ZfN-1946-1-0067 | Wrong affiliation address info |
|
| | ZfN-1946-1-0070 | OCR problem in title |
|
| | ZfN-1946-1-0087 | content becomes abstract |
|
| | ZfN-1946-1-0093 | OCR problem in author name |
|
| | ZfN-1946-1-0108 | |
|
| | ZfN-1946-1-0120 | content becomes abstract |
|
| | ZfN-1946-1-0121 | content becomes abstract |
|
| | ZfN-1946-1-0125 | OCR problem in title (special character) part of content in abstract |
|
| | ZfN-1946-1-0131 | OCR problem in title Mattauch -Herzog'schen => M attauch -H e r z o gsehen |
|
| | ZfN-1946-1-0146 | part of abstract missing |
|
| | ZfN-1946-1-0151 | content becomes abstract |
|
| Testing from 22.03.2012
|
| | ZNC-1988-43c-0001 | Keywords are in abstract | Very similar layout; may improve as we reach this year proper later since we will have accumulated data corresponding to the field. CNR
|
| | ZNC-1988-43c-0019 | part of keyword is in dedication | Very similar layout, difficult to differentiate
|
| | ZNC-1988-43c-0029 | Keywords are in abstract | idem; keywords and abstract have very similar format
|
| | ZNC-1988-43c-0074 | Author gets additional affiliation | Would go in the training data - checking this with Patrice
|
| | ZNC-1988-43c-0099 | Department info gets lost | Would go in the training data - checking this with Patrice
|
| | ZNC-1988-43c-0126 | Subtitle is added to title | No model for subtitles; Impossible to do any better there
|
| | ZNC-1988-43c-0133 | special character problem in title, </br> wrong parsing of keywords | comes from OCR, </br> goes in training data
|
| | ZNC-1988-43c-0155 | Subtitle is added to title | cf. above
|
| | ZNC-1988-43c-0167 | All affiliations are listed as creators (not child from author) | This is manageable (default rule when all affiliation are applied to all authors)
|
| | ZNC-1988-43c-0173 | Keywords are in abstract | cf. above, very difficult case
|
| | ZNC-1988-43c-0177 | All affiliations are listed as creators (not child from author) Authors become affiliations | To be checked on our side if we can reduce such pbs
|
| | ZNC-1988-43c-0199 | One author becomes two | Already identified issue with previous volumes; contrib to training data CNR
|
| | ZNC-1988-43c-0213 | special character problem in title | OCR issue
|
| | ZNC-1988-43c-0231 | author gets wrong affiliation zfn becomes affiliation affiliations are listed as authors | Grobid gets desynch; candidate for training data
|
| | ZNC-1988-43c-0269 | Title is missing | Grobid mistakes (happens...); training data
|
| | ZNC-1988-43c-0285 | address becomes author | Interesting! Training data CNR
|
| | ZNC-1988-43c-0337 | affiliations are listed as authors abstract is missing | Difficult case (organisation looks like a name)
|
| | ZNC-1988-43c-0363 | Institute info is missing |
|
| | ZNC-1988-43c-0370 | Abstract is missing |
|
| | ZNC-1988-43c-0397 | Subtitle is added to title |
|
| | ZNC-1988-43c-0403 | Abstract is missing |
|
| | ZNC-1988-43c-0418 | special characters in affiliations are not recognized (in names they are) | OCR issues
|
| | ZNC-1988-43c-0431 | Keywords are in abstract |
|
| | ZNC-1988-43c-0438 | One author becomes two | Training data
|
| | ZNC-1988-43c-0443 | Affiliation gets lost, author gets wrong affiliation |
|
| | ZNC-1988-43c-0449 | Authors get wrong affiliations |
|
| | ZNC-1988-43c-0463 | Author gets lost abstract is misisng |
|
| | ZNC-1988-43c-0467 | Abstract is missing |
|
| | ZNC-1988-43c-0479 | Keywords in abstract |
|
| | ZNC-1988-43c-0505 | Keywords in abstract |
|
| | ZNC-1988-43c-0511 | Abstract is missing |
|
| | ZNC-1988-43c-0515 | Keywords in abstract |
|
| | ZNC-1988-43c-0519 | Title and subtitle are merged Abstract is missing |
|
| | ZNC-1988-43c-0529 | Keywords in abstract | CNR
|
| | ZNC-1988-43c-0545 | Institution and department info are lost |
|
| | ZNC-1988-43c-0554 | Institution and department info are lost Keywords in abstract |
|
| | ZNC-1988-43c-0563 | Keywords in abstract |
|
| | ZNC-1988-43c-0577 | OCR problem in title |
|
| | ZNC-1988-43c-0601 | Title is missing Author name is title OCR problem in author name |
|
| | ZNC-1988-43c-0609 | Institution and department info are lost Keywords in abstract |
|
| | ZNC-1988-43c-0613 | Subtitle is added to title Abstract is missing |
|
| | ZNC-1988-43c-0636 | Address is added to affiliation name OCR problem in author name |
|
| | ZNC-1988-43c-0665 | Start of abstract is in keywords | CNR
|
| | ZNC-1988-43c-0709 | Institution and department info are lost |
|
| | ZNC-1988-43c-0717 | Institution and department info are lost Keywords in abstract |
|
| | ZNC-1988-43c-0731 | Keywords in abstract |
|
| | ZNC-1988-43c-0765 | author affiliation mix up | difficult data
|
| | ZNC-1988-43c-0769 | Keywords in abstract |
|
| | ZNC-1988-43c-0777 | Start of abstract is in keywords |
|
| | ZNC-1988-43c-0782 | Relation between author and affiliation gets lost Keywords in abstract |
|
| | ZNC-1988-43c-0795 | Relation between author and affiliation gets lost |
|
| | ZNC-1988-43c-0799 | Keywords in abstract |
|
| | ZNC-1988-43c-0823 | Department info in address line |
|
| | ZNC-1988-43c-0850 | Keywords in abstract |
|
| | ZNC-1988-43c-0857 | OCR problem in title |
|
| | ZNC-1988-43c-0857 | affiliations merged Keywords in abstract | difficult data
|
| | ZNC-1988-43c-0893 | subtitle becomes author | CNR
|
| | ZNC-1988-43c-0903 | Keywords in abstract |
|
| | ZNC-1988-43c-0908 | Abstract is missing |
|
| | ZNC-1988-43c-0918 | Abstract is missing |
|
| | ZNC-1988-43c-0938 | Department info is missing |
|
| | ZNC-1988-43c-0955 | Keywords in abstract |
|
| | ZfN-1946-1-0151 | content becomes abstract |
|
| Testing from 29.03.2012 / Reihe A, Volume 2 (1947)
|
| | ZNA-1947-2a-0154 | Abstract fehlt |
|
| | ZNA-1947-2a-0159 | ok |
|
| | ZNA-1947-2a-0163 | ok |
|
| | ZNA-1947-2a-0167 | ok |
|
| | ZNA-1947-2a-0171 | PDF Datei fehlt bzw. gleicher Inhalt wie ZNA-1947-2a-0173 |
|
| | ZNA-1947-2a-0173 | <affiliation><orgName type="department" key="dep1">Institut für physikalische Chemie und Elektrochemie</orgName><orgName type="department" key="dep2">Kaiser-Wilhelm-Institut für physikalische Chemie und Elektrochemie</orgName><orgName type="institution">Technischen Universität Berlin-Charlottenburg</orgName><address><settlement>Berlin-Dahlem</settlement></address></affiliation> /// Zuordnung???? |
|
| | ZNA-1947-2a-0177_b | ZNA-1947-2a-0175_b.header.tei.xml und ZNA-1947-2a-0177_b.pdf /// Namen stimmen nicht überein. |
|
| | ZNA-1947-2a-0184_n | <date type="published" when="10471"/> /// OCR |
|
| | ZNA-1947-2a-0185 | ok |
|
| | ZNA-1947-2a-0202 | <publicationStmt>unknown</publicationStmt> imprint nicht vollständig, abstract fehlt |
|
| | ZNA-1947-2a-0216 | <head>Abstract</head> A ls "Neue Sterne" oder "Novae" werden /// Gibt kein Abstract nur Inhalt. |
|
| | | ZNA-1947-2a-0217.header.tei/// IDENTISCH MIT ZNA-1947-2a-0219.header.tei |
|
| | ZNA-1947-2a-0219 | ok |
|
| | ZNA-1947-2a-0226 | ok |
|
| | | <title level="a" type="main">dereinschalten</title> author fehlt, imprint nicht vollständig, /// Falscher Title (Gasballastpumpen) |
|
| | ZNA-1947-2a-0238_n | <date type="published" when="1946"/> /// Falscher Title (Gasballastpumpen), imprint nicht vollständig, |
|
| | ZNA-1947-2a-0239_n | <orgName type="institution">Unterharzer Berg</orgName><author><persName><forename type="first">Hüttenwerke</forename><forename type="middle">G m b H</forename><roleName>Goslar</roleName></persName>imprint nicht vollständig, |
|
| | ZNA-1947-2a-0241 | ok |
|
In a project meeting was decided that we use pdf/A_1b format, which is not the case.
Die Frage ist hier: Wie entscheide ich, ob eine vorliegende Datei konform zur
Norm ist?
Wir erzeugen die PDF/A-Dateien mit Adobe Acrobat Pro 9 und ggf. 10. Die
PDF/A-Konformität stellen wir durch eine Überprüfung und ggf. Konvertierung mit
dem in Acrobat eingebauten Preflight-Werkzeug her. Dies stammt von der Firma
callas (siehe z.B.
http://www.callassoftware.com/callas/doku.php/en:news:press:20110706).
Eine erneute Überprüfung der von Ihnen genannten Datei (ich nahm sie vom
ftp-Server) mit Acrobat 9.5 sowie auch dem aktuellen callas pdfaPilot 3 ergab
keine Probleme bzgl. der Konformität mit PDF/A-1b. Sie können dies gerne
nachvollziehen, der pdfaPilot ist als Testversion zum Download erhältlich:
http://www.callassoftware.com/callas/doku.php/de:download
Allgemein gilt, dass die PDF/A-Standards komplex sind und entsprechend auch
die Validierung von Dateien auf Standardkonformität (vgl.
http://www.pdfa.org/2011/08/validating-pdfa/). Es gibt nun verschiedene
Software zur Konformitätsprüfung. Doch, welcher Software will man vertrauen?
Was fehlt ist eine Referenzimplementierung, also eine vom
Standardisierungsgremium abgesegnete Software, die die Konformitätsprüfung von
Dateien sicher, korrekt und vollständig vornimmt. Es gibt einen kompenten und
umfangreichen Test diverser Software von 2009
(http://www.pdflib.com/fileadmin/pdflib/pdf/pdfa/2009-05-04-Bavaria-report-on-PDFA-validation-accuracy.pdf),
incl. Vorgängerversionen der hier genannten Software. Darin schneiden
Acrobat/callas nicht schlecht ab. Kein Programm ist perfekt. Das häufigste
Problem bei Acrobat, callas, Solid Documents und PDF Tools sind falsche Alarme,
d.h. es werden Verstöße gegen die Norm gemeldet, die aber tatsächlich nicht
vorhanden sind.
Von den verfügbaren Optionen erscheint uns die callas-Software nicht die
schlechteste Wahl. Sie hat offenbar eine gewisse Marktpräsenz und wird ja auch
nicht ohne Grund von Adobe (den Schöpfern von pdf!) in die Acrobat-Software
integriert. callas ist eines der Mitglieder des PDF/A Competence Center und
arbeitete aktiv an der Erstellung diverser pdf-Standards incl. den
PDF/A-ISO-Normen mit.
Andererseits: Ich testete die genannte Datei heute auch selber mit der von
Ihnen genannten online-Validierung von pdf-tools.com und erhielt ebenfalls die
Fehlermeldung. Diese ist aber wenig hilfreich, da sie das gefundene Problem
nicht genau lokalisiert. So kann man (oder zumindest ich) nicht wirklich für
Abhilfe sorgen.
Der von Ihnen zitierte Validierungsreport von validatepdf.com bemängelt einen
Fehler im OutputIntent, lässt aber offen, worin das Problem genau liegt.
Weiterhin gibt er Warnungen bzgl. der XMP-Metadaten aus. Die XMP-Metadaten
werden aber von den anderen Programme, incl. dem XMP-Validierer für PDF/A-1 von
PDFlib
(http://www.pdflib.com/de/knowledge-base/xmp-metadaten/kostenloser-xmp-validator/)
nicht bemängelt.
Fazit: Manche Software bestätigt Normen-Konformität, während andere Fehler
oder Warnungen ausgibt. Keiner der angeblichen Mängel wird von einer zweiten
Software gefunden. Dies entspricht dem Bild aus dem oben zitierten
Bavaria-Report. Inhaltlich erscheinen mir die Mängel, so sie denn existent
sind, nicht gravierend.
Wir schlugen Ihnen die Lieferung als PDF/A vor, um Ihnen möglichst große
Sicherheit für die langzeitige Verfügbarkeit der Daten zu geben. Wobei unsere
pdf-Dateien ja ohnehin recht einfach aufgebaut sind und schon ohne PDF/A wenig
Probleme erwarten ließen. Absolute Zukunftssicherheit gibt es bei der digitalen
Langzeitarchivierung bislang nicht. Wir sind aber zuversichtlich, dass die an
Sie gelieferten und nach Acrobat/callas auch PDF/A-konformen Dateien
vergleichsweise sehr zukunftssicher sind.
PDF/A-Konformität nach Acrobat/callas sehe ich als ausreichend an. Ich würde
mich freuen, wenn Sie dem zustimmen könnten.
(Die Projektleitung der MPDL (Malte Dreyer) stimmt dem zu und ist vor diesem Hintergrund gerne bereit die Anlieferung zu akzeptieren.)