Difference between revisions of "Imeji Performance eSciDoc"

From MPDLMediaWiki
Jump to navigation Jump to search
 
(38 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Faces}}
{{Imeji_Tech}}


<accesscontrol>MPDL,,</accesscontrol>
This page contains information about different technology possibilities to implement [[Imeji | imeji]]. To achieve the requirements the performance of the different technologies is of most interest.


== eSciDoc ItemHandler (SOAP)==
{|{{table}}
All metadata are stored in an eSciDoc item. The item is updated etc. via the eSciDoc item handler.  
|- bgcolor="#F5F5DC"
! width="300" |'''Technology'''
! width="100" |'''Time to update one item *'''
! width="100" |'''Time to ingest one item **'''
!width="200"|'''Pro [[Image:Happy.gif]]''' 
! width="200" |'''Con [[Image:Sad.gif]]'''
! width="300" |'''Open Questions'''
! width="100" |'''Status'''


[[Image:Happy.gif | 20px]] Fast development, as already implemented in other solutions
|- style="height:20px"
: All eSciDoc services can be used (versioning, statistics, aa etc.)
|'''eSciDoc Item Retrieval (SOAP)'''|| 0,6 sec ||2,65 sec || Fast development, as already implemented in other solutions <br/>All eSciDoc services can be used (versioning, statistics, aa etc.)||Very slow <br/> Extra release, pid assignment etc. is necessary  || -- ||[http://jira.mpdl.mpg.de/browse/FACESBUG-431 Tested]
|-
|'''eSciDoc Item Retrieval (REST)''' ||0,5 sec ||2,2 sec ||All eSciDoc services can be used (versioning, statistics, aa etc.) <br/> retrieve Operation is faster (approx. half a second per item)  || Slow <br/> Extra release, pid assignment etc. is necessary|| -- ||[http://jira.mpdl.mpg.de/browse/FACESBUG-431 Tested]
|-
|'''eSciDoc IngestHandler''' ||-- ||0,4 sec  ||--|| No PID assigned <br/> User needs special role: ingester <br/> Items seems not to be indexed: '''blocker!'''|| -- ||[http://jira.mpdl.mpg.de/browse/FACESBUG-436 Tested]
|-
|'''eSciDoc ContentRelation''' ||-- || --||--|| CR is not under version control || Cannot be updated any more when released once (The documentation says public-status of an CR must not be "released"). Thus, CRs are '''not feasible''' for this purpose || [http://jira.mpdl.mpg.de/browse/FACESBUG-434 Tested]
|-
|'''eSciDoc Item with 1000+ components''' <br/> All metadata of a collection are stored within one item ||?? ||0.9 sec || faster ingest compared to single item ingest || Retrieval times for item with 1000 components: > 33 sec<br/> Initial filesize: 0,6MB (will increase with each version)<br/> '''Failed to ingest''' an item with 10000 components<br/> Initial file size: > 5MB ||-- ||Tested
|-
|'''eSciDoc as archive, MD in Triple Store'''||updating 1000 item = 3503ms, == 3,5ms/item<br/>updating 100000 items = 21073ms, == 210ns/item || ingesting 100000 items (= 1,2 Mio Triples) = 81452ms, == 814ns||Very fast|| synchronization issues <br/> Evtl. redundant data <br/> aa has to be implemented  || How do we perform status updates? (escidoc has to know the status, not only the triple store) <br/> maybe this alternative can be acceptable in decoupled scenario e.g. ingest/updates are done directly on the triple store, they are stored with delay in eSciDoc core - in this case, AA must be taken seriously as well <br/> see also [[Image:Batch_metadata_update.pptx]] || [http://jira.mpdl.mpg.de/browse/FACESBUG-435 Tested] '''Decided and agreed'''<br/> see also [[MD_Store|MD Store implementation]]
|-
|'''eSciDoc Core Performance Tuning'''  ||-- || --||All solutions could profit from this || Development has to be together with FIZ, so that we do not develop our own eSciDoc which we have to adopt with every FW release <br/> Development process can be very long  <br/> Code seems to be complex to understand  || Would FIZ be willing to provide development resources to perform this task?  || Discarded
|-
|'''No eSciDoc''' ||-- || --||Can be much faster || Services can not be reused  <br/> High development effort  || What to use as storage? Fedora, DB? || Discarded
|}


[[Image:Sad.gif | 20px]] Very slow (approx. 2,65 sec for one item (create, update, submit, pid, release))
(*) Only update operation
: Extra release, pid assignment etc. is necessary


'''Open Questions:'''
(**) Whole process form create to release, with eventually necessary retrieves, pid assignment, submit etc...


== eSciDoc Item retrival via REST Interface==
[[Category:Imeji_Technical_Specification|Performance eSciDoc]]
All metadata are stored in an eSciDoc item. The item is updated etc. via the eSciDoc REST interface.
[[Category: ESciDoc]]
 
[[Image:Happy.gif | 20px]]  All eSciDoc services can be used (versioning, statistics, aa etc.)
: retrieve Operation is faster (approx. half a second per item)
 
[[Image:Sad.gif | 20px]] Slow (approx. 2,2 sec for one item (create, update, submit, pid, release))
: Extra release, pid assignment etc. is necessary
 
'''Open Questions:'''
 
== eSciDoc IngestHandler ==
 
[[Image:Happy.gif | 20px]]
* Items ingested with status released:
** ingest is about 0,4 s
 
[[Image:Sad.gif | 20px]]
* User should habe a new role: ingester: Means that all Faces user should hat the privilige
* Items released without a PID:
** Possibility to ingest the item with a dummy PID and to update that PID after the ingest with a real value: decrease performance.
* Items seems not to inedexed: blocker!
 
'''Open Questions:'''
 
== eSciDoc ContentRelation ==
All metadata are stored in a content relation object, which is related to the item (image).
 
[[Image:Happy.gif | 20px]] For a metadata change, the content relation only needs to be updated (no extra release, pid assignment etc.)
* Unfortunately, this is wrong: A content relation also has to be submitted and released. Additionally, it seems it cannot be updated anymore when released once (The documentation says public-status of an CR must not be "released"). Thus, CRs are not feasible for this purpose --[[User:Haarlaender|MarkusH]] 14:05, 1 June 2010 (UTC)
 
[[Image:Sad.gif | 20px]] To be checked if fast enough
* As there's still submit and release necessary, no much difference to an item. And update is not possible, see comment above. --[[User:Haarlaender|MarkusH]] 14:05, 1 June 2010 (UTC)
 
'''Open Questions:'''
* Is a content relation under version control?
:No, it is not under version control
* How is the aa for content relations?
:same principle as for items, excluding versions
 
== eSciDoc only as archive ==
The item (image) itself will be stored in eSciDoc, together with the technical metadata.
 
'''Open Questions:'''
* Is there a set of 'core' metadata we have to/ should store with the item for LTA reasons?
:any metadata are subject to LTA, in addition, PREMIS event history is tracked for items/containers
 
=== MD in Triple Store ===
All metadata are stored in a triple store, all operations (search, update etc.) can take place here.
 
The triple store would know the eSciDoc id of the image item, but the image item would not know its metadata in the triple store.
 
 
 
[[Image:Happy.gif | 20px]] Very fast (30,000 items in 2 seconds)
:items or triples?
 
[[Image:Sad.gif | 20px]] synchronization issues
: redundant data
: aa has to be implemented
 
'''Open Questions:'''
* How can we synchronize the two systems? (Do we have to synchronize them at all, or is sufficient to store the md only in the triple store?)
* How do we perform status updates? (escidoc has to know the status, not only the triple store)
* maybe this alternative can be acceptable in decoupled scenario e.g. ingest/updates are done directly on the triple store, they are stored with delay in eSciDoc core - in this case, AA must be taken seriously as well
*see also [[Image:Batch_metadata_update.pptx]]
 
== eSciDoc Core Performance Tuning ==
Update eSciDoc core, so that retrieval of items are faster.
 
[[Image:Happy.gif | 20px]] All solutions could profit from this
 
[[Image:Sad.gif | 20px]] Development has to be together with FIZ, so that we do not develop our own eSciDoc which we have to adopt with every FW release.
: Development process can be very long
: Code seems to be complex to understand
 
'''Open Questions:'''
* Would FIZ be willing to provide development resources to perform this task?
 
== No eSciDoc ==
We do not use eSciDoc at all.
 
[[Image:Happy.gif | 20px]] Can be much faster
 
[[Image:Sad.gif | 20px]] Services can not be reused
: High development effort
 
'''Open Questions:'''
* What to use as storage? Fedora, DB?
 
[[Category:Faces | Metadata Update]]
[[Category:Faces 4.0| Metadata Update]]

Latest revision as of 07:41, 19 August 2013

Imeji logo.png

Internal
Meetings
Cooperation

Specification
Architecture
Installer
Ingest
Functional Specification
Technical Specification

Metadata
RDF mapping
Metadata terms

edit


This page contains information about different technology possibilities to implement imeji. To achieve the requirements the performance of the different technologies is of most interest.

Technology Time to update one item * Time to ingest one item ** Pro Happy.gif Con Sad.gif Open Questions Status
eSciDoc Item Retrieval (SOAP) 0,6 sec 2,65 sec Fast development, as already implemented in other solutions
All eSciDoc services can be used (versioning, statistics, aa etc.)
Very slow
Extra release, pid assignment etc. is necessary
-- Tested
eSciDoc Item Retrieval (REST) 0,5 sec 2,2 sec All eSciDoc services can be used (versioning, statistics, aa etc.)
retrieve Operation is faster (approx. half a second per item)
Slow
Extra release, pid assignment etc. is necessary
-- Tested
eSciDoc IngestHandler -- 0,4 sec -- No PID assigned
User needs special role: ingester
Items seems not to be indexed: blocker!
-- Tested
eSciDoc ContentRelation -- -- -- CR is not under version control Cannot be updated any more when released once (The documentation says public-status of an CR must not be "released"). Thus, CRs are not feasible for this purpose Tested
eSciDoc Item with 1000+ components
All metadata of a collection are stored within one item
?? 0.9 sec faster ingest compared to single item ingest Retrieval times for item with 1000 components: > 33 sec
Initial filesize: 0,6MB (will increase with each version)
Failed to ingest an item with 10000 components
Initial file size: > 5MB
-- Tested
eSciDoc as archive, MD in Triple Store updating 1000 item = 3503ms, == 3,5ms/item
updating 100000 items = 21073ms, == 210ns/item
ingesting 100000 items (= 1,2 Mio Triples) = 81452ms, == 814ns Very fast synchronization issues
Evtl. redundant data
aa has to be implemented
How do we perform status updates? (escidoc has to know the status, not only the triple store)
maybe this alternative can be acceptable in decoupled scenario e.g. ingest/updates are done directly on the triple store, they are stored with delay in eSciDoc core - in this case, AA must be taken seriously as well
see also File:Batch metadata update.pptx
Tested Decided and agreed
see also MD Store implementation
eSciDoc Core Performance Tuning -- -- All solutions could profit from this Development has to be together with FIZ, so that we do not develop our own eSciDoc which we have to adopt with every FW release
Development process can be very long
Code seems to be complex to understand
Would FIZ be willing to provide development resources to perform this task? Discarded
No eSciDoc -- -- Can be much faster Services can not be reused
High development effort
What to use as storage? Fedora, DB? Discarded

(*) Only update operation

(**) Whole process form create to release, with eventually necessary retrieves, pid assignment, submit etc...