Difference between revisions of "Imeji Performance eSciDoc"
Jump to navigation
Jump to search
Kleinfercher (talk | contribs) m |
|||
(26 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
{{ | {{Imeji_Tech}} | ||
This page contains information about different technology possibilities to implement [[Imeji | imeji]]. To achieve the requirements the performance of the different technologies is of most interest. | |||
This page contains information about different technology possibilities to implement [[ | |||
{|{{table}} | {|{{table}} | ||
Line 16: | Line 14: | ||
|- style="height:20px" | |- style="height:20px" | ||
|eSciDoc Item Retrieval (SOAP)|| 0,6 sec ||2,65 sec || Fast development, as already implemented in other solutions <br/>All eSciDoc services can be used (versioning, statistics, aa etc.)||Very slow <br/> Extra release, pid assignment etc. is necessary || -- ||[http://jira.mpdl.mpg.de/browse/FACESBUG-431 Tested] | |'''eSciDoc Item Retrieval (SOAP)'''|| 0,6 sec ||2,65 sec || Fast development, as already implemented in other solutions <br/>All eSciDoc services can be used (versioning, statistics, aa etc.)||Very slow <br/> Extra release, pid assignment etc. is necessary || -- ||[http://jira.mpdl.mpg.de/browse/FACESBUG-431 Tested] | ||
|- | |- | ||
|eSciDoc Item Retrieval (REST) ||0,5 sec ||2,2 sec ||All eSciDoc services can be used (versioning, statistics, aa etc.) <br/> retrieve Operation is faster (approx. half a second per item) || Slow <br/> Extra release, pid assignment etc. is necessary|| -- ||[http://jira.mpdl.mpg.de/browse/FACESBUG-431 Tested] | |'''eSciDoc Item Retrieval (REST)''' ||0,5 sec ||2,2 sec ||All eSciDoc services can be used (versioning, statistics, aa etc.) <br/> retrieve Operation is faster (approx. half a second per item) || Slow <br/> Extra release, pid assignment etc. is necessary|| -- ||[http://jira.mpdl.mpg.de/browse/FACESBUG-431 Tested] | ||
|- | |- | ||
|eSciDoc IngestHandler ||-- ||0,4 sec ||--|| No PID assigned <br/> User needs special role: ingester <br/> Items seems not to indexed: blocker!|| -- ||[http://jira.mpdl.mpg.de/browse/FACESBUG-436 Tested] | |'''eSciDoc IngestHandler''' ||-- ||0,4 sec ||--|| No PID assigned <br/> User needs special role: ingester <br/> Items seems not to be indexed: '''blocker!'''|| -- ||[http://jira.mpdl.mpg.de/browse/FACESBUG-436 Tested] | ||
|- | |||
|'''eSciDoc ContentRelation''' ||-- || --||--|| CR is not under version control || Cannot be updated any more when released once (The documentation says public-status of an CR must not be "released"). Thus, CRs are '''not feasible''' for this purpose || [http://jira.mpdl.mpg.de/browse/FACESBUG-434 Tested] | |||
|- | |- | ||
|eSciDoc | |'''eSciDoc Item with 1000+ components''' <br/> All metadata of a collection are stored within one item ||?? ||0.9 sec || faster ingest compared to single item ingest || Retrieval times for item with 1000 components: > 33 sec<br/> Initial filesize: 0,6MB (will increase with each version)<br/> '''Failed to ingest''' an item with 10000 components<br/> Initial file size: > 5MB ||-- ||Tested | ||
|- | |- | ||
|eSciDoc as archive, MD in Triple Store|| | |'''eSciDoc as archive, MD in Triple Store'''||updating 1000 item = 3503ms, == 3,5ms/item<br/>updating 100000 items = 21073ms, == 210ns/item || ingesting 100000 items (= 1,2 Mio Triples) = 81452ms, == 814ns||Very fast|| synchronization issues <br/> Evtl. redundant data <br/> aa has to be implemented || How do we perform status updates? (escidoc has to know the status, not only the triple store) <br/> maybe this alternative can be acceptable in decoupled scenario e.g. ingest/updates are done directly on the triple store, they are stored with delay in eSciDoc core - in this case, AA must be taken seriously as well <br/> see also [[Image:Batch_metadata_update.pptx]] || [http://jira.mpdl.mpg.de/browse/FACESBUG-435 Tested] '''Decided and agreed'''<br/> see also [[MD_Store|MD Store implementation]] | ||
|- | |- | ||
|eSciDoc Core Performance Tuning ||-- || --||All solutions could profit from this || Development has to be together with FIZ, so that we do not develop our own eSciDoc which we have to adopt with every FW release <br/> Development process can be very long <br/> Code seems to be complex to understand || Would FIZ be willing to provide development resources to perform this task? | |'''eSciDoc Core Performance Tuning''' ||-- || --||All solutions could profit from this || Development has to be together with FIZ, so that we do not develop our own eSciDoc which we have to adopt with every FW release <br/> Development process can be very long <br/> Code seems to be complex to understand || Would FIZ be willing to provide development resources to perform this task? || Discarded | ||
|- | |- | ||
|No eSciDoc ||-- || --||Can be much faster || Services can not be reused <br/> High development effort || What to use as storage? Fedora, DB? | |'''No eSciDoc''' ||-- || --||Can be much faster || Services can not be reused <br/> High development effort || What to use as storage? Fedora, DB? || Discarded | ||
|} | |} | ||
Line 35: | Line 35: | ||
(**) Whole process form create to release, with eventually necessary retrieves, pid assignment, submit etc... | (**) Whole process form create to release, with eventually necessary retrieves, pid assignment, submit etc... | ||
[[Category: | [[Category:Imeji_Technical_Specification|Performance eSciDoc]] | ||
[[Category: | [[Category: ESciDoc]] |
Latest revision as of 07:41, 19 August 2013
|
This page contains information about different technology possibilities to implement imeji. To achieve the requirements the performance of the different technologies is of most interest.
Technology | Time to update one item * | Time to ingest one item ** | Pro | Con | Open Questions | Status |
---|---|---|---|---|---|---|
eSciDoc Item Retrieval (SOAP) | 0,6 sec | 2,65 sec | Fast development, as already implemented in other solutions All eSciDoc services can be used (versioning, statistics, aa etc.) |
Very slow Extra release, pid assignment etc. is necessary |
-- | Tested |
eSciDoc Item Retrieval (REST) | 0,5 sec | 2,2 sec | All eSciDoc services can be used (versioning, statistics, aa etc.) retrieve Operation is faster (approx. half a second per item) |
Slow Extra release, pid assignment etc. is necessary |
-- | Tested |
eSciDoc IngestHandler | -- | 0,4 sec | -- | No PID assigned User needs special role: ingester Items seems not to be indexed: blocker! |
-- | Tested |
eSciDoc ContentRelation | -- | -- | -- | CR is not under version control | Cannot be updated any more when released once (The documentation says public-status of an CR must not be "released"). Thus, CRs are not feasible for this purpose | Tested |
eSciDoc Item with 1000+ components All metadata of a collection are stored within one item |
?? | 0.9 sec | faster ingest compared to single item ingest | Retrieval times for item with 1000 components: > 33 sec Initial filesize: 0,6MB (will increase with each version) Failed to ingest an item with 10000 components Initial file size: > 5MB |
-- | Tested |
eSciDoc as archive, MD in Triple Store | updating 1000 item = 3503ms, == 3,5ms/item updating 100000 items = 21073ms, == 210ns/item |
ingesting 100000 items (= 1,2 Mio Triples) = 81452ms, == 814ns | Very fast | synchronization issues Evtl. redundant data aa has to be implemented |
How do we perform status updates? (escidoc has to know the status, not only the triple store) maybe this alternative can be acceptable in decoupled scenario e.g. ingest/updates are done directly on the triple store, they are stored with delay in eSciDoc core - in this case, AA must be taken seriously as well see also File:Batch metadata update.pptx |
Tested Decided and agreed see also MD Store implementation |
eSciDoc Core Performance Tuning | -- | -- | All solutions could profit from this | Development has to be together with FIZ, so that we do not develop our own eSciDoc which we have to adopt with every FW release Development process can be very long Code seems to be complex to understand |
Would FIZ be willing to provide development resources to perform this task? | Discarded |
No eSciDoc | -- | -- | Can be much faster | Services can not be reused High development effort |
What to use as storage? Fedora, DB? | Discarded |
(*) Only update operation
(**) Whole process form create to release, with eventually necessary retrieves, pid assignment, submit etc...