Imeji Performance eSciDoc

From MPDLMediaWiki
Jump to: navigation, search
Imeji logo.png
Internal
Meetings
Cooperation
Specification
Architecture
Installer
Ingest
Functional Specification
Technical Specification
Metadata
RDF mapping
Metadata terms
edit


This page contains information about different technology possibilities to implement imeji. To achieve the requirements the performance of the different technologies is of most interest.

Technology Time to update one item * Time to ingest one item ** Pro Happy.gif Con Sad.gif Open Questions Status
eSciDocEnhanced Scientific Documentation Item Retrieval (SOAPSimple Object Access Protocol) 0,6 sec 2,65 sec Fast development, as already implemented in other solutions
All eSciDocEnhanced Scientific Documentation services can be used (versioning, statistics, aa etc.)
Very slow
Extra release, pid assignment etc. is necessary
-- Tested
eSciDocEnhanced Scientific Documentation Item Retrieval (RESTRepresentational State Transfer) 0,5 sec 2,2 sec All eSciDocEnhanced Scientific Documentation services can be used (versioning, statistics, aa etc.)
retrieve Operation is faster (approx. half a second per item)
Slow
Extra release, pid assignment etc. is necessary
-- Tested
eSciDocEnhanced Scientific Documentation IngestHandler -- 0,4 sec -- No PIDPersistent Identifer or Identification assigned
User needs special role: ingester
Items seems not to be indexed: blocker!
-- Tested
eSciDocEnhanced Scientific Documentation ContentRelation -- -- -- CR is not under version control Cannot be updated any more when released once (The documentation says public-status of an CR must not be "released"). Thus, CRs are not feasible for this purpose Tested
eSciDocEnhanced Scientific Documentation Item with 1000+ components
All metadata of a collection are stored within one item
?? 0.9 sec faster ingest compared to single item ingest Retrieval times for item with 1000 components: > 33 sec
Initial filesize: 0,6MB (will increase with each version)
Failed to ingest an item with 10000 components
Initial file size: > 5MB
-- Tested
eSciDocEnhanced Scientific Documentation as archive, MDMetadata in Triple Store updating 1000 item = 3503ms, == 3,5ms/item
updating 100000 items = 21073ms, == 210ns/item
ingesting 100000 items (= 1,2 Mio Triples) = 81452ms, == 814ns Very fast synchronization issues
Evtl. redundant data
aa has to be implemented
How do we perform status updates? (escidoc has to know the status, not only the triple store)
maybe this alternative can be acceptable in decoupled scenario e.g. ingest/updates are done directly on the triple store, they are stored with delay in eSciDocEnhanced Scientific Documentation core - in this case, AA must be taken seriously as well
see also File:Batch metadata update.pptx
Tested Decided and agreed
see also MD Store implementation
eSciDocEnhanced Scientific Documentation Core Performance Tuning -- -- All solutions could profit from this Development has to be together with FIZFachinformationszentrum Karlsruhe, so that we do not develop our own eSciDocEnhanced Scientific Documentation which we have to adopt with every FW release
Development process can be very long
Code seems to be complex to understand
Would FIZFachinformationszentrum Karlsruhe be willing to provide development resources to perform this task? Discarded
No eSciDocEnhanced Scientific Documentation -- -- Can be much faster Services can not be reused
High development effort
What to use as storage? FedoraFlexible Extensible Digital Object Repository Architecture, DB? Discarded

(*) Only update operation

(**) Whole process form create to release, with eventually necessary retrieves, pid assignment, submit etc...