PubMan Web Syndication Feeds

From MPDLMediaWiki
Revision as of 09:48, 23 October 2009 by Makarenko (talk | contribs) (→‎Further reading and related pages)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Motivation[edit]

Web feeds allow software programs to check for updates published on a web site. To provide a web feed, a site owner may use specialized software (such as a content management system) that publishes a list (or "feed") of recent articles or content in a standardized, machine-readable format. The feed can then be downloaded by service providers that syndicate content from the feed, or by feed reader programs that allow Internet users to subscribe to feeds and view their content.

In the PubMan context a range of syndications are reasonable, see #Candidates for syndications on PubMan

Widespread formats[edit]

For the moment there are 2 main branches of formats of the Web Syndication Feeds: RSS and Atom.

  • RSS can be divided into 2 sub branches: RSS 1.* and RSS 2.*. See here for features. Today, most feed readers and syndication tools supports both branches.
  • Atom is relative new WSF with many advantages. It has several backward compatible dialects.

Distribution: As of August 2008, the syndic8.com website was indexing 546,069 total feeds of which 86,496 were some dialect of Atom and 438,102 were some dialect of RSS, see feed summary. Following usage distribution of the RSS branches are taken from the Peachpit report from January 2007:

RSS version Usage
RSS 0.91 (RSS 2.* branch) 13%
RSS 1.0 (RSS 1.* branch) 17%
RSS 2.0 (RSS 2.* branch) 67%

Conclusion: It make sense to implement RSS 2.0 and Atom first. The Atom is the good candidate for implementation due to Google support and increasing usage for the moment. A later release may introduce support for RSS 1.* version if is explicitly requested by users

Usage[edit]

Web Syndication Feed (WSF) interface of the PubMan can be used

  • by users directly with the browsers (FF, IE, Opera, etc.) which have already built-in plugins for WSF managing
  • for automatized generation of the institutes web sites. See Feeding local webpages for more details.


Candidates for syndications on PubMan[edit]

The WSF can be divided into 3 groups according to the PubMan visibility

1. Public views[edit]

Implemented[edit]

  • recent releases in repository (item versions)
    • Interface location: Home page
    • <link rel="alternate" ...>:
http://pubman.mpdl.mpg.de/syndication/feed/rss20/releases
  • recent releases for a specific Organization Unit (item versions)
    • Interface location: Page of the Organizational Search Results
    • <link rel="alternate" ...>:
http://pubman.mpdl.mpg.de/syndication/feed/rss20/releases/affiliation/escidoc:persistent3

Further development if needed[edit]

  • recent changes for a specific publication
    • Interface location: Any View Item page
    • <link rel="alternate" ...>:
http://pubman.mpdl.mpg.de/syndication/feed/rss20/changes/item/escidoc:28123

2. Session dependent views[edit]

  • each advanced search (Cannot be implemented for the moment, the advanced search history scenario is not yet specified for PubMan)
  • Implemented not on the base of session handling. CQL query with the framework indexes is used. Web presentation is here. Url syntax:
http://pubman.mpdl.mpg.de/syndication/feed/<feedType>/search?q=<CQL query>

3. Authorization dependent views[edit]

  • Workspaces:
    • Latest submissions
    • Latest changes

In the first stage of the implementation we could concentrate on the Public views syndications.

Implementation[edit]

  • The new service SyndicationManager, it could be located in common_services to be accessible in all solutions (PubMan, Faces, ViRR). The SyndicatonManager can use structuredexportmanager for PubItemListXML->FeedXML transformations transformations considering WSF formats as the new export formats, e.g RSS20, ATOM, etc.
  • structuredexportmanager should be redesigned to be able to export aggregated information like: name of the feed, its description, date of last change, etc.

--Makarenko 09:39, 6 April 2009 (UTC):

    • The current version of the SyndicationManager calls the Search&Export interface directly to retrieve item list
    • Search&Export delivers eSciDoc XML for item list
  • The ROME project provides a set of open source Java tools which cn be used for the processing and generation of the wellformed WSFs.

Outcome of the SyndicationManager design and architecture meeting[edit]

  • no EJB interface will be implemented
  • SyndicationManager will consist of a single presentation module (syndication_presentation)
  • The transformation to RSS/Atom (Rome) will be done in the new Transformation Service (in R4, the transformation will be encapsulated into an own class in the SyndicationManager)
  • The feed definition (configuration XML) will be held in the SyndicationManager
  • The feed definition should be extendable by:
    • definition of alternative search/data services
    • definition of the formats of the input/output of these services
  • SyndicationManager will use SearchAndExport instead of the Search module (in R4, Search will still be used and only publication items will be used for syndication)
  • Therefore, SearchAndExport will be extended by a pure "eSciDoc XML" output format
  • Tom will test/explore caching in a proxy.

Required ToDos:

  • Mapping PubMan MD -> RSS/Atom
  • Design of the SyndicationManager component
  • Revise user interface to allow auto discovery of feeds (<link rel="alternate" [...]) on the corresponding web pages
  • Identification of additional candidates for syndication

see comment regarding naming on Talk:PubMan_Web_Syndication_Feeds

Further reading and related pages[edit]

RSS 2.0 Standard

JIRA Tasks: AS-377, AS-586

Atom Wikipedia

RSS Wikipedia

ROME project

Media RSS Module (mrss), ROME module for mrss.

See Design in EA: /Desing Model/Use Case Realization/SyndicationManager

What Is RSS by Mark Pilgrim

w3c feed validator