PubMan Func Spec Search Engine Optimization

From MPDLMediaWiki
Jump to navigation Jump to search

This Page provides information about the possible improvement of the appearance of PubMan content in search engines, especially in Google.

---- Work in progress ----

Current Situation with Google[edit]


Example: Search for “Experimental demonstration of a suspended diffractively coupled optical cavity”.

  • The headline is taken from the actual 'title' tag used in pubman:

Publication Manager :: Experimental demonstration of a suspended ...

  • The snipped contains a confusing mixture of irrelevant information:

Item StatisticsRevisionsRelease HistoryView item. Experimental demonstration of a suspended diffractively coupled optical cavity. Item is Released ...

Google preferably takes the snippet information from the 'description' tag (up to 150 characters). PubMan currently does not use the 'description' tag.

Alternatively Google uses information from Open Directory Project (which can be prohibited through the 'robots' tag).

Articles indexed by Google Scholar have structured snippets with information about author(s), publication year and full length title. Additionaly features like “cited by”, “related articles” or “all xy versions” become available for those articles.

  • Google Scholar indexes content when crawlers are permitted to access to the site. So far only scholarly articles are indexed, no books or monographs. Items with access restrictions mandatorily have to have at least an abstract.


eDoc content appears with structured information in the snipped, though the information provided here is not always the same:

Sometimes Google takes the beginning of an abstract, while in other cases e.g. the full length title and authors are displayed.

Future State[edit]

  • Headline: 'title'

It is arguable if "Publication Manager" has to appear in the title for the source can easily be identified through the URL, which always is displayed at the bottom.

  • Snipped: Author(s), Genre, Pages, OA-Status.

eDoc Solution[edit]

Robots trying to access PubMan are redirected to an index of PubMan content, which offers a previously defined set of metadata as plain text to be indexed by the search engine. The search engine takes the information provided in the tags of the index to fill the snippet of its search result page. This means that robots will not access the live system.

This solution is supposed to work not only for Google but for most search engines.

Metadata provided in the index could be:

  • Autor(s): taken from PubMan creator - For items with more than three authors, the listing should be cut after the third author and end with ",...".
  • Genre: taken from PubMan genre.
  • Number of pages: taken from PubMan pages.
  • OA status: The note "Open Access" could be provided if there is at least one file in visibility status "public".
  • Abstract: taken from PubMan: abstract.
  • Full length title: taken from PubMan title.
  • Keywords: taken from PubMan Free keywords.

To be defined:

  • The robot directory permananetly has to be kept on an actual level.
  • In which (time)cycles the index is reconditioned to keep the provided information up to date. The time between two reconditions should not extent one week to make sure, new content can be indexed by Google in an acceptable period of time.

Direct Access to PubMan[edit]

Optimizing PubMan for direct indexation by Google should include using the 'description' tag to offer a reasonalble set of metadata for the snippet (see metadata list above).

Using the Google Webmasters Tool requires registration and, therefore, agreement to the Google Terms of Service. Once registered it is possible to submit content as well as sitemaps. Google Scholar uses its own registration form for submission.


  • CoLab page about SEO [1]