Archive for June, 2007

Searching for people and authors

The problem with people is that they don’t have unique names. To take an extreme case, Yahoo reported a couple of weeks ago that the Chinese authorities were considering a move to try to end the confusion caused by the fact that more than a billion people are now sharing just 100 surnames, and 93 million have the family name Wang.

More pertinently for this publishing, the problem of author disambiguation has long been an issue for searching bibliographic databases such as PubMed/Medline. There is a lot of work being on done on automatic ways to disambiguate author names, such as using affiliations, email addresses, subjects and co-authorships.

However a more “Web 2.0” way to do this has been suggested in the WikiAuthors proposal. The idea here is that a copy of the database (in this case, Medline initially) would be placed on a wiki, and the authors and their colleagues – that is, the scientific community at large – would do the necessary work.

At present the WikiAuthors proposal appears stalled, pending the development of other WikiMedia projects (e.g. WikiProteins).

I was struck by some similarities with Spock, the current hot new search engine. Spock (currently in private beta, but there’s a good overview on Read/WriteWeb) focuses on people search, that is, it treats all search terms as a request to find matching people. Thus searching for “President of the United States” returns George W Bush and Bill Clinton as its first two hits.

The similarity with the WikiAuthors proposal is that Spock will allow users to add tags (in addition to automatically generated tags).

Spock will, however, be much more semantically rich than is proposed in WikiAuthors. Tags will include name, gender, age, occupation and location, and others. The really interesting bit comes from the “relationship” tag, which will link people together. Thus Spock can offer links to related people – in the case of Bill Clinton, this might be Chelsea Clinton (daughter), Bush (successor), Hilary (wife), Gore (VP). This will be a powerful tool if it works as promised.

Looking back the other way, I wondered if it would it be useful to tag relationships between authors in a bibliographic database, for instance co-authors, co-workers, student/supervisor, etc. This could give a whole new way of exploring links in the literature beyond the current way of using citations.

Technorati Tags: ,

uBioRSS – Track the latest research by taxon or species

uBioRSS looks a very interesting development. From Matthew Cockerill’s post on the BioMedCentral blog:

uBioRSS is a nifty service from the MBLWHOI Library at Woods Hole, which harvests bibliographic information about new articles from publishers’ RSS feeds, and then passes them through the uBio taxonomic classification system which identifies any species that are mentioned in the article, and classifies the article appropriately.

This makes it possible to browse the literature taxonomically, so that, for example you might view a list of all the latest articles on cetaceans far more easily than can be done using plain text search.

uBioRSS is a great example of the way in which semantic enrichment can add value to the literature

Of course it’s not new for third parties to add tagging to content e.g. to improve the search experience (e.g. product names – Google Product Search, place names – MetaCarta, etc.) but this is a nice example of what can be easily done with STM content. I’m sure this sort of thing will become increasingly common.

Technorati Tags: ,

Online advertising primer

I was a bit surprised to see in his presentation “Online Advertising in Scholarly Journals:the Opportunities, Risks, and Rewards” to the STM Spring Conference in April, Richard Newman of the American Medical Association felt it necessary to explain how Google’s AdSense and AdWords programmes work. But on reflection I realised that he was probably right, because many STM publishers have yet to engage with online advertising in any substantial way.

Anyway, if you’re looking for a primer on online advertising, covering the different kinds of advertising, the size and growth of the market, challenges and innovation, current and future trends, and lots of links, I can recommend this recent post on the MediaShift blog: Your Guide to Online Advertising

From time to time, I’ll give an overview of one broad MediaShift topic, annotated with online resources and plenty of tips. The idea is to help you understand the topic, learn the jargon, and take action. I’ve already covered blogging, citizen journalism, presidential campaign videos and various other topics. This week I’ll look at online advertising.

Technorati Tags: ,

Automated Content Access Protocol (ACAP) Conference

Today sees the first major conference on Automated Content Access Protocol (ACAP).

ACAP is potentially very important for online (i.e., all) publishers. What is it? This is the description from the conference website:

a standard by which the owners of content published on the World Wide Web can provide permissions information (relating to access and use of their content) in a form that can be recognised and interpreted by a search engine “spider”, so that the search engine operator is enabled systematically to comply with the permissions granted by the owner. ACAP will allow publishers, broadcasters and any other to express their individual access and use policies in a language that search engine’s robot “spiders” can be taught to understand.

(For more information, see also the Wikipedia page; the official ACAP website)

ACAP was conceived in January 2006 and born some 9 months later at the Frankfurt Book Fair. As of today, it is mid-way through a first pilot project that is intended to design v1.0 of ACAP. The pilot is due to finish in October 2007, with a final conference scheduled for 29 October in London.

The ACAP partners include leading publishers (STM and others), media and news organisations (including WAN, the World Organisation of Newspapers) and the British Library. But right now none of the three major search engines are formal members (though they have participated informally), and clearly ACAP is never going to work without their active endorsement and participation.

So what’s in it for the search engines? On the one hand, they stand to lose access to content, or be barred from certain kinds of re-use they currently enjoy (particularly in news). But on the other hand there are potential gains, with publishers being able to make certain kinds of currently restricted content (books, databases) available to search engines if they feel they have more control over re-use (and potential to monetise that use). There are certainly potential huge gains for end-users here.

But one problem for an early adoption of ACAP is that it (at least partly) addresses an area of current tension between content owners and the search engines: copyright. For example, publishers and Google have engaged in legal battles over Google’s interpretation of copyright law in relation to book scanning (e.g. The Authors Guild of America and Association of American Publishers have separately sued Google) and news aggregation (e.g. Agence-France Presse). So it wouldn’t be surprising if the search engines decided to play their cards close to their chests at this point.

So what’s new at the London conference? In the opening keynote (pdf), WAN’s Gavin O’Reilly says of the Big Three search engines’ non-membership:

So however perplexing I find the fact that the big three still aren’t full participants in ACAP, it is – for me, probably the sole and minor disappointment among a long and continuing litany of successes and triumphs and I welcome the self-evident operational involvement that we continue to see from some of them.

Francis Cave’s project report (pdf) appears to shows that the pilot is roughly on track, with the Use Cases defined, the technology options for the Technical Framework researched and specified. Defining the Use Cases was clearly an important milestone, as they lie at the heart of what ACAP is about and are potentially quite complex. For instance Mark Maddocks’ presentation Why ACAP? Reed Elsevier Perspective (pdf) gives these examples:

  • Specify permitted use of indexed content (e.g. Limit number of displayed words in the search result; of Require direct link back to publisher site)
  • Exclude certain SE services from using indexed content (e.g. Allow for inclusion in main Google index but not Google Scholar)
  • Exclude specific parts of the page from indexing and/or display (e.g. Paragraphs, images, figures, or tables)
  • Exclude from the index copies of the page not found at the specified URL
  • First Click Free – site is indexed, but provides limited content to the user (e.g. Crawlers are allowed to index pages; A search results page can present search results from these pages; People can link to the page from the search results page but not onward link from that destination page. Instead they are redirected to a registration or other page)
  • Registration – site is indexed & free, but have to register for access (e.g. Crawlers are allowed to index the pages bypassing registration, the pages are flagged as Registration so that the crawler can explain this to the user if they choose to; Users clicking on a link are asked to register before seeing the content)

Technorati Tags: , ,

Update/correction on Nature Precedings statistics

Oops. In my note about the launch of Nature Precedings last week, I said incorrectly there were 64 submissions on the launch date and gave the breakdown by subject category.

This made a rather elementary error – my numbers assumed that each submission was in only one subject category, whereas course many have multiple categories.

Looking at the site a week later, there appear to be 43 submissions, split roughly 40:60 between manuscripts (17) and poster/presentations (26).

Bioinformatics is still by far the most popular single category, though.

Technorati Tags: , , ,

Mother Goose & the scientific peer review process

Since we have been covering peer review developments recently (e.g. here) we couldn’t resist posting a link to this (an oldy but goody): Mother Goose & the scientific peer review process (from the Science Creative Quarterly). Extract:

Hey diddle diddle, the cat and the fiddle.
The cow jumped over the moon.
The little dog laughed, to see such a sight.
And the dish ran away with the spoon.

The reviewers felt that not enough data was presented to support your claims. For example – how many times did your group observe the cow jumping over the moon? From the text and supporting figures, it would appear that you base this conclusion on one data point as no calculations regarding standard deviations were presented. As an analytical journal of high repute, the reviewers felt that this is simply not acceptable. In addition, several of the reviewers felt that the word ‘diddle’ was inappropriate, and should have been replaced by the more scientifically correct, ‘Hey fornicate fornicate.“ Because of these, and other problems, we are sorry to inform you that your manuscript has not been accepted for publication.

Technorati Tags:

Web 2.0 for higher education

A couple of reports/blog entries caught my eye recently in the area of Web 2.0 and education.

A new JISC-funded report Web 2.0 for Content for Learning and Teaching in Higher Education by Tom Franklin and Mark van Harmelen (who blogged its release here) was published at the end of last month. It offers recommendations to JISC on how to respond to the opportunities and challenges of Web 2.0. Overall they recommend that:

Recommendation 1: Guidelines should not be so prescriptive as to stifle the experimentation that is needed with Web 2.0 and learning and teaching that is necessary to take full advantage of the possibilities offered by this new technology.

From a publisher’s perspective, these recommendations could be important:

Recommendation 2: JISC should consider funding projects investigating how institutional repositories can be made more accessible for learning and teaching through the use of Web 2.0 technologies, including tagging, folksonomies and social software.

Recommendation 6: JISC should consider funding a study to look at how repositories can be used to provide end-user (i.e. referrer) archiving services for material that is referenced in academic published material, including Internet journal papers. Part of this consideration should extend to copyright issues.

Recommendation 3: JISC should consider funding work looking at the legal aspects of ownership and IPR, including responsibility for infringements in terms of IPR, with the aim of developing good practice guides to support open creation and re-use of material.

Other blog coverage: see Brian Kelly (UKOLN) on UK Web Focus.

Coincidentally, the Read/Write Web blog today published a round-up of some of its recent coverage of Web 2.0 in e-learning in e-learning 2.0: All You Need To Know. This gives a whistle-stop tour at 30,000 feet with a lot of useful links, of which I found these particularly interesting:

e-learning 2.0 – how Web technologies are shaping education

Elgg – social network software for education

Elgg is an open source social platform that:

provides each user with their own weblog, file repository (with podcasting capabilities), an online profile and an RSS eader. Additionally, all of a user’s content can be tagged with keywords – so they can connect with other users with similar interests and create their own personal learning network. However, where Elgg differs from a regular weblog or a commercial social network (such as MySpace) is the degree of control each user is given over who can access their content. Each profile item, blog post, or uploaded file can be assigned its own access restrictions – from fully public, to only readable by a particular group or individual.

Elgg is being used at a number of universities including Brighton, Leeds and MIT. Coincidentally, there is a detailed case study of the Elgg implementation at Brighton in the Franklin/van Harmelen report discussed above.

Technorati Tags: ,

Nature Precedings goes live

I noted the launch announcement of Nature’s Precedings a few days ago.

The site has now gone live with a total of 64 submissions. Timo Hannay’s announcement is here and the press release here.

The site is very nicely implemented, with all the Web2.0 features we have come to expect from Nature Publishing Group, including tagging (documents and people), voting for articles, and open discussion on articles, etc.

Perhaps not surprisingly, by far the largest single subject category is Bioinformatics, with 20 documents. Biotechnology, Evolution & Ecology, and Molecular Cell Biology each have 9, while other topics have only a couple of documents. It will be interesting to see whether this distribution just reflects the make-up of the beta testers, or whether it will hold as the site attracts more submissions.

The list of partners has also been released: the British Library, the European Bioinformatics Institute, Science Commons, and the Wellcome Trust. Nature hopes the stature of the partners will allay fears about Nature’s plans for possible future control of the content.

The site differs from some of the earlier preprint sites (like arXiv in physics) in that it accepts powerpoint presentations as well as journal article preprints, e.g. this interesting presentation: Open Notebook Science Using Blogs and Wikis.

UPDATE/CORRECTION: There’s a mistake in these figures, see here for the correction

Technorati Tags: , , ,

UKSG publishes final report on Usage Factor

The UK Serials Group has now published the final report on the feasibility study into developing and implementing journal Usage Factors (UFs).

The summary says that “based on these results it appears that it would not only be feasible to develop a meaningful journal Usage Factor, but that there is broad support for its implementation.” Some of the key findings:

  • the majority of publishers are supportive of the UF concept, appear to be willing, in principle to participate in the calculation and publication of UFs, and are prepared to see their journals ranked according to UF
  • there is not a significant difference between authors in different areas of academic research on the validity of journal Impact Factors as a measure of quality
  • UF, were it available, would be a highly ranked factor by librarians, not only in the evaluation of journals for potential purchase, but also in the evaluation of journals for retention or cancellation

The idea for a Usage Factor derives partly from dissatisfaction with the Impact Factor as a measure of the quality or usefulness of a journal. It’s not without its critics, for instance this posting by Springer’s Director of Open Access Jan Velterop:

… as a measurement of value? It’s an unholy idea, potentially compounding the misery of improper use of the impact factor.

Technorati Tags: , ,

Peer review developments

An article in the first issue of Open Medicine, a new open access journal , Peer review in open access scientific journals by Dr Falagas (via Journalogy) discussed development in peer review from an open access perspective:

Open access publications should be at the forefront in experimenting with strategies to foster what might be called an increasingly open science. As the open access movement blossoms, its supporters should continue to critically evaluate the parallel development of openness and transparency in the peer review process.

…while all manner of electronic journals are experimenting with reader input on published material, little is known about the scientific value of post-publication review in the modern era of open access publishing

Peer review is a surprisingly active area for discussion and experimentation, given that has been the standard approach to selecting material for publication in scholarly journals for about 300 years. (For instance the American Medical Association runs an four-yearly International Congress on Peer Review and Biomedical Publication.) There are two reasons for this:

  • dissatisfaction with the present system: it has been described as unreliable, unfair, unstandardised, untested, open to bias, failing to validate or authenticate, stiffling innovation, perpetuating the status quo, rewarding the prominent, expensive and slow
  • because we can: new online publishing and social networking technologies make it easier to test new ideas

Some example of new approaches to peer review:

  1. An early online experiment with open reviewing tool place at the (now defunct) journal Electronic Transactions on Artificial Intelligence. The peer review process consisted of the following. All submitted articles within scope are immediately posted on the Web for a 90 day discussion period. At end of “review” period, authors given option to revise; revised article sent out for “pass-fail” review“. If ”pass,“ article is published.
  2. Nature’s open peer review trial: authors were invited to have their submitted manuscripts placed on an open website where anyone could review and comment on them. About 5% of authors agreed to participate, and the displayed papers got a healthy level of interest and traffic, but the trial was unsuccessful because the quantity (and quality)
  3. In PLosOne, the new OA ”journal of everything“ from PLoS, articles are assessed by a member of the editorial board for purely for technical correctness (roughly, answering the question, is this science?). Once accepted, papers are made available for community-based open peer review involving online annotation, discussion, and rating.
  4. At Biology Direct (a BioMedCentral journal), their novel system of peer review ”will include making the author responsible for obtaining reviewers’ reports, via the journal’s Editorial Board; making the peer review process open rather than anonymous; and publishing the reviewers’ reports along with the articles, thus increasing both the responsibility and the reward of the referees and eliminating sources of abuse in the refereeing process“

Some useful resources:

Technorati Tags: ,

Next Page »


LinkedIn

LinkedIn button

RSS feed for this blog

Subscribe via email

Categories

Twitter