PolITiGenomics

Politics, Information Technology, and Genomics

Double standard

June 5th, 2009

Genetic Future

Since the Biology of Genomes meeting in early May, a tempest has been brewing. It is only in this last week that this tempest has gathered enough strength that it could no longer be contained by those who have chosen to stir it up. The esteemed Daniel MacArthur blogged and from the conference. This apparently caught the attention of the conference organizers and . As journalists, the folks at GenomeWeb are required to follow CSHL’s media rules which require that journalists get the permission of a speaker before publishing any information from her talk. GenomeWeb saw a double standard when comparing what Daniel was allowed to do and what they were allowed to do. They then contacted CSHL. The initial write-up of the gathering storm in Science Insider characterized this contact as complaining. GenomeWeb characterized it as asking CSHL for clarification of their policy (in a comment on a response posted by Daniel in his blog, Genetic Future). Of course this attempt to, in effect, censor has only served to bring more attention to Daniel’s blog (the so-called Streisand effect), and has resulted in a number of responses from other bloggers like Anthony Fejes, DrugMonkey, and even , comments (some quite passionate) on the Science Insider story, Daniel’s response, and , as well as a couple well-reasoned pieces on where the policy should head from here by Ed Yong and Andrew Maynard. Daniel himself provides a nice summary of it all in a follow-up post. With all that sound and fury, there is not much to add on the subject other than to say I suppose I am lucky that the 500 or so emails I had to pore through each night after the meeting ended at 10:30 or 11 p.m. prevented me from posting any commentary during the meeting (well, the emails plus the fact that I knew Daniel would do a better job than me).

Taking a step back, there is a larger double standard at play here than the distinction between professional journalists and peddlers of new media. Many of the conclusions around whether CSHL is right in restricting any type of journalist focus on the type of conference and the expectations that type of conference creates in the minds of the presenters. At a private, invitation-only conference, no publishing. At a breaking results conference like Biology of Genomes, get permission. At an open conference, anything goes. So then one might ask: why aren’t all conferences open? The whole notion that presenting something at a conference that has some understanding of respecting others’ unpublished work is a bit ridiculous (this point has been made by others, along with the fact that Biology of Genomes is over-subscribed every year; getting people in the door is not a problem). But I am not even going to debate that point. The more interesting question is: why aren’t all data and research released rapidly and freely available? Since the were agreed to in 1996, all genome sequencing centers have submitted their data, from raw sequence data to finished sequence to assemblies to annotation, to public repositories as quickly after generation as possible. These principles were reinforced by the Fort Lauderdale agreement in 2003 which added a provision that protected the production centers’ right to first publication. But as we have seen recently, that provision of the . As sequencing has moved into medical applications, the sequencing centers have taken great pains to release human sequence data in a responsible manner, but still rapidly. What’s more, they now also release the detected variants fully annotated and correlated with phenotypic information in protected access databases available to any researcher. As data that requires more and more analysis and significant human curation are made rapidly available well before publication, the production centers become ever more vulnerable to getting “scooped” on their hard won findings.

As Church and Hillier properly conclude in the above referenced article

Sequence data are now easier to produce, but decisions about timelines for data release, publication, and ownership and standards for assembly comparison and quality assessment, as well as the tools for managing and displaying these data, need considerable attention in order to best serve the entire community. (Emphasis mine)

This conclusion begets many questions. If the rapid release described in the Bermuda Principles still holds true, why does it only apply to large-scale sequencing centers? Many researchers are generating more sequence in a month than the Human Genome Project was able to produce in a year. As they continue to be allowed to perform pre-publication (as opposed to post-generation) data submission, why are they not being held to the same standard as the large-scale sequencing centers?

Stepping back further, does dumping all of those data, literally terabytes and terabytes, into public nucleotide repositories like the SRA and ERA as soon as it is generated still make sense? Who has the bandwidth to download and use it all? Mainly only those centers that are submitting it. For human data, a single instrument run contains enough data to identify an individual. Should there not be at least some provisions in place to allow data generators to properly assess and quality control their data?

The human reference has been published (with a recent update to ). The blueprint exists. Thus, many of the reasons underlying the conclusions of the Bermuda Principles are no longer applicable. So should those open access principles be applied more widely to other areas of biology and science at large or should they no longer apply to sequence data from a genome for which a reference exists? It is time to rethink the current policies and begin to apply them to all sequence generators. And people are doing just that. The double standard must end.

Posted in genomics | 10 Comments »

Tagged with: , , , , ,


You can follow any responses to this entry through the feed. You can leave a response, or trackback from your own site.

10 Responses to “Double standard”

  1. > If the rapid release described in the Bermuda Principles still holds
    > true, why does it only apply to large-scale sequencing centers? Many
    > researchers are generating more sequence in a month than the Human
    > Genome Project was able to produce in a year. As they continue to be
    > allowed to perform pre-publication (as opposed to post-generation) data
    > submission, why are they not being held to the same standard as the
    > large-scale sequencing centers?

    Because they’re not doing the same thing, nor for the same purpose?

    The Bermuda principles were designed “to apply for all human genomic
    sequence generated by large-scale sequencing centres, funded for the
    public good, in order to prevent such centres establishing a privileged
    position in the exploitation and control of human sequence
    information”.

    The Fort Lauderdale agreement stipulates that the pre-publication rules
    should be extended to “other large-scale production centers”, but still
    only to apply to “community resource projects” defined as those
    “specifically devised and implimented to create a set of data, reagents
    or other material whose primary utility will be as a resource to the
    broad scientific community”.

    Most next-gen sequencing and other medical research projects do not
    fall under this definition, but instead are trying to answer rather
    specific questions with the minimum (well-powered) set of data required
    to do so. On-publication release of data – or even post-generation
    release, of say, the control portion of the data as was the case in WTCCC – may be of more general
    utility, but that is not why the project was funded: there is one key
    analysis to be performed, and that by the group funded to do so.

    However, this is in the funders’ court. If they asked:

    “would you take this grant for your rather specific medical genetics
    project only on condition of pre-release of all data?”

    the answer would probably be a grudging, “well, if you insist” …

  2. “If the rapid release described in the Bermuda Principles still holds true, why does it only apply to large-scale sequencing centers?”

    Because the large centers are contracted to produce sequences and make them publicly available, while small researchers are given grants for particular areas of research.

    A genome center, also, does not care about publications. As long as they provide the sequences they were paid for, in the time they promised, they’ll get more work.

    If an individual researcher pays for sequencing something, pre-releases the sequences, and then gets scooped by another group (a larger one, or one that just happened to have a few people free to crunch the data faster then the researcher’s team could), that researcher’s career is toast. He won’t get funding again.

    If we institute a policy that forces small researchers to pre-release, all we’ll get is that small researchers will stop sequencing and start relying entirely on data produced by large centers.

    The only alternative is to reform the grant system, but how to do that is a whole different – and much larger – can of worms.

  3. Neil,

    I agree that most next-gen sequencing and medical sequencing projects do not fall under the provisions described by Bermuda/Fort Lauderdale, which is exactly the point. Most of what large genome sequencing centers are doing now are not the type of projects that were done five to ten years ago, i.e., not the type of projects that fall under Bermuda/Fort Lauderdale. The goals and end results of many of the projects are no different than those of single investigators. The only difference is the scale. So why should there continue to be a difference in submission policies?

  4. M.,

    The genome sequencing centers are obligated to do much more than just generate raw data (as described in the post). It is also not accurate to say we are not concerned nor evaluated on publications. Nor are my concerns over whether large scale sequencing centers continue to “get more work”.

    I do agree that the issue is much larger than I describe in the above post (admittedly due to lack of time). One of the main problems is that the enforcement of the Fort Lauderdale agreement has been sporadic at best. It relies on journal editors for enforcement but they suffer from similar perverse incentives and principal investigators: wanting to scoop one another.

    There needs to be a common set of standards for similar types of projects and they need to be enforced through funding institutions.

  5. You saw this?

    Group Calls for Rapid Release of More Genomics Data
    Science 22 May 2009:
    Vol. 324. no. 5930, pp. 1000 – 1001
    http://www.sciencemag.org/cgi/content/summary/324/5930/1000-b

    I’ve forwarded this URL (and Mike the Mad Biologists) to the “wordsmiths” …

    This inaccurate quote sums it nicely for me:

    F Scott Fitzgerald: The rich are different than you and me.
    Ernest Hemingway: Yes, they have more money.

  6. Neil, yes I had seen that (and actually heard about the meeting from some of its participants). It was one of the jumping off points for the post (although not properly referenced). I have added a link to the story at the end of the post.

  7. [...] a recent post, David Dooling asks why genome centers are forced to release their data early, when other smaller labs with a sequencing machine aren’t. In responding to some of the [...]

  8. Most of what large genome sequencing centers are doing now are not the type of projects that were done five to ten years ago, i.e., not the type of projects that fall under Bermuda/Fort Lauderdale. The goals and end results of many of the projects are no different than those of single investigators.

    I think this is key, and I think it’s why we’ll see a slow fading of big genome centers in the coming years. More on my blog here:

  9. Chris, we have already seen a reduction in “big genome centers”. The 2003 large scale sequencing grant funded five genome centers, the 2006 funded only three (WU, Broad, Baylor). I do not think there will be a fading. If anything, they will become more important as the rate of change of sequencing technology increases. The large centers lead the adoption of these technologies and develop the expertise which is disseminated throughout the scientific community which in turn allows these technologies to be widely adopted. Another thing in favor of the large centers is the large infrastructure required to really handle a significant number of these instruments. Finally, there will always be big, bold projects that will be beyond the scope and interest of smaller investigators.

  10. [...] way of ScienceBlogling Daniel MacArthur, I came across this excellent post by David Dooling about, among other things, how different genome centers, based on size, have different release [...]

Leave a Reply