PolITiGenomics

Politics, Information Technology, and Genomics

Lightning strike

April 21st, 2010

A previous cloud post, Puff piece, has gotten a bit of attention from and . While the Informatics Iron piece was positive, Mr. Stowe took issue with some of the points I made. First, he says that my claim that IT and software engineering is needed to get things running on the cloud is inaccurate.

You are implying that to get running in the cloud, an end user must worry about the “IT expertise” and “software engineering” needed to get applications up and running. I believe this is a straw-man, an incorrect assertion to begin with.

One of the major benefits of virtualized infrastructure and service oriented architectures is that they are repeatable and decouple the knowledge of building the service from the users consuming it. This means that one person, who creates the virtual machine images or the server code running the service, does need the expertise to get an application running properly in the cloud. But after that engineering is done once, a whole community of end-users of that service can benefit without knowledge of the specifics of getting the application to scale.

For example, does everyone that uses GMail/Yahoo/Hotmail know every line of software code to make it run? Do they know every operational aspect of how to make mail scale to tens of thousands of processors across many data centers?

Definitely not, and the point is they don’t have to. The same is true for high performance and high throughput computing. To give examples of free services that don’t require end user software engineering or IT expertise to do bioinformatics/proteomics/etc.:

  • The NIH Website for BLAST has, for years, been running BLAST as a service so that researchers can use GUIs to run queries on parallel back-end infrastructure (see http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=9606) This requires no complicated knowledge or software engineering for scientists to run BLAST as a Service.
  • Tools like ViPDAC have 2-minute tutorial videos to run proteomics on Amazon Web Service.

His argument is absolutely correct when dealing with established systems, applications, and work flows. For use cases like email and running BLAST, there is no need for additional software engineering or IT expertise (other than getting on the internet). In fact, The Genome Center has long offered a for anyone to use. Further, over the past few weeks, several prepackaged bioinformatics work flows that run on the cloud (or some approximation thereof) have been announced: Mr. Stowe’s company Cycle Computing announced CycleCloud for Life Sciences, , from Bio-Team, ChIP-seq and RNA-seq analysis pipelines from DNAnexus, the work flows available in Galaxy, and of course the previously published . Unfortunately, canned analyses are not the norm in bioinformatics. Bioinformaticians love to tinker, trying to get just a little more biological information out of their data sets. The result is that bioinformatics applications and work flows are constantly being tweaked, updated, and improved. Because of this, maintenance of these pipelines is a huge burden. The supporters of these generic pipelines must work constantly to update and verify software or the users will constantly be waiting for the latest fix to be applied or latest feature to be available (anyone who installs each new version of velvet can attest to this). The saving grace in all of this is that as the use of sequencing becomes more widespread, the percentage of the people doing the analysis that classify as bioinformaticians will decrease (greatly). This means that a larger and larger percentage of people with sequence data to analyze will likely not be interested in tweaking analysis pipelines but will just want to run something and get an answer. It is this ever growing group of people that will greatly benefit from easy to use analysis tools, whether they be deployed on the cloud or not. Both Mr. Stowe and I agree that creating easy to use tools for non-bioinformaticians to use is a very worthwhile goal. Unfortunately the proliferation of existing tool options (e.g., maq, bwa, bowtie, bfast, soap, novoalign, etc.) now layered with a proliferation of cloud offerings will make it even more difficult for non-experts to chose which pipeline is the best to use. Therefore approaches like those taken by Cycle Computing and GenomeQuest that provide default analysis pipelines and the ability for bioinformaticians to create and share their own work flows are the most likely to be successful. The development of these generic, distributed analysis frameworks that also provide useful defaults is an even more worthwhile goal because it achieves two important ends: ease of use for non-experts and the ability for bioinformaticians to tinker. Bioinformaticians are more likely to find tools like these useful and therefore will be early adopters, choose the best platforms, establish best-practices on these platforms, publish results using these platforms, and then the non-experts will follow.

Mr. Stowe’s other objection related to my point that no process scales linearly with the number of cores. He concedes that point but points out

In fact, regardless of whether the job is linearly scalable, most companies and research institutions don’t have 1 cluster to 1 user scenarios. There are multiple users with multiple jobs each. What if you have 10 crossbow users with 10 runs to do on various genomes? Then you can get 100x performance on the *workflow as a whole*.

Again, this is true, but, to be fair, that is not the same point he made in his original article. His original point was that if you needed your analysis to run faster you could just provision more nodes. I just pointed out that this is true, but you would likely pay a premium for that because nothing scales linearly. It may seem like a fine distinction, but with all the misinformation around clouds nowadays, it’s an important one to make. It should also be noted that without good software engineering and system administration, even algorithms that should scale nearly linearly might not. The take-home message is that if someone has done that software engineering and systems administration work to make a program scale well and run well in a cloud envrionment and made it available to you, great. If not, someone is going to have to do it.

I had the opportunity to meet Mr. Stowe at the XGen Congress and have talked more with him this week at Bio-IT World Conference and Expo (my talk is tomorrow at 11 a.m. EDT in Track 3: Bioinformatics and Next-Gen Data). We had a good discussion about cloud computing and its role in bioinformatics (they’ve got a cool solution to the Amazon storage problem). As you can hopefully tell from this post, we are largely in agreement: engineering is needed, but once it is done, everyone benefits. Cycle Computing certainly has a lot of good expertise in the cloud, so if you need some engineering done, shoot him an email. Unfortunately, they probably will not be able to help you access the .


Permanent campaign

April 21st, 2010

NPR has a series this week about the current level of distrust Americans have with government. The latest installment, Americans Distrust Congress? That’s No Surprise, ties some of the low opinion of Congress to the highly partisan rhetoric. What the story hints at, but does not state explicitly, is that partisan rhetoric is a positive feedback loop. Those who pay attention to politics, and therefore fund campaigns and watch political shows, tend to be more partisan. This encourages politicians and political pundits to be more partisan (to increase their base/donations and viewership, respectively). This in turn gives credence to and reinforces those more partisan views of their constituents and viewers. Unfortunately, the articles fails to mention the fact that all of this partisan rhetoric is mere theater; a means to get elected and re-elected in perpetuity. It is a means to distract the general public from the fact that partisanship only exists on the fringes of the political debate. Both parties are happy to do the bidding of the same lobbyists on issues that actually matter.


Breast cancer quartet

April 15th, 2010

Today in Nature our recent research studying basal-like breast cancer as compared to normal DNA, a subsequent brain metastasis, and a xenograft derived from the primary tumor (a xenograft is the implantation of a portion of the tumor biopsy into the fatty tissue of an immunodeficient mouse) was published along with an commentary by Joe Gray. Surprisingly, while the xenograft was derived from the primary tumor, its mutational profile had many characteristics similar to that of the brain metastasis. This finding indicates that mutations required for successful transplantation of a tumor into a mouse are perhaps similar to those required for the formation of metastasis (and provide insights into why some tumor types are not prone to metastasis and fail to grow xenografts). The article also represents the first publication of the complete sequence of an African-American female.

You can find news coverage of the article and its findings at , BusinessWeek, News24, , and the WU Record (includes video).


Twitterpated

April 12th, 2010

Just wanted to let everyone know that is now on and . So become a fan a listen to our tweets (immediately after typing that sentence I got the overwhelming feeling that I am an old dork who uses hipster jargon to try to sound cool).


Misdirection

April 9th, 2010

I know it’s two posts in a row, but here is another clip from The Daily Show worth checking out.

I would encourage you to check out Media Matters to get some sense of the breadth and depth of deception in the popular media on this and other issues.


Race to the bottom

April 7th, 2010

Yesterday on NPR there was a story reporting new details on the assassination of Salvadoran Archbishop Oscar Romero. Of course, according to one member of the Texas school board, no one knows who Oscar Romero is, so I suppose there is no need to report on things like this.

And posting that gives me an excuse to post Jon Stewart’s impression of Glenn Beck from a few weeks ago (it’s 13 minutes, but well worth watching).


Sing like a bird

April 1st, 2010

zebra finch

The reference genome and several companion papers are being published today in Nature and Genome Research. The zebra finch is a model for human vocal development and it is hoped that a better understanding of its genome and the genes involved in learning song can shed light on how humans learn language. These learnings may be helpful to researchers studying diseases that slow language development such as autism. of led the consortium to sequence the reference genome. Coverage of the publications can be found at the BBC, ABC, , NPR/, The Independent, , , and WU Record (includes a video).


Profiling aromatase inhibitor response in breast cancer

March 31st, 2010

GenomeWeb’s magazine recently posted a story about ‘s efforts, led by , to determine . The hope is that we can find certain patterns of mutations that associate with therapy response (or non-response). Then, genetic tests can be developed that probe these mutations and they can be used to predict whether patients will respond to the therapy. Those that are predicted to respond will receive aromatase inhibitor therapy; those that are predicted to not respond will receive some other course of treatment. In other words, the goal is to further refine personalized medicine in breast cancer treatment. As the article states, we are going to sequence the whole genomes of 50 patients’ tumor and normal genomes (we already have completed the sequencing of over 40 patients), 25 responders and 25 non-responders.


Ides of March

March 15th, 2010

During the 30 minutes or so after boarding a plane and when you are free to use approved electronic devices, you can get a little reading done. On my last couple trips I had a stack of somewhat dated Newsweek magazines to pore through. Fortunately, there were several good articles, which I now pass along for your consideration (which I type on my computer during the period I am free to use approved electronic devices with the wireless features disabled). First a couple articles on the topic that has been on everyone’s mind now that the Republicans are no longer “responsible” for the growing debt: Fareed Zakaria’s Defusing the Debt Bomb talks about several concrete measures that can be taken to reduce the debt and The Real Greek Tragedy talks about why it is important to do that. Bringing a dose of reality to the debt issues is We the Problem which talks about why the US Congress will not enact any of the needed changes (he only gets it half right by blaming the people, lobbyists are part of the equation too). Shifting topics to the “partisan gridlock” in Washington DC, Ezra Klein’s Stay Out Of It, Mr. President discusses how the mere act of the President, any President, supporting some legislative agenda tees it up for the opposition party to, well, oppose it. This opposition occurs even when there is not much substantive difference between the two parties’ stances on the issue or when significant proposals of the opposition party have been included in the bill (giving credence to Mr. Klein’s thesis is the fact that Republicans no longer support their proposals from the 1993 health care debate that are in the current bill). The actual distance between Republicans and Democrats on issues is discussed in How the GOP Sees It. Finally, Google’s Orwell Moment discusses their flubbed roll out of Google Buzz and . I like the Newsweek article because it actually uses Orwell’s name in an appropriate reference to 1984. Most references to 1984 use terms like “Big Brother” in a pejorative way, e.g., “another example of Big Brother watching you.” But what is most powerful about 1984 is not that people saw the hyper-surveilling, truth-manipulating government as an intrusive presence in their life, but as a comforting one. The vast majority of people saw the government as something that brought benefits (peace and stability) and were more than happy to trade some small, meaningless rights for these benefits. What rights are you willing to trade for the benefits of social networking?


What the Crisis Nursery does

March 11th, 2010

Below is a nice interview with DiAnne Mueller, CEO of the St. Louis Crisis Nursery, talking about what the Crisis Nursery does and how you can help.


« Previous Entries