[Bioperl-l] PubMed records (was: MeSH terms)

Brian Osborne bosborne11 at verizon.net
Sat Oct 24 15:18:20 EDT 2009


Not sure what "robust" means - would "working" suffice? Also, you  
suggested starting with a Genbank id but what I'm about to show you  
starts with Pubmed ids, at the other end. What I will do is take some  
of this and make a little script for Bioperl's examples/ directory. In  
the meantime, here is some code:

#!/bin/perl -w

use Bio::Biblio;

my $pmid = 52;

my $biblio = Bio::Biblio->new(-access => "eutils");

my $ref = $biblio->get_by_id($pmid);

# $ref contains raw XML
print $ref,"\n";

And what it prints is below.

Brian O.

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st  
January 2009//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_090101.dtd 
     <MedlineCitation Owner="NLM" Status="MEDLINE">
         <Article PubModel="Print">
                 <ISSN IssnType="Print">0006-2960</ISSN>
                 <JournalIssue CitedMedium="Print">
             <ArticleTitle>Evidence of the involvement of a 50S  
ribosomal protein in several active sites.</ArticleTitle>
                 <AbstractText>The functional role of the Bacillus  
stearothermophilus 50S ribosomal protein B-L3 (probably homologous to  
the Escherichia coli protein L2) was examined by chemical  
modification. The complex [B-L3-23S RNA] was photooxidized in the  
presence of rose bengal and the modified protein incorporated by  
reconstitution into 50S ribosomal subunits containing all other  
unmodified components. Particles containing photooxidized B-L3 are  
defective in several functional assays, including (1) poly(U)-directed  
poly(Phe) synthesis, (2) peptidyltransferase activity, (3) ability to  
associate with a [30S-poly(U)-Phe-tRNA] complex, and (4) binding of  
elongation factor G and GTP. The rates of loss of the partial  
functional activities during photooxidation of B-L3 indicate that at  
least two independent inactivating events are occurring, a faster one,  
involving oxidation of one or more histidine residues, affecting  
peptidyltransferase and subunit association activities and a slower  
one affecting EF-G binding. Therefore the protein B-L3 has one or more  
histidine residues which are essential for peptidyltransferase and  
subunit association, and another residue which is essential for EF-G- 
GTP binding. B-L3 may be the ribosomal peptidyltransferase protein, or  
a part of the active site, and may contribute functional groups to the  
other active sites as well.</AbstractText>
             <AuthorList CompleteYN="Y">
                 <Author ValidYN="Y">
                     <ForeName>S R</ForeName>
                 <PublicationType>Journal Article</PublicationType>
                 <PublicationType>Research Support, U.S. Gov't,  
             <Country>UNITED STATES</Country>
                 <NameOfSubstance>Macromolecular Substances</ 
                 <NameOfSubstance>Ribosomal Proteins</NameOfSubstance>
                 <DescriptorName MajorTopicYN="N">Bacillus  
                 <QualifierName MajorTopicYN="Y">metabolism</ 
                 <DescriptorName MajorTopicYN="N">Binding Sites</ 
                 <DescriptorName MajorTopicYN="N">Hydrogen-Ion  
                 <DescriptorName MajorTopicYN="N">Kinetics</ 
                 <DescriptorName MajorTopicYN="N">Macromolecular  
                 <DescriptorName MajorTopicYN="N">Oxidation-Reduction</ 
                 <DescriptorName MajorTopicYN="N">Photochemistry</ 
                 <DescriptorName MajorTopicYN="N">Protein Binding</ 
                 <DescriptorName MajorTopicYN="N">Ribosomal Proteins</ 
                 <QualifierName MajorTopicYN="Y">metabolism</ 
                 <DescriptorName MajorTopicYN="N">Ribosomes</ 
                 <QualifierName MajorTopicYN="N">metabolism</ 
             <PubMedPubDate PubStatus="pubmed">
             <PubMedPubDate PubStatus="medline">
             <PubMedPubDate PubStatus="entrez">
             <ArticleId IdType="pubmed">52</ArticleId>


On Oct 24, 2009, at 2:45 PM, Robert Bradbury wrote:

> <alsaplayer-devel at lists.tartarus.org>
> I'm not sure if this is related to the MeSH question question or  
> not, but
> I've googled the documentation several times and never managed to find
> "robust" examples for how to manipulate PubMed records.
> It would seem that there ought to be code lying around which does:
>  Given Genbank ID,
>     Fetch all Pubmed records from that ID
>         Fetch all related records (via NCBI's "related" record IDs)
>     Purge the list of duplicates, then do things like fetch all of the
> abstracts or fetch all of the MeSH headings, etc. for all of those  
> records.
> Another example would include fetching all records of relatedness  
> (i.e. a
> PubMed tree of depth N (or cloud of some max N)).
> I think that one can use NCBI's fetch interface to do this (one  
> could do it
> by having NCBI email you all of the PubMed results and have an email
> harvester collect those results, parse them and setup a new set of
> queries).  Of course this seems like an overhead intensive way to do  
> this.
> Given the fact that increasing amounts of information is becoming  
> open to
> the public one could consider even parsing the published papers and
> supplemental files (e.g. XLS tables) for genes of interest (as it  
> seems the
> authors of most work as well as the PubMed record processors fail to  
> provide
> or research the gene name information that is supposed to be in the  
> PubMed
> records).
> Now it may simply be that its because I lack sufficient experience  
> with the
> BioPerl documentation that I am unaware of the functions/tools which  
> do this
> type of thing.  So if anyone has any hints/pointers they would be
> appreciated.
> Thanks,
> Robert Bradbury
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list