[Bioperl-l] One more load_seqdatabase.pl question

gang wu gwu at molbio.mgh.harvard.edu
Fri Dec 1 10:19:42 EST 2006


Thanks Hilmar. I did include the -lookup switch on the command line. The 
warning messages say that the code failed to "INSERT" instead of 
"UPDATE", which sounds like a match was not found. But I was just 
loading the same Genbank file for the second time. To test if it 
actually updated the records, I made a minor modification on one of the 
COMMENT feature. Unfortunately it's not updated. By the way, the test 
genbank file has four "COMMENT" features but they are different. Any 
idea what's happening there?

I wonder if it's a bad idea to "UPDATE" a sequence.  Say I got a new 
sequence version with 5 features removed, 5 features modified and 5 
features new. If only --lookup is included, according to the POD, the 5 
new features will be inserted, the 5 modified features will be updated 
and the 5 removed features will be in the database untouched. This 
rendered the new sequence records a mixture of old and new versions. I 
did not see a reason anyone would like to have a sequence like this. 
Either include -remove to replace the old version if only one version is 
needed, or put the new version under a different name space if multiple 
versions are needed. Do I have the correct understanding of these issues?

I deeply appreciate your help.

Gang


Hilmar Lapp wrote:
> Right. You need to tell it to lookup sequences first if you know that 
> you are loading sequences which may be in the database already (see 
> the POD of load_seqdatabase.pl, switch --lookup; there are several 
> other command line options that control what will happen if a sequence 
> entry is already present in the database.).
>
> The messages in you report are warnings, not errors. It looks like 
> some of the comments are duplicated for a sequence, it doesn't look 
> like reason for concern. Is not so good if you get errors thrown.
>
>     -hilmar
>
> On Nov 30, 2006, at 5:08 PM, gang wu wrote:
>
>> Thanks Hilmar. Do you mean the NVL() clause will make 
>> load_seqdatabase.pl not work when update?
>>
>> I have problem with updating. Seems load_seqdatabase.pl only tries to 
>> insert instead of update. I used one of the test genbank file coming 
>> whith bioperl-db. Please take a look at the attached output.
>>
>> Thanks.
>>
>> Gang
>>
>> =========================================
>> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle 
>> -dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank 
>> -namespace test 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb
>> Loading 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb 
>> ...
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("This sequence was reannotated via the Ensembl system. 
>> Please visit the Ensembl web site, http://www.ensembl.org/ for more 
>> information. ","1") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("The /gene indicates a unique id for a gene, /cds a 
>> unique id for a translation and a /exon a unique id for an exon. 
>> These ids are maintained wherever possible between versions. For more 
>> information on how to interpret the feature table, please visit 
>> http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>> ...
>> ...
>> ==========================================================
>> Hilmar Lapp wrote:
>>> These are the protein translations stored in the feature table as 
>>> tags of features, right? You can change the type of the column 
>>> (although there may be some issues when you update the column 
>>> because the NVL() clause won't work if I recall that correctly), but 
>>> doing so will deprive you of any 'normal' searches against that 
>>> column. (You can still use functions >from the DBMS_LOB package, but 
>>> they will be much slower and are completely non-standard.) It is up 
>>> to you whether that is too big of a price to pay for having some 
>>> redundant protein translations (translating the feature's DNA 
>>> sequence should give you the same) in the database. I always trimmed 
>>> those feature tags off (using a custom SeqProcessor). An alternative 
>>> is to convert these feature tags into actual bioentries (i.e., 
>>> Bio::Seq objects; again, a custom SeqProcessor will allow you to do 
>>> that). -hilmar On Nov 28, 2006, at 4:13 PM, gang wu wrote:
>>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank 
>>>> genome sequences to my Oracle BioSQL database. I saw some 
>>>> errors(See attached warning message) related to 
>>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE 
>>>> column), which has Varchar2 data type of maximum 4000 bytes. Did 
>>>> anybody mention this issue before? Should I just modify the column 
>>>> to a type being able store more data such as LONG or CLOB? Thanks. 
>>>> Gang Log information: ============================================ 
>>>> load_seqdatabase.pl -host elegans -driver Oracle -dbname sparc 
>>>> -dbuser biosqldb-sgowner -dbpass PASS -format genbank -namespace 
>>>> genbank /genomeseq/arabidopsis//NC_003070.gbk Loading 
>>>> /genomeseq/arabidopsis//NC_003070.gbk ... -------------------- 
>>>> WARNING --------------------- MSG: SimpleValueAdaptor::add_assoc: 
>>>> unexpected failure of statement execution: ORA-01461: can bind a 
>>>> LONG value only for insert into a LONG column (DBD ERROR: error 
>>>> possibly near <*> indicator at char 12 in 'INSERT INTO 
>>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) 
>>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2] 
>>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: 
>>>> FK[Bio::SeqFeature::Generic]:14898, 
>>>> FK[Bio::Annotation::SimpleValue]:800, 
>>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSGFV 
>>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASADR 
>>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSSEI 
>>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNSFP 
>>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFERA 
>>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYTRY 
>>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRTEA 
>>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIVDI 
>>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENIDW 
>>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRFAL 
>>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGSDN 
>>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPTRY 
>>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYLYT 
>>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLRGL 
>>>> VQATYQASA! 
>>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGGKV 
>>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDEDAY 
>>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSDFV 
>>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSDSE 
>>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGTLG 
>>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQEV 
>>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLGQL 
>>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVPTL 
>>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRTVT 
>>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEMRL 
>>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRILAV 
>>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGIDEY 
>>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDGVD 
>>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKLLR 
>>>> VKLDFNFM! 
>>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNLQS 
>>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAVSN 
>>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVYCL 
>>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSGWD 
>>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADDSE 
>>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEASV 
>>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLDGL 
>>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQSS 
>>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDEDDF 
>>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAKGL 
>>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKGVA 
>>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNGCL 
>>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSMRN 
>>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPKVE 
>>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSKPL 
>>>> WLSVGADAS! 
>>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRPVY 
>>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWITND 
>>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQES 
>>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNMTS 
>>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPFSV 
>>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSESW 
>>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKNIV 
>>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILYMS 
>>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDLRV 
>>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSMAM 
>>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGVDI 
>>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEGAK 
>>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSLLR 
>>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNIMG 
>>>> QRKFIPAK! 
>>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAIDQ 
>>>> AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS", 
>>>> rank:"1" -------------------------------------------------- 
>>>> =============================================   
>>>> _______________________________________________ Bioperl-l mailing 
>>>> list Bioperl-l at lists.open-bio.org 
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>



More information about the Bioperl-l mailing list