Jay's random SeqLab notes

From GSAF

Jump to: navigation, search

My random SeqLab meanderings. --Jhannah 19:23, 31 January 2008 (CST)

Contents

[edit] Typedata / Fetch

typedata genembl:drogp*       # writes to STDOUT
typedata genembl:drogpdh*
fetch genembl:drogpdh*        # writes to files
Different databases
  Nucleic
    genbank   GenBank
    embl      EMBL
    genembl   GenBank + EMBL
    nucleic   PIR-Nucleic
  Protein
    pir       PIR-Protein
    sw        Swiss-Prot
14 different subclassifications of GenEMBL
   ba         Bacteria
   in         Invertebrate
   ...
search ONLY invertebrate for drogpdh*:
  typedata in:drogpdh*
fetch genembl:drogpdh
tofasta -check
tofasta -INfile=drogpdh.gb_in -Default
tofasta -INfile=drogpdh.gb_in -BEGin=20 -END=60
Reference Searching
   lookup    stringsearch    names
SequenceSearching
   blast     netblast
SequenceRetrieval
   fetch     netfetch
lookup -check
lookup Kit -ORG="Rattus norvegicus"
stringsearch genEMBL:* catalase

stringsearch genEMBL:* catalase -MEN=A -OUT=myfile.list
lookup mouse catalase -IN=@myfile.list
reformat -RSF @myfile.list

[edit] X Windows

From Mac OS X:

  • Launch X11
  • ssh -X gsaf.unmc.edu
  • xwindows
  • dotplot nm_022264.seq.pnt

From Windows:

  • Launch Cygwin
  • In PuTTY, click the "Enable X11 forwarding" box
  • ssh to gsaf.unmc.edu
  • xwindows
  • dotplot nm_022264.seq.pnt

[edit] netblastn

netfetch -check
netfetch drogpdh
reformat drogpdh.rsf{*}
netblast drogpdh.seq -DBNucleotideonly -LIStsize=20   # creates drogpdh.netblastn
netfetch drogpdh.netblastn -TOP=10 -TYPe=n -OUT=drogpdh.hits.rsf

[edit] netblastx, compare, dotplot

netfetch -check
netfetch drogpdh
reformat drogpdh.rsf{*}
netblast drogpdh.seq -LIStsize=20   # creates drogpdh.netblastx
netfetch drogpdh.netblastx -TOP=10 -TYPe=n -OUT=drogpdh.hits.rsf
reformat drogpdh.hits.rsf{*}
compare abb29518.seq abc96874.seq
dotplot abb29518.pnt

[edit] Homework assigned Feb 21

netfetch nm_022264
reformat nm_022264.rsf{*}
translate nm_022264.seq -BEG=45 -END=2981 -OUT=nm_022264.seq.aa
netblast nm_022264.seq.aa -LIStsize=20
netfetch nm_022264.seq.netblastp -TOP=10 -TYPe=p -OUT=nm_022264.hits.rsf
reformat nm_022264.hits.rsf{*}
compare nm_022264.seq.aa caa44354.seq
dotplot nm_022264.seq.pnt

That dotplot is pretty boring, so let's also do a dotplot from the worst hit:

netfetch ABP97102.1
reformat abp97102.rsf{*}  
compare nm_022264.seq.aa abp97102.seq -WIN=5 -OUT=worst_hit.pnt
dotplot worst_hit.pnt

Wow. That's just noise. Try these:

compare nm_022264.seq.aa abp97102.seq -WIN=10 -OUT=worst_hit.pnt
compare nm_022264.seq.aa abp97102.seq -WIN=15 -OUT=worst_hit.pnt
compare nm_022264.seq.aa abp97102.seq -WIN=20 -OUT=worst_hit.pnt

And watch the line re-emerge... :)

[edit] pileup, figure, pretty

ls *seq > seq.list
pileup @seq.list              # Creates .msf file, pileup.figure
figure -POR pileup.figure     # See the denogram

pretty -IN=@seq.list -CON -OUT=pretty.pretty
   # vi pretty.pretty to see consensus

[edit] map, mapplot, mapsort, plasmidmap

map is text, mapplot is pretty version of whole seq, mapsort shows us all the fragments we'd end up with.

map -INfile=x02513.seq -Default
map -INfile=x02513.seq -BEGin=1 -END=7249 -ENZ=* -TAB
mapplot -INfile=gb_sy:synpbr322 -OUTfile=synpbr322.mapplot
mapplot -INfile=gb_sy:synpbr322 -OUTfile=synpbr322.mapplot -EXCL=500,2000 -DAT=pUC18_cutters.dat

These are very important to us. Find only those that cut once when we ignore some ranges:

-EXC=500,2000
-ONCe
mapsort -INfile=genbank:synpbr322 -CIR -ENZ=* -ONC -EXCL=10,200
# creates text file synpbr322.mapsort
mapsort -INfile=genbank:synpbr322 -CIR -ENZ=* -PLA -ONC -EXCL=10,200
plasmidmap synpbr322.tick   # pretty version!