MAST667-011 Intro PERL for biologists

l o a d i n g . . .

MAST667 Intro PERL
<!--{{{-->
<link rel='alternate' type='application/rss+xml' title='RSS' href='index.xml'/>
<!--}}}-->
Background: #fff
Foreground: #000
PrimaryPale: #8cf
PrimaryLight: #18f
PrimaryMid: #04b
PrimaryDark: #014
SecondaryPale: #ffc
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #841
TertiaryPale: #eee
TertiaryLight: #ccc
TertiaryMid: #999
TertiaryDark: #666
Error: #f88
/*{{{*/
body {background:[[ColorPalette::Background]]; color:[[ColorPalette::Foreground]];}

a {color:[[ColorPalette::PrimaryMid]];}
a:hover {background-color:[[ColorPalette::PrimaryMid]]; color:[[ColorPalette::Background]];}
a img {border:0;}

h1,h2,h3,h4,h5,h6 {color:[[ColorPalette::SecondaryDark]]; background:transparent;}
h1 {border-bottom:2px solid [[ColorPalette::TertiaryLight]];}
h2,h3 {border-bottom:1px solid [[ColorPalette::TertiaryLight]];}

.button {color:[[ColorPalette::PrimaryDark]]; border:1px solid [[ColorPalette::Background]];}
.button:hover {color:[[ColorPalette::PrimaryDark]]; background:[[ColorPalette::SecondaryLight]]; border-color:[[ColorPalette::SecondaryMid]];}
.button:active {color:[[ColorPalette::Background]]; background:[[ColorPalette::SecondaryMid]]; border:1px solid [[ColorPalette::SecondaryDark]];}

.header {background:[[ColorPalette::PrimaryMid]];}
.headerShadow {color:[[ColorPalette::Foreground]];}
.headerShadow a {font-weight:normal; color:[[ColorPalette::Foreground]];}
.headerForeground {color:[[ColorPalette::Background]];}
.headerForeground a {font-weight:normal; color:[[ColorPalette::PrimaryPale]];}

.tabSelected{color:[[ColorPalette::PrimaryDark]];
	background:[[ColorPalette::TertiaryPale]];
	border-left:1px solid [[ColorPalette::TertiaryLight]];
	border-top:1px solid [[ColorPalette::TertiaryLight]];
	border-right:1px solid [[ColorPalette::TertiaryLight]];
}
.tabUnselected {color:[[ColorPalette::Background]]; background:[[ColorPalette::TertiaryMid]];}
.tabContents {color:[[ColorPalette::PrimaryDark]]; background:[[ColorPalette::TertiaryPale]]; border:1px solid [[ColorPalette::TertiaryLight]];}
.tabContents .button {border:0;}

#sidebar {}
#sidebarOptions input {border:1px solid [[ColorPalette::PrimaryMid]];}
#sidebarOptions .sliderPanel {background:[[ColorPalette::PrimaryPale]];}
#sidebarOptions .sliderPanel a {border:none;color:[[ColorPalette::PrimaryMid]];}
#sidebarOptions .sliderPanel a:hover {color:[[ColorPalette::Background]]; background:[[ColorPalette::PrimaryMid]];}
#sidebarOptions .sliderPanel a:active {color:[[ColorPalette::PrimaryMid]]; background:[[ColorPalette::Background]];}

.wizard {background:[[ColorPalette::PrimaryPale]]; border:1px solid [[ColorPalette::PrimaryMid]];}
.wizard h1 {color:[[ColorPalette::PrimaryDark]]; border:none;}
.wizard h2 {color:[[ColorPalette::Foreground]]; border:none;}
.wizardStep {background:[[ColorPalette::Background]]; color:[[ColorPalette::Foreground]];
	border:1px solid [[ColorPalette::PrimaryMid]];}
.wizardStep.wizardStepDone {background:[[ColorPalette::TertiaryLight]];}
.wizardFooter {background:[[ColorPalette::PrimaryPale]];}
.wizardFooter .status {background:[[ColorPalette::PrimaryDark]]; color:[[ColorPalette::Background]];}
.wizard .button {color:[[ColorPalette::Foreground]]; background:[[ColorPalette::SecondaryLight]]; border: 1px solid;
	border-color:[[ColorPalette::SecondaryPale]] [[ColorPalette::SecondaryDark]] [[ColorPalette::SecondaryDark]] [[ColorPalette::SecondaryPale]];}
.wizard .button:hover {color:[[ColorPalette::Foreground]]; background:[[ColorPalette::Background]];}
.wizard .button:active {color:[[ColorPalette::Background]]; background:[[ColorPalette::Foreground]]; border: 1px solid;
	border-color:[[ColorPalette::PrimaryDark]] [[ColorPalette::PrimaryPale]] [[ColorPalette::PrimaryPale]] [[ColorPalette::PrimaryDark]];}

#messageArea {border:1px solid [[ColorPalette::SecondaryMid]]; background:[[ColorPalette::SecondaryLight]]; color:[[ColorPalette::Foreground]];}
#messageArea .button {color:[[ColorPalette::PrimaryMid]]; background:[[ColorPalette::SecondaryPale]]; border:none;}

.popupTiddler {background:[[ColorPalette::TertiaryPale]]; border:2px solid [[ColorPalette::TertiaryMid]];}

.popup {background:[[ColorPalette::TertiaryPale]]; color:[[ColorPalette::TertiaryDark]]; border-left:1px solid [[ColorPalette::TertiaryMid]]; border-top:1px solid [[ColorPalette::TertiaryMid]]; border-right:2px solid [[ColorPalette::TertiaryDark]]; border-bottom:2px solid [[ColorPalette::TertiaryDark]];}
.popup hr {color:[[ColorPalette::PrimaryDark]]; background:[[ColorPalette::PrimaryDark]]; border-bottom:1px;}
.popup li.disabled {color:[[ColorPalette::TertiaryMid]];}
.popup li a, .popup li a:visited {color:[[ColorPalette::Foreground]]; border: none;}
.popup li a:hover {background:[[ColorPalette::SecondaryLight]]; color:[[ColorPalette::Foreground]]; border: none;}
.popup li a:active {background:[[ColorPalette::SecondaryPale]]; color:[[ColorPalette::Foreground]]; border: none;}
.popupHighlight {background:[[ColorPalette::Background]]; color:[[ColorPalette::Foreground]];}
.listBreak div {border-bottom:1px solid [[ColorPalette::TertiaryDark]];}

.tiddler .defaultCommand {font-weight:bold;}

.shadow .title {color:[[ColorPalette::TertiaryDark]];}

.title {color:[[ColorPalette::SecondaryDark]];}
.subtitle {color:[[ColorPalette::TertiaryDark]];}

.toolbar {color:[[ColorPalette::PrimaryMid]];}
.toolbar a {color:[[ColorPalette::TertiaryLight]];}
.selected .toolbar a {color:[[ColorPalette::TertiaryMid]];}
.selected .toolbar a:hover {color:[[ColorPalette::Foreground]];}

.tagging, .tagged {border:1px solid [[ColorPalette::TertiaryPale]]; background-color:[[ColorPalette::TertiaryPale]];}
.selected .tagging, .selected .tagged {background-color:[[ColorPalette::TertiaryLight]]; border:1px solid [[ColorPalette::TertiaryMid]];}
.tagging .listTitle, .tagged .listTitle {color:[[ColorPalette::PrimaryDark]];}
.tagging .button, .tagged .button {border:none;}

.footer {color:[[ColorPalette::TertiaryLight]];}
.selected .footer {color:[[ColorPalette::TertiaryMid]];}

.sparkline {background:[[ColorPalette::PrimaryPale]]; border:0;}
.sparktick {background:[[ColorPalette::PrimaryDark]];}

.error, .errorButton {color:[[ColorPalette::Foreground]]; background:[[ColorPalette::Error]];}
.warning {color:[[ColorPalette::Foreground]]; background:[[ColorPalette::SecondaryPale]];}
.lowlight {background:[[ColorPalette::TertiaryLight]];}

.zoomer {background:none; color:[[ColorPalette::TertiaryMid]]; border:3px solid [[ColorPalette::TertiaryMid]];}

.imageLink, #displayArea .imageLink {background:transparent;}

.annotation {background:[[ColorPalette::SecondaryLight]]; color:[[ColorPalette::Foreground]]; border:2px solid [[ColorPalette::SecondaryMid]];}

.viewer .listTitle {list-style-type:none; margin-left:-2em;}
.viewer .button {border:1px solid [[ColorPalette::SecondaryMid]];}
.viewer blockquote {border-left:3px solid [[ColorPalette::TertiaryDark]];}

.viewer table, table.twtable {border:2px solid [[ColorPalette::TertiaryDark]];}
.viewer th, .viewer thead td, .twtable th, .twtable thead td {background:[[ColorPalette::SecondaryMid]]; border:1px solid [[ColorPalette::TertiaryDark]]; color:[[ColorPalette::Background]];}
.viewer td, .viewer tr, .twtable td, .twtable tr {border:1px solid [[ColorPalette::TertiaryDark]];}

.viewer pre {border:1px solid [[ColorPalette::SecondaryLight]]; background:[[ColorPalette::SecondaryPale]];}
.viewer code {color:[[ColorPalette::SecondaryDark]];}
.viewer hr {border:0; border-top:dashed 1px [[ColorPalette::TertiaryDark]]; color:[[ColorPalette::TertiaryDark]];}

.highlight, .marked {background:[[ColorPalette::SecondaryLight]];}

.editor input {border:1px solid [[ColorPalette::PrimaryMid]];}
.editor textarea {border:1px solid [[ColorPalette::PrimaryMid]]; width:100%;}
.editorFooter {color:[[ColorPalette::TertiaryMid]];}

#backstageArea {background:[[ColorPalette::Foreground]]; color:[[ColorPalette::TertiaryMid]];}
#backstageArea a {background:[[ColorPalette::Foreground]]; color:[[ColorPalette::Background]]; border:none;}
#backstageArea a:hover {background:[[ColorPalette::SecondaryLight]]; color:[[ColorPalette::Foreground]]; }
#backstageArea a.backstageSelTab {background:[[ColorPalette::Background]]; color:[[ColorPalette::Foreground]];}
#backstageButton a {background:none; color:[[ColorPalette::Background]]; border:none;}
#backstageButton a:hover {background:[[ColorPalette::Foreground]]; color:[[ColorPalette::Background]]; border:none;}
#backstagePanel {background:[[ColorPalette::Background]]; border-color: [[ColorPalette::Background]] [[ColorPalette::TertiaryDark]] [[ColorPalette::TertiaryDark]] [[ColorPalette::TertiaryDark]];}
.backstagePanelFooter .button {border:none; color:[[ColorPalette::Background]];}
.backstagePanelFooter .button:hover {color:[[ColorPalette::Foreground]];}
#backstageCloak {background:[[ColorPalette::Foreground]]; opacity:0.6; filter:'alpha(opacity:60)';}
/*}}}*/
/*{{{*/
* html .tiddler {height:1%;}

body {font-size:.75em; font-family:arial,helvetica; margin:0; padding:0;}

h1,h2,h3,h4,h5,h6 {font-weight:bold; text-decoration:none;}
h1,h2,h3 {padding-bottom:1px; margin-top:1.2em;margin-bottom:0.3em;}
h4,h5,h6 {margin-top:1em;}
h1 {font-size:1.35em;}
h2 {font-size:1.25em;}
h3 {font-size:1.1em;}
h4 {font-size:1em;}
h5 {font-size:.9em;}

hr {height:1px;}

a {text-decoration:none;}

dt {font-weight:bold;}

ol {list-style-type:decimal;}
ol ol {list-style-type:lower-alpha;}
ol ol ol {list-style-type:lower-roman;}
ol ol ol ol {list-style-type:decimal;}
ol ol ol ol ol {list-style-type:lower-alpha;}
ol ol ol ol ol ol {list-style-type:lower-roman;}
ol ol ol ol ol ol ol {list-style-type:decimal;}

.txtOptionInput {width:11em;}

#contentWrapper .chkOptionInput {border:0;}

.externalLink {text-decoration:underline;}

.indent {margin-left:3em;}
.outdent {margin-left:3em; text-indent:-3em;}
code.escaped {white-space:nowrap;}

.tiddlyLinkExisting {font-weight:bold;}
.tiddlyLinkNonExisting {font-style:italic;}

/* the 'a' is required for IE, otherwise it renders the whole tiddler in bold */
a.tiddlyLinkNonExisting.shadow {font-weight:bold;}

#mainMenu .tiddlyLinkExisting,
	#mainMenu .tiddlyLinkNonExisting,
	#sidebarTabs .tiddlyLinkNonExisting {font-weight:normal; font-style:normal;}
#sidebarTabs .tiddlyLinkExisting {font-weight:bold; font-style:normal;}

.header {position:relative;}
.header a:hover {background:transparent;}
.headerShadow {position:relative; padding:4.5em 0em 1em 1em; left:-1px; top:-1px;}
.headerForeground {position:absolute; padding:4.5em 0em 1em 1em; left:0px; top:0px;}

.siteTitle {font-size:3em;}
.siteSubtitle {font-size:1.2em;}

#mainMenu {position:absolute; left:0; width:10em; text-align:right; line-height:1.6em; padding:1.5em 0.5em 0.5em 0.5em; font-size:1.1em;}

#sidebar {position:absolute; right:3px; width:16em; font-size:.9em;}
#sidebarOptions {padding-top:0.3em;}
#sidebarOptions a {margin:0em 0.2em; padding:0.2em 0.3em; display:block;}
#sidebarOptions input {margin:0.4em 0.5em;}
#sidebarOptions .sliderPanel {margin-left:1em; padding:0.5em; font-size:.85em;}
#sidebarOptions .sliderPanel a {font-weight:bold; display:inline; padding:0;}
#sidebarOptions .sliderPanel input {margin:0 0 .3em 0;}
#sidebarTabs .tabContents {width:15em; overflow:hidden;}

.wizard {padding:0.1em 1em 0em 2em;}
.wizard h1 {font-size:2em; font-weight:bold; background:none; padding:0em 0em 0em 0em; margin:0.4em 0em 0.2em 0em;}
.wizard h2 {font-size:1.2em; font-weight:bold; background:none; padding:0em 0em 0em 0em; margin:0.4em 0em 0.2em 0em;}
.wizardStep {padding:1em 1em 1em 1em;}
.wizard .button {margin:0.5em 0em 0em 0em; font-size:1.2em;}
.wizardFooter {padding:0.8em 0.4em 0.8em 0em;}
.wizardFooter .status {padding:0em 0.4em 0em 0.4em; margin-left:1em;}
.wizard .button {padding:0.1em 0.2em 0.1em 0.2em;}

#messageArea {position:fixed; top:2em; right:0em; margin:0.5em; padding:0.5em; z-index:2000; _position:absolute;}
.messageToolbar {display:block; text-align:right; padding:0.2em 0.2em 0.2em 0.2em;}
#messageArea a {text-decoration:underline;}

.tiddlerPopupButton {padding:0.2em 0.2em 0.2em 0.2em;}
.popupTiddler {position: absolute; z-index:300; padding:1em 1em 1em 1em; margin:0;}

.popup {position:absolute; z-index:300; font-size:.9em; padding:0; list-style:none; margin:0;}
.popup .popupMessage {padding:0.4em;}
.popup hr {display:block; height:1px; width:auto; padding:0; margin:0.2em 0em;}
.popup li.disabled {padding:0.4em;}
.popup li a {display:block; padding:0.4em; font-weight:normal; cursor:pointer;}
.listBreak {font-size:1px; line-height:1px;}
.listBreak div {margin:2px 0;}

.tabset {padding:1em 0em 0em 0.5em;}
.tab {margin:0em 0em 0em 0.25em; padding:2px;}
.tabContents {padding:0.5em;}
.tabContents ul, .tabContents ol {margin:0; padding:0;}
.txtMainTab .tabContents li {list-style:none;}
.tabContents li.listLink { margin-left:.75em;}

#contentWrapper {display:block;}
#splashScreen {display:none;}

#displayArea {margin:1em 17em 0em 14em;}

.toolbar {text-align:right; font-size:.9em;}

.tiddler {padding:1em 1em 0em 1em;}

.missing .viewer,.missing .title {font-style:italic;}

.title {font-size:1.6em; font-weight:bold;}

.missing .subtitle {display:none;}
.subtitle {font-size:1.1em;}

.tiddler .button {padding:0.2em 0.4em;}

.tagging {margin:0.5em 0.5em 0.5em 0; float:left; display:none;}
.isTag .tagging {display:block;}
.tagged {margin:0.5em; float:right;}
.tagging, .tagged {font-size:0.9em; padding:0.25em;}
.tagging ul, .tagged ul {list-style:none; margin:0.25em; padding:0;}
.tagClear {clear:both;}

.footer {font-size:.9em;}
.footer li {display:inline;}

.annotation {padding:0.5em; margin:0.5em;}

* html .viewer pre {width:99%; padding:0 0 1em 0;}
.viewer {line-height:1.4em; padding-top:0.5em;}
.viewer .button {margin:0em 0.25em; padding:0em 0.25em;}
.viewer blockquote {line-height:1.5em; padding-left:0.8em;margin-left:2.5em;}
.viewer ul, .viewer ol {margin-left:0.5em; padding-left:1.5em;}

.viewer table, table.twtable {border-collapse:collapse; margin:0.8em 1.0em;}
.viewer th, .viewer td, .viewer tr,.viewer caption,.twtable th, .twtable td, .twtable tr,.twtable caption {padding:3px;}
table.listView {font-size:0.85em; margin:0.8em 1.0em;}
table.listView th, table.listView td, table.listView tr {padding:0px 3px 0px 3px;}

.viewer pre {padding:0.5em; margin-left:0.5em; font-size:1.2em; line-height:1.4em; overflow:auto;}
.viewer code {font-size:1.2em; line-height:1.4em;}

.editor {font-size:1.1em;}
.editor input, .editor textarea {display:block; width:100%; font:inherit;}
.editorFooter {padding:0.25em 0em; font-size:.9em;}
.editorFooter .button {padding-top:0px; padding-bottom:0px;}

.fieldsetFix {border:0; padding:0; margin:1px 0px 1px 0px;}

.sparkline {line-height:1em;}
.sparktick {outline:0;}

.zoomer {font-size:1.1em; position:absolute; overflow:hidden;}
.zoomer div {padding:1em;}

* html #backstage {width:99%;}
* html #backstageArea {width:99%;}
#backstageArea {display:none; position:relative; overflow: hidden; z-index:150; padding:0.3em 0.5em 0.3em 0.5em;}
#backstageToolbar {position:relative;}
#backstageArea a {font-weight:bold; margin-left:0.5em; padding:0.3em 0.5em 0.3em 0.5em;}
#backstageButton {display:none; position:absolute; z-index:175; top:0em; right:0em;}
#backstageButton a {padding:0.1em 0.4em 0.1em 0.4em; margin:0.1em 0.1em 0.1em 0.1em;}
#backstage {position:relative; width:100%; z-index:50;}
#backstagePanel {display:none; z-index:100; position:absolute; margin:0em 3em 0em 3em; padding:1em 1em 1em 1em;}
.backstagePanelFooter {padding-top:0.2em; float:right;}
.backstagePanelFooter a {padding:0.2em 0.4em 0.2em 0.4em;}
#backstageCloak {display:none; z-index:20; position:absolute; width:100%; height:100px;}

.whenBackstage {display:none;}
.backstageVisible .whenBackstage {display:block;}
/*}}}*/
/***
StyleSheet for use when a translation requires any css style changes.
This StyleSheet can be used directly by languages such as Chinese, Japanese and Korean which need larger font sizes.
***/
/*{{{*/
body {font-size:0.8em;}
#sidebarOptions {font-size:1.05em;}
#sidebarOptions a {font-style:normal;}
#sidebarOptions .sliderPanel {font-size:0.95em;}
.subtitle {font-size:0.8em;}
.viewer table.listView {font-size:0.95em;}
/*}}}*/
/*{{{*/
@media print {
#mainMenu, #sidebar, #messageArea, .toolbar, #backstageButton, #backstageArea {display: none ! important;}
#displayArea {margin: 1em 1em 0em 1em;}
/* Fixes a feature in Firefox 1.5.0.2 where print preview displays the noscript content */
noscript {display:none;}
}
/*}}}*/
<!--{{{-->
<div class='header' macro='gradient vert [[ColorPalette::PrimaryLight]] [[ColorPalette::PrimaryMid]]'>
<div class='headerShadow'>
<span class='siteTitle' refresh='content' tiddler='SiteTitle'></span>&nbsp;
<span class='siteSubtitle' refresh='content' tiddler='SiteSubtitle'></span>
</div>
<div class='headerForeground'>
<span class='siteTitle' refresh='content' tiddler='SiteTitle'></span>&nbsp;
<span class='siteSubtitle' refresh='content' tiddler='SiteSubtitle'></span>
</div>
</div>
<div id='mainMenu' refresh='content' tiddler='MainMenu'></div>
<div id='sidebar'>
<div id='sidebarOptions' refresh='content' tiddler='SideBarOptions'></div>
<div id='sidebarTabs' refresh='content' force='true' tiddler='SideBarTabs'></div>
</div>
<div id='displayArea'>
<div id='messageArea'></div>
<div id='tiddlerDisplay'></div>
</div>
<!--}}}-->
<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='title' macro='view title'></div>
<div class='subtitle'><span macro='view modifier link'></span>, <span macro='view modified date'></span> (<span macro='message views.wikified.createdPrompt'></span> <span macro='view created date'></span>)</div>
<div class='tagging' macro='tagging'></div>
<div class='tagged' macro='tags'></div>
<div class='viewer' macro='view text wikified'></div>
<div class='tagClear'></div>
<!--}}}-->
<!--{{{-->
<div class='toolbar' macro='toolbar +saveTiddler -cancelTiddler deleteTiddler'></div>
<div class='title' macro='view title'></div>
<div class='editor' macro='edit title'></div>
<div macro='annotations'></div>
<div class='editor' macro='edit text'></div>
<div class='editor' macro='edit tags'></div><div class='editorFooter'><span macro='message views.editor.tagPrompt'></span><span macro='tagChooser'></span></div>
<!--}}}-->
To get started with this blank TiddlyWiki, you'll need to modify the following tiddlers:
* SiteTitle & SiteSubtitle: The title and subtitle of the site, as shown above (after saving, they will also appear in the browser title bar)
* MainMenu: The menu (usually on the left)
* DefaultTiddlers: Contains the names of the tiddlers that you want to appear when the TiddlyWiki is opened
You'll also need to enter your username for signing your edits: <<option txtUserName>>
These InterfaceOptions for customising TiddlyWiki are saved in your browser

Your username for signing your edits. Write it as a WikiWord (eg JoeBloggs)

<<option txtUserName>>
<<option chkSaveBackups>> SaveBackups
<<option chkAutoSave>> AutoSave
<<option chkRegExpSearch>> RegExpSearch
<<option chkCaseSensitiveSearch>> CaseSensitiveSearch
<<option chkAnimate>> EnableAnimations

----
Also see AdvancedOptions
<<importTiddlers>>
|''Type:''|file|
|''URL:''|http://mptw.tiddlyspot.com|
|''Workspace:''|(default)|

This tiddler was automatically created to record the details of this server
|''Type:''|file|
|''URL:''|http://tiddlywiki.bidix.info|
|''Workspace:''|(default)|

This tiddler was automatically created to record the details of this server
|''Type:''|file|
|''URL:''|http://tw.lewcid.org|
|''Workspace:''|(default)|

This tiddler was automatically created to record the details of this server
|''Type:''|file|
|''URL:''|http://www.tiddlytools.com|
|''Workspace:''|(default)|

This tiddler was automatically created to record the details of this server
<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='01macro='tiddler 01subtopicMenu'></div><div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div><div class='tagClear'></div>
<!--}}}-->
[[BACK to Project Start|AAFCstart]]
!!!
!File Dump Subroutine
Whenever I am generating output files, I find it easier to keep track of things in folders if the secondary files I generate all start with some part of the initial data input file. This way, in an alphabetical sort, these output files will always be adjacent to the original input file. So under the ''GLOBAL VARIABLES'' section add these teo lines of code:
{{{
# Grab the file name w/o the period or extension
	$infile =~ m/^(.*)\./; # match from beginning to period
	my $fileroot = $1;     # set $fileroot to pattern match $1
}}}
The first line is PERL regex. For example, if you define $infile as "~Arabidopsis-TAIR8-NT-cd95.ffn", then the first line says start at the beginning character of infile [^], then match any number [*] of any character [.] up to the first period [\.] and save all those characters in memory [()]. The second line just says define $fileroot as the characters that are stored in that first [$1] default memory location from the previous regex match. So at this point, $fileroot will equal ''"~Arabidopsis-TAIR8-NT-cd95"''. Now we'll use this character string to define new outfiles so that all the outfiles begin with this identifier.

So when we want to generate the output file, we first define $outfile, which in this case is going to hold the ~AAfreqs for each individual protein. Once $outfile is declared, then we call a new subroutine called ~FileDump
{{{
	my $outfile = $fileroot . "-AAfreqs.txt";  # make outfile name
	&FileDump($outfile);
}}}

!!!~FileDump
{{{
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub FileDump
{	my $file = $_[0];          # get the $outfile name from the parantheses
	pop(@AA);                     # Remove the "X" amino acid from the array
	open(OUT,">$file");
# PRINT file header . . . . 
	print OUT "NAME";
	foreach my $a (@AA){   print OUT "\t$a";  }
	print OUT "\n";
# PRINT the AAfreqs to file . . . . . 
	foreach my $name (keys %AAfreq)
	{	print OUT "$name";
		foreach my $a (@AA){ print OUT "\t$AAfreq{$name}{$a}"; }
		print OUT "\n";
	}
	close(OUT);
}	
# - - - - - - - - - - - - - - - - - - - - - - - - - -
}}}

[[BACK to AA Freq Count assignment|AAFreqCode]]
!!!
!Help and Hints for HW#2

''1.'' If you are having trouble copying and pasting the subroutines into a working script, here's a fully functional version that you can use right off the shelf: @@[[AAfreqCountTable]]@@

''2.'' If you can't get the program to run at all, then here's the output table it should have generated. Just download the table directly from [[HERE|05/Arabidopsis-TAIR8-NT-cd95-AAfreqs.txt]]

''3.'' If you can't generate an xy plot in Excel, here's a rough one I did for T vs. W that you can download, and then email back to me: [[TvW|05/TvW.png]]. Note, that Excel is slow with lots of points like this so it may take 30 minutes for it to compose the image. Before you plot, make sure you have saved any other work you have been doing on your computer.
<html><img src="05/TvW.png" style="height:200px"></html>


!
[[BACK to Project Page|AAFreqCode]]
!!!
!Calculate Amino Acid Frequencies
We've done this in AAcount. Now we just need to collect those AA freq values into a separate file so that we can work with the numbers for each protein. In this figure, the amino acid cross-correlations are compared using a data set of 26,000 proteins. The lower panels show an xy plot of each AA freq but just with a black ellipse surrounding 95% of the points and a red "weighted" trend line; the upper panel shows red for negative correlations and blue for positive correlations:
<html><img src="05/ArabT8-AAfreqs.png" style="height:500px"></html>
!!!Data
Looking at just one of the amino acid correlation plots, the figure below shows the freq of Leucine and Isoleucine in the ~TAIR8 proteome, each point is one protein, 26,000+ proteins plotted. The contours plot point densities so you can see where the bulk of the overlaid points is located. ''What does it mean?''
<html><img src="05/ArabT8-LvIplot.png" style="height:500px"></html>
''In order to do ANY informatic work, you have to have data to explore/manipulate. Because of the inherent complexity of genomes, you need lots of data. In a plot where n > 26,000, there is a lot of potential statistical power.''
!
[[BACK to Project Page|AAFreqCode]]
!!!
!Calculate Amino Acid Frequencies
# DOWNLOAD the Arabidopsis ~TAIR8 ffn file
** This is a 35 MB file. It has been preprocessed by removing duplicate or redundant genes (>95% amino acid identity). It is in a standard fasta nt format.
** Click to download @@[[Arabidopsis-TAIR8-NT-cd95.ffn|05/Arabidopsis-TAIR8-NT-cd95.ffn]]@@
# Script Skeleton: You will need the following subroutines . . . 
** 	&ReadFasta($infile);
**	&LoadCodonTable($codontable);
***  with $codontable defined as "Standard";
**	&[[TranslateFasta|TransFasta]];
**	&[[AAfreq|AAcounter]];
**	&[[Round]];
# Edit the ''~AAfreq'' subroutine
**     We need to do some variable processing because of the length of the "header" information that is included in the ~TAIR8 fasta file.
**     Here's the page with the edited subroutine page: @@AAFreqTAIR8@@
# Add the file print routine . . . . 
**     There's a new strategy for naming $outfile.
**     The filedump code is contained in a subroutine
**     Sequence notes: @@AAFCfiledump@@
# Now you should be able to generate a data file with all the AA freqs for each protein in the ~TAIR8 fasta file.
**      This is a tab-delimited file and you can open it in any spreadsheet
**      ''ASSIGNMENT:''
***     Make an xy plot of any two amino acids you choose.
***     Save the plot as an image file (jpg, png, bmp, pdf)
***     Email that plot to amarsh@udel.edu by 5 pm Monday, 06 OCT
***     It doesn't have to be a fancy plot like the one I did. Just an XY plot of the data to show me that you can do all these steps and actually do something useful with the data.
<html><img src="05/ArabT8-LvIplot.png" style="height:300px"></html>
[[BACK to AAcount assignment|AAcount]]
!!!
!AA Frequencies in a genome
We still haven't addressed the original goal of this coding exercise:
| @@Within a single genome, are all proteins created equal?@@ |
We will pursue this informatic question by focusing on a quantitative analysis of amino acid usage in the Arabidopsis (~TAIR8) proteome.  
# Project Overview @@[[AAFCoverview]]@@
# Getting Started: @@[[AAFCstart]]@@
# ''ASSIGNMENT 30SEP:''
**      After reading the overview and start pages above, it will be clear what the first task of this project will be. 
**      Quick Summary:
***     Make an xy plot of the % composition of any two amino acids for all proteins (> 60 amino acids) in the ~TAIR8 fasta file.
***     Save the plot as an image file (jpg, png, bmp, pdf).
***     Email that plot to amarsh@udel.edu by 5 pm Monday, 06 OCT.
# ''HELP FILES:'' @@[[AAFChint1]]@@

!
[[BACK to Project Start|AAFCstart]]
!!!
!Edit the ~AAfreq subroutine
The code below has two changes. First, we need to reduce the text complexity of the header information. This accomplished by splitting the $name variable on the "|" character, then recombining only the first and third components to make a $newname. Second, we are going to put a filter in place so that we only work with proteins that are greater than 60 amino acids in length. Also note, that a new hash name is used here {{{%AAfreq}}} just to make it clear what values it stores. And note that we are calculating the frequency as a percentage (ratio * 100).
{{{
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub AAfreq
{   my $protcount = 0;
	foreach my $name (keys %PRTs)
	{   #!!! Arabidopsis: process $name to reduce complexity . . . . 
			my @n = split(/\|/,$name);
			my $newname = $n[0] . $n[2];
		foreach my $a (@AA)
		{	$AAprotcount{$name}{$a} = 0; $AAfreq{$newname}{$a} = 0; }
		my @aminoacids = split(//,$PRTs{$name});
		my $AAcount = 0;
		foreach my $aa (@aminoacids) 
		{	# make sure we are only counting AAs
			if ($aa =~ m/[ACDEFGHIKLMNPQRSTVWYX]/)     
			{	$AAprotcount{$name}{$aa} += 1;   
				$AAgenomecount{$aa} += 1;
				$AAcount += 1;
			}
		}
	  #!!! Only consider proteins > 60 AAs in length . . . . 
		if ($AAcount > 60)
		{	foreach my $aa1 (@AA)
			{	$AAfreq{$newname}{$aa1} = 
				&Round(100*$AAprotcount{$name}{$aa1}/$AAcount);
			}
			$protcount += 1;
		}
	}
	print "There are $protcount proteins in the current AAfreq analysis\n\n";
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
}}}
[[BACK to Lecture 4|L04]]
[[Jump to the AA Freq Count assignment|AAFreqCode]]
!!!
''Code Work Assignment #1:'' Due 5 pm Friday 26 SEP via email to //amarsh@udel.edu//
!!!
!Amino Acid Metrics:
To start any kind of quantitative comparisons, we need a well defined question or goal to direct and focus our attention and efforts. Here, we are going to look at secondary Amino Acid metrics of sequences. For this exercise, we will address the following:
| @@Within a single genome, are all proteins created equal?@@ |
* ''Clear Computational Goals:''
## A genome has an average amino acid composition across all the proteins that are encoded by its genes.
## In microbial genomes, there are substantial shifts in amino acid composition associated with specific environmental habitats, and we conclude that amino acid composition is a component of the process of environmental adaptation.
## What we want to know is that if we compare the amino acid composition of individual proteins within a genome, to the overall genome average:
*** Which proteins are the most different?
*** Which amino acids are the most conserved and divergent?

As usual in Science, asking the questions is the easy part. So how do we begin?
# You will need to select a genome to work with that has a FFN file (fasta file with the nucleotide sequence of each annotated ORF). If you don't have a favorite genome, select an FFN file from this @@[[LIST HERE]]@@.
# Using the @@FASTAtranslate2@@ script, you can easily read the FFN fasta, make a protein translation, and generate a protein fasta file. We'll start the exercise from this point.
# The computational tasks are fairly straight forward:
** count the AA's in all the proteins
** calculate the frequency composition for each amino acid
** go back and count the AA's for each protein
** compare the frequency values for each individual protein to the genome composition

''First Code Suggestions are posted here: @@AAcount1@@'' 
''Second Code Suggestions are posted here: @@AAcount2@@'' 
''Third Code Suggestions:'' check the subroutines on CodeWorks. //A counting subroutine has been posted.//
''Fourth Code Suggestions:'' working script for [[Lecture 4|L04]] is posted: @@AAcount3@@ 
''Fifth Code Suggestions:'' script we built in class for [[Lecture 4|L04]]: @@AAcount4@@
''Sixth Code Suggestions:'' code block for doing a mean claculation: @@AAcount5@@
| Note that the calcs for AA freq values using the two different hash arrays (across genome vs. across proteins) will not be exactly equivalent: @@AAdistribution@@ |
''Seventh Code Suggestion:'' here's a fully commented version: @@MaungCode01@@

* ''What Is Due On Friday:''
** First, working code is great.
*** ''TRY TO CALCULATE THE AVERAGE AMINO FREQUENCY ACROSS ALL INDIVIDUAL PROTEINS''
** Second, non-working code, but well commented to show me what you were trying to do is very good.
** Third, some code and some text description to describe the logic that you see to solve the problem even if you are unsure of how to code it in PERL is good.
** Fourth, a 500 word essay describing your attempts at PERL code is not good.
** Fifth, an interpretive dance of the PERL code is not acceptable.

Just try to solve the steps that you can. I'll cover some of this in [[Lecture 04|L04]] as well. 
 
!!!
[[BACK to Lecture 4|L04]]
!
!First Installment
[[BACK to Assignment|AAcount]] | [[Go to Next Code Hint|AAcount2]]
!!!
Here's the first steps to consider:
I will get you started with ideas about how to count amino acids:
{{{
       # set up an array with each amino acid letter
       my @AA = qw |A C D E F G H I K L M N P Q R S T V W Y |; 
       my %AAcount;

       foreach my $name (keys %PRTs)
       {         # split the protein sequence into an array of AA characters
                 my @aminoacids = split(//,$PRTs{$name});    
                 
                 foreach my $aa (@aminoacids) 
                 {    # make sure we are only counting AAs
                       if ($aa =~ m/[ACDEFGHIKLMNPQRSTVWY]/)     
                       {        $AAcount{$aa} += 1;   }
                 }
       }

      # check what's in %AAcount:
     foreach my $aminoacid (sort keys %AAcount) 
     {         print "$aminoacid = $AAcount{$aminoacid}\n"; }

}}}
//Note: see @@[[qw]]@@ or @@[[=~|pattern matching]]@@ for more details//
       
* Once you can count Amino Acids, then start to think about:
** how to calculate and store in memory the frequency of each amino acid
** how to print the AA data to a file so that you can actually work with it in ''~MatLab'' or ''R'' or ''~OpenOffice'' or, (sigh), even ''Microsoft Office Excel^^TM^^''.
*** hint: {{{ print OUTFILE "$aminoacid\t$AAcount{$aminoacid}\n"; }}}
*** use the tab char "\t" between values in the file so they can be easily imported into other programs
** how to do the same for the several thousands of proteins in the FFN file
** how to compare the values among 20 amino acids for several thousand proteins
!
!Second Code Installment
[[BACK to Assignment|AAcount]] | [[Go to First Code Hint|AAcount1]]
!!!

You can keep track of the protein AA counts and the total genome AA counts at the same time. Just use two different hash arrays. And for keeping track of the protein counts you can use a two-dimensional hash array, where each protein name has an entry, and for each of those entries, there are 20 entries for each amino acid. We can reference these values by using TWO index values, one for the protein name and one for the amino acid name like this: 
| {{{ $AAprotcount{$name}{$aa} }}} |

Here's a code snippet that will count amino acids for each protein and then print them to a separate text file (tab-delimited) that you can open with other spreadsheet or statistics programs. You just need to declare these variables:
{{{
my %AAprotcount;
my %AAgenomecount;
my @AA = qw |A C D E F G H I K L M N P Q R S T V W Y |; 
my $outfile = "--enter your file name--";
}}}

Now, insert this code to do all the counts in one pass through the %PRTs sequence array:
{{{
open(OUT,">$outfile");
# print header line to outfile . . . . 
print OUT "NAME\tA\tC\tD\tE\tF\tG\tH\tI\tK\tL\tM\tN\tP\tQ\tR\tS\tT\tV\tW\tY\n"; 
foreach my $name (keys %PRTs)
{  # split the protein sequence into an array of AA characters
	my @aminoacids = split(//,$PRTs{$name});   
	foreach my $aa (@aminoacids) 
	{	# make sure we are only counting AAs
		if ($aa =~ m/[ACDEFGHIKLMNPQRSTVWY]/)     
		{	$AAprotcount{$name}{$aa} += 1;   
			$AAgenomecount{$aa} += 1;
		}
	}
	# Once the AAs in each protein have been counted, print to OUTFILE
	print OUT "$name";
	foreach my $aa (@AA) 
	{		print OUT "\t$AAprotcount{$name}{$aa}"; }
	print OUT "\n";
}
close(OUT);
}}}
Now open the OUTFILE you just generated in a spreadsheet or stats program and take a look at the numbers. You should be able to see how you can start to use these variable arrays to finish the calculations.

!
!Counting AAs
24 SEP, [[Lecture 4|L04]], incorporates current subroutines and strategies to date.
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - -
# AGM-SEP2008
# OBJECTIVE: 
#		1. Read FASTA file, parse headers and sequences 
#		2. Translate NT sequence into PROTEIN sequence
#		3. Calculate AA %freq values for genome
#              4. Compare genome %AA numbers to protein %AA numbers

# - - - - - U S E R    V A R I A B L E S - - - - - - 
my $infile = "Methanococcus_jannaschii-PID102-cd95.ffn";
my $codontable = "Standard";           # specify which table to use

# - - - - - G L O B A L  V A R I A B L E S  - - - - -
my @FILE;           # input array to hold file contents
my %NTs;            # nucleotide orf name & sequence
my %CodonTable;     # codon translation table
my %PRTs;           # amino acid orf names & sequence
my %AAprotcount;    # aa counts in each orf
my %AAgenomecount;  # aa counts in total genome
my @AA = qw |A C D E F G H I K L M N P Q R S T V W Y |;

# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\nDance to the music . . . . \n\n";

	&ReadFasta($infile);
	&LoadCodons($codontable);
	&TranslateFasta;
	&AAfreq;
	# - - - - - - - - - - - - - - - - - - - - - - -
	# check what's in %AAgenomecount:
	foreach my $aminoacid (@AA)
	{         print "$aminoacid = $AAgenomecount{$aminoacid}\n"; }
	# - - - - - - - - - - - - - - - - - - - - - - -

print "\n\n\nDONE\n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		$NTs{$name} = $seq;
	}
	$/="\n";  # reset to \n default before leaving subroutine
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TranslateFasta
{	# Convert the NT sequence into AAs . . . . . .
	foreach my $header (keys %NTs)
	{	my $protein = "";            # set to "empty" at the start of each loop
		for (my $i=0; $i <= length($NTs{$header})-2; $i += 3)  # another FOR-loop structure
		{	my $codon = substr($NTs{$header},$i,3);             # $codon = 3 nts at a time
			my $aa = $CodonTable{$codon};       # here's the translation step
			$protein .= $aa;
		}
		$PRTs{$header} = $protein;
	}
}

# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub AAfreq
{	foreach my $name (keys %PRTs)
	{  # split the protein sequence into an array of AA characters
		my @aminoacids = split(//,$PRTs{$name});   
		foreach my $aa (@aminoacids) 
		{	# make sure we are only counting AAs
			if ($aa =~ m/[ACDEFGHIKLMNPQRSTVWY]/)
			{	$AAprotcount{$name}{$aa} += 1;   
				$AAgenomecount{$aa} += 1;
			}
		}
	}
}
# ---------------------------------------------------------
sub LoadCodons
{
	$/=">";
	my $Table = shift(@_);
	my @TABLE = <DATA>;
	foreach my $j (@TABLE)
	{	if ($j =~ m/^ (\d){1,2} $Table/)
		{	my @k = split(/\n/,$j);
			$k[1] =~ s/Amino  //;
			foreach my $i (1..3)
			{	$k[$i+1] =~ s/Base$i  //; }
			my @AA = split(//,$k[1]);
			my @B1 = split(//,$k[2]);
			my @B2 = split(//,$k[3]);
			my @B3 = split(//,$k[4]);
			foreach my $i (0..63)
			{	$CodonTable{$B1[$i].$B2[$i].$B3[$i]} = $AA[$i]; }
		}
	}
	$/="\n";   # reset to default \n before leaving subroutine
}
# ---------------------------------------------------------
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# The lines below are not perl statements and are not executed as part of the 
# program.  Instead, they are available to be read as data input by the program
# using the I/O handle name "DATA". This is a default handle name for any data 
# you want to include in a script file.
__END__
> 0 Codon Translation Tables
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1
> 1 Standard
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
> 11 Bacteria and Archea
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
}}}
!Counting ~AAs
24 SEP, script after [[Lecture 4|L04]], incorporates the NT check from [[Glenn01]] and the [[ROUND|Round]] subroutine. 
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# 24SEP Lecture 4.
# Script built in class to demo the subroutines for AAcount

# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $infile = "Methanococcus_jannaschii-PID102-cd95.ffn";
my $codontable = "Bacteria";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @FILE;
my %NTs; 
my %CodonTable;
my %PRTs;
my %AAprotcount; 
my %AAgenomecount;
my @AA = qw |A C D E F G H I K L M N P Q R S T V W Y X |;

# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "my favorite quote or song lyric goes <here> . . . . \n";

&ReadFasta($infile);
&LoadCodons($codontable);
&TranslateFasta;
&AAfreq;
&ScreenDump;

print "\n\n\n   DONE   \n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -

# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		$NTs{$name} = $seq;
	}
       $/="\n"; # reset input break back to default
}
# ---------------------------------------------------------
sub LoadCodons
{
	$/=">";
	my $Table = shift(@_);
	my @TABLE = <DATA>;
	foreach my $j (@TABLE)
	{	if ($j =~ m/^ (\d){1,2} $Table/)
		{	my @k = split(/\n/,$j);
			$k[1] =~ s/Amino  //;
			foreach my $i (1..3)
			{	$k[$i+1] =~ s/Base$i  //; }
			my @AA = split(//,$k[1]);
			my @B1 = split(//,$k[2]);
			my @B2 = split(//,$k[3]);
			my @B3 = split(//,$k[4]);
			foreach my $i (0..63)
			{	$CodonTable{$B1[$i].$B2[$i].$B3[$i]} = $AA[$i]; }
		}
	}
	# foreach my $nnn (keys %CodonTable)
	# {  print "$nnn = $CodonTable{$nnn}\n";}
	$/="\n";   # reset back to default before leaving subroutine
}

# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TranslateFasta
{	# Convert the NT sequence into AAs . . . . . .
	foreach my $header (keys %NTs)
	{	my $protein = "";           
		for (my $i=0; $i <= length($NTs{$header})-2; $i += 3)  
		{	my $codon = substr($NTs{$header},$i,3);      
			my $aa = $CodonTable{$codon};
			# Now check to see if $aa is defined within the codon table
			if (!$aa)    # unless ($aa)                          
			{  $aa = "X";}
			
			$protein .= $aa;
		}
		$PRTs{$header} = $protein;
	}
}

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub AAfreq
{   
	foreach my $name (keys %PRTs)
	{  # Initialize all counts to zero
	     my $AAcount = 0;
		foreach my $a (@AA)
		{	$AAprotcount{$name}{$a} = 0; }
		
		# split the protein sequence into an array of AA characters
		my @aminoacids = split(//,$PRTs{$name});   
		foreach my $aa (@aminoacids) 
		{	# make sure we are only counting AAs
			if ($aa =~ m/[ACDEFGHIKLMNPQRSTVWYX]/)     
			{	$AAprotcount{$name}{$aa} += 1;   
				$AAgenomecount{$aa} += 1;
				$AAcount += 1;
			}
		}
		
		foreach my $aa1 (@AA)
		{	$AAprotcount{$name}{$aa1} = 
		        &Round($AAprotcount{$name}{$aa1}/$AAcount); 
		}
		
	}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Round
{ 	my $x = @_[0];
	my $x = (int(($x*10**4) + 0.5)/10**4);
	return $x;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ScreenDump
{   # - - - - - - - - - - - - - - - - - - - - - - -
	# check what's in %AAgenomecount:
	foreach my $aminoacid (@AA)
	{         print "$aminoacid = $AAgenomecount{$aminoacid}\n"; }

	# - - - - - - - - - - - - - - - - - - - - - - -
	# check what's in %AAprotcount
	foreach my $orf (keys %AAprotcount)
	{	print "\n>>$orf: ";
		foreach my $aa1 (@AA)
		{     print "$AAprotcount{$orf}{$aa1}| ";}
		print "\n";
	}
	# - - - - - - - - - - - - - - - - - - - - - - -
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
__END__
> 0 Codon Translation Tables
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1
> 1 Standard
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
> 11 Bacteria and Archea
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
}}}
[[BACK to Lecture 4|L04]]
[[BACK to AA Count assignment|AAcount]]
!!!
!How to calculate means . . . 
Here are some example code blocks for calculating the means for AA freqs using the two array hashes: {{{%AAprotcount}}} and {{{%AAgenomecount}}}. There are lots of ways to do this. This code is provided more as a logic map of what needs to be accomplished, and you break those tasks up into separate steps.

{{{
my $N = 0;
my %aasum;
foreach my $id (keys %AAprotcount)
{	# Need to count total proteins
	$N += 1;
	# Need to sum the freq from each protein 
	foreach my $a (@AA)
	{	$aasum{$a} += $AAprotcount{$id}{$a}; }
}

my $TOTAL = 0;
my %aameanfreq;
foreach my $a (@AA)
{	# calculate each mean freq . . . . 
	$aameanfreq{$a} = &Round($aasum{$a}/$N);
	# we also need total amino acids . . . . 
	$TOTAL += $AAgenomecount{$a};
}

foreach my $a (@AA)
{	# calculate the AA fraction using genome counts
	my $genfreq = &Round($AAgenomecount{$a}/$TOTAL);
	# compare on screen the mean protein freqs against the genome freqs
	print "$a:  Protein calc > $aameanfreq{$a}  ==  $genfreq <= Genome calc\n";
}
}}}
!
!!!
[[BACK to Working Code|CodeWorks]]
!!!
!Count Amino Acids:
Here's the little subroutine to count AA's generated for the First homework assignment: @@[[AAcount]]@@
The idea is that we are going to use two arrays to hold the count data. One array {{{%AAprotcount}}} will hold the AA counts for each individual protein. The other array {{{%AAgenomecount}}} will hold the AA counts for the entire genome. Both arrays will be hashes as described in @@[[AAcount2]]@@.

''NOTE:''  This subroutine requires that the %PRTs hash array contain the amino acid sequence data and header information generated by the [[Translate FASTA|TransFasta]] subroutine. Also, the print statements shown in [[AAcount2]] have been removed for clarity. 

{{{
#  Define these global array hashes to hold the raw AA counts
        my %AAprotcount; 
        my %AAgenomecount;

#  Call subroutine:
        &AAfreq;

# Subroutine Code:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub AAfreq
{
	foreach my $name (keys %PRTs)
	{  # split the protein sequence into an array of AA characters
		my @aminoacids = split(//,$PRTs{$name});   
		foreach my $aa (@aminoacids) 
		{	# make sure we are only counting AAs
			if ($aa =~ m/[ACDEFGHIKLMNPQRSTVWY]/)     
			{	$AAprotcount{$name}{$aa} += 1;   
				$AAgenomecount{$aa} += 1;
			}
		}
	}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
}}}
[[BACK to AA Count assignment|AAcount]]
!!!
The MEAN AA frequency for each amino acid calculated across all proteins is not exactly equal to average AA frequency calculated just by dividing the bulk sum of each AA (in %AAgenomecount) by the total AAs in the genome. The numbers are close, but because of the distribution of AAfreqs among proteins are NOT normal distributions, the calcs are not equivalent. 

So here's a spreadsheet of some test data I worked up when frustrated that I couldn't get the calcs to work. This is a hypothetical genome with 10 proteins (p1 to p10) and 3 amino acids. The larger the difference in AA representation across a protein, the larger the difference in the mean calculations between Genome AA% and Protein AA%. 
<html><img src="04/AAfreqTest.png" style="height:500px"></html>

[[BACK to Lecture 6|L06.01]]
!!!
!Simple Script to Screen Proteins
The ~HW2 assignment (AAFCstart) generates a large table of the AA frequencies on ALL proteins in the ~TAIR8 proteome. This script opens that file and executes a simple filter to produce an output file that is a subset of that large ALL file.
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# 08OCT lecture 6.
# Script to filter the results of the AAFreqCalc script

# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $infile = "Arabidopsis-TAIR8-NT-cd95-AAfreqs.txt";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @FILE;
my @AA = qw |A C D E F G H I K L M N P Q R S T V W Y X |;
# Grab the file name w/o the period or extension
	$infile =~ m/^(.*)-AAfreqs\./; # match from beginning to period
	my $fileroot = $1;     # set $fileroot to pattern match $1

# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nMy footsteps are ticking\nLike water dripping from a tree\nWalking a harline\nAnd stepping very carefully. . . . \n\n";

# 1. Open the AA Freq file . . . . . . 
	open(DATA,"<$infile") or die "\n\n\n Nada $infile\n\n\n";
	@FILE=<DATA>;
	close(DATA);

# 2. Filter proteins and output subset . . . . . 
		my $outfile = $fileroot . "-Filter1008.txt";  # make outfile name
		open(OUT,">$outfile");
		pop(@AA);                     # Remove the "X" amino acid from the array
	# Set file header . . . . 
		print OUT "NAME";
		foreach my $a (@AA){ print OUT "\t$a";}
		print OUT "\n";
	# Pint the AAfreqs to file . . . . .
		my $count = 0;
		foreach my $protein (@FILE)
		{	
			# - - - - - - - - - - - - -  - - - - 
			# HERE'S THE FILTER LOOP . . . . . . 
			if ($protein =~ m/NBS/)
			{
				print OUT "$protein\n";
				$count += 1;
			}
			# HERE'S THE FILTER LOOP . . . . . . 
			# - - - - - - - - - - - - -  - - - - 
		}
		close(OUT);
		print "There are $count proteins in \"$outfile\" \n"; 

print "\n\n\n   DONE   \n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
}}}
[[BACK to AA Freq Count assignment|AAFChint1]]
!!!
!Working Code for HW#2
Here's the script to generate the AA freq table for each protein in the Arabidopsis TAIR8 proteome.
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# 24SEP Lecture 4.
# Script built in class to demo the subroutines for AAcount

# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $infile = "Arabidopsis-TAIR8-NT-cd95.ffn";
my $codontable = "Standard";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @FILE;
my %NTs; 
my %CodonTable;
my %PRTs;
my %AAprotcount; 
my %AAgenomecount;
my %AAfreq;
my @AA = qw |A C D E F G H I K L M N P Q R S T V W Y X |;
# Grab the file name w/o the period or extension
	$infile =~ m/^(.*)\./; # match from beginning to period
	my $fileroot = $1;     # set $fileroot to pattern match $1

# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\n  Drove 100 miles today, never even left LA . . . . \n\n";

	&ReadFasta($infile);
	&LoadCodonTable($codontable);
	&TranslateFasta;
	&AAfreq;
	
	my $outfile = $fileroot . "-AAfreqs.txt";  # make outfile name
	&FileDump($outfile);

print "\n\n\n   DONE   \n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		$NTs{$name} = $seq;
	}
    $/="\n"; # reset input break back to default
}
# ---------------------------------------------------------
sub LoadCodonTable
{
	$/=">";
	my $Table = shift(@_);
	my @TABLE = <DATA>;
	foreach my $j (@TABLE)
	{	if ($j =~ m/^ (\d){1,2} $Table/)
		{	my @k = split(/\n/,$j);
			$k[1] =~ s/Amino  //;
			foreach my $i (1..3)
			{	$k[$i+1] =~ s/Base$i  //; }
			my @AA = split(//,$k[1]);
			my @B1 = split(//,$k[2]);
			my @B2 = split(//,$k[3]);
			my @B3 = split(//,$k[4]);
			foreach my $i (0..63)
			{	$CodonTable{$B1[$i].$B2[$i].$B3[$i]} = $AA[$i]; }
		}
	}
	# foreach my $nnn (keys %CodonTable)
	# {  print "$nnn = $CodonTable{$nnn}\n";}
	$/="\n";   # reset back to default before leaving subroutine
}

# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TranslateFasta
{	# Convert the NT sequence into AAs . . . . . .
	foreach my $header (keys %NTs)
	{	my $protein = "";           
		for (my $i=0; $i <= length($NTs{$header})-2; $i += 3)  
		{	my $codon = substr($NTs{$header},$i,3);      
			my $aa = $CodonTable{$codon};
			# Now check to see if $aa is defined within the codon table
			if (!$aa)    # unless ($aa)                          
			{  $aa = "X";}
			
			$protein .= $aa;
		}
		$PRTs{$header} = $protein;
	}
}

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub AAfreq
{   my $protcount = 0;
	foreach my $name (keys %PRTs)
	{   # Arabidopsis: process $name to reduce complexity . . . . 
			my @n = split(/\|/,$name);
			my $newname = $n[0] . $n[2];
		foreach my $a (@AA)
		{	$AAprotcount{$name}{$a} = 0; $AAfreq{$newname}{$a} = 0; }
		my @aminoacids = split(//,$PRTs{$name});
		my $AAcount = 0;
		foreach my $aa (@aminoacids) 
		{	# make sure we are only counting AAs
			if ($aa =~ m/[ACDEFGHIKLMNPQRSTVWYX]/)     
			{	$AAprotcount{$name}{$aa} += 1;   
				$AAgenomecount{$aa} += 1;
				$AAcount += 1;
			}
		}
		# Only consider proteins > 60 AAs in length . . . . 
		if ($AAcount > 60)
		{	foreach my $aa1 (@AA)
			{	$AAfreq{$newname}{$aa1} = 
				&Round(100*$AAprotcount{$name}{$aa1}/$AAcount);
			}
			$protcount += 1;
		}
	}
	print "There are $protcount proteins in the current AAfreq analysis\n\n";
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Round
{ 	my $x = @_[0];
	my $x = (int(($x*10**3) + 0.5)/10**3);
	return $x;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub FileDump
{   my $file = $_[0];
	open(OUT,">$file");
	pop(@AA);                     # Remove the "X" amino acid from the array
	# Set file header . . . . 
	print OUT "NAME";
	foreach my $a (@AA){ print OUT "\t$a";}
	print OUT "\n";
# Pint the AAfreqs to file . . . . . 
	foreach my $name (keys %AAfreq)
	{	print OUT "$name";
		foreach my $a (@AA){ print OUT "\t$AAfreq{$name}{$a}"; }
		print OUT "\n";
	}
	close(OUT);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
__END__
> 0 Codon Translation Tables
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1
> 1 Standard
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
> 11 Bacteria and Archea
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
}}}
<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='AAuse' macro='tiddler AAuseSubtopicMenu'></div><div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div><div class='tagClear'></div>
<!--}}}-->
!!!
[[Back to Codon Table Discussion|Glenn01]]
!!!
Here are the nucleotide codes used for ambiguous nucleotide calls in a sequence:
<html><img src="00/ambiguousNTcodes.png" style="height:400px"></html>
/***
|''Name:''|AnnotationsPlugin|
|''Description:''|Inline annotations for tiddler text.|
|''Author:''|Saq Imtiaz ( lewcid@gmail.com )|
|''Source:''|http://tw.lewcid.org/#AnnotationsPlugin|
|''Code Repository:''|http://tw.lewcid.org/svn/plugins|
|''Version:''|2.0|
|''Date:''||
|''License:''|[[Creative Commons Attribution-ShareAlike 3.0 License|http://creativecommons.org/licenses/by-sa/3.0/]]|
|''~CoreVersion:''|2.2.3|

!!Usage:
*{{{((text to annotate(annotation goes here)}}}
* To include the text being annotated, in the popup as a title, put {{{^}}} as the first letter of the annotation text.
** {{{((text to annotate(^annotation goes here)}}}

!!Examples:
Mouse over, the text below:
* ((banana(the best fruit in the world)))
* ((banana(^ the best fruit in the world)))

***/
// /%
config.formatters.unshift({name:"annotations",match:"\\(\\(",lookaheadRegExp:/\(\((.*?)\((\^?)((?:.|\n)*?)\)\)\)/g,handler:function(w){
this.lookaheadRegExp.lastIndex=w.matchStart;
var _2=this.lookaheadRegExp.exec(w.source);
if(_2&&_2.index==w.matchStart){
var _3=createTiddlyElement(w.output,"span",null,"annosub",_2[1]);
_3.anno=_2[3];
if(_2[2]){
_3.subject=_2[1];
}
_3.onmouseover=this.onmouseover;
_3.onmouseout=this.onmouseout;
_3.ondblclick=this.onmouseout;
w.nextMatch=_2.index+_2[0].length;
}
},onmouseover:function(e){
popup=createTiddlyElement(document.body,"div",null,"anno");
this.popup=popup;
if(this.subject){
wikify("!"+this.subject+"\n",popup);
}
wikify(this.anno,popup);
addClass(this,"annosubover");
Popup.place(this,popup,{x:25,y:7});
},onmouseout:function(e){
removeNode(this.popup);
this.popup=null;
removeClass(this,"annosubover");
}});
setStylesheet(".anno{position:absolute;border:2px solid #000;background-color:#DFDFFF; color:#000;padding:0.5em;max-width:15em;width:expression(document.body.clientWidth > (255/12) *parseInt(document.body.currentStyle.fontSize)?'15em':'auto' );}\n"+".anno h1, .anno h2{margin-top:0;color:#000;}\n"+".annosub{background:#ccc;}\n"+".annosubover{z-index:25; background-color:#DFDFFF;cursor:help;}\n","AnnotationStyles");


// %/
!Passing Arrays to Subroutines by Name Reference
In a standard call to a subroutine, you can simple "send" a variable to it so that it can be used in the subroutine code, like this:
{{{
my $x = 10;
my $y = 15;
my $Sum = &AddNumbers($x,$y);

# - - - - - -  - - -
sub AddNumbers
{
   my ($N1,$N2) = @_;
   my $SUM = $N1 + $N2;
   return($SUM);
}
}}}
!!Array Referencing:
But often you will have an array of numbers that you would like a subroutine to process, and so there is a way to pass the "variable name" of an array (rather than all the elements) by what is called ''Referencing''. In the example below, line 1 defines array ''@X''. Line 2 then calls subroutine ''~AddNumbers'' and sends to it the name of the variable array (using ''\@X'') instead of sending it all the elements separately. When the subroutine starts, it assigns the name of the array to the variable ''$arrayname'', and then we can access ''@X'' using the ''$arrayname'' name by enclosing it within brackets: @@{{{@{$arrayname} }}}@@. The advantages to array referencing are:  faster processing when subroutines are working with arrays; the ability for manipulations within a subroutine to change the original array;  allows subroutines to be more transportable/independent because they access only their internally defined variables.
{{{
  my @X = (10, 25, 56, 74, 82, 96, 105, 123, 173);
  my $SUM = &AddNumbers(\@X);

# - - - - - - - -  
sub AddNumbers
{
    my $arrayname = $_[0];
    my $sum = 0;
    foreach my $num (@{$arrayname})
    {     $sum += $num;    }
    return($sum);
}
}}}

This also works for hashes as well (see script LODscore). Just remember that inside the subroutine the hash name now needs to be enclosed in ''{...}''. In this example, the name of the hash in the subroutine is stored in the variable ''$Pvalues'' so we can get to the hash elements by using: @@{{{ ${$Pvalues}{$nt} }}}@@.
{{{
my $Pcod = &Pcalc($query,\%Pcode);

#-----------------
sub Pcalc
{	my ($QS,$Pvalues) = @_;
	my @QS = split(//,$QS);
	my $prob = 1; 
	foreach my $nt (@QS)
	{	$prob = $prob * ${$Pvalues}{$nt}; }
	return $prob;
}
}}}
If you needed a foreach loop to iterate through the hash values, it would look like this within the subroutine block:
{{{
    # within subroutine . . . . .
    foreach my $x (keys %{$Pvalues})
    {        ... code to work with ${$Pvalues}{$x} ..... ; }
}}}


!
[[BACK to BLAST project page|BLASTproject]]
!!!
!Evalue and BIT analysis:
Here are the compiled evalues and bit scores for the 10-gene trial data set for //A. marina//:
<html><table><tr>
<td><img src="11/AmrinaNULLplot.png" style="height:300px"></td>
<td><img src="11/AmrinaNULLplot-log.png" style="height:300px"></td>
</tr></table></html>
!!!
Here are the frequency distributions for each variable. Small sample size is too small to draw any real conclusions. 
<html><table><tr>
<td><img src="11/AmrinaNULLplot-BitDist.png" style="height:300px"></td>
<td><img src="11/AmrinaNULLplot-EvalueDist.png" style="height:300px"></td>
</tr></table></html>
!!!
Another way to think about it is that you want to generate a distribution for these values that could be represented as a boxplot . . . . what are outliers?
<html>
<img src="11/AmrinaNULLplot-BoxPlotDist.png" style="height:400px"></html>
!
[[Back to Lecture 10|L10]]
!!!
!BLAST Database Construction
A companion program to BLAST is the database formatting utility called: ''formatdb''. This program allows one to make custom ~BLAST-compatible sequence databases (either NT or AA) for use with the BLASTALL packages. This utility reads standard FASTA formatted files and extracts and compresses the sequence data into a new file.

''1.'' Upload this FASTA file to your SANDBOX folder: [[download|10/Acaryochloris-marina-PID12997-cd95.faa]]

''2.'' Upload this query sequence file to you SANDBOX folder: [[download|10/UnknownSeqs-L10.faa]]

''3.'' We can create a local database of the AA sequences from //Acaryochloris// using just two simple options:
{{engindent{A.  We specify the name of the input FASTA file with ''"-i"''}}}{{engindent{B.  We specify the name of the output database with ''"-n"''}}}{{{
prompt> formatdb -i Acaryochloris-marina-PID12997-cd95.faa -n myDB
}}}

''4.'' Now we can run a BLAST query using ''__myDB__'' with:
{{{
blastall -p blastp -d myDB -i $inseq -o $outfile -e 1e-3 -m $j -v 20 -b 20
}}}
!
[[BACK to BLAST project page|BLASTproject]]
!!!
!BLAST Parser for E-value and BIT scores
This script was first discussed in Lecture 10 (L10) and presented here: BLASTparse_script.
{{{
#!/usr/bin/perl
use strict;
$|=1;

# - - - - - H E A D E R - - - - - - - - - - - - - - - - - - -
# BLAST parser to extract e-values from NULL distribution query
# AGM2008

# - - - - - U S E R   V A R I A B L E S - - - - - - - - - - -
my $infile  = "AmarinaNULL-1000.txt";
my $outfile = "AmarinaNULL-1000-data.txt";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @BLAST;
my @EVAL;
my @BITS;

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nIt's all down hill from here . . . \n\n";

&ReadBlast($infile);

my $orfcount = 1;
foreach my $blast (@BLAST)
{	my @lines = split(/\n/,$blast);
	# Drop all the header lines before the target data . . . . 
	my $start = 0;
	my $skip = 0;
	while ($start == 0)
	{	if ($lines[0] =~ m/^Query/)
		{	print "$orfcount. $lines[0]\n"; $orfcount += 1; }
		
		if ($lines[0] =~ m/^Sequences producing significant alignments/)
		{	$start = 1; }
		elsif ($lines[0] =~ m/No hits found/)
		{	$start = 1; $skip = 1; }
		else
		{	shift(@lines); }
	}
	
	if ($skip == 0)
	{	# Cherry pick the bit and e-value . . . .
		my $count = 2;
		while ($count < 11)
		{	chomp($lines[$count]);
			if ($lines[$count] =~ m/(\d+)   ([\.\d]+).{0,3}$/)
			{	push(@BITS, $1); #print "       >$1<\n";
				push(@EVAL, $2); #print "       >$2<\n";
				$count += 1;
			}	
			else
			{	$count = 12; }
		}
	}
}

my $count = $#BITS + 1;
print "\n\nThere are $count values in the bit and evalue arrays\n";

open(OUT,">$outfile");
print "\n\nData writing to $outfile . . . . ";
print OUT "BITS\tEVALS\n";
foreach my $i (0..$#BITS)
{	print OUT "$BITS[$i]\t$EVAL[$i]\n"; }
close(OUT);

print "\n\n\n D O N E  \n\n\n";

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadBlast
{	my $file = shift;
	$/="BLASTP 2.2.18 [Mar-02-2008]";
	open(IN,"<$file");
	@BLAST = <IN>;
	close(IN);
	shift(@BLAST);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -


# ---------------------------------------------------------
# - - - - - EOF - - - - - - - - - - - - - - - - - - - - - - 
# ---------------------------------------------------------

}}}

!
!Parsing BLAST results
BLAST output formats are rigidly predictable (computer generated). This makes them ideal text files for scripted parsing.

# Run the BLAST runner script from Lecture 9 (L09) on your Biowolf account to generate some BLAST output files. The basic difference in the output options of the 10 formats relates to how much information is included for each individual local alignment and how those alignments are formatted. 
# We will write a script to retrieve the top 10 alignment scores (both bit values and e-values) using the "0" format.
# Parsing script from class: [[BLASTparse_script]]

[[download BLAST output file|10/BLASTout-12NOV-0.txt]]
[[BACK to Lecture 10|L10]]  
[[Back to Lecture 10|BLASTparse]]
[[Back to BLAST project|BLASTproject]]
!!!
!BLAST Parser
The simple BLAST parser we worked on in class was coded to extract the bit and e-value data for the top 10 BLAST hits. ALthough the script is simple, the real "extraction" work is done by a complex regex expression:
@@{{{$B[$i] =~ m/ref\|(.+)\|.+\s+(\d+)\s+(.+)$/;}}}@@
This is what a target line in the BLAST table looks like:
{{{ref|YP_463218.1| DNA replication and repair protein RecF [Anaero...   250   3e-65}}}
...so the regex expression is designed to match the line so that the following quantities are matched within the parentheses and saved to memory: gene ID number (YP_463218.1), bit score (250), e-value (3e-65).

Here's the script we worked on during lecture:
{{{
#!/usr/bin/perl
use strict;
$|=1;      # forces print output to be sent to screen in real-time

# - - - - - H E A D E R - - - - - - - - - - - - - - - - -
# Parse BLAST results for bit and e-values.
# - - - - - U S E R    V A R I A B L E S - - - - - - - -
my $blastin = "BLASTout-12NOV-0.txt";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my %BITS;
my %EVS;

print "\n\nWe all live in a yellow submarine . . . . \n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# Open the blast output file . . . . . . 
	open(BLAST,"<$blastin") or die "\n\n Nada $blastin\n\n";
	my @B = <BLAST>;
	close(BLAST);

# Drop all the header lines before the target data . . . . 
	my $start = 0;
	while ($start == 0)
	{	if ($B[0] =~ m/^Sequences producing significant alignments/)
		{	$start = 1; }
		else
		{	shift(@B); }
	}
	
# Cherry pick the bit and e-value . . . .
	foreach my $i (2..11)
	{	$B[$i] =~ m/ref\|(.+)\|.+\s+(\d+)\s+(.+)$/;
		$BITS{$1} = $2;
		$EVS{$1} = $3;
	}

# Quick print check . . . . . 	
	foreach my $id (sort keys %BITS)
	{	print " $id  bits=$BITS{$id}   eval=$EVS{$id}\n"; }


print "\n\n    * * * D O N E * * *\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub wait
{	print "\n    waiting for Godot . . . \n";
	my $wait = <STDIN>;
	return;
}
# ---------------------------------------------------------
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
}}}


!
!BLAST review in Genome Biology
<html><img src="10/genomeresearchtitle.png" style="height:175px"></html>
[[download pdf|10/BLAST-GenomeBiology.pdf]]
[[BACK to Lecture 10|L10]]
!
!BLAST Code Project

''1. Create a local BLAST database for your genome''
From the BLASTdb example, we can create a local database of the AA sequences from the genome file of //Acaryochloris// ([[download|10/Acaryochloris-marina-PID12997-cd95.faa]]) using the FORMATDB command and just two simple options:
{{engindent{a.  We specify the name of the input FASTA file with ''"-i"''}}}{{engindent{b.  We specify the name of the output database with ''"-n"''}}}{{{
prompt> formatdb -i Acaryochloris-marina-PID12997-cd95.faa -n myDB
}}}

@@''TASK 1:''@@  
__//Create a local database structure as above for: ~Acaryochloris-marina-PID12997-cd95.faa  . . . . OR  . . . . choose a genome of your choice.//__

!!!
''2. Generate random peptide of length N given the %GC and AA table of target genome''
The script to generate the random protein sequences is posted here: ProteinRandomizer. You will need to copy it to your biowolf SANDBOX folder. The nucleotide sequence composition for the //Acaryochloris// genome is:
| A = 0.257, G = 0.243, T = 0.260, C = 0.250 |

@@''TASK 2:''@@ 
//Run the randomizer script to generate a FASTA file with 100 random protein sequences in it.// 
!!!

''3. BLAST each peptide against the genome database''
Now run a BLAST job with that protein FASTA file against the local database you've created. Set the -e option fairly high to get rough matches. Use the default output format of "0".
{{{
blastall -p blastp -d myDB -i $inseq -o $outfile -e 1 -m 0
}}}
Here's an example shell script to use for submitting a single blast job to the biowolf cluster: BLASTshell

@@''TASK 3:''@@ 
//Run the BLAST search/alignment.// 
!!!

''4. Compile the //bit// and //evalue// returns''
In Lecture 10 (L10) we went over a BLAST script to retreive bit and evalues from blast output. Here, that script is further developed to work iteratively on a BLAST output file that has multiple query sequences: BLASTgrabber
@@''TASK 4:''@@ 
//Retrieve and compile the bit and evalue scores for the random alignments.// 
!!!

''5. Plot data, frequency distribution of bits and evalues''
BLAST data analysis . . . . BLASTdata1
@@''TASK 5:''@@ 
//Discovery something novel about genome organization . . . . // 
!
BLAST stats

''1. Objective:''  for any given genome, we want to know the probability that a random AA sequence (of fixed N-mer length) will match a local gene domain within that genome. 

''2. Approach:''
** Create a local BLAST database for your genome
** Generate random peptide of length N given the %GC and AA table of target genome
*** //This will be provided to you//
** BLAST each peptide against the genome database
** Compile the //bit// and //evalue// returns
** Plot data, frequency distribution of bits and evalues as a function of N-mer length

''3. Description:''
** Go to the BLASTproject page

[[Back to Lecture 10|L10]]
!
[[Back to Lecture 9|L09.01]]
!!!
!BLAST Runner
Batch script to send multiple BLAST jobs to the Biowolf SGE.
{{{
#!/usr/bin/perl
use strict;
$|=1;

# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# Submit batch BLAST queries.

# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $inseq = "UnknownSeqs.faa";
my $datetag = "12NOV";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my $blastshell = "#!/bin/sh
#\$ -cwd
#\$ -S /bin/sh
#\$ -j y
#\$ -pe threaded 4
#\$ -m bae
#\$ -M amarsh\@udel.edu
#\$ -N ";

# - - - - - M A I N - - - - - - - - - - - - - - - - - -
print "\n\nSTART: Imagine all the people, living for today . . .  \n\n";

foreach my $j (0..9)
{	my $RUNname = "Bpuke-".$j;
	my $outfile = "BLASTout-".$datetag."-".$j.".txt";
	open(OUT,">BLASTshell.sh");
	print OUT $blastshell;
	print OUT "$RUNname\n\n";
	print OUT "blastall -p blastp -d nr -i $inseq -o $outfile -e 1e-3 -m $j -v 20 -b 20\n";
	close(OUT);
	
	print `qsub BLASTshell.sh`;
}

# - - - - - S U B R O U T I N E S - - - - - - - - - - -

# - - - - - EOF - - - - - - - - - - - - - - - - - - -
}}}

!
[[BACK to BLAST project|BLASTproject]]
!!!
!BLAST shell script:

{{{
#!/bin/sh
#$ -cwd
#$ -S /bin/sh
#$ -j y
#$ -pe mpi 4
#$ -M username@udel.edu
#$ -m bae
#$ -N Bnull

blastall \
-p blastp \
-d Amarina_DB \
-i RandomSeqs-02DEC.txt \
-o AmarinaNULL-02DEC.txt \
-e 10 \
-m 0
}}}
!
!Bioinformatics Core at DBI:
''Biowolf'' is the 286-core parallel computing cluster maintained at the Delaware Biotechnology Institute. Class accounts will be assigned to any participants interested in learning how to utilize this resource.

When you are ready to start using the SUN GRID ENGINE to submit jobs to run on Biowolf, here's a detailed overview of the system provided by Doug O'Neal: 
''A Hitchhiker's Guide to Biowolf:'' http://bioit.dbi.udel.edu/howto/sge

!!!Getting STARTED with your new accout:
# Your "username" and password were emailed to you. They are the first two entries in the text line. 
** Your username is "class" followed by two digits, like "class05"
** Your password is the random 6 character string of letters and numbers
# Make an __[[SSH]]__ connection in a command window
** @@{{{ ssh username@biowolf.dbi.udel.edu }}}@@
** You will be prompted for your initial password
# Change your password:
** The command prompt will look like: @@{{{classXX@biowolf ~$ }}}@@
** Type the command ''passwd'', hit <enter>, follow the queries. 

Now add some default folders to your home directory as follows:
{{{
    prompt> mkdir 01-DATA 
    prompt> mkdir 02-SCRIPTS 
    prompt> mkdir 03-SANDBOX
}}}
* ''01-DATA''
**  You will use this folder to store data files
* ''02-SCRIPTS''
** You will use this folder to store scripts
* ''03-SANDBOX''
** You will use this folder as your working folder for editing and running scripts
** When you are done with an analysis, you will copy the current version of your script back into "02-SCRIPTS" and move any of the data output generated into an appropriate folder in "01-DATA". Then you will delete whatever is left in "03-SANDBOX" to leave it clean for your next code session.
!!!FILE Transfer
It is important that in addition to the command window you have established above, you are also able to use SSH to move files between your Biowolf account and your own computer. 
| TASK 1: Copy all the scripts you have in this class to your "02-SCRIPTS" folder. |
This is accomplished with different interface programs for each OS platform. But basically you want a GUI interface showing you a current folder on your computer and a current folder on Biowolf so you can easily navigate between folders and copy files with a simple drag-drop mouse action. 
* On PCs the GUI you want to run is "SSH-client" (I think), which you can get from UD Network: [[SSH]].
* On Macs, you want the MacFuse plugin for SSH: see [[SSH]].
* On Unix/Linux, I use KDE's Knoqueror that allows for {{{fish://user@biowolf.dbi.udel.edu}}} connections. 

!!!Basic UNIX commands
Work through this simple @@[[UNIX TUTORIAL|http://www.ee.surrey.ac.uk/Teaching/Unix/unix1.html]]@@ to get an introduction to command window controls.
These are the basic commands you will need to use:
{{{
commands:
      ls  = list files aka "dir"
      rm = remove file aka "delete"
      cd .. = change directory up one level
      cd foldername = change directory down one level to foldername
      mkdir foldername = make directory foldername
}}}

!!!
Here's a skeleton outline for your PERL scripts:
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - - - -

# - - - - - U S E R    V A R I A B L E S - - - - - - - -

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# - - - - - EOF - - - - - - - - - - - - - - - - - - - - - -
}}}

# SETUP: The file has to start with a "shebang" line. This line tells the OS what interpreter to use for executing the following script commands. Here, "/usr/bin/perl" designates the name and file location of the binary executable file. Then the modules or packages to be included during the run are listed. Here the "strict" package is invoked to provide better error recognition and reporting from the perl interpreter.
# HEADER: describe what the script actually does.
# USER VARS: whatever the user has to do to easily run the program should be put here.
# GLOBAL VARS: declare the most important variables that will be used up front.
# MAIN: this is the actual program.
# SUBROUTINES: functions that are built-in to the script.
#EOF: just flags the end of the file.
!
[[Go Back to Lecture 01|L01.02]]
[[Go Back to Lecture 02|L02]]
!
/***
|Name|CheckboxPlugin|
|Source|http://www.TiddlyTools.com/#CheckboxPlugin|
|Version|2.2.4|
|Author|Eric Shulman - ELS Design Studios|
|License|http://www.TiddlyTools.com/#LegalStatements <<br>>and [[Creative Commons Attribution-ShareAlike 2.5 License|http://creativecommons.org/licenses/by-sa/2.5/]]|
|~CoreVersion|2.1|
|Type|plugin|
|Requires||
|Overrides||
|Description|Add checkboxes to your tiddler content|
This plugin extends the TiddlyWiki syntax to allow definition of checkboxes that can be embedded directly in tiddler content.  Checkbox states are preserved by either:
* automatically modifying the tiddler content (deprecated)
* or, by setting/removing tags on specified tiddlers,
* or, by setting custom field values on specified tiddlers,
* or, by saving to a locally-stored cookie ID.
When an ID is assigned to the checkbox, it enables direct programmatic access to the checkbox DOM element, as well as creating an entry in TiddlyWiki's config.options[ID] internal data.  In addition to tracking the checkbox state, you can also specify custom javascript for programmatic initialization and onClick event handling for any checkbox, so you can provide specialized side-effects in response to state changes.
!!!!! Inline wiki-syntax usage
<<<
//{{{
[ ]or[_] and [x]or[X]
//}}}
Simple checkboxes using 'Inline X' storage.  The current unchecked/checked state is indicated by the character between the {{{[}}} and {{{]}}} brackets ("_" means unchecked, "X" means checked).  When you click on a checkbox, the current state is retained by directly modifying the tiddler content to place the corresponding "_" or "X" character in between the brackets.
>//''NOTE: 'Inline X' syntax has been deprecated...''  This storage format only works properly for checkboxes that are directly embedded and accessed from content in a single tiddler.  However, if that tiddler is 'transcluded' into another (by using the {{{<<tiddler TiddlerName>>}}} macro), the 'Inline X' will be ''erroneously stored in the containing tiddler's source content, resulting in corrupted content in that tiddler.''  For anything but the most simple of "to do list" uses, you should select from the various alternative storage methods described below...//
//{{{
[x=id]
//}}}
Assign an optional ID to the checkbox so you can use {{{document.getElementByID("id")}}} to manipulate the checkbox DOM element, as well as tracking the current checkbox state in {{{config.options["id"]}}}.  If the ID starts with "chk" the checkbox state will also be saved in a cookie, so it can be automatically restored whenever the checkbox is re-rendered (overrides any default {{{[x]}}} or {{{[_]}}} value).  If a cookie value is kept, the "_" or "X" character in the tiddler content remains unchanged, and is only applied as the default when a cookie-based value is not currently defined.
//{{{
[x(title|tag)] or [x(title:tag)]
//}}}
Initializes and tracks the current checkbox state by setting or removing a particular tag value from a specified tiddler.  If you omit the tiddler title (and the | or : separator), the specified tag is assigned to the current tiddler.  If you omit the tag value, as in {{{(title|)}}}, the default tag, {{{checked}}}, is assumed.  Omitting both the title and tag, {{{()}}}, tracks the checkbox state by setting the "checked" tag on the current tiddler.  When tag tracking is used, the "_" or "X" character in the tiddler content remains unchanged, and is not used to set or track the checkbox state.  If a tiddler title named in the tag does not exist, the checkbox state defaults to the "inline X" value.  If this value is //checked//, or is subsequently changed to //checked//, it will automatically create the missing tiddler and then add the tag to it.  //''NOTE: beginning with version 2.1.2 of this plugin, the "|" separator is the preferred separator between the title and tag name, as it avoids syntactic ambiguity when ":" is used within tiddler titles or tag names.''//
//{{{
[x(field@tiddler)]
//}}}
Initializes and tracks the current checkbox state by setting a particular custom field value from a specified tiddler.  If you omit the tiddler title (but not the "@" separator), the specified field on the current tiddler is used.  If you omit the field name, as in {{{(@tiddler)}}}, a default fieldname of {{{checked}}} is assumed.  Omitting both the field and the tiddler title, {{{(@)}}}, defaults to setting the "checked" field on the current tiddler.  When field tracking is used, the "_" or "X" character in the tiddler content remains unchanged, and is not used to set or track the checkbox state.  If the tiddler title named in the parameter does not exist, the checkbox state defaults to the "inline X" value.  If this value is //checked// or is subsequently changed to //checked//, it will automatically create the missing tiddler and then add the field to it.
//{{{
[x{javascript}{javascript}]
//}}}
You can define optional javascript code segments to add custom initialization and/or 'onClick' handling to a checkbox.  The current checkbox state (and it's other DOM attributes) can be set or read from within these code segments by reference to the default context-object, 'this'.

The first code segment will be executed when the checkbox is initially displayed, so that you can programmatically determine it's starting checked/unchecked state.  The second code segment (if present) is executed whenever the checkbox is clicked, so that you can perform programmed responses or intercept and override the checkbox state based on complex logic using the TW core API or custom functions defined in plugins (e.g. testing a particular tiddler title to see if certain tags are set or setting some tags when the checkbox is clicked).

Note: if you want to use the default checkbox initialization processing with a custom onclick function, use this syntax: {{{ [x=id{}{javascript}] }}} 
<<<
!!!!! Macro usage
<<<
In addition to embedded checkboxes using the wiki syntax described above, a ''macro-based syntax'' is also provided, for use in templates where wiki syntax cannot be directly used.  This macro syntax can also be used in tiddler content, as an alternative to the wiki syntax.  When embedded in [[PageTemplate]], [[ViewTemplate]], or [[EditTemplate]] (or custom alternative templates), use the following macro syntax:
//{{{
<span macro="checkbox target checked id onInit onClick"></span>
//}}}
or, when embedded in tiddler content, use the following macro syntax:
//{{{
<<checkbox target checked id onInit onClick>>
//}}}
where:
''target''
>is either a tag reference (e.g., ''tagname|tiddlername'') or a field reference (e.g. ''fieldname@tiddlername''), as described above.
''checked'' (optional)
>is a keyword that sets the initial state of the checkbox to "checked".  When omitted, the default checkbox state is "unchecked".
''id'' (optional)
>specifies an internal config.options.* ID, as described above.  If the ID begins with "chk", a cookie-based persistent value will be created to track the checkbox state in between sessions.
''onInit'' (optional)
>contains a javascript event handler to be performed when the checkbox is initially rendered (see details above).
''onClick'' (optional)
>contains a javascript event handler to be performed each time the checkbox is clicked (see details above).
>//note: to use the default onInit handler with a custom onClick handler, use "" (empty quotes) as a placeholder for the onInit parameter//
<<<
!!!!!Examples
<<<
''checked and unchecked static default ("inline X") values:''
//{{{
[X] label
[_] label
//}}}
>[X] label
>[_] label
''document-based value (id='demo', no cookie):''
//{{{
[_=demo] label
//}}}
>[_=demo] label
''cookie-based value  (id='chkDemo'):''
//{{{
[_=chkDemo] label
//}}}
>[_=chkDemo] label
''tag-based value (TogglyTagging):''
//{{{
[_(CheckboxPlugin|demotag)]
[_(CheckboxPlugin|demotag){this.refresh.tagged=this.refresh.container=false}]
//}}}
>[_(CheckboxPlugin|demotag)] toggle 'demotag' (and refresh tiddler display)
>[_(CheckboxPlugin|demotag){this.refresh.tagged=this.refresh.container=false}] toggle 'demotag' (no refresh)
''field-based values:''
//{{{
[_(demofield@CheckboxPlugin)] demofield@CheckboxPlugin
[_(demofield@)] demofield@ (equivalent to demonfield@ current tiddler)
[_(checked@CheckboxPlugin)] checked@CheckboxPlugin
[_(@CheckboxPlugin)] @CheckboxPlugin
[_(@)] @ (equivalent to checked@ current tiddler)
//}}}
>[_(demofield@CheckboxPlugin)] demofield@CheckboxPlugin
>[_(demofield@)] demofield@ (current tiddler)
>[_(checked@CheckboxPlugin)] checked@CheckboxPlugin
>[_(@CheckboxPlugin)] @CheckboxPlugin
>[_(@)] toggle field: @ (defaults to "checked@here")
>click to view current: <<toolbar fields>>
''custom init and onClick functions:''
//{{{
[X{this.checked=true}{alert(this.checked?"on":"off")}] message box with checkbox state
//}}}
>[X{this.checked=true}{alert(this.checked?"on":"off")}] message box with checkbox state
''retrieving option values:''
>config.options['demo']=<script>return config.options['demo']?"true":"false";</script>
>config.options['chkDemo']=<script>return config.options['chkDemo']?"true":"false";</script>
<<<
!!!!!Configuration
<<<
Normally, when a checkbox state is changed, the affected tiddlers are automatically re-rendered, so that any checkbox-dependent dynamic content can be updated.  There are three possible tiddlers to be re-rendered, depending upon where the checkbox is placed, and what kind of storage method it is using.
*''container'': the tiddler in which the checkbox is displayed. (e.g., this tiddler)
*''tagged'': the tiddler that is being tagged (e.g., "~MyTask" when tagging "~MyTask:done")
*''tagging'': the "tag tiddler" (e.g., "~done" when tagging "~MyTask:done")
You can set the default refresh handling for all checkboxes in your document by using the following javascript syntax either in a systemConfig plugin, or as an inline script.  (Substitute true/false values as desired):
{{{config.checkbox.refresh = { tagged:true, tagging:true, container:true };}}}

You can also override these defaults for any given checkbox by using an initialization function to set one or more of the refresh options.  For example:
{{{[_{this.refresh.container=false}]}}}
<<<
!!!!!Installation
<<<
import (or copy/paste) the following tiddlers into your document:
''CheckboxPlugin'' (tagged with <<tag systemConfig>>)
<<<
!!!!!Revision History
<<<
2007.08.06 - 2.2.5 supress automatic refresh of any tiddler that is currently being edited.  Ensures that current tiddler edit sessions are not prematurely discarded (losing any changes).  However, if checkbox changes a tag on a tiddler being edited, update the "tags" input field (if any) so that saving the edited tiddler correctly reflects any changes due to checkbox activity... see refreshEditorTagField().
2007.07.13 - 2.2.4 in handler(), fix srctid reference (was "w.tiddler", should have been "w.tiddler.title").  This fixes broken 'inline X' plus fatal macro error when using PartTiddlerPlugin.  Thanks to cmari for reporting the problem and UdoBorkowski for finding the code error.
2007.06.21 - 2.2.3 suppress automatic refresh of tiddler when using macro-syntax to prevent premature end of tiddler editing session.
2007.06.20 - 2.2.2 fixed handling for 'inline X' when checkboxes are contained in a 'trancluded' tiddler.  Now, regardless of where an inline X checkbox appears, the X will be placed in the originating source tiddler, rather than the tiddler in which the checkbox appears.
2007.06.17 - 2.2.1 Refactored code to add checkbox //macro// syntax for use in templates (e.g., {{{macro="checkbox ..."}}}. Also, code cleanup of existing tag handling.
2007.06.16 - 2.2.0 added support for tracking checkbox states using tiddler fields via "(fieldname@tiddlername)" syntax.
2006.05.04 - 2.1.3 fix use of findContainingTiddler() to check for a non-null return value, so that checkboxes won't crash when used outside of tiddler display context (such as in header, sidebar or mainmenu)
2006.03.11 - 2.1.2 added "|" as delimiter to tag-based storage syntax (e.g. "tiddler|tag") to avoid parsing ambiguity when tiddler titles or tag names contain ":".   Using ":" as a delimiter is still supported but is deprecated in favor of the new "|" usage.  Based on a problem reported by JeffMason.
2006.02.25 - 2.1.0 added configuration options to enable/disable forced refresh of tiddlers when toggling tags
2006.02.23 - 2.0.4 when toggling tags, force refresh of the tiddler containing the checkbox.
2006.02.23 - 2.0.3 when toggling tags, force refresh of the 'tagged tiddler' so that tag-related tiddler content (such as "to-do" lists) can be re-rendered.
2006.02.23 - 2.0.2 when using tag-based storage, allow use [[ and ]] to quote tiddler or tag names that contain spaces:
{{{[x([[Tiddler with spaces]]:[[tag with spaces]])]}}}
2006.01.10 - 2.0.1 when toggling tags, force refresh of the 'tagging tiddler'.  For example, if you toggle the "systemConfig" tag on a plugin, the corresponding "systemConfig" TIDDLER will be automatically refreshed (if currently displayed), so that the 'tagged' list in that tiddler will remain up-to-date.
2006.01.04 - 2.0.0 update for ~TW2.0
2005.12.27 - 1.1.2 Fix lookAhead regExp handling for {{{[x=id]}}}, which had been including the "]" in the extracted ID.  
Added check for "chk" prefix on ID before calling saveOptionCookie()
2005.12.26 - 1.1.2 Corrected use of toUpperCase() in tiddler re-write code when comparing {{{[X]}}} in tiddler content with checkbox state. Fixes a problem where simple checkboxes could be set, but never cleared.
2005.12.26 - 1.1.0 Revise syntax so all optional parameters are included INSIDE the [ and ] brackets.  Backward compatibility with older syntax is supported, so content changes are not required when upgrading to the current version of this plugin.   Based on a suggestion by GeoffSlocock
2005.12.25 - 1.0.0 added support for tracking checkbox state using tags ("TogglyTagging")
Revised version number for official post-beta release.
2005.12.08 - 0.9.3 support separate 'init' and 'onclick' function definitions.
2005.12.08 - 0.9.2 clean up lookahead pattern
2005.12.07 - 0.9.1 only update tiddler source content if checkbox state is actually different.  Eliminates unnecessary tiddler changes (and 'unsaved changes' warnings)
2005.12.07 - 0.9.0 initial BETA release
<<<
!!!!!Credits
<<<
This feature was created by EricShulman from [[ELS Design Studios|http:/www.elsdesign.com]]
<<<
!!!!!Code
***/
//{{{
version.extensions.CheckboxPlugin = {major: 2, minor: 2, revision:5 , date: new Date(2007,8,6)};
//}}}

//{{{
config.checkbox = { refresh: { tagged:true, tagging:true, container:true } };
config.formatters.push( {
	name: "checkbox",
	match: "\\[[xX_ ][\\]\\=\\(\\{]",
	lookahead: "\\[([xX_ ])(=[^\\s\\(\\]{]+)?(\\([^\\)]*\\))?({[^}]*})?({[^}]*})?\\]",
	handler: function(w) {
		var lookaheadRegExp = new RegExp(this.lookahead,"mg");
		lookaheadRegExp.lastIndex = w.matchStart;
		var lookaheadMatch = lookaheadRegExp.exec(w.source)
		if(lookaheadMatch && lookaheadMatch.index == w.matchStart) {
			// get params
			var checked=(lookaheadMatch[1].toUpperCase()=="X");
			var id=lookaheadMatch[2];
			var target=lookaheadMatch[3];
			if (target) target=target.substr(1,target.length-2).trim(); // trim off parentheses
			var fn_init=lookaheadMatch[4];
			var fn_click=lookaheadMatch[5];
			var tid=story.findContainingTiddler(w.output);  if (tid) tid=tid.getAttribute("tiddler");
			var srctid=w.tiddler?w.tiddler.title:null;
			config.macros.checkbox.create(w.output,tid,srctid,w.matchStart+1,checked,id,target,config.checkbox.refresh,fn_init,fn_click);
			w.nextMatch = lookaheadMatch.index + lookaheadMatch[0].length;
		}
	}
} );
config.macros.checkbox = {
	handler: function(place,macroName,params,wikifier,paramString,tiddler) {
		if(!(tiddler instanceof Tiddler)) { // if no tiddler passed in try to find one
			var here=story.findContainingTiddler(place);
			if (here) tiddler=store.getTiddler(here.getAttribute("tiddler"))
		}
		var srcpos=0; // "inline X" not applicable to macro syntax
		var target=params.shift(); if (!target) target="";
		var defaultState=params[0]=="checked"; if (defaultState) params.shift();
		var id=params.shift(); if (id && !id.length) id=null;
		var fn_init=params.shift(); if (fn_init && !fn_init.length) fn_init=null;
		var fn_click=params.shift(); if (fn_click && !fn_click.length) fn_click=null;
		var refresh={ tagged:true, tagging:true, container:false };
		this.create(place,tiddler.title,tiddler.title,0,defaultState,id,target,refresh,fn_init,fn_click);
	},
	create: function(place,tid,srctid,srcpos,defaultState,id,target,refresh,fn_init,fn_click) {
		// create checkbox element
		var c = document.createElement("input");
		c.setAttribute("type","checkbox");
		c.onclick=this.onClickCheckbox;
		c.srctid=srctid; // remember source tiddler
		c.srcpos=srcpos; // remember location of "X"
		c.container=tid; // containing tiddler (may be null if not in a tiddler)
		c.tiddler=tid; // default target tiddler 
		c.refresh = {};
		c.refresh.container = refresh.container;
		c.refresh.tagged = refresh.tagged;
		c.refresh.tagging = refresh.tagging;
		place.appendChild(c);
		// set default state
		c.checked=defaultState;
		// track state in config.options.ID
		if (id) {
			c.id=id.substr(1); // trim off leading "="
			if (config.options[c.id]!=undefined)
				c.checked=config.options[c.id];
			else
				config.options[c.id]=c.checked;
		}
		// track state in (tiddlername|tagname) or (fieldname@tiddlername)
		if (target) {
			var pos=target.indexOf("@");
			if (pos!=-1) {
				c.field=pos?target.substr(0,pos):"checked"; // get fieldname (or use default "checked")
				c.tiddler=target.substr(pos+1); // get specified tiddler name (if any)
				if (!c.tiddler || !c.tiddler.length) c.tiddler=tid; // if tiddler not specified, default == container
				if (store.getValue(c.tiddler,c.field)!=undefined)
					c.checked=(store.getValue(c.tiddler,c.field)=="true"); // set checkbox from saved state
			} else {
				var pos=target.indexOf("|"); if (pos==-1) var pos=target.indexOf(":");
				c.tag=target;
				if (pos==0) c.tag=target.substr(1); // trim leading "|" or ":"
				if (pos>0) { c.tiddler=target.substr(0,pos); c.tag=target.substr(pos+1); }
				if (!c.tag.length) c.tag="checked";
				var t=store.getTiddler(c.tiddler);
				if (t && t.tags)
					c.checked=t.isTagged(c.tag); // set checkbox from saved state
			}
		}
		if (fn_init) c.fn_init=fn_init.trim().substr(1,fn_init.length-2); // trim off surrounding { and } delimiters
		if (fn_click) c.fn_click=fn_click.trim().substr(1,fn_click.length-2);
		c.init=true; c.onclick(); c.init=false; // compute initial state and save in tiddler/config/cookie
	},
	onClickCheckbox: function(event) {
		if (this.fn_init)
			// custom function hook to set initial state (run only once)
			{ try { eval(this.fn_init); this.fn_init=null; } catch(e) { displayMessage("Checkbox init error: "+e.toString()); } }
		else if (this.fn_click)
			// custom function hook to override or react to changes in checkbox state
			{ try { eval(this.fn_click) } catch(e) { displayMessage("Checkbox click error: "+e.toString()); } }
		if (this.id)
			// save state in config AND cookie (only when ID starts with 'chk')
			{ config.options[this.id]=this.checked; if (this.id.substr(0,3)=="chk") saveOptionCookie(this.id); }
		if (this.srctid && this.srcpos>0 && (!this.id || this.id.substr(0,3)!="chk") && !this.tag && !this.field) {
			// save state in tiddler content only if not using cookie, tag or field tracking
			var t=store.getTiddler(this.srctid); // put X in original source tiddler (if any)
			if (t && this.checked!=(t.text.substr(this.srcpos,1).toUpperCase()=="X")) { // if changed
				t.set(null,t.text.substr(0,this.srcpos)+(this.checked?"X":"_")+t.text.substr(this.srcpos+1),null,null,t.tags);
				if (!story.isDirty(t.title)) story.refreshTiddler(t.title,null,true);
				store.setDirty(true);
			}
		}
		if (this.field) {
			if (this.checked && !store.tiddlerExists(this.tiddler))
				store.saveTiddler(this.tiddler,this.tiddler,"",config.options.txtUserName,new Date());
			// set the field value in the target tiddler
			store.setValue(this.tiddler,this.field,this.checked?"true":"false");
			// DEBUG: displayMessage(this.field+"@"+this.tiddler+" is "+this.checked);
		}
		if (this.tag) {
			if (this.checked && !store.tiddlerExists(this.tiddler))
				store.saveTiddler(this.tiddler,this.tiddler,"",config.options.txtUserName,new Date());
			var t=store.getTiddler(this.tiddler);
			if (t) {
				var tagged=(t.tags && t.tags.find(this.tag)!=null);
				if (this.checked && !tagged) { t.tags.push(this.tag); store.setDirty(true); }
				if (!this.checked && tagged) { t.tags.splice(t.tags.find(this.tag),1); store.setDirty(true); }
			}
			// if tag state has been changed, update display of corresponding tiddlers (unless they are in edit mode...)
			if (this.checked!=tagged) {
				if (this.refresh.tagged) {
					if (!story.isDirty(this.tiddler)) story.refreshTiddler(this.tiddler,null,true); // the TAGGED tiddler in view mode
					else config.macros.checkbox.refreshEditorTagField(this.tiddler,this.tag,this.checked); // the TAGGED tiddler in edit mode (with tags field)
				}
				if (this.refresh.tagging)
					if (!story.isDirty(this.tag)) story.refreshTiddler(this.tag,null,true); // the TAGGING tiddler
			}
		}
		// refresh containing tiddler (but not during initial rendering, or we get an infinite loop!) (and not when editing container)
		if (!this.init && this.refresh.container && this.container!=this.tiddler)
			if (!story.isDirty(this.container)) story.refreshTiddler(this.container,null,true); // the tiddler CONTAINING the checkbox
		return true;
	},
	refreshEditorTagField: function(title,tag,set) {
		var tagfield=story.getTiddlerField(title,"tags");
		if (!tagfield||tagfield.getAttribute("edit")!="tags") return; // if no tags field in editor (i.e., custom template)
		var tags=tagfield.value.readBracketedList();
		if (tags.contains(tag)==set) return; // if no change needed
		if (set) tags.push(tag); // add tag
		else tags.splice(tags.indexOf(tag),1); // remove tag
		for (var t=0;t<tags.length;t++) tags[t]=String.encodeTiddlyLink(tags[t]);
		tagfield.value=tags.join(" "); // reassemble tag string (with brackets as needed)
		return;
	}
}
//}}}
!!!Script Crypt
# A skeleton file for starting a PERL script @@[[BoneCode]]@@
# Simple FASTA read script @@[[FASTAread]]@@ 
# Simple FASTA translate script @@[[FASTAtranslate]]@@
# FASTA translate script with subroutines and hashes @@[[FASTAtranslate2]]@@
# Generate AA Freq table for Arabidopsis:  @@[[AAfreqCountTable]]@@
# Word Comparison Script:
##  Single run comparison: @@[[WordCompare]]@@
##  Looping run comparison to collect benchmark stats: @@[[WordCompareLoop]]@@ 
# MidTerm: ~Needleman-Wunsch "BLAST" similarity/alignment: @@NW-AlignBlast@@
# Calculate LOD scores: @@LODscore@@
# Profile LOD scores along a genomic clone: @@LODprofiling@@
# ~Smith-Waterman implementation for local alignment scoring: @@LocalBLAST@@
# Submitting batch BLAST jobs on Biowolf: @@BLASTrunner@@
# Parsing a BLAST output file: @@BLASTparse_script@@
!!!Subroutines:
Here are just the subroutine parts that you can move between scripts. Note the variable declaration requirements and the dependence of some on others (ex. &LoadCodonTable has to be called before &[[TranslateFasta|TransFasta]]).
# [[&ReadFasta|ReadFasta]]
# [[&LoadCodonTable|LoadCodonTable]]
# [[&TranslateFasta|TransFasta]]
# [[&AAfreq|AAcounter]]
# [[&ROUND|Round]]
# [[Array Referencing for subroutines|ArrayRef]]
!!!Code questions and discussion:
# [[Concatentation of Strings|concatenation]]
# [[Ambiguous Nucleotides and Codon Table Errors|Glenn01]]
!
!Evolutionary Model of Emergence: Color Maps
The idea is to try and develop an evolution model for system complexity in genomes using a simpler color-system for the initial ground work. Our brains are so finely tuned to pattern recognition in color images that it is far more efficient to just "eye-ball" the ouput of a model run as a color image map than to try and write the corresponding algorithms to process the same volume of genetic data to look for patterns in genomes.

In proteins, each amino acid is coded by 3 nucleotides. In an image map, each pixel is coded by 3 color values (r,g,b). So the mechanics of writing a program so that an image could evolve at random is similar to the code to do the same thing with a protein coding gene.

!!Gallery:
''12DEC:''
 <html><table><tr>
<td><img src="color/Boxes-081130-080x060-02850-00820-000429723.png" style="height:400px"></td>
<td> Working with a larger grid (80x60) results in an exponential increase in processing time. This image started as a random tile grid with two larger boxes on top. Run is still going as this image is only generation # 429,723. Color balance is better and edge effects are not present. Download the evolution QT movie file: <u><a href="color/BabelImage-Boxes-12DEC.mov"> Boxes</a></u></td></tr>
</table></html>

''30NOV:''
 <html><table><tr>
<td><img src="color/Babel-081128-040x030-008070-02526-0000956045.png" style="height:400px"></td>
<td> This image started as a random 30x40 tile grid. I stopped the run after 956,000 generations. Download the evolution QT movie file: <a href="color/BabelImage-29NOV-Random.mov"> RANDOM</a></td></tr>
<tr>
<td><img src="color/RedSquare-081128-040x030-007820-02458-0000933148.png" style="height:400px"></td>
<td> <b>Red Square:</b> As a test for the selection criteria applied at each generation, I started this run with a small red square in the middle of the grid to see how "stable" it would be and whether or not it would influence the organization of color tiles around it. Download evolution QT movie file: <a href="color/BabelImage-29NOV-RedSquare.mov"> RedSquare</a></td></tr>
<tr>
<td><img src="color/BlueSquare1-081128-040x030-005650-01935-0000661053.png" style="height:400px"></td>
<td><b>BlueSquare1:</b> Same idea as Red Square, but I tweaked with some of the rate processes to see if I could keep the image from going so gray (i.e. equal r,g and b values). Download evolution QT movie file: <a href="color/BabelImage-29NOV-BlueSquare1.mov">BlueSquare1</a></td></tr>
<tr>
<td><img src="color/BlueSquare2-081128-040x030-004100-01318-0000592764.png" style="height:400px"></td>
<td> <b>BlueSquare2:</b> BS1 still turned out fairly gray. More tweaks for color selection to add a score penalty for when rgb values were within 10% of each other.  Download evolution QT movie file: <a href="color/BabelImage-29NOV-BlueSquare2.mov">BlueSquare2</a></td>
</tr></table></html>
Background: #fff
Foreground: #000
PrimaryPale: #ddeeaa
PrimaryLight: #ddeeaa
PrimaryMid: #666633
PrimaryDark: #014
SecondaryPale: #bbdd88
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #666633
TertiaryPale: #eee
TertiaryLight: #ccc
TertiaryMid: #999
TertiaryDark: #aacc88
Error: #f88
!Command Reference:

# [[foreach]]
# [[open]]
# [[while]]
# [[if]]
# [[print]]
# [[substr]]
# [[$/|break]]
# [[==, >, <, eq, neq, != |logop]]
# [[variables]]: strings, arrays, hash arrays
# [[=~|pattern matching]]
# [[.=|concatenation]]
# [[qw]]

!
# Group ENTRIES by a common tag, 'xxxx'.
# Create subtopic menu:
** rename as ''//xxxx//SubtopicMenu''
** enter the ENTRY titles into the table cells
# Create the viewtemplate
** rename as ''//xxxx//ViewTemplate''
** edit this line in the body, to this syntax using the 'xxxx' tag name:
{{{
<class='xxxx' macro='tiddler xxxxSubtopicMenu'>
}}}
!!SUBTOPIC PAGE CONTENT
{{{
<html>
<div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
TITLE OR HEADER OR DESCRIPTOR . . . .. 
</html>
1. [[x |1.11]]
2. [[y |1.12]]
3. [[z |1.13]]
!!
}}}
!!PLOT PAGE CONTENT
{{{
//header title//
<html><img src="00/xxxx.png" style="height:400px"></html>
[[BACK|XXTAGNAMEXX]]
!!!
}}}
!!
!!
[img[00/00-anim-codingpain.gif]]
''Dr. Adam G. Marsh''
645.4367
amarsh@udel.edu
!!
/***
|''Name:''|CryptoFunctionsPlugin|
|''Description:''|Support for cryptographic functions|
***/
//{{{
if(!version.extensions.CryptoFunctionsPlugin) {
version.extensions.CryptoFunctionsPlugin = {installed:true};

//--
//-- Crypto functions and associated conversion routines
//--

// Crypto "namespace"
function Crypto() {}

// Convert a string to an array of big-endian 32-bit words
Crypto.strToBe32s = function(str)
{
	var be = Array();
	var len = Math.floor(str.length/4);
	var i, j;
	for(i=0, j=0; i<len; i++, j+=4) {
		be[i] = ((str.charCodeAt(j)&0xff) << 24)|((str.charCodeAt(j+1)&0xff) << 16)|((str.charCodeAt(j+2)&0xff) << 8)|(str.charCodeAt(j+3)&0xff);
	}
	while (j<str.length) {
		be[j>>2] |= (str.charCodeAt(j)&0xff)<<(24-(j*8)%32);
		j++;
	}
	return be;
};

// Convert an array of big-endian 32-bit words to a string
Crypto.be32sToStr = function(be)
{
	var str = "";
	for(var i=0;i<be.length*32;i+=8)
		str += String.fromCharCode((be[i>>5]>>>(24-i%32)) & 0xff);
	return str;
};

// Convert an array of big-endian 32-bit words to a hex string
Crypto.be32sToHex = function(be)
{
	var hex = "0123456789ABCDEF";
	var str = "";
	for(var i=0;i<be.length*4;i++)
		str += hex.charAt((be[i>>2]>>((3-i%4)*8+4))&0xF) + hex.charAt((be[i>>2]>>((3-i%4)*8))&0xF);
	return str;
};

// Return, in hex, the SHA-1 hash of a string
Crypto.hexSha1Str = function(str)
{
	return Crypto.be32sToHex(Crypto.sha1Str(str));
};

// Return the SHA-1 hash of a string
Crypto.sha1Str = function(str)
{
	return Crypto.sha1(Crypto.strToBe32s(str),str.length);
};

// Calculate the SHA-1 hash of an array of blen bytes of big-endian 32-bit words
Crypto.sha1 = function(x,blen)
{
	// Add 32-bit integers, wrapping at 32 bits
	add32 = function(a,b)
	{
		var lsw = (a&0xFFFF)+(b&0xFFFF);
		var msw = (a>>16)+(b>>16)+(lsw>>16);
		return (msw<<16)|(lsw&0xFFFF);
	};
	// Add five 32-bit integers, wrapping at 32 bits
	add32x5 = function(a,b,c,d,e)
	{
		var lsw = (a&0xFFFF)+(b&0xFFFF)+(c&0xFFFF)+(d&0xFFFF)+(e&0xFFFF);
		var msw = (a>>16)+(b>>16)+(c>>16)+(d>>16)+(e>>16)+(lsw>>16);
		return (msw<<16)|(lsw&0xFFFF);
	};
	// Bitwise rotate left a 32-bit integer by 1 bit
	rol32 = function(n)
	{
		return (n>>>31)|(n<<1);
	};

	var len = blen*8;
	// Append padding so length in bits is 448 mod 512
	x[len>>5] |= 0x80 << (24-len%32);
	// Append length
	x[((len+64>>9)<<4)+15] = len;
	var w = Array(80);

	var k1 = 0x5A827999;
	var k2 = 0x6ED9EBA1;
	var k3 = 0x8F1BBCDC;
	var k4 = 0xCA62C1D6;

	var h0 = 0x67452301;
	var h1 = 0xEFCDAB89;
	var h2 = 0x98BADCFE;
	var h3 = 0x10325476;
	var h4 = 0xC3D2E1F0;

	for(var i=0;i<x.length;i+=16) {
		var j,t;
		var a = h0;
		var b = h1;
		var c = h2;
		var d = h3;
		var e = h4;
		for(j = 0;j<16;j++) {
			w[j] = x[i+j];
			t = add32x5(e,(a>>>27)|(a<<5),d^(b&(c^d)),w[j],k1);
			e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
		}
		for(j=16;j<20;j++) {
			w[j] = rol32(w[j-3]^w[j-8]^w[j-14]^w[j-16]);
			t = add32x5(e,(a>>>27)|(a<<5),d^(b&(c^d)),w[j],k1);
			e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
		}
		for(j=20;j<40;j++) {
			w[j] = rol32(w[j-3]^w[j-8]^w[j-14]^w[j-16]);
			t = add32x5(e,(a>>>27)|(a<<5),b^c^d,w[j],k2);
			e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
		}
		for(j=40;j<60;j++) {
			w[j] = rol32(w[j-3]^w[j-8]^w[j-14]^w[j-16]);
			t = add32x5(e,(a>>>27)|(a<<5),(b&c)|(d&(b|c)),w[j],k3);
			e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
		}
		for(j=60;j<80;j++) {
			w[j] = rol32(w[j-3]^w[j-8]^w[j-14]^w[j-16]);
			t = add32x5(e,(a>>>27)|(a<<5),b^c^d,w[j],k4);
			e=d; d=c; c=(b>>>2)|(b<<30); b=a; a = t;
		}

		h0 = add32(h0,a);
		h1 = add32(h1,b);
		h2 = add32(h2,c);
		h3 = add32(h3,d);
		h4 = add32(h4,e);
	}
	return Array(h0,h1,h2,h3,h4);
};


}
//}}}
!DATA section

You can store important run data within a script by placing it at the end of the file following the special delimiter: 
<html><img src="03/end.png" style="height:50px"></html>

The FASTAtranslate file looks like this:
<html><img src="03/DATA.png" style="height:400px"></html>
!!!
[[BACK|L03]]
!
[[FrontPage]]
/***
|''Name:''|DeprecatedFunctionsPlugin|
|''Description:''|Support for deprecated functions removed from core|
***/
//{{{
if(!version.extensions.DeprecatedFunctionsPlugin) {
version.extensions.DeprecatedFunctionsPlugin = {installed:true};

//--
//-- Deprecated code
//--

// @Deprecated: Use createElementAndWikify and this.termRegExp instead
config.formatterHelpers.charFormatHelper = function(w)
{
	w.subWikify(createTiddlyElement(w.output,this.element),this.terminator);
};

// @Deprecated: Use enclosedTextHelper and this.lookaheadRegExp instead
config.formatterHelpers.monospacedByLineHelper = function(w)
{
	var lookaheadRegExp = new RegExp(this.lookahead,"mg");
	lookaheadRegExp.lastIndex = w.matchStart;
	var lookaheadMatch = lookaheadRegExp.exec(w.source);
	if(lookaheadMatch && lookaheadMatch.index == w.matchStart) {
		var text = lookaheadMatch[1];
		if(config.browser.isIE)
			text = text.replace(/\n/g,"\r");
		createTiddlyElement(w.output,"pre",null,null,text);
		w.nextMatch = lookaheadRegExp.lastIndex;
	}
};

// @Deprecated: Use <br> or <br /> instead of <<br>>
config.macros.br = {};
config.macros.br.handler = function(place)
{
	createTiddlyElement(place,"br");
};

// Find an entry in an array. Returns the array index or null
// @Deprecated: Use indexOf instead
Array.prototype.find = function(item)
{
	var i = this.indexOf(item);
	return i == -1 ? null : i;
};

// Load a tiddler from an HTML DIV. The caller should make sure to later call Tiddler.changed()
// @Deprecated: Use store.getLoader().internalizeTiddler instead
Tiddler.prototype.loadFromDiv = function(divRef,title)
{
	return store.getLoader().internalizeTiddler(store,this,title,divRef);
};

// Format the text for storage in an HTML DIV
// @Deprecated Use store.getSaver().externalizeTiddler instead.
Tiddler.prototype.saveToDiv = function()
{
	return store.getSaver().externalizeTiddler(store,this);
};

// @Deprecated: Use store.allTiddlersAsHtml() instead
function allTiddlersAsHtml()
{
	return store.allTiddlersAsHtml();
}

// @Deprecated: Use refreshPageTemplate instead
function applyPageTemplate(title)
{
	refreshPageTemplate(title);
}

// @Deprecated: Use story.displayTiddlers instead
function displayTiddlers(srcElement,titles,template,unused1,unused2,animate,unused3)
{
	story.displayTiddlers(srcElement,titles,template,animate);
}

// @Deprecated: Use story.displayTiddler instead
function displayTiddler(srcElement,title,template,unused1,unused2,animate,unused3)
{
	story.displayTiddler(srcElement,title,template,animate);
}

// @Deprecated: Use functions on right hand side directly instead
var createTiddlerPopup = Popup.create;
var scrollToTiddlerPopup = Popup.show;
var hideTiddlerPopup = Popup.remove;

// @Deprecated: Use right hand side directly instead
var regexpBackSlashEn = new RegExp("\\\\n","mg");
var regexpBackSlash = new RegExp("\\\\","mg");
var regexpBackSlashEss = new RegExp("\\\\s","mg");
var regexpNewLine = new RegExp("\n","mg");
var regexpCarriageReturn = new RegExp("\r","mg");

}
//}}}
/***
|!''Name:''|!''E''asily ''A''daptable ''S''ource ''E''ditor|
|''Description:''|this framework allows you to easily create commands that work on the current tiddler text selection in edit mode|
|''Version:''|0.1.0|
|''Date:''|13/01/2007|
|''Source:''|http://yann.perrin.googlepages.com/twkd.html#E.A.S.E|
|''Author:''|[[Yann Perrin|YannPerrin]]|
|''License:''|[[BSD open source license]]|
|''~CoreVersion:''|2.x|
|''Browser:''|Firefox 1.0.4+; Firefox 1.5; InternetExplorer 6.0|
***/
////Messages Definition
//{{{
config.messages.Ease = {
noselection:"nothing selected",
asktitle:"enter the new tiddler title",
exists:" already exists, please enter another title",
askForTagsLabel:"enter the new tiddler tags",
tiddlercreated:" tiddler created"
}
//}}}
////
//{{{
if (!window.TWkd) window.TWkd={context:{}};
if (!TWkd.Ease)
 TWkd.Ease = function (text,tooltip){
 this.text = text;
 this.tooltip = tooltip;
 this.modes = [];
 this.addMode = function(modeDefinition) {this.modes.push(modeDefinition);};
 this.handler = function(event,src,title) {
 TWkd.context.command = this;
 TWkd.context.selection=this.getSelection(title);
 if (this.modes.length==1) {
 this.modes[0].operation();
 }
 else {
 var popup = Popup.create(src);
 if(popup) {
 for (var i=0; i<this.modes.length; i++) {
 createTiddlyButton(createTiddlyElement(popup,"li"), this.modes[i].name, this.modes[i].tooltip, this.OperateFromButton, null, 'id'+i, null);
 }
 Popup.show(popup,false);
 event.cancelBubble = true;
 if (event.stopPropagation) event.stopPropagation();
 return false;
 }
 }
 };
 };

TWkd.Ease.prototype.OperateFromButton = function(e){
 var commandMode=this.getAttribute('Id').replace('id','');
 TWkd.context.command.modes[commandMode].operation();
};

TWkd.Ease.prototype.getTiddlerEditField = function(title,field){
 var tiddler = document.getElementById(story.idPrefix + title);
 if(tiddler != null){
 var children = tiddler.getElementsByTagName("*")
 var e = null;
 for (var t=0; t<children.length; t++){
 var c = children[t];
 if(c.tagName.toLowerCase() == "input" || c.tagName.toLowerCase() == "textarea"){
 if(!e) {e = c;}
 if(c.getAttribute("edit") == field){e = c;}
 }
 }
 if(e){return e;}
 }
} // closes getTiddlerEditField function definition
 
TWkd.Ease.prototype.getSelection = function(title,quiet) {
 var tiddlerTextArea = this.getTiddlerEditField(title,"text");
 var result = {};
 if (document.selection != null && tiddlerTextArea.selectionStart == null) {
 tiddlerTextArea.focus();
 var range = document.selection.createRange();
 var bookmark = range.getBookmark();
 var contents = tiddlerTextArea.value;
 var originalContents = contents;
 var marker = "##SELECTION_MARKER_" + Math.random() + "##";
 while(contents.indexOf(marker) != -1) {
 marker = "##SELECTION_MARKER_" + Math.random() + "##";
 }
 var selection = range.text;
 range.text = marker + range.text + marker;
 contents = tiddlerTextArea.value;
 result.start = contents.indexOf(marker);
 contents = contents.replace(marker, "");
 result.end = contents.indexOf(marker);
 tiddlerTextArea.value = originalContents;
 range.moveToBookmark(bookmark);
 range.select();
 }
 else {
 result.start=tiddlerTextArea.selectionStart;
 result.end=tiddlerTextArea.selectionEnd;
 }
 result.content=tiddlerTextArea.value.substring(result.start,result.end);
 result.source=title;
 if (!result.content&&!quiet) displayMessage(config.messages.Ease.noselection);
 return(result);
}//closes getSelection function definition

// replace selection or insert new content
TWkd.Ease.prototype.putInPlace=function(content,workplace) {
 var tiddlerText = this.getTiddlerEditField(workplace.source,"text");
 tiddlerText.value = tiddlerText.value.substring(0,workplace.start)+content+tiddlerText.value.substring(workplace.end);
}

// asking for title
TWkd.Ease.prototype.askForTitle = function(suggestion) {
 if (!suggestion)
 suggestion = "";
 var newtitle;
 while (!newtitle||store.tiddlerExists(newtitle))
 {
 if (store.tiddlerExists(newtitle))
 displayMessage(newtitle+config.messages.Ease.exists);
 newtitle = prompt(config.messages.Ease.asktitle,suggestion);
 if (newtitle==null)
 {
 displayMessage(config.messages.Ease.titlecancel);
 return(false);
 }
 }
 return(newtitle);
}//closes askForTitle function definition

// creation of a new tiddler
TWkd.Ease.prototype.newTWkdLibTiddler = function(title,content,from,askForTags){
 var tiddler = new Tiddler();
 tiddler.title = title;
 tiddler.modifier = config.options.txtUserName;
 tiddler.text = content;
 (from) ? tiddler.tags = [from] : tiddler.tags=[];
 if (askForTags)
 tiddler.tags = prompt(config.messages.Ease.askForTagsLabel,'[['+from+']]').readBracketedList();
 store.addTiddler(tiddler);
 //store.notifyAll();
 displayMessage(title+config.messages.Ease.tiddlercreated);
}

if (!TWkd.Mode)
 TWkd.Mode = function (name,tooltip,ask,operation) {
 this.name = name;
 this.tooltip = tooltip;
 this.ask = ask;
 this.operation = operation;
 };
//}}}
<div class="toolbar" macro="toolbar +saveTiddler closeOthers -cancelTiddler deleteTiddler"></div>
<div class="title" macro="view title"></div>
<div class="editLabel">Title</div><div class="editor" macro="edit title"></div>
<div class="editLabel">Tags</div><div class="editor" macro="edit tags"></div>
<div class="editorFooter"><span macro="message views.editor.tagPrompt"></span><span macro="tagChooser"></span></div>
<div macro='hideWhen ((tiddler.tags.contains("Contacts"))||(tiddler.title=="New Contact"))'>[[EditToolbar]]<div class='editor' macro='edit text'></div></div>
<div macro='showWhen ((tiddler.tags.contains("Contacts"))||(tiddler.title=="New Contact"))'><div class='editor'>
<table width='100%'>
<tr><th>Name</th><td><span macro='edit ContactFirstName'></span><span macro='edit ContactLastName'></span></td><td rowspan='4' width='50%' macro='edit text'></td></tr>
<tr><th>Adress</th><td><span macro='edit ContactStreetNumber'></span><span macro='edit ContactStreetName'></span><span macro='edit ContactZipCode'></span><span macro='edit ContactCity'></span></td></tr>
<tr><th>Phone</th><td><span macro='edit ContactPhone'></span></td></tr>
<tr><th>Email</th><td><span macro='edit ContactMail'><span></td></tr>
</table>
</div></div>
<div macro='toolbar Format Greek Hebrew Indent Notes Color Highlighting Tables'></div>
!~Mark-Up Script Editors:
!!!1. Java Edit
''jEdit'' is a feature-rich programmer's text editor built on Java that will run on all computers with JRE (java runtime environment). To download, install, and set up jEdit as quickly and painlessly as possible, go to the [[Quick Start page | http://www.jedit.org/index.php?page=quickstart]].
<html><img src="00/jedit.png" style="height:300px"></html>

!!!2. Crimson Editor
''CE'' is a syntax mark-up editor for MS Windows. Folder navigation is pretty easy using this tool. It can be setup as a pass-through interface to the command terminal so running and editing scripts can all be done in one GUI. To download and install got to [[Crimson Editor | http://www.crimsoneditor.com/]].
<html><img src="00/crimsonedit.png" style="height:300px"></html>

!!!3. Komodo
''KOMODO EDIT'' is a feature code editor from Active State. They sell a high-octane version of the editor Komodo-IDE which is geared to professional coders. The freeware version Komodo-Editor still has more features than you will ever probably require.  To download and install got to [[Komodo | http://www.activestate.com/Products/]], and select ''Komodo-Edit'', not ''Komodo-IDE''.
<html><img src="00/komodo.png" style="height:300px"></html>

!!!4. Vim
Vim is a highly configurable text editor built to enable efficient text editing. It is an improved version of the vi editor distributed with most UNIX systems. Vim is often called a "programmer's editor," and so useful for programming that many consider it an entire IDE. It's not just for programmers, though. Vim is perfect for all kinds of text editing, from composing email to editing configuration files. Vim can be configured to work in a very simple (Notepad-like) way, called evim or Easy Vim.
[[VIM download|http://www.vim.org/index.php]]
!
[[BACK to Lecture 7|L07]]
!!!
!Entropy
''ENTROPY'' is a concept to describe the information content of a system given different states of organization. It is borrowed by molecular biologists to describe the "organization" of sequence information. The idea of information entropy was first developed for telecommunications by: Shannon and Weaver, 1949, "The mathematical theory of communication," University of Illinois Press, Chicago.

The idea is that the more information you have about a system, the more certain you are about the current state of that system. The Shannon-Weaver entropy statistics works by a fairly simple summation of the probability states of any system to establish a metric that represents how much you DON'T know about that system. Yes, it is an inverse measure in that the greater the system entropy value, the more complex a system is, and consequently there's more that you don't know about it.

Example 1: From Dweyer, let's say there is a 90% chance of rain today, a 9% chance of overcast clouds but no rain, and a 1% chance of partly cloudy, sunny skies, with no rain. This weather "system" has the following potential probabilities: 

''WEATHER~~system1~~ = [//p//(Rain), //p//(Clouds), //p//(Sun)] = [0.90, 0.09, 0.01]''

If you heard this information on the radio when you woke up, you would likely make an immediate mental note to bring an umbrella or wear a rain jacket when you left home. If however the forecast was for 33% chance of rain and a 34% chance of overcast skies with no rain and a 33% chance of partly cloudy, sunny skies, with no rain, then you would not be so quick to conclude that you needed an umbrella today. 

''WEATHER~~system2~~ = [//p//(Rain), //p//(Clouds), //p//(Sun)] = [0.33, 0.34, 0.33]''

To make a correct decision about whether or not to bring an umbrella you would need to access more weather data (doppler radar, etc.) to better ascertain.

So these two weather states have very different levels of predictability or complexity or entropy. We generally calculate an entropy statistic as the summation of the probability of each possible state (//p//~~i~~) multiplied by the natural log of that probability:
''H'' = -1 * Ln( (//p//~~i~~ ^^//p//~~i~~^^)! )    //#(remember ! = factorial)//
| ''H'' = -1 * sum (//p//~~i~~ * Ln(//p//~~i~~)  |

''H''-WEATHER~~system1~~ = -1 * ( (0.90 * -0.11) + (0.09 * -2.41) + (0.01 * -4.61) ) = __0.157__
''H''-WEATHER~~system2~~ = -1 * ( (0.33 * -1.11) + (0.34 * -1.08) + (0.33 * -1.11) ) = __1.099__

So if we compare the entropy numbers, the larger value for WEATHER~~system2~~ is interpreted to mean that there is more uncertainty about whether or not it will rain or shine on this day. The high number value means that it is more complex of a decision to ascertain the true state of the weather for day 2. 

| In Bioinformatics you will often be faced with problems of trying to predict something about a DNA sequence base solely on the probabilities that nucloetide bases will be present at different frequencies. |
!!!
[[BACK to Lecture 7|L07]]
!
[[BACK to Lecture 7|L07]]
!!!
!NT Frequencies or Probabilities
Knowing that [[ENTROPY|Entropy01]] is a metric of information disorganization or uncertainty, we can apply the same logic as we did to assessing weather conditions to assessing sequence characteristics. 

| Given information about two classes of sequence elements, coding and noncoding DNA, can you look at a new sequence block and ascertain the probability that the sequence is either coding or noncoding? |

!!!EXAMPLE:
$~QuerySeq = "GACTAATAATGACGCTAGCTAGCTAGCTAGCATTATATAGGCGATATCAG";

Let's define two possible sets or models that a sequence can occupy: ''coding'' or ''noncoding''. Each of these sets is going to have unique characteristics or properties determined by the nucleotide composition (system state) of A, G, T or C nucleotides that comprise that sequence family. IF coding and noncoding domains have the following nucleotide compositions, we should then be able to calculate whether $~QuerySeq is either coding or noncoding:

SEQ~~code~~ = [//p//(A), //p//(G), //p//(T), //p//(C)]  = (0.25, 0.28, 0.21, 0.26)
SEQ~~nocode~~ = [//p//(A), //p//(G), //p//(T), //p//(C)]  = (0.15, 0.38, 0.13, 0.34)


We can calculate the likelihood of $~QuerySeq being a member of SEQ~~model-X~~ as a simple probability of the random chance of finding "G-A-C-T-A-A....G" in a sequence family or model or group: 
''//P//($~QuerySeq|SEQ~~model-X~~)''

To make this clearer, imagine that you have a bag of ''coding'' nucleotides such that whenever you reach into that bag and pull out a single NT, there is a 25% chance you will get an A, 21% chance you will get a T, 28% chance you will get a G, and a 26% chance you will get C. So if you were to reach into that bag and pull out 50 nucleotides what is the probability that you would get the query sequence above? The probability that any two events will occur together is the product of their separate probabilites. Here there is a 28% chance that you would pull out a G first. Then there is a 25% chance that you would pull out an A. So the chance that you would pull out a G and then an A is: p(G)*p(A) = 0.28 * 0.25 = 0.070; a 7% probability. 

But for the family of noncoding sequences, p(G)*p(A) = 0.38 * 0.15 = 0.057; about a 6% probability. This kind of comparison that is used to ascertain whether $~QuerySeq is a member of SEQ~~code~~ or SEQ~~nocode~~. Obviously, one needs to look at more than just 2 nt positions. 

!








!
[[BACK to Lecture 7|L07]]
!!!
!Entropy Ratios:
We can simplify the calculation of an LOD score for a sequence by gathering similar terms into single factors. 
''Given:''
$~QuerySeq =  "GACTAATAATGACGCTAGCTAGCTAGCTAGCATTATATAGGCGATATCAG";

''We calculated the probability value using the p-value for every nucleotide position:''
//P//($~QuerySeq|SEQ~~code~~)   =   p(G)~~code~~ * p(A)~~code~~ * ....p(G)~~code~~

''We simplify this by first just calculating the natural log of the probability:''
''ln''(//P//($~QuerySeq|SEQ~~code~~) )  =   ''ln''(p(G)~~code~~) ''+'' ''ln''(p(A)~~code~~) ''+'' .... ''ln''(p(G)~~code~~)

''Then we simplify by gathering the terms:''
''ln''(//P//($~QuerySeq|SEQ~~code~~) )  = 
|! count |! calculation |! p(NT) |! ''ln''(p(NT)) |! count * ''ln''(p(NT)) |
|  11 | ''ln''(p(G)~~code~~)| 0.28 | -1.273 | -14.00 |
| 17 | ''ln''(p(A)~~code~~)| 0.25 | -1.386 | -23.56 |
|   9 | ''ln''(p(C)~~code~~)| 0.26 | -1.347 | -12.12 |
|   13 | ''ln''(p(T)~~code~~)| 0.21 | -1.560 | -20.29 |
|>|>|>| ''ln''(//P//($~QuerySeq:SEQ~~code~~) ) = |!  ''-69.97'' |

''Perfect Sequence:Coding model or set:''
In an IDEAL coding sequence block, we define p(G) = 0.28, p(C) = 0.26, p(A) = 0.25, and p(T) = 0.21. So in a 100 NT coding sequence, we would have 28, 26, 25 and 21 of each, respectively. We can then calculate a //P// for any ideal sequence of length ''n'' as:

| ''ln'' (//L//(SEQ~~code~~)~~''n''~~ )  = |  (''n'' * p(G~~code~~)) * ''ln''(p(G)~~code~~) |
| +|  (''n'' * p(A~~code~~)) * ''ln''(p(A)~~code~~) |
| +|  (''n'' * p(C~~code~~)) * ''ln''(p(C)~~code~~) |
| +|  (''n'' * p(T~~code~~)) * ''ln''(p(T)~~code~~) |

 ''ln'' (//L// ) =  ''n'' * ( (0.28*-1.273) + (0.25*-1.386) + (0.26*-1.347) + (0.21*-1.560) )
 ''ln'' (//L// ) =  ''n'' * -1.381
The log likelihood value of -1.381*n simply means that each NT included in //coding sequence//  adds 1.381 information units to the overall log likelihood calculation. 
!!!Entropy:
Note the similarity between __log likelihood__ calculation for an IDEAL coding sequence and the form of the Entropy equation from [[Entropy02]]:
| ''H'' = -1 * sum (//p//~~i~~ * ln(//p//~~i~~)  |
If we substitute the //p//~~i~~* ln(//p//~~i~~) terms in the ''log likelihood'' equation (//L// ) above:
 ''ln'' (//L// ) =  ''n'' * ''H''

Essentially the Entropy calculation ''H'' gives us a metric of how much information a new observation contributes towards increasing our understanding of the organization or current state of a system. Low values of ''H'' mean that new data does not contribute much, that the current state of the system is very predictable. Larger values of ''H'' imply that new observations provide a substantial contribution towards better understanding the organization or current state of the system, that the current state of the system is unpredictable and that more information has more of an impact on whether or not we can predict the current state.  

!
!File read/write and text processing
This example described in Lecture 01 [[(Return to that lecture)|L01.03]]
It is also covered again in Lecture 02 [[(Return to that lecture)|L02.03]]
A fully annotated version of this script is also posted here: FASTAread-NOTED

Note that in this version the FASTA read routine has been moved to the subroutine portion of the script. If you look at the MAIN section, it is just cleaner to move that "busy-work" code out of the way so you can see what the script really does.
{{{
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - -
# AGM-SEP2008
# OBJECTIVE: 
#      1. Read FASTA file, parse headers and sequences 
#      2. Convert lowercase seq letters to uppercase. 
#      3. Write new FASTA file with uppercase sequences.

# - - - - - U S E R    V A R I A B L E S - - - - - - 
my $infile = "TestFasta.txt";  # user edited input fasta file

# - - - - - G L O B A L  V A R I A B L E S  - - - - -
my @FILE;      # input array to hold file contents
my @Names;     # array to hold each orf header
my @Seqs;      # array to hold each orf sequence

# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\nThe Hunt for Red October . . . . \n\n";

# TASK 1: READ fasta file . . . . . . . . . . . 
	&ReadFasta($infile);
	
# TASK 2: PROCESS fasta file . . . . . . . . . . . 
	foreach my $i (0..$#Seqs)
	{	$Seqs[$i] =~ tr/agct/AGCT/; }  #REGEX translate /changeX/Y/
	
# TASK 3: WRITE results to new fasta file . . . . . . . . . . . 
	my $outfile = "NEW-".$infile;
	open(NewFile,">$outfile");
	foreach my $k (0..$#Seqs)
	{	my $ntcount = length($Seqs[$k]);
		print NewFile "> $Names[$k] - nt count = $ntcount\n$Seqs[$k]\n";
	}

print "\n\n\nDONE\n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		push(@Names, $name);
		push(@Seqs, $seq);
	}
}
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
}}}
[[BACK to lecture 2|L02.03]]
{{{
#!PATH-2-PERL-HERE
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - -
# AGM-SEP2008
# OBJECTIVE: 
#      1. Read FASTA file, parse headers and sequences 
#      2. Convert lowercase seq letters to uppercase. 
#      3. Write new FASTA file with uppercase sequences.

# - - - - - U S E R    V A R I A B L E S - - - - - - 
my $infile = "TestFasta.txt";  # user edited input fasta file

# - - - - - G L O B A L  V A R I A B L E S  - - - - -
my @FILE;      # input array to hold file contents
my @Names;     # array to hold each orf header
my @Seqs;      # array to hold each orf sequence

# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\nThe Hunt for Red October . . . . \n\n";

# TASK 1: READ fasta file . . . . . . . . . . . 
	# sets the input break character to ">"
	$/=">";
	# set input stream from $infile with name FASTA
	open(FASTA,"<$infile") or die "\n\n\n Nada $infile\n\n\n";
	# Slurp all lines of FASTA into array @FILE
	@FILE=<FASTA>;
	# deactivate FASTA input stream
	close(FASTA);

	## . . . . . . . . . . . . . . . . . . . . . . .
	## PRINT CHECK A: look at @FILE elements before proceeding
	# print "Print Code Block A:\n---------\n";
	# my $k = 1;
	# foreach my $entry (@FILE)
	# {	if ($k <= 3)             # only print the first 3 @FILE elements
	# 	{ 	print "$k. [$entry]"; # screen dump, $line flanked by brackets
	# 		print "\n---------\n"; # make the break between elements clear
	# 		$k += 1;             # increment the counter by 1
	# 	}
	# }
	# die;
	## ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `

	# Remove the first entry which is just ">" from @FILE.
	# SHIFT deletes that entry and then moves all the other entries
	#      down by 1. So the Second element now becomes the first, the 
	#      third now becomes the second, etc. . . . . . . 
	shift(@FILE); 
	my $j = 1;      # simple counter for print output
	
	# Here's the strategy for breaking down each fast entry and getting
	#   the information that we need into defined variables:
	#      @FILE has all the entries
	#      Each $FILE[x] is copied to a single string called $orf
	#      $orf is then split into component lines and stored in @Lines
	#      Each $Line[x] is copied to $line and processed. 
	foreach my $orf (@FILE)
	{	# Need to divide each entry into individual lines 
		#   using the line break \n character as the separator.
		my @Lines = split(/\n/,$orf);
		
	## . . . . . . . . . . . . . . . . . . . . . . .
	## PRINT CHECK B: look at the @lines elements before proceeding
		# print "Print Code Block B:\n---------\n";
		# my $k = 1;
		# foreach my $line (@Lines)
		# {	print "$k. [$line]";   # screen dump, $line flanked by brackets
		# 	print "\n---------\n"; # make the break between elements clear
		# 	$k += 1;               # increment the counter by 1
		# }
		# die;
	## ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
		
		# Put the header information into $name
		my $name = $Lines[0];
		
		# Initialize $seq to be empty, note that there is nothing between
		#    the quotes. Not even a <space>. Just nothing. Nada. Null.
		my $seq = "";
		
		# So we are going to start concatenating the sequence. Remember that
		#     The second element in @Lines is retrieved by $Line[1], so we
		#     start the index counter $i at 1.
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		
		# Now we are going to use a regex expression to process the sequence
		#    and remove any unwanted characters.
		# The regex switch function works like this /find-pattern/replace-pattern/
		#    Here, the replace pattern is "empty" (nothing) so it essentially
		#    results in a simple deletion.
		$seq =~ s/>//;  # remove the ">" at the end if it is there
		
		# Now use the push function to put the $name and $seq values into
		#    arrays which you can then access in STEP 2 below. 
		push(@Names, $name);
		push(@Seqs, $seq);

	## . . . . . . . . . . . . . . . . . . . . . . .
	## PRINT CHECK C: look at the @lines elements before proceeding
		# if ($j==1){print "Print Code Block C:\n---------\n";} # just print once
		# my $ntcount = length($seq);  # count NTs to check for multiples of 63
		# print "$j. $name: NT count = $ntcount";   # screen dump
		# print "\n---------\n";         # make the break between elements clear
	## ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `		

		$j += 1; # increment the counter by 1	
	}
	
# TASK 2: PROCESS fasta file . . . . . . . . . . . 
	# The function $#Seqs returns the index number of the last element in @Seqs
	# This foreach loop will run with $i ranging from 0 to N, (N=elements-1).
	foreach my $i (0..$#Seqs)
	{	$Seqs[$i] =~ tr/agct/AGCT/; }  #REGEX translate /changeX/Y/
	

# TASK 3: WRITE results to new fasta file . . . . . . . . . . . 
	my $outfile = "NEW-".$infile;
	open(NewFile,">$outfile");
	foreach my $k (0..$#Seqs)
	{	my $ntcount = length($Seqs[$k]);
		print NewFile "> $Names[$k] - nt count = $ntcount\n$Seqs[$k]\n";
		# print "> $Names[$k]- nt count = $ntcount\n";
	}

# OK, Time to say good-bye and exit program . . . . . . 
print "\n\n\nDONE\n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -




# - - - - - EOF - - - - - - - - - - - - - - - - - - -
}}}
This script was posted for [[Lecture 3|L03]].
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - -
# AGM-SEP2008
# OBJECTIVE: 
#		1. Read FASTA file, parse headers and sequences 
#		2. Translate NT sequence into PROTEIN sequence
#		3. Write protein sequences in FASTA file format.

# - - - - - U S E R    V A R I A B L E S - - - - - - 
my $infile = "TestFasta-Amarina.ffn";  # user edited input fasta file

# - - - - - G L O B A L  V A R I A B L E S  - - - - -
my @FILE;       # input array to hold file contents
my @Names;      # array to hold each orf header
my @Seqs;       # array to hold each orf sequence
my %CodonTable; # Hash-Array for codons
my @Proteins;   # array to hold protein sequences

# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\nThe clowns are in your sock drawer . . . . \n\n";

# TASK 1: READ fasta file . . . . . . . . . . . 
	&ReadFasta($infile);
	
# TASK 2: Translate Sequence to PROTEIN sequence . . . . . . . . . 
	# A.  Load the AA codon table from end of program.
	my @data = split(/\n/,<DATA>);
	foreach my $line (@data)          
	{  	my @codons = split(/ /,$line); # separate on "space" character
		my $AA = shift(@codons);       # $AA= amino acid, then remove from @codon
		foreach my $nnn (@codons) 
		{	$CodonTable{$nnn} = $AA; print ">>> $nnn = $CodonTable{$nnn}\n";}
    }
	
	# B. Convert the NT sequence into AAs . . . . . .
	foreach my $seq (@Seqs)
	{	my $protein = "";            # set to "empty" at the start of each loop
		for (my $i=0; $i <= length($seq)-2; $i += 3)  # another FOR-loop structure
		{	my $codon = substr($seq,$i,3);             # $codon = 3 nts at a time
			my $aa = $CodonTable{$codon};       # here's the translation step
			$protein .= $aa;
		}
		push(@Proteins, $protein);
	}

# TASK 3: WRITE results to new fasta file . . . . . . . . . . . 
	my $outfile = "Protein-".$infile;
	open(NewFile,">$outfile");
	foreach my $k (0..$#Proteins)
	{	my $ntcount = length($Proteins[$k]);
		print NewFile "> $Names[$k]; AA count = $ntcount\n$Proteins[$k]\n";
		print "> $Names[$k]; AA count = $ntcount\n$Proteins[$k]\n";
	}

	
print "\n\n\nDONE\n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		push(@Names, $name);
		push(@Seqs, $seq);
	}
}
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# The lines below are not perl statements and are not executed as part of the 
# program.  Instead, they are available to be read as data input by the program
# using the I/O handle name "DATA". This is a default handle name for any data 
# you want to include in a script file.
__END__
A GCU GCC GCA GCG
R CGU CGC CGA CGG AGA AGG
N AAU AAC
D GAU GAC 
C UGU UGC
Q CAA CAG
E GAA GAG
G GGU GGC GGA GGG
H CAU CAC
I AUU AUC AUA
L UUA UUG CUU CUC CUA CUG
K AAA AAG
M AUG
F UUU UUC
P CCU CCC CCA CCG
S UCU UCC UCA UCG AGU AGC
T ACU ACC ACA ACG
W UGG
Y UAU UAC
V GUU GUC GUA GUG
* UAA UAG UGA
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
}}}
!
!!!
[[BACK to Lecture 3|L03]]
!!!
Compartmentalize the code to subroutines. Now the MAIN program is clearly evident:
<html><img src="03/subroutines.png" style="height:300px"></html>

Now we hash the sequence strings so they are stored with the header information in a nucleotide hash %NTs, and then in a protein hash %PRTs. Just need some simple edits to change how those variables are stored and accessed.
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - -
# AGM-SEP2008
# OBJECTIVE: 
#		1. Read FASTA file, parse headers and sequences 
#		2. Translate NT sequence into PROTEIN sequence
#		3. Write protein sequences in FASTA file format.

# - - - - - U S E R    V A R I A B L E S - - - - - - 
my $infile = "TestFasta-Amarina.ffn";  # user edited input fasta file

# - - - - - G L O B A L  V A R I A B L E S  - - - - -
my @FILE;        # input array to hold file contents
my %NTs;         # Hash-Array to hold each orf name & sequence
my %CodonTable;  # Hash-Array for codons
my %PRTs;        # Hash-array to hold protein sequences

# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\nThe clowns are in your sock drawer . . . . \n\n";

	&ReadFasta($infile);
	&TranslateFasta;
	&PrintFasta($infile);

print "\n\n\nDONE\n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		$NTs{$name} = $seq;
	}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TranslateFasta
{	# A.  Load the AA codon table from end of program.
	my @data = split(/\n/,<DATA>);
	foreach my $line (@data)          
	{  	my @codons = split(/ /,$line); # separate on "space" character
		my $AA = shift(@codons);       # $AA= amino acid, then remove from @codon
		foreach my $nnn (@codons) 
		{	$nnn =~ s/U/T/g;
			$CodonTable{$nnn} = $AA; 
			print ">>> $nnn = $CodonTable{$nnn}\n";
		}
    }
	
	# B. Convert the NT sequence into AAs . . . . . .
	foreach my $header (keys %NTs)
	{	my $protein = "";            # set to "empty" at the start of each loop
		for (my $i=0; $i <= length($NTs{$header})-2; $i += 3)  # another FOR-loop structure
		{	my $codon = substr($NTs{$header},$i,3);             # $codon = 3 nts at a time
			my $aa = $CodonTable{$codon};       # here's the translation step
			$protein .= $aa;
		}
		$PRTs{$header} = $protein;
	}
}

# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub PrintFasta
{	my $outfile = "Protein-".$_[0];
	open(NewFile,">$outfile");
	foreach my $protein (sort keys %PRTs)
	{	my $count = length($PRTs{$protein});
		print NewFile "> $protein; AA count = $count\n$PRTs{$protein}\n";
		print "> $protein; AA count = $count\n";
	}
}
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# The lines below are not perl statements and are not executed as part of the 
# program.  Instead, they are available to be read as data input by the program
# using the I/O handle name "DATA". This is a default handle name for any data 
# you want to include in a script file.
__END__
A GCU GCC GCA GCG
R CGU CGC CGA CGG AGA AGG
N AAU AAC
D GAU GAC 
C UGU UGC
Q CAA CAG
E GAA GAG
G GGU GGC GGA GGG
H CAU CAC
I AUU AUC AUA
L UUA UUG CUU CUC CUA CUG
K AAA AAG
M AUG
F UUU UUC
P CCU CCC CCA CCG
S UCU UCC UCA UCG AGU AGC
T ACU ACC ACA ACG
W UGG
Y UAU UAC
V GUU GUC GUA GUG
* UAA UAG UGA
}}}
!
/***
|Name|FontSizePlugin|
|Created by|SaqImtiaz|
|Location|http://tw.lewcid.org/#FontSizePlugin|
|Version|1.0|
|Requires|~TW2.x|
!Description:
Resize tiddler text on the fly. The text size is remembered between sessions by use of a cookie.
You can customize the maximum and minimum allowed sizes.
(only affects tiddler content text, not any other text)

Also, you can load a TW file with a font-size specified in the url.
Eg: http://tw.lewcid.org/#font:110

!Demo:
Try using the font-size buttons in the sidebar, or in the MainMenu above.

!Installation:
Copy the contents of this tiddler to your TW, tag with systemConfig, save and reload your TW.
Then put {{{<<fontSize "font-size:">>}}} in your SideBarOptions tiddler, or anywhere else that you might like.

!Usage
{{{<<fontSize>>}}} results in <<fontSize>>
{{{<<fontSize font-size: >>}}} results in <<fontSize font-size:>>

!Customizing:
The buttons and prefix text are wrapped in a span with class fontResizer, for easy css styling.
To change the default font-size, and the maximum and minimum font-size allowed, edit the config.fontSize.settings section of the code below.

!Notes:
This plugin assumes that the initial font-size is 100% and then increases or decreases the size by 10%. This stepsize of 10% can also be customized.

!History:
*27-07-06, version 1.0 : prevented double clicks from triggering editing of containing tiddler.
*25-07-06,  version 0.9

!Code
***/

//{{{
config.fontSize={};

//configuration settings
config.fontSize.settings =
{
            defaultSize : 100,  // all sizes in %
            maxSize : 200,
            minSize : 40,
            stepSize : 10
};

//startup code
var fontSettings = config.fontSize.settings;

if (!config.options.txtFontSize)
            {config.options.txtFontSize = fontSettings.defaultSize;
            saveOptionCookie("txtFontSize");}
setStylesheet(".tiddler .viewer {font-size:"+config.options.txtFontSize+"%;}\n","fontResizerStyles");
setStylesheet("#contentWrapper .fontResizer .button {display:inline;font-size:105%; font-weight:bold; margin:0 1px; padding: 0 3px; text-align:center !important;}\n .fontResizer {margin:0 0.5em;}","fontResizerButtonStyles");

//macro
config.macros.fontSize={};
config.macros.fontSize.handler = function (place,macroName,params,wikifier,paramString,tiddler)
{

               var sp = createTiddlyElement(place,"span",null,"fontResizer");
               sp.ondblclick=this.onDblClick;
               if (params[0])
                           createTiddlyText(sp,params[0]);
               createTiddlyButton(sp,"+","increase font-size",this.incFont);
               createTiddlyButton(sp,"=","reset font-size",this.resetFont);
               createTiddlyButton(sp,"–","decrease font-size",this.decFont);
}

config.macros.fontSize.onDblClick = function (e)
{
             if (!e) var e = window.event;
             e.cancelBubble = true;
             if (e.stopPropagation) e.stopPropagation();
             return false;
}

config.macros.fontSize.setFont = function ()
{
               saveOptionCookie("txtFontSize");
               setStylesheet(".tiddler .viewer {font-size:"+config.options.txtFontSize+"%;}\n","fontResizerStyles");
}

config.macros.fontSize.incFont=function()
{
               if (config.options.txtFontSize < fontSettings.maxSize)
                  config.options.txtFontSize = (config.options.txtFontSize*1)+fontSettings.stepSize;
               config.macros.fontSize.setFont();
}

config.macros.fontSize.decFont=function()
{

               if (config.options.txtFontSize > fontSettings.minSize)
                  config.options.txtFontSize = (config.options.txtFontSize*1) - fontSettings.stepSize;
               config.macros.fontSize.setFont();
}

config.macros.fontSize.resetFont=function()
{

               config.options.txtFontSize=fontSettings.defaultSize;
               config.macros.fontSize.setFont();
}

config.paramifiers.font =
{
               onstart: function(v)
                  {
                   config.options.txtFontSize = v;
                   config.macros.fontSize.setFont();
                  }
};
//}}}
/***
|''Name:''|ForEachTiddlerPlugin|
|''Version:''|1.0.8 (2007-04-12)|
|''Source:''|http://tiddlywiki.abego-software.de/#ForEachTiddlerPlugin|
|''Author:''|UdoBorkowski (ub [at] abego-software [dot] de)|
|''Licence:''|[[BSD open source license (abego Software)|http://www.abego-software.de/legal/apl-v10.html]]|
|''Copyright:''|&copy; 2005-2007 [[abego Software|http://www.abego-software.de]]|
|''TiddlyWiki:''|1.2.38+, 2.0|
|''Browser:''|Firefox 1.0.4+; Firefox 1.5; InternetExplorer 6.0|
!Description

Create customizable lists, tables etc. for your selections of tiddlers. Specify the tiddlers to include and their order through a powerful language.

''Syntax:'' 
|>|{{{<<}}}''forEachTiddler'' [''in'' //tiddlyWikiPath//] [''where'' //whereCondition//] [''sortBy'' //sortExpression// [''ascending'' //or// ''descending'']] [''script'' //scriptText//] [//action// [//actionParameters//]]{{{>>}}}|
|//tiddlyWikiPath//|The filepath to the TiddlyWiki the macro should work on. When missing the current TiddlyWiki is used.|
|//whereCondition//|(quoted) JavaScript boolean expression. May refer to the build-in variables {{{tiddler}}} and  {{{context}}}.|
|//sortExpression//|(quoted) JavaScript expression returning "comparable" objects (using '{{{<}}}','{{{>}}}','{{{==}}}'. May refer to the build-in variables {{{tiddler}}} and  {{{context}}}.|
|//scriptText//|(quoted) JavaScript text. Typically defines JavaScript functions that are called by the various JavaScript expressions (whereClause, sortClause, action arguments,...)|
|//action//|The action that should be performed on every selected tiddler, in the given order. By default the actions [[addToList|AddToListAction]] and [[write|WriteAction]] are supported. When no action is specified [[addToList|AddToListAction]]  is used.|
|//actionParameters//|(action specific) parameters the action may refer while processing the tiddlers (see action descriptions for details). <<tiddler [[JavaScript in actionParameters]]>>|
|>|~~Syntax formatting: Keywords in ''bold'', optional parts in [...]. 'or' means that exactly one of the two alternatives must exist.~~|

See details see [[ForEachTiddlerMacro]] and [[ForEachTiddlerExamples]].

!Revision history
* v1.0.8 (2007-04-12)
** Adapted to latest TiddlyWiki 2.2 Beta importTiddlyWiki API (introduced with changeset 2004). TiddlyWiki 2.2 Beta builds prior to changeset 2004 are no longer supported (but TiddlyWiki 2.1 and earlier, of cause)
* v1.0.7 (2007-03-28)
** Also support "pre" formatted TiddlyWikis (introduced with TW 2.2) (when using "in" clause to work on external tiddlers)
* v1.0.6 (2006-09-16)
** Context provides "viewerTiddler", i.e. the tiddler used to view the macro. Most times this is equal to the "inTiddler", but when using the "tiddler" macro both may be different.
** Support "begin", "end" and "none" expressions in "write" action
* v1.0.5 (2006-02-05)
** Pass tiddler containing the macro with wikify, context object also holds reference to tiddler containing the macro ("inTiddler"). Thanks to SimonBaird.
** Support Firefox 1.5.0.1
** Internal
*** Make "JSLint" conform
*** "Only install once"
* v1.0.4 (2006-01-06)
** Support TiddlyWiki 2.0
* v1.0.3 (2005-12-22)
** Features: 
*** Write output to a file supports multi-byte environments (Thanks to Bram Chen) 
*** Provide API to access the forEachTiddler functionality directly through JavaScript (see getTiddlers and performMacro)
** Enhancements:
*** Improved error messages on InternetExplorer.
* v1.0.2 (2005-12-10)
** Features: 
*** context object also holds reference to store (TiddlyWiki)
** Fixed Bugs: 
*** ForEachTiddler 1.0.1 has broken support on win32 Opera 8.51 (Thanks to BrunoSabin for reporting)
* v1.0.1 (2005-12-08)
** Features: 
*** Access tiddlers stored in separated TiddlyWikis through the "in" option. I.e. you are no longer limited to only work on the "current TiddlyWiki".
*** Write output to an external file using the "toFile" option of the "write" action. With this option you may write your customized tiddler exports.
*** Use the "script" section to define "helper" JavaScript functions etc. to be used in the various JavaScript expressions (whereClause, sortClause, action arguments,...).
*** Access and store context information for the current forEachTiddler invocation (through the build-in "context" object) .
*** Improved script evaluation (for where/sort clause and write scripts).
* v1.0.0 (2005-11-20)
** initial version

!Code
***/
//{{{

	
//============================================================================
//============================================================================
//		   ForEachTiddlerPlugin
//============================================================================
//============================================================================

// Only install once
if (!version.extensions.ForEachTiddlerPlugin) {

if (!window.abego) window.abego = {};

version.extensions.ForEachTiddlerPlugin = {
	major: 1, minor: 0, revision: 8, 
	date: new Date(2007,3,12), 
	source: "http://tiddlywiki.abego-software.de/#ForEachTiddlerPlugin",
	licence: "[[BSD open source license (abego Software)|http://www.abego-software.de/legal/apl-v10.html]]",
	copyright: "Copyright (c) abego Software GmbH, 2005-2007 (www.abego-software.de)"
};

// For backward compatibility with TW 1.2.x
//
if (!TiddlyWiki.prototype.forEachTiddler) {
	TiddlyWiki.prototype.forEachTiddler = function(callback) {
		for(var t in this.tiddlers) {
			callback.call(this,t,this.tiddlers[t]);
		}
	};
}

//============================================================================
// forEachTiddler Macro
//============================================================================

version.extensions.forEachTiddler = {
	major: 1, minor: 0, revision: 8, date: new Date(2007,3,12), provider: "http://tiddlywiki.abego-software.de"};

// ---------------------------------------------------------------------------
// Configurations and constants 
// ---------------------------------------------------------------------------

config.macros.forEachTiddler = {
	 // Standard Properties
	 label: "forEachTiddler",
	 prompt: "Perform actions on a (sorted) selection of tiddlers",

	 // actions
	 actions: {
		 addToList: {},
		 write: {}
	 }
};

// ---------------------------------------------------------------------------
//  The forEachTiddler Macro Handler 
// ---------------------------------------------------------------------------

config.macros.forEachTiddler.getContainingTiddler = function(e) {
	while(e && !hasClass(e,"tiddler"))
		e = e.parentNode;
	var title = e ? e.getAttribute("tiddler") : null; 
	return title ? store.getTiddler(title) : null;
};

config.macros.forEachTiddler.handler = function(place,macroName,params,wikifier,paramString,tiddler) {
	// config.macros.forEachTiddler.traceMacroCall(place,macroName,params,wikifier,paramString,tiddler);

	if (!tiddler) tiddler = config.macros.forEachTiddler.getContainingTiddler(place);
	// --- Parsing ------------------------------------------

	var i = 0; // index running over the params
	// Parse the "in" clause
	var tiddlyWikiPath = undefined;
	if ((i < params.length) && params[i] == "in") {
		i++;
		if (i >= params.length) {
			this.handleError(place, "TiddlyWiki path expected behind 'in'.");
			return;
		}
		tiddlyWikiPath = this.paramEncode((i < params.length) ? params[i] : "");
		i++;
	}

	// Parse the where clause
	var whereClause ="true";
	if ((i < params.length) && params[i] == "where") {
		i++;
		whereClause = this.paramEncode((i < params.length) ? params[i] : "");
		i++;
	}

	// Parse the sort stuff
	var sortClause = null;
	var sortAscending = true; 
	if ((i < params.length) && params[i] == "sortBy") {
		i++;
		if (i >= params.length) {
			this.handleError(place, "sortClause missing behind 'sortBy'.");
			return;
		}
		sortClause = this.paramEncode(params[i]);
		i++;

		if ((i < params.length) && (params[i] == "ascending" || params[i] == "descending")) {
			 sortAscending = params[i] == "ascending";
			 i++;
		}
	}

	// Parse the script
	var scriptText = null;
	if ((i < params.length) && params[i] == "script") {
		i++;
		scriptText = this.paramEncode((i < params.length) ? params[i] : "");
		i++;
	}

	// Parse the action. 
	// When we are already at the end use the default action
	var actionName = "addToList";
	if (i < params.length) {
	   if (!config.macros.forEachTiddler.actions[params[i]]) {
			this.handleError(place, "Unknown action '"+params[i]+"'.");
			return;
		} else {
			actionName = params[i]; 
			i++;
		}
	} 
	
	// Get the action parameter
	// (the parsing is done inside the individual action implementation.)
	var actionParameter = params.slice(i);


	// --- Processing ------------------------------------------
	try {
		this.performMacro({
				place: place, 
				inTiddler: tiddler,
				whereClause: whereClause, 
				sortClause: sortClause, 
				sortAscending: sortAscending, 
				actionName: actionName, 
				actionParameter: actionParameter, 
				scriptText: scriptText, 
				tiddlyWikiPath: tiddlyWikiPath});

	} catch (e) {
		this.handleError(place, e);
	}
};

// Returns an object with properties "tiddlers" and "context".
// tiddlers holds the (sorted) tiddlers selected by the parameter,
// context the context of the execution of the macro.
//
// The action is not yet performed.
//
// @parameter see performMacro
//
config.macros.forEachTiddler.getTiddlersAndContext = function(parameter) {

	var context = config.macros.forEachTiddler.createContext(parameter.place, parameter.whereClause, parameter.sortClause, parameter.sortAscending, parameter.actionName, parameter.actionParameter, parameter.scriptText, parameter.tiddlyWikiPath, parameter.inTiddler);

	var tiddlyWiki = parameter.tiddlyWikiPath ? this.loadTiddlyWiki(parameter.tiddlyWikiPath) : store;
	context["tiddlyWiki"] = tiddlyWiki;
	
	// Get the tiddlers, as defined by the whereClause
	var tiddlers = this.findTiddlers(parameter.whereClause, context, tiddlyWiki);
	context["tiddlers"] = tiddlers;

	// Sort the tiddlers, when sorting is required.
	if (parameter.sortClause) {
		this.sortTiddlers(tiddlers, parameter.sortClause, parameter.sortAscending, context);
	}

	return {tiddlers: tiddlers, context: context};
};

// Returns the (sorted) tiddlers selected by the parameter.
//
// The action is not yet performed.
//
// @parameter see performMacro
//
config.macros.forEachTiddler.getTiddlers = function(parameter) {
	return this.getTiddlersAndContext(parameter).tiddlers;
};

// Performs the macros with the given parameter.
//
// @param parameter holds the parameter of the macro as separate properties.
//				  The following properties are supported:
//
//						place
//						whereClause
//						sortClause
//						sortAscending
//						actionName
//						actionParameter
//						scriptText
//						tiddlyWikiPath
//
//					All properties are optional. 
//					For most actions the place property must be defined.
//
config.macros.forEachTiddler.performMacro = function(parameter) {
	var tiddlersAndContext = this.getTiddlersAndContext(parameter);

	// Perform the action
	var actionName = parameter.actionName ? parameter.actionName : "addToList";
	var action = config.macros.forEachTiddler.actions[actionName];
	if (!action) {
		this.handleError(parameter.place, "Unknown action '"+actionName+"'.");
		return;
	}

	var actionHandler = action.handler;
	actionHandler(parameter.place, tiddlersAndContext.tiddlers, parameter.actionParameter, tiddlersAndContext.context);
};

// ---------------------------------------------------------------------------
//  The actions 
// ---------------------------------------------------------------------------

// Internal.
//
// --- The addToList Action -----------------------------------------------
//
config.macros.forEachTiddler.actions.addToList.handler = function(place, tiddlers, parameter, context) {
	// Parse the parameter
	var p = 0;

	// Check for extra parameters
	if (parameter.length > p) {
		config.macros.forEachTiddler.createExtraParameterErrorElement(place, "addToList", parameter, p);
		return;
	}

	// Perform the action.
	var list = document.createElement("ul");
	place.appendChild(list);
	for (var i = 0; i < tiddlers.length; i++) {
		var tiddler = tiddlers[i];
		var listItem = document.createElement("li");
		list.appendChild(listItem);
		createTiddlyLink(listItem, tiddler.title, true);
	}
};

abego.parseNamedParameter = function(name, parameter, i) {
	var beginExpression = null;
	if ((i < parameter.length) && parameter[i] == name) {
		i++;
		if (i >= parameter.length) {
			throw "Missing text behind '%0'".format([name]);
		}
		
		return config.macros.forEachTiddler.paramEncode(parameter[i]);
	}
	return null;
}

// Internal.
//
// --- The write Action ---------------------------------------------------
//
config.macros.forEachTiddler.actions.write.handler = function(place, tiddlers, parameter, context) {
	// Parse the parameter
	var p = 0;
	if (p >= parameter.length) {
		this.handleError(place, "Missing expression behind 'write'.");
		return;
	}

	var textExpression = config.macros.forEachTiddler.paramEncode(parameter[p]);
	p++;

	// Parse the "begin" option
	var beginExpression = abego.parseNamedParameter("begin", parameter, p);
	if (beginExpression !== null) 
		p += 2;
	var endExpression = abego.parseNamedParameter("end", parameter, p);
	if (endExpression !== null) 
		p += 2;
	var noneExpression = abego.parseNamedParameter("none", parameter, p);
	if (noneExpression !== null) 
		p += 2;

	// Parse the "toFile" option
	var filename = null;
	var lineSeparator = undefined;
	if ((p < parameter.length) && parameter[p] == "toFile") {
		p++;
		if (p >= parameter.length) {
			this.handleError(place, "Filename expected behind 'toFile' of 'write' action.");
			return;
		}
		
		filename = config.macros.forEachTiddler.getLocalPath(config.macros.forEachTiddler.paramEncode(parameter[p]));
		p++;
		if ((p < parameter.length) && parameter[p] == "withLineSeparator") {
			p++;
			if (p >= parameter.length) {
				this.handleError(place, "Line separator text expected behind 'withLineSeparator' of 'write' action.");
				return;
			}
			lineSeparator = config.macros.forEachTiddler.paramEncode(parameter[p]);
			p++;
		}
	}
	
	// Check for extra parameters
	if (parameter.length > p) {
		config.macros.forEachTiddler.createExtraParameterErrorElement(place, "write", parameter, p);
		return;
	}

	// Perform the action.
	var func = config.macros.forEachTiddler.getEvalTiddlerFunction(textExpression, context);
	var count = tiddlers.length;
	var text = "";
	if (count > 0 && beginExpression)
		text += config.macros.forEachTiddler.getEvalTiddlerFunction(beginExpression, context)(undefined, context, count, undefined);
	
	for (var i = 0; i < count; i++) {
		var tiddler = tiddlers[i];
		text += func(tiddler, context, count, i);
	}
	
	if (count > 0 && endExpression)
		text += config.macros.forEachTiddler.getEvalTiddlerFunction(endExpression, context)(undefined, context, count, undefined);

	if (count == 0 && noneExpression) 
		text += config.macros.forEachTiddler.getEvalTiddlerFunction(noneExpression, context)(undefined, context, count, undefined);
		

	if (filename) {
		if (lineSeparator !== undefined) {
			lineSeparator = lineSeparator.replace(/\\n/mg, "\n").replace(/\\r/mg, "\r");
			text = text.replace(/\n/mg,lineSeparator);
		}
		saveFile(filename, convertUnicodeToUTF8(text));
	} else {
		var wrapper = createTiddlyElement(place, "span");
		wikify(text, wrapper, null/* highlightRegExp */, context.inTiddler);
	}
};


// ---------------------------------------------------------------------------
//  Helpers
// ---------------------------------------------------------------------------

// Internal.
//
config.macros.forEachTiddler.createContext = function(placeParam, whereClauseParam, sortClauseParam, sortAscendingParam, actionNameParam, actionParameterParam, scriptText, tiddlyWikiPathParam, inTiddlerParam) {
	return {
		place : placeParam, 
		whereClause : whereClauseParam, 
		sortClause : sortClauseParam, 
		sortAscending : sortAscendingParam, 
		script : scriptText,
		actionName : actionNameParam, 
		actionParameter : actionParameterParam,
		tiddlyWikiPath : tiddlyWikiPathParam,
		inTiddler : inTiddlerParam, // the tiddler containing the <<forEachTiddler ...>> macro call.
		viewerTiddler : config.macros.forEachTiddler.getContainingTiddler(placeParam) // the tiddler showing the forEachTiddler result
	};
};

// Internal.
//
// Returns a TiddlyWiki with the tiddlers loaded from the TiddlyWiki of 
// the given path.
//
config.macros.forEachTiddler.loadTiddlyWiki = function(path, idPrefix) {
	if (!idPrefix) {
		idPrefix = "store";
	}
	var lenPrefix = idPrefix.length;
	
	// Read the content of the given file
	var content = loadFile(this.getLocalPath(path));
	if(content === null) {
		throw "TiddlyWiki '"+path+"' not found.";
	}
	
	var tiddlyWiki = new TiddlyWiki();

	// Starting with TW 2.2 there is a helper function to import the tiddlers
	if (tiddlyWiki.importTiddlyWiki) {
		if (!tiddlyWiki.importTiddlyWiki(content))
			throw "File '"+path+"' is not a TiddlyWiki.";
		tiddlyWiki.dirty = false;
		return tiddlyWiki;
	}
	
	// The legacy code, for TW < 2.2
	
	// Locate the storeArea div's
	var posOpeningDiv = content.indexOf(startSaveArea);
	var posClosingDiv = content.lastIndexOf(endSaveArea);
	if((posOpeningDiv == -1) || (posClosingDiv == -1)) {
		throw "File '"+path+"' is not a TiddlyWiki.";
	}
	var storageText = content.substr(posOpeningDiv + startSaveArea.length, posClosingDiv);
	
	// Create a "div" element that contains the storage text
	var myStorageDiv = document.createElement("div");
	myStorageDiv.innerHTML = storageText;
	myStorageDiv.normalize();
	
	// Create all tiddlers in a new TiddlyWiki
	// (following code is modified copy of TiddlyWiki.prototype.loadFromDiv)
	var store = myStorageDiv.childNodes;
	for(var t = 0; t < store.length; t++) {
		var e = store[t];
		var title = null;
		if(e.getAttribute)
			title = e.getAttribute("tiddler");
		if(!title && e.id && e.id.substr(0,lenPrefix) == idPrefix)
			title = e.id.substr(lenPrefix);
		if(title && title !== "") {
			var tiddler = tiddlyWiki.createTiddler(title);
			tiddler.loadFromDiv(e,title);
		}
	}
	tiddlyWiki.dirty = false;

	return tiddlyWiki;
};


	
// Internal.
//
// Returns a function that has a function body returning the given javaScriptExpression.
// The function has the parameters:
// 
//	 (tiddler, context, count, index)
//
config.macros.forEachTiddler.getEvalTiddlerFunction = function (javaScriptExpression, context) {
	var script = context["script"];
	var functionText = "var theFunction = function(tiddler, context, count, index) { return "+javaScriptExpression+"}";
	var fullText = (script ? script+";" : "")+functionText+";theFunction;";
	return eval(fullText);
};

// Internal.
//
config.macros.forEachTiddler.findTiddlers = function(whereClause, context, tiddlyWiki) {
	var result = [];
	var func = config.macros.forEachTiddler.getEvalTiddlerFunction(whereClause, context);
	tiddlyWiki.forEachTiddler(function(title,tiddler) {
		if (func(tiddler, context, undefined, undefined)) {
			result.push(tiddler);
		}
	});
	return result;
};

// Internal.
//
config.macros.forEachTiddler.createExtraParameterErrorElement = function(place, actionName, parameter, firstUnusedIndex) {
	var message = "Extra parameter behind '"+actionName+"':";
	for (var i = firstUnusedIndex; i < parameter.length; i++) {
		message += " "+parameter[i];
	}
	this.handleError(place, message);
};

// Internal.
//
config.macros.forEachTiddler.sortAscending = function(tiddlerA, tiddlerB) {
	var result = 
		(tiddlerA.forEachTiddlerSortValue == tiddlerB.forEachTiddlerSortValue) 
			? 0
			: (tiddlerA.forEachTiddlerSortValue < tiddlerB.forEachTiddlerSortValue)
			   ? -1 
			   : +1; 
	return result;
};

// Internal.
//
config.macros.forEachTiddler.sortDescending = function(tiddlerA, tiddlerB) {
	var result = 
		(tiddlerA.forEachTiddlerSortValue == tiddlerB.forEachTiddlerSortValue) 
			? 0
			: (tiddlerA.forEachTiddlerSortValue < tiddlerB.forEachTiddlerSortValue)
			   ? +1 
			   : -1; 
	return result;
};

// Internal.
//
config.macros.forEachTiddler.sortTiddlers = function(tiddlers, sortClause, ascending, context) {
	// To avoid evaluating the sortClause whenever two items are compared 
	// we pre-calculate the sortValue for every item in the array and store it in a 
	// temporary property ("forEachTiddlerSortValue") of the tiddlers.
	var func = config.macros.forEachTiddler.getEvalTiddlerFunction(sortClause, context);
	var count = tiddlers.length;
	var i;
	for (i = 0; i < count; i++) {
		var tiddler = tiddlers[i];
		tiddler.forEachTiddlerSortValue = func(tiddler,context, undefined, undefined);
	}

	// Do the sorting
	tiddlers.sort(ascending ? this.sortAscending : this.sortDescending);

	// Delete the temporary property that holds the sortValue.	
	for (i = 0; i < tiddlers.length; i++) {
		delete tiddlers[i].forEachTiddlerSortValue;
	}
};


// Internal.
//
config.macros.forEachTiddler.trace = function(message) {
	displayMessage(message);
};

// Internal.
//
config.macros.forEachTiddler.traceMacroCall = function(place,macroName,params) {
	var message ="<<"+macroName;
	for (var i = 0; i < params.length; i++) {
		message += " "+params[i];
	}
	message += ">>";
	displayMessage(message);
};


// Internal.
//
// Creates an element that holds an error message
// 
config.macros.forEachTiddler.createErrorElement = function(place, exception) {
	var message = (exception.description) ? exception.description : exception.toString();
	return createTiddlyElement(place,"span",null,"forEachTiddlerError","<<forEachTiddler ...>>: "+message);
};

// Internal.
//
// @param place [may be null]
//
config.macros.forEachTiddler.handleError = function(place, exception) {
	if (place) {
		this.createErrorElement(place, exception);
	} else {
		throw exception;
	}
};

// Internal.
//
// Encodes the given string.
//
// Replaces 
//	 "$))" to ">>"
//	 "$)" to ">"
//
config.macros.forEachTiddler.paramEncode = function(s) {
	var reGTGT = new RegExp("\\$\\)\\)","mg");
	var reGT = new RegExp("\\$\\)","mg");
	return s.replace(reGTGT, ">>").replace(reGT, ">");
};

// Internal.
//
// Returns the given original path (that is a file path, starting with "file:")
// as a path to a local file, in the systems native file format.
//
// Location information in the originalPath (i.e. the "#" and stuff following)
// is stripped.
// 
config.macros.forEachTiddler.getLocalPath = function(originalPath) {
	// Remove any location part of the URL
	var hashPos = originalPath.indexOf("#");
	if(hashPos != -1)
		originalPath = originalPath.substr(0,hashPos);
	// Convert to a native file format assuming
	// "file:///x:/path/path/path..." - pc local file --> "x:\path\path\path..."
	// "file://///server/share/path/path/path..." - FireFox pc network file --> "\\server\share\path\path\path..."
	// "file:///path/path/path..." - mac/unix local file --> "/path/path/path..."
	// "file://server/share/path/path/path..." - pc network file --> "\\server\share\path\path\path..."
	var localPath;
	if(originalPath.charAt(9) == ":") // pc local file
		localPath = unescape(originalPath.substr(8)).replace(new RegExp("/","g"),"\\");
	else if(originalPath.indexOf("file://///") === 0) // FireFox pc network file
		localPath = "\\\\" + unescape(originalPath.substr(10)).replace(new RegExp("/","g"),"\\");
	else if(originalPath.indexOf("file:///") === 0) // mac/unix local file
		localPath = unescape(originalPath.substr(7));
	else if(originalPath.indexOf("file:/") === 0) // mac/unix local file
		localPath = unescape(originalPath.substr(5));
	else // pc network file
		localPath = "\\\\" + unescape(originalPath.substr(7)).replace(new RegExp("/","g"),"\\");	
	return localPath;
};

// ---------------------------------------------------------------------------
// Stylesheet Extensions (may be overridden by local StyleSheet)
// ---------------------------------------------------------------------------
//
setStylesheet(
	".forEachTiddlerError{color: #ffffff;background-color: #880000;}",
	"forEachTiddler");

//============================================================================
// End of forEachTiddler Macro
//============================================================================


//============================================================================
// String.startsWith Function
//============================================================================
//
// Returns true if the string starts with the given prefix, false otherwise.
//
version.extensions["String.startsWith"] = {major: 1, minor: 0, revision: 0, date: new Date(2005,11,20), provider: "http://tiddlywiki.abego-software.de"};
//
String.prototype.startsWith = function(prefix) {
	var n =  prefix.length;
	return (this.length >= n) && (this.slice(0, n) == prefix);
};



//============================================================================
// String.endsWith Function
//============================================================================
//
// Returns true if the string ends with the given suffix, false otherwise.
//
version.extensions["String.endsWith"] = {major: 1, minor: 0, revision: 0, date: new Date(2005,11,20), provider: "http://tiddlywiki.abego-software.de"};
//
String.prototype.endsWith = function(suffix) {
	var n = suffix.length;
	return (this.length >= n) && (this.right(n) == suffix);
};


//============================================================================
// String.contains Function
//============================================================================
//
// Returns true when the string contains the given substring, false otherwise.
//
version.extensions["String.contains"] = {major: 1, minor: 0, revision: 0, date: new Date(2005,11,20), provider: "http://tiddlywiki.abego-software.de"};
//
String.prototype.contains = function(substring) {
	return this.indexOf(substring) >= 0;
};

//============================================================================
// Array.indexOf Function
//============================================================================
//
// Returns the index of the first occurance of the given item in the array or 
// -1 when no such item exists.
//
// @param item [may be null]
//
version.extensions["Array.indexOf"] = {major: 1, minor: 0, revision: 0, date: new Date(2005,11,20), provider: "http://tiddlywiki.abego-software.de"};
//
Array.prototype.indexOf = function(item) {
	for (var i = 0; i < this.length; i++) {
		if (this[i] == item) {
			return i;
		}
	}
	return -1;
};

//============================================================================
// Array.contains Function
//============================================================================
//
// Returns true when the array contains the given item, otherwise false. 
//
// @param item [may be null]
//
version.extensions["Array.contains"] = {major: 1, minor: 0, revision: 0, date: new Date(2005,11,20), provider: "http://tiddlywiki.abego-software.de"};
//
Array.prototype.contains = function(item) {
	return (this.indexOf(item) >= 0);
};

//============================================================================
// Array.containsAny Function
//============================================================================
//
// Returns true when the array contains at least one of the elements 
// of the item. Otherwise (or when items contains no elements) false is returned.
//
version.extensions["Array.containsAny"] = {major: 1, minor: 0, revision: 0, date: new Date(2005,11,20), provider: "http://tiddlywiki.abego-software.de"};
//
Array.prototype.containsAny = function(items) {
	for(var i = 0; i < items.length; i++) {
		if (this.contains(items[i])) {
			return true;
		}
	}
	return false;
};


//============================================================================
// Array.containsAll Function
//============================================================================
//
// Returns true when the array contains all the items, otherwise false.
// 
// When items is null false is returned (even if the array contains a null).
//
// @param items [may be null] 
//
version.extensions["Array.containsAll"] = {major: 1, minor: 0, revision: 0, date: new Date(2005,11,20), provider: "http://tiddlywiki.abego-software.de"};
//
Array.prototype.containsAll = function(items) {
	for(var i = 0; i < items.length; i++) {
		if (!this.contains(items[i])) {
			return false;
		}
	}
	return true;
};


} // of "install only once"

// Used Globals (for JSLint) ==============
// ... DOM
/*global 	document */
// ... TiddlyWiki Core
/*global 	convertUnicodeToUTF8, createTiddlyElement, createTiddlyLink, 
			displayMessage, endSaveArea, hasClass, loadFile, saveFile, 
			startSaveArea, store, wikify */
//}}}


/***
!Licence and Copyright
Copyright (c) abego Software ~GmbH, 2005 ([[www.abego-software.de|http://www.abego-software.de]])

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.

Neither the name of abego Software nor the names of its contributors may be
used to endorse or promote products derived from this software without specific
prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.
***/

!Format menu
|''bold''|@@highlight@@|
|//italic//|[[hyperlink]]|
|__underline__||
!Greek menu
|{{greek{κλητοι̂ς}}}|{{gkindent{{{gkindent{{{gkindent{κλητοι̂ς}}}}}}}}}|
|{{gkindent{κλητοι̂ς}}}|{{gkindent{{{gkindent{{{gkindent{{{gkindent{κλητοι̂ς}}}}}}}}}}}}|
|{{gkindent{{{gkindent{κλητοι̂ς}}}}}}|{{gkindent{{{gkindent{{{gkindent{{{gkindent{{{gkindent{κλητοι̂ς}}}}}}}}}}}}}}}|
!!Hebrew menu
{{hebrewNoAlign{וַיָּקָם}}}
{{hebrewRightAlign{וַיָּקָם}}}
{{hebAlignAndIndent{וַיָּקָם}}}
{{hebAlignAndIndent{{{hebAlignAndIndent{וַיָּקָם}}}}}}
{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{וַיָּקָם}}}}}}}}}
{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{וַיָּקָם}}}}}}}}}}}}
{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{וַיָּקָם}}}}}}}}}}}}}}}
!Indent menu
{{engindent{Text}}}
{{engindent{{{engindent{Text}}}}}}
{{engindent{{{engindent{{{engindent{Text}}}}}}}}}
{{engindent{{{engindent{{{engindent{{{engindent{Text}}}}}}}}}}}}
{{engindent{{{engindent{{{engindent{{{engindent{{{engindent{Text}}}}}}}}}}}}}}}
!Notes menu
((syntax(add note here))) &#149; ((translation(add note here))) &#149; ((text(add note here))) &#149; ((gram(add note here))) ((Popup: your text here(your popup text here)))
!Color menu
{{red{Red}}} {{blue{Blue}}} {{green{Green}}} {{gold{Gold}}} {{gray{Gray}}} {{magenta{Magenta}}} {{purple{Purple}}} {{teal{Teal}}} {{burgundy{Burgundy}}}
!Highlighting menu
@@bgcolor(#ff6666):Red@@ @@bgcolor(#ccccff):Blue@@ @@Yellow@@ @@bgcolor(#99ff99):Green@@ @@bgcolor(#cc9966):Brown@@ @@bgcolor(#cccc99):Gray@@ @@bgcolor(#ff9933):Orange@@
!Tables menu
Invisible table: {{invisiblecomm{
|!Invisible table header|!Invisible table header|!invisible table header|
|data|data|data|
|data|data|data|
|data|data|data|
}}}
Sortable table:
|sortable|k
|Header1|Header2|Header3|h
|Aa|B3|data7|
|Ab|B2|data2|
|Ac|B1|data8|
Standard table:
|!Header|!Header|!Header|
|data|data|data|
|data|data|data|
|data|data|data|
Table cell colors:
|!Below is a light gray cell|!Below is a dark gray cell|!Below are regular cells|
|bgcolor(#eeeeee):text here|||
||bgcolor(#cccccc):text there||
|||text anywhere|
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><big><big><b><center>
<b>PERL: Beginning Bioinformatics.</b>
<br>
<br></big>
An introduction to scripting<br>
in PERL for biologists<br><br></big>
Fall 2008
<br>
Wednesdays, 1530-1700h, R206/C202
</b></big>
<br>
<br>
Dr. Adam G. Marsh
<br>
<i>645.4367 or amarsh@udel.edu</i>
<br>
<br>
PERL is  a very efficient scripting language for handling text strings, and as such is ideally suited for the manipulation, analysis and formatting of genetic information. This course focuses on introducing biologists to working with PERL code for bioinformatic applications.
<br>
</html>
''GOALS:''  The course is designed to cover an assortment of bioinformatic PERL scripts from Rex Dweyer's [[Genomic PERL|Text Book]]. Participants will become familiar with editing low to mid-complexity scripts so that they will be able to utillize existing script resources in their current research. Progress in the course is not determined by an ability to learn and write code de novo, but by the simple skill to understand what code blocks do and how to edit them to suit one's own purposes.  
!
!!!
[[Back to Working Code|CodeWorks]]
!!!
!Ambiguous Nucleotide Codes
Many nucleotide sequences have ambiguous or unknown base pairs that may appear in a ~GenBank entry. These result from inconclusive or conflicting sequencing data where there is evidence that 2 or more bases were determined at one position. See @@[[AmbiguousNucleotideTable]]@@. So . . . what happens when you try to translate a sequence and extract a codon like ''"ACR"''? 

!!!From Glenn Christman
I encountered two of errors running some of the counting code .... they 
are the Perl equivalent of trying to access a null pointer. [//AGM: This means that the variables do not have any value or are undefined.//]

1) Nucleotide codes such as M, R, Y cause a problem because they are not in the 
codon table.  I solved this using the following in the section that does the 
translation (creating an X amino acid):
{{{
my $aa = $CodonTable{$codon};
# Now check to see if $aa is defined within the codon table
       if (! $aa)                          
       {     print "\nProblem with $header.  See codon $codon.\n\n";
             $aa = "X";
        }
# Now add $aa to protein sequence . . . . . . . . . 
$protein .= $aa;
}}}

2) The errors are also encountered when outputing the frequency table 
whenever there is a blank value in the $AAcount{$name}{$aa} hash table.  I 
solved this by first populating the table with zeros before starting to count:
{{{
foreach my $aa_code (@AA) 
{      foreach my $name_code (keys %PRTs)
	{     $AAprotcount{$name_code}{$aa_code} = 0; }
	$AAgenomecount{$aa_code} = 0;
}
}}}
[[BACK|L06]]
!!!
!~HW2 PLOTS:
The assignment was described on: AAFCstart, with hints on AAFChint1
Student Results:
<html>
<img src="05/hw2-christman.png" style="height:500px">
<img src="05/hw2-eddie.jpg" style="height:500px">
<img src="05/hw2-grim.png" style="height:500px">
<img src="05/hw2-guida.png" style="height:500px">
<img src="05/hw2-gupta.png" style="height:500px">
<img src="05/hw2-hiras.jpg" style="height:500px">
<img src="05/hw2-huang.png" style="height:500px">
<img src="05/hw2-maung.gif" style="height:500px">
<img src="05/hw2-simon.jpg" style="height:500px">
<img src="05/hw2-zhai.png" style="height:500px">
</html>
!Last Learning Experience:

Please name any files you send to me starting with your lastname, a hyphen and a digit referencing the question to which it applies, then any other descriptor you like. Example, if I were submitting an answer to the first question, the file would be: "marsh-1-pieceOcake.txt"
!!!
1. Here's a question on logic control  . . . . . HappyDayAlpha
@@DUE: 5 pm today, 10DEC@@

2. Here's a question on counting nucletides . . . . HappyThursdayBeta
@@DUE: 5 pm Thursday, 11DEC@@

3. Here's a question on analyzing AA and ~NTs  . . . NotSoHappyFridayOmega
@@DUE: 5 pm Friday, 12DEC@@ 
!
[[BACK to Main Exam|HappyDay]]
!!!

! Question 1:
''1.''   Below is a block of code from the script BLASTgrabber. Add a comment for each line to describe what it does. Be sure to specifically on the control logic //while loop//.

{{{
my $orfcount = 1;
foreach my $blast (@BLAST)
{	my @lines = split(/\n/,$blast);
	# Drop all the header lines before the target data . . . . 
	my $start = 0;
	my $skip = 0;
	while ($start == 0)
	{	if ($lines[0] =~ m/^Query/)
		{	print "$orfcount. $lines[0]\n"; $orfcount += 1; }
		
		if ($lines[0] =~ m/^Sequences producing significant alignments/)
		{	$start = 1; }
		elsif ($lines[0] =~ m/No hits found/)
		{	$start = 1; $skip = 1; }
		else
		{	shift(@lines); }
	}
	
	if ($skip == 0)
	{	# Cherry pick the bit and e-value . . . .
		my $count = 2;
		while ($count < 11)
		{	chomp($lines[$count]);
			if ($lines[$count] =~ m/(\d+)   ([\.\d]+).{0,3}$/)
			{	push(@BITS, $1); #print "       >$1<\n";
				push(@EVAL, $2); #print "       >$2<\n";
				$count += 1;
			}	
			else
			{	$count = 12; }
		}
	}
}

}}}

[[BACK to Main Exam|HappyDay]]
!!!
!Question 2:

2. Download the FASTA file of nucleotide sequences in annotated ORFs for the microbe //Anaeromyxobacter//: [[Click Here|12-Final/Anaeromyxobacter_Fw109-5-PID17729-cd95.ffn]]. Write a script that calculates the overall nucleotide frequencies (AGTC) in the coding genes of this delta-proteobacteria. Send me the script along with a copy of the output from your program reporting the frequency values.

!
/***
|Name:|HideWhenPlugin|
|Description:|Allows conditional inclusion/exclusion in templates|
|Version:|3.1 ($Rev: 3919 $)|
|Date:|$Date: 2008-03-13 02:03:12 +1000 (Thu, 13 Mar 2008) $|
|Source:|http://mptw.tiddlyspot.com/#HideWhenPlugin|
|Author:|Simon Baird <simon.baird@gmail.com>|
|License:|http://mptw.tiddlyspot.com/#TheBSDLicense|
For use in ViewTemplate and EditTemplate. Example usage:
{{{<div macro="showWhenTagged Task">[[TaskToolbar]]</div>}}}
{{{<div macro="showWhen tiddler.modifier == 'BartSimpson'"><img src="bart.gif"/></div>}}}
***/
//{{{

window.hideWhenLastTest = false;

window.removeElementWhen = function(test,place) {
	window.hideWhenLastTest = test;
	if (test) {
		removeChildren(place);
		place.parentNode.removeChild(place);
	}
};


merge(config.macros,{

	hideWhen: { handler: function(place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( eval(paramString), place);
	}},

	showWhen: { handler: function(place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( !eval(paramString), place);
	}},

	hideWhenTagged: { handler: function (place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( tiddler.tags.containsAll(params), place);
	}},

	showWhenTagged: { handler: function (place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( !tiddler.tags.containsAll(params), place);
	}},

	hideWhenTaggedAny: { handler: function (place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( tiddler.tags.containsAny(params), place);
	}},

	showWhenTaggedAny: { handler: function (place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( !tiddler.tags.containsAny(params), place);
	}},

	hideWhenTaggedAll: { handler: function (place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( tiddler.tags.containsAll(params), place);
	}},

	showWhenTaggedAll: { handler: function (place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( !tiddler.tags.containsAll(params), place);
	}},

	hideWhenExists: { handler: function(place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( store.tiddlerExists(params[0]) || store.isShadowTiddler(params[0]), place);
	}},

	showWhenExists: { handler: function(place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( !(store.tiddlerExists(params[0]) || store.isShadowTiddler(params[0])), place);
	}},

	hideWhenTitleIs: { handler: function(place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( tiddler.title == params[0], place);
	}},

	showWhenTitleIs: { handler: function(place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( tiddler.title != params[0], place);
	}},

	'else': { handler: function(place,macroName,params,wikifier,paramString,tiddler) {
		removeElementWhen( !window.hideWhenLastTest, place);
	}}

});

//}}}

{{{
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
abcdefghijklmnopqrstuvwxyz
</html>
}}}
/***
|''Name:''|HistoryPlugin|
|''Description:''|Limits to only one tiddler open. Manages an history stack and provides macro to navigate in this history (<<history>><<back>><<forward>>).|
|''Version:''|1.0.0|
|''Date:''|2008-03-23|
|''Source:''|http://tiddlywiki.bidix.info/#HistoryPlugin|
|''Author:''|BidiX (BidiX (at) bidix (dot) info)|
|''[[License]]:''|[[BSD open source license|http://tiddlywiki.bidix.info/#%5B%5BBSD%20open%20source%20license%5D%5D ]]|
|''~CoreVersion:''|2.3.0|
***/
//{{{
	Story.prototype.tiddlerHistory = [];
	Story.prototype.historyCurrentPos = -1;
	Story.prototype.currentTiddler = null;
	Story.prototype.maxPos = 11;

	Story.prototype.old_history_displayTiddler = Story.prototype.displayTiddler;
	Story.prototype.displayTiddler = function(srcElement,title,template,animate,slowly)
	{
		title = ((typeof title === "string") ? title : title.title);
		//SinglePageMode
		if (this.currentTiddler) this.closeTiddler(this.currentTiddler);
		if (template == 2) {
			//switch to Edit mode : don't manage
			story.old_history_displayTiddler(null,title,template,animate,slowly);
			return; 
		}
		// if same tiddler no change
		if (this.tiddlerHistory[this.historyCurrentPos] == title) {
			this.currentTiddler = title;
			story.old_history_displayTiddler(null,title,template,animate,slowly);
			return;
		}
		if (this.historyCurrentPos == this.tiddlerHistory.length -1) {
			// bottom of stack
	    	this.tiddlerHistory.push(title);
		   	if (this.tiddlerHistory.length > 11) {
	                 this.tiddlerHistory.shift();
	       	} else {
		    this.historyCurrentPos += 1;
	            }

		} else {
			// middle of stack
		    this.historyCurrentPos += 1;
			if (this.tiddlerHistory[this.historyCurrentPos] != title) {
				// path change => cut history
				this.tiddlerHistory[this.historyCurrentPos] = title;
				var a = [];
				for(var i = 0; i <= this.historyCurrentPos;i++) {
					a[i] = this.tiddlerHistory[i];
				}
				this.tiddlerHistory = a;
			}
		}
		this.currentTiddler = title;
		story.old_history_displayTiddler(null,title,template,animate,true);
	        scrollTo(0, 1);
	}

	Story.prototype.old_history_closeTiddler = Story.prototype.closeTiddler;
	Story.prototype.closeTiddler = function(title,animate,slowly)
	{
		this.currentTiddler = null;
	    story.old_history_closeTiddler.apply(this,arguments);
	}

	config.macros.history = {};
	config.macros.history.action = function(event) {
	var popup = Popup.create(this);
		if(popup)
			{
	        if (!story.tiddlerHistory.length)
	            createTiddlyText(popup,"No history");
	        else
	           {
	           var c = story.tiddlerHistory.length;
			   for (i=0; i<c;i++ )
	               {
					var elmt = createTiddlyElement(popup,"li");
				   	var btn = createTiddlyButton(elmt,story.tiddlerHistory[i],story.tiddlerHistory[i],config.macros.history.onClick);
					btn.setAttribute("historyPos",i);
			       }
	           }
	        }
		Popup.show(popup,false);
		event.cancelBubble = true;
		if (event.stopPropagation) event.stopPropagation();
		return false;
	}
	config.macros.history.handler = function(place,macroName,params)
	{
		createTiddlyButton(place, 'history', 'history', config.macros.history.action);
	}

	config.macros.history.onClick = function(ev)
	{
		var e = ev ? ev : window.event;
		var historyPos = this.getAttribute("historyPos");
		story.historyCurrentPos = historyPos -1;
		story.displayTiddler(null,story.tiddlerHistory[historyPos]);
		return false;
	};

	config.macros.back = {};
	config.macros.back.action = function() {
	       if (story.historyCurrentPos > 0) {
				if (story.currentTiddler) story.closeTiddler(story.currentTiddler);
				story.historyCurrentPos = story.historyCurrentPos -2;
				story.displayTiddler(null,story.tiddlerHistory[story.historyCurrentPos+1]);
			} else {
				//if (story.currentTiddler) story.old_history_displayTiddler(null,story.currentTiddler);
				};
		return false;
	}
	config.macros.back.handler = function(place,macroName,params)
	{
		createTiddlyButton(place, '<<', 'back', config.macros.back.action,"backButton");
	}

	config.macros.forward = {};
	config.macros.forward.action = function() {
	       if (story.historyCurrentPos < story.tiddlerHistory.length -1) {
				if (story.currentTiddler) story.closeTiddler(story.currentTiddler);
				//story.historyCurrentPos = story.historyCurrentPos;
				story.displayTiddler(null,story.tiddlerHistory[story.historyCurrentPos+1]);
			} else {
				//if (story.currentTiddler) story.old_history_displayTiddler(null,story.currentTiddler);
			}
		return false;
	}
	config.macros.forward.handler = function(place,macroName,params)
	{
		createTiddlyButton(place, '>>', 'forward', config.macros.forward.action, "ibutton");
	}
//}}}
!Independent Exercises:

!!!5. Generate a NULL distribution for BLAST hits against a local database:
''Code Work Assignment #5:'' Due 9 am Wednesday 10 DEC via email to //amarsh@udel.edu//
You will utilize a provided script (ProteinRandomizer) to generate a FASTA file with 100 random proteins based on the nucleotide composition of the local BLAST database you will be searching against. The goal is to identify how frequently "random" matches may arise when using BLAST. The project is described here: @@BLASTproject@@.

!!!4. Profile a 30 KB piece of genomic DNA for coding and non-coding domains:
''Code Work Assignment #4:'' Due 5 pm Friday 31 OCT via email to //amarsh@udel.edu//
Using a Log of Odds score (LOD; see [[Lecture 7|L07]]), you will calaculate a P-value for coding and non-coding sequence and then plot the LOD score against nucleotide position. The project is described here: @@LODprofile@@.

!!!3. Benchmark CPU time for alignments with differeing character lengths:
''Code Work Assignment #3:'' Due 5 pm Friday 10 OCT via email to //amarsh@udel.edu//

I am providing you with a fully function script. You just need to run this script on BIOWOLF, collect the time data, generate a plot, and send the plot to me. The project is described here: @@WordCompareBenchmark@@.


!!!2. Amino Acid Frequency Plot:
''Code Work Assignment #2:'' Due 5 pm Monday 06 OCT via email to //amarsh@udel.edu//

We will continue developing the AA counting scripts you worked on in assignment #1. You've done the calculations, but now you need to start working with the numbers. The project will be developed here: @@AAFreqCode@@.



!!!1. Amino Acid Metrics:
''Code Work Assignment #1:'' Due 5 pm Friday 26 SEP via email to //amarsh@udel.edu//

To start any kind of quantitative comparisons, we need a well defined question or goal to direct and focus our attention and efforts. Here, we are going to look at secondary Amino Acid metrics of sequences. For this exercise, we will address the following:
| @@Within a single genome, are all proteins created equal?@@ |

''See @@[[Lecture 4|L04]]@@ introduction and go to the assignment page: @@AAcount@@''
''//ANSWER:// Here's a code block from E.Maung that fully comments each line of code:@@MaungCode01@@''

!
<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='INTRO' macro='tiddler INTROSubtopicMenu'></div>
<div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div><div class='tagClear'></div>
<!--}}}-->
<<importTiddlers>>
{{{
<html><img src="00/xxxx.png" style="height:300px"></html>
}}}

{{{
<html><table><tr>
<td><img src="00/xxxx.png" style="height:300px"></td>
<td><img src="00/xxxx.png" style="height:300px"></td>
</tr></table></html>
}}}
/***
|Name|InlineJavascriptPlugin|
|Source|http://www.TiddlyTools.com/#InlineJavascriptPlugin|
|Documentation|http://www.TiddlyTools.com/#InlineJavascriptPluginInfo|
|Version|1.9.2|
|Author|Eric Shulman - ELS Design Studios|
|License|http://www.TiddlyTools.com/#LegalStatements <br>and [[Creative Commons Attribution-ShareAlike 2.5 License|http://creativecommons.org/licenses/by-sa/2.5/]]|
|~CoreVersion|2.1|
|Type|plugin|
|Requires||
|Overrides||
|Description|Insert Javascript executable code directly into your tiddler content.|
''Call directly into TW core utility routines, define new functions, calculate values, add dynamically-generated TiddlyWiki-formatted output'' into tiddler content, or perform any other programmatic actions each time the tiddler is rendered.
!!!!!Documentation
>see [[InlineJavascriptPluginInfo]]
!!!!!Revisions
<<<
2008.03.03 [1.9.2] corrected declaration of wikifyPlainText() for 'TW 2.1.x compatibility fallback' (fixes Safari "parse error")
2008.02.23 [1.9.1] in onclick function, use string instead of array for 'bufferedHTML' attribute on link element (fixes IE errors)
2008.02.21 [1.9.0] 'onclick' scripts now allow returned text (or document.write() calls) to be wikified into a span that immediately follows the onclick link.  Also, added default 'return false' handling if no return value provided (prevents HREF from being triggered -- return TRUE to allow HREF to be processed).  Thanks to Xavier Verges for suggestion and preliminary code.
|please see [[InlineJavascriptPluginInfo]] for additional revision details|
2005.11.08 [1.0.0] initial release
<<<
!!!!!Code
***/
//{{{
version.extensions.inlineJavascript= {major: 1, minor: 9, revision: 2, date: new Date(2008,3,3)};

config.formatters.push( {
	name: "inlineJavascript",
	match: "\\<script",
	lookahead: "\\<script(?: src=\\\"((?:.|\\n)*?)\\\")?(?: label=\\\"((?:.|\\n)*?)\\\")?(?: title=\\\"((?:.|\\n)*?)\\\")?(?: key=\\\"((?:.|\\n)*?)\\\")?( show)?\\>((?:.|\\n)*?)\\</script\\>",

	handler: function(w) {
		var lookaheadRegExp = new RegExp(this.lookahead,"mg");
		lookaheadRegExp.lastIndex = w.matchStart;
		var lookaheadMatch = lookaheadRegExp.exec(w.source)
		if(lookaheadMatch && lookaheadMatch.index == w.matchStart) {
			var src=lookaheadMatch[1];
			var label=lookaheadMatch[2];
			var tip=lookaheadMatch[3];
			var key=lookaheadMatch[4];
			var show=lookaheadMatch[5];
			var code=lookaheadMatch[6];
			if (src) { // load a script library
				// make script tag, set src, add to body to execute, then remove for cleanup
				var script = document.createElement("script"); script.src = src;
				document.body.appendChild(script); document.body.removeChild(script);
			}
			if (code) { // there is script code
				if (show) // show inline script code in tiddler output
					wikify("{{{\n"+lookaheadMatch[0]+"\n}}}\n",w.output);
				if (label) { // create a link to an 'onclick' script
					// add a link, define click handler, save code in link (pass 'place'), set link attributes
					var link=createTiddlyElement(w.output,"a",null,"tiddlyLinkExisting",wikifyPlainText(label));
					var fixup=code.replace(/document.write\s*\(/gi,'place.bufferedHTML+=(');
					link.code="function _out(place){"+fixup+"\n};_out(this);"
					link.tiddler=w.tiddler;
					link.onclick=function(){
						this.bufferedHTML="";
						try{ var r=eval(this.code);
							if(this.bufferedHTML.length || (typeof(r)==="string")&&r.length)
								var s=this.parentNode.insertBefore(document.createElement("span"),this.nextSibling);
							if(this.bufferedHTML.length)
								s.innerHTML=this.bufferedHTML;
							if((typeof(r)==="string")&&r.length) {
								wikify(r,s,null,this.tiddler);
								return false;
							} else return r!==undefined?r:false;
						} catch(e){alert(e.description||e.toString());return false;}
					};
					link.setAttribute("title",tip||"");
					var URIcode='javascript:void(eval(decodeURIComponent(%22(function(){try{';
					URIcode+=encodeURIComponent(encodeURIComponent(code.replace(/\n/g,' ')));
					URIcode+='}catch(e){alert(e.description||e.toString())}})()%22)))';
					link.setAttribute("href",URIcode);
					link.style.cursor="pointer";
					if (key) link.accessKey=key.substr(0,1); // single character only
				}
				else { // run inline script code
					var fixup=code.replace(/document.write\s*\(/gi,'place.innerHTML+=(');
					var code="function _out(place){"+fixup+"\n};_out(w.output);"
					try { var out=eval(code); } catch(e) { out=e.description?e.description:e.toString(); }
					if (out && out.length) wikify(out,w.output,w.highlightRegExp,w.tiddler);
				}
			}
			w.nextMatch = lookaheadMatch.index + lookaheadMatch[0].length;
		}
	}
} )
//}}}

// // Backward-compatibility for TW2.1.x and earlier
//{{{
if (typeof(wikifyPlainText)=="undefined") window.wikifyPlainText=function(text,limit,tiddler) {
	if(limit > 0) text = text.substr(0,limit);
	var wikifier = new Wikifier(text,formatter,null,tiddler);
	return wikifier.wikifyPlain();
}
//}}}
!Basic instructions
#Download the file to your hard drive by [[right-clicking and saving the link / target as...|webviewtw.html]] to the filename and location of your choice. Close this page and open your new file.
#Replace the title in the upper left by editing MainMenu.
#Add topics to MainMenu. Click on those topics to create the tiddlers for those topics. To your uninitiated web viewers they will appear to be separate webpages, but you and I know better!
#Edit DefaultTiddlers to include the names of the tiddlers that you want to appear when the ~TiddlyWiki is opened.
#If you want different colorpalettes than the ones provided, check [[here|http://www.giffmex.org/webviewtwexample.html#MoreColorPalettes!]] for more. Just import them from that file to this file.
#If you want to temporarily suspend the single-page-only feature, I recommend the toggle singlepage mode bookmarklet from ~TiddlyTools [[(link here)|http://www.tiddlytools.com/#InstantBookmarklets.]]
#Upload to your site using the UploadPlugin. [[Instructions here|http://www.giffmex.org/twfortherestofus.html#%5B%5BSimple%20instructions%20for%20BidiX's%20UploadPlugin%5D%5D]]
!Lecture 01
<html>
<div style="color: rgb(100, 100, 150); font-family: Monaco;">
<big><big>
<b>PERL</b><br>
</big>
<i>"Practical Extraction Report Language"</i>
</html>
1. Basic Setup: [[Starting PERL|L01.01]]
2. What is a PERL script: [[bare skeleton|L01.02]]
3. File Read/Write: [[test fasta|L01.03]]

!

!Starting PERL
PERL is a scripting language, which means commands are directly interpreted by a program (PERL) that is running on a computer. This PERL "interpreter" has to be present in order for a script to be executed.  
 
1. To run PERL scripts on a computer you will need: To have the PERL interpreter installed/active. You can find out by entering the following into a command window:
{{{
prompt> perl -v
}}}
This should display the current version info of the PERL installed on your machine. IF there is an error mssg, then PERL is not present. You can download it by going to: [[PERL]].

2. To run PERL scripts on a computer you will need: To know the folder location of the interpreter. You can find this by entering the following into a command window:
{{{
prompt> which perl
}}}
The reply (if PERL is installed) will tell you the path and name of the perl interpreter (like "/usr/bin/perl"). You will need to remember this.

3. To run PERL scripts on a computer you will need: To have a good text editor. The script files are simple flat ASCII files that can be edited by any text editor. However, an editor with programming syntax highlighting is strongly recommended: see [[Editors]].

[[BACK|L01]]
!Start from scratch
# Open a command window.
# Navigate to the folder where you now want to work.
# Create a new text file in that folder.
# Open that blank text file in your editor.
# Copy and paste the ''Bare Bones Code Skeleton'' (BoneCode) into your new file.
# Save as "01-~FASTAread.pl"

[[BACK|L01]]
!
!What do you want to do?
@@. For now, 99% of the tasks you will eventually code will look something like this: .@@
<html><img src="01/flowchart.png" style="height:200px"></html>
There is a simple hierarchy of opening/reading a sequence data file or blast output file, parsing that file information to find what you are looking for, and then writing that information to another file for the next step in the analysis. Scripts should be small and TASK focused.

1. Download this sample FASTA file: 
| http://icewater.cms.udel.edu/IntroPerl/01/TestFasta.txt |

2. Copy it to your current working folder.

!Start scripting:
//The finished script we are working on is here: FASTAread//

1. Add header info:
<html><img src="01/code01.png" style="height:75px"></html>

2. Outline the MAIN program:
<html><img src="01/code02.png" style="height:225px"></html>

3. Define the obvious variables:
<html><img src="01/code03.png" style="height:150px"></html>

4. Work with the input code:
<html><img src="01/code04.png" style="height:150px"></html>

5. Work with the parsing code to get the text data into variable data:
<html><img src="01/code05.png" style="height:250px"></html>

6. Do some process to the sequence data:
<html><img src="01/code06.png" style="height:75px"></html>

7. Write new file with the processed sequences:
<html><img src="01/code07.png" style="height:150px"></html>

[[BACK to Lecture 1|L01]]
[[BACK to Lecture 2|L02]]
!

!Lecture 02
<html>
<div style="color: rgb(100, 100, 150); font-family: Monaco;">
<big><big>
<b>PERL</b><br>
</big>
<i>"Practical Extraction Report Language"</i>
</html>
1. Basic Setup: [[Starting PERL|L02.01]]
2. What is a PERL script: [[bare skeleton|L01.02]]
3. Test Script Results: [[output interpretation|Proud Mary]]
4. What is a [[FASTA|L02.02]]?
5. File Read/Write: [[test fasta|L01.03]]

//After lecture, I have added this page for additional clarification: //
6. Annotated and dissected FASTA read/write script: [[Go There|L02.03]]
!
!Get PERL Running

# Quick Review of getting PERL running: [[Running PERL]]
# Everyone should be able to display the //print// text AND be able to navigate through the cmd window to the folder where you want to work.
# Test Script Output:
<html><img src="02/test.png" style="height:100px"></html>
And what about the [[Lucky Number 7?|Proud Mary]]
!!
[[BACK|L02]]
!
!What is a FASTA File

A FASTA file is a simple, standardized DNA/RNA/AA sequence file format.
<html><img src="02/fasta.png" style="height:300px"></html>

# Header line begins with ">" character
# Header line ends with a line break "\n"
# Sequence immediately follows the header line:
## sequence may have fixed line lengths so that there is a "\n" every 60 characters (as in the test file above)
## sequence may be one continuous string with no "\n" line breaks.
## sequence my have a space " " introduced every 10 characters (older format)
# PERL strength: @@Text processing@@
 

[[BACK|L02]]
!
!Script Vivisection
Note: the following script dissection is performed without anaesthesia and is not for those of weak heart. //Abandon hope all ye who enter . . . //
!!!
The fully annotated script is on the page [[FASTAread-NOTED]]. Copy the text in the box and paste it into your editor. In this annotated code, I have added //Print Code Blocks// that look like this:
{{{
	# . . . . . . . . . . . . . . . . . . . . . . .
	# PRINT CHECK A: look at @FILE elements before proceeding
	< perl code perl code perl code perl code perl code perl code>
	# ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
}}}
In the file that I am passing to you, all the print code blocks are commented out. All you have to do to activate one of the blocks is uncomment the actual code lines (leave the upper and lower comment lines still commented out).

1. [[GO TO|PrintBlockA]] Print Block A to look at the initial @FILE entries.
2. [[GO TO|PrintBlockB]] Print Block B to look at separate lines within each entry of @FILE.
3. [[GO TO|PrintBlockC]] Print Block C to look at the sequence length check at the end.

[[BACK|L02]]
!
!Character String Processing:

In Chapter 1 of [[Dwyer's book|Text Book]], he presents a simple example of starting with a DNA sequence, turning it into an RNA sequence (transcription), and then turning that into a protein sequence (translation). In this lecture we will discuss the coding behind translating a DNA sequence into a protein sequence. 

# To prepare for lecture, try to run the [[TRANSLATE|L03.01]] exercise before coming to class.
#  Let's look at [[subroutines]]
#  Let's look at the [[DATA]] section
#  Let's look at 1D-arrays and 2D-table arrays or [[hashes]]
#  Let's look at the actual [[Translate]] process
#  Final Version: all hash - all subroutine [[FASTAtranslate2]]
!
!NT to PROTEIN translation:
[[BACK to Lecture 03|L03]]
''1.'' Download and save a copy of this FASTA test file: [[TestFasta-Amarina.ffn|03/TestFasta-Amarina.ffn]]
''2.'' Paste the code below into your FASTA reader script and replace the ~Lowercase-to-Uppercase code that was there. Then save this file as "02-~FASTAtranslate.pl"
{{{
# TASK 2: Translate Sequence to PROTEIN sequence . . . . . . . . . 
	# - - - - - - - - - - - - - - - - - - - - - - - -
	# A.  Load the AA codon table from end of program.
	#     "DATA" is the default I/O handle for information put there.
	my %CodonTable;
	my @data = split(/\n/,<DATA>);
	foreach my $line (@data)          
	{  	my @codons = split(/ /,$line); # separate on "space" character
		my $AA = shift(@codons);       # $AA= amino acid, then remove from @codon
		foreach my $nnn (@codons) 
		{	$CodonTable{$nnn} = $AA; print ">>> $nnn = $CodonTable{$nnn}\n";}
       }
	
	
	# - - - - - - - - - - - - - - - - - - - - - - - -
	# B. Convert the NT sequence into AAs . . . . . .
	my @Proteins;
	foreach my $seq (@Seqs)
	{	my $protein = "";            # set to "empty" at the start of each loop
		for (my $i=0; $i <= length($seq)-2; $i += 3)  # another FOR-loop structure
		{	my $codon = substr($seq,$i,3);             # $codon = 3 nts at a time
			my $aa = $CodonTable{$codon};       # here's the translation step
			$protein .= $aa;
		}
		push(@Proteins, $protein);
	}
}}}
!!!

''3.'' Now edit the variables in Task 3 so that you are printing the @Peptide data instead of the @Seqs data:
<html><img src="03/Task3.png" style="height:150px"></html>
!!!

''4. '' We are going to add the codon table data to the very end of the script in a special section that can be referenced by the Input/Output handle name "DATA". Here's the actual text that you should paste at the end of your script, followed by a screen pic of how it should look when you are done.
{{{
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# The lines below are not perl statements and are not executed as part of the 
# program.  Instead, they are available to be read as data input by the program
# using the I/O handle name "DATA". This is a default handle name for any data 
# you want to include in a script file.
__END__
A GCU GCC GCA GCG
R CGU CGC CGA CGG AGA AGG
N AAU AAC
D GAU GAC 
C UGU UGC
Q CAA CAG
E GAA GAG
G GGU GGC GGA GGG
H CAU CAC
I AUU AUC AUA
L UUA UUG CUU CUC CUA CUG
K AAA AAG
M AUG
F UUU UUC
P CCU CCC CCA CCG
S UCU UCC UCA UCG AGU AGC
T ACU ACC ACA ACG
W UGG
Y UAU UAC
V GUU GUC GUA GUG
* UAA UAG UGA
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
}}}
<html><img src="03/DATA.png" style="height:400px"></html>
!!!

A clean copy of this code file is also here: FASTAtranslate
!
!Using Comparative Metrics
It's challenging to directly compare two genomes or even just two gene sequences because of the phylogenetic and likely functional separation between each entity. In order to ensure that you are not comparing "apples" to "oranges", you have to ask very specific questions and pay a lot of attention to the method or approach in which you will pursue that question.

We are going to start by looking at some simple amino acid metrics. We know that amino acid composition is important for individual genomes. The figure below plots amino acid frequencies for 367 microbial genomes as a function of the %GC content of each. This lecture focuses on an exercise to count and describe the distribution of amino acids within a genome and then determine a meaningful way to compare those numbers: @@[[AAcount]]@@.  
<html><img src="04/R-aafreq-vs-gc.png" style="height:200px"></html>

!!!Lecture Outline:
# Work on the Amino Acid counting exercise [[AAcount]] before coming to lecture.
# We'll cover some code . . . . 
# We'll cover some metrics  . . . .
# We'll discuss the comparison . . . .

24SEP: Here's the working script to count amino acids: @@AAcount3@@. 
Incorporates code blocks and ideas from AAcount1 and AAcount2.

24SEP (After lecture): Here's the script we built in class: @@AAcount4@@.
Note that the addition of the [[ROUND|Round]] subroutine and how I moved the print statements into a separate subroutine called ''&~ScreenDump'' at the end.

28SEP: Here's a fully commented version of the AAcount exercise: @@MaungCode01@@

!
!01OCT

!!Code Work Assignment:
# Due 5 pm Monday 06 OCT
# AA freq plot from Arabidopsis ~TAIR8 fasta
# @@AAFreqCode@@

!!Comparing Sequences
# Bulk metrics, %gc, aa freqs . . . .
# [[Alignments|L05.1]] 
# [[Similaritiy Scoring|L05.2]]
# [[Reduce computational complexity|L05.3]]

Working Alignment Code for Lecture: @@WordCompare01@@ 

!
[[BACK|L05]]|[[NEXT|L05.1a]]
!!!
!ALIGNMENTS
If two sequences are not identical, then to compare them, we generate an alignment of those sequences. The alignment process is logically clear, but computationally complex.

If we take these two sequences: __''{{{ C A T D O G }}}''__  and  __''{{{ B A T H O G }}}''__, we could quickly ascertain that two "good" alignments would be:
{{{
   Alignment #1:
          C A T D O G
          B A T H O G

   Alignment #2:
          C - A T D - O G
          - B A T - H O G
}}}
 
''What makes a good alignment?''
# Sequence homology: evolutionary relationship allows re-positioning.
# The original order of characters in both sequences must be preserved (linearity of sequence).
# Gaps "-" can be introduced to allow for homologous amino acid positions to be aligned over each other (mutations in sequence).
# The alignment of identical characters is maximized.
# The introduction of gaps into either sequence is minimized.
[[BACK|L05.1]]|[[NEXT|L05.1b]]
!!!
!!ALIGNMENTS: setting up the strings
We first start with the two strings that we want to align:
{{{
my $seq1 = "CATDOG";
my $seq2 = "BATHOG";
}}}
Both sequences contain 6 letters. The minimum similarity alignment we could possible have is:
{{{
     C A T D O G - - - - - - 
     - - - - - - B A T H O G
}}}
So we start by developing a generalized series of sequence-gap patterns, starting with the string: | ''{{{XXXXXX------}}}'' |, where the "X" signifies an amino acid (or nucleotide) and "-" signifies a gap. We will call each sequence pattern a CHAIN of X's and gaps. Here's the code to setup the first CHAIN:
{{{
my @chain;
foreach (1..length($seq1))   {	push(@chain,"x"); }
foreach (length($seq1)+1..$N){	push(@chain,"-"); }
print join('',@chain);
}}}
!
[[BACK|L05.1a]]|[[NEXT|L05.1c]]
!!!
!!ALIGNMENTS: setting up chains
Now the task is to generate all the possible combinations of 6 characters and 6 gaps but keeping the 6 characters in the same order. We will store each chain pattern in an array called @Gaps.
{{{
my @Gaps;
foreach my $i (1..length($seq1))
{	for(my $j = length($seq1)-1; $j>=0; $j -= 1)
	{	foreach my $k ($j..$N-2)
		{   @chain[$k,$i+$k] = @chain[$i+$k, $k];
			my $match = 0;
			my $seq = join('',@chain);
		# Check to see if seq pattern has already been found:
			foreach my $gap (@Gaps)
			{	if ($gap eq $seq){ $match = 1; last;} }
		# Store unique chain patterns in @Gaps:
			if ($match == 0)
			{	push(@Gaps,$seq); }  
		}
	}
}
my $n = $#Gaps +1;
print "There are $n permutations in the gap set\n";
}}}

The output: ''{{{There are 176 permutations in the gap set.}}}''
The elements in @Gap look like:
{{{
xxxxx-x-----
xxxxx--x----
xxxxx---x---
xxxxx----x--
xxxxx-----x-
xxxxx------x
xxxx-x-----x
xxxx--x----x
xxxx---x---x
xxxx----x--x
xxxx-----x-x
xxxx------xx
xxx-x-----xx
xxx--x----xx
xxx---x---xx
xxx----x--xx
xxx-----x-xx
xxx------xxx
xx-x-----xxx
xx--x----xxx
xx---x---xxx
xx----x--xxx
xx-----x-xxx
xx------xxxx
x-x-----xxxx
x--x----xxxx
x---x---xxxx
x----x--xxxx
x-----x-xxxx
x------xxxxx
-x-----xxxxx
--x----xxxxx
---x---xxxxx
----x--xxxxx
-----x-xxxxx
------xxxxxx
-----xx-xxxx
-----xxxx-xx
-----xxxxxx-
-----xxxxx-x
----xx-xxx-x
----xxxx-x-x
----xxxxx--x
----xxxx--xx
----xxxx-xx-
---xx-xx-xx-
---xxxx--xx-
---xxx-x-xx-
---xxx-xxx--
---xxx-xx--x
--xx-x-xx--x
--xxx--xx--x
--xx--xxx--x
--xx-xx-x--x
--xx-xx---xx
--xx-xx--xx-
-xx--xx--xx-
-x-x-xx--xx-
-x-xxx---xx-
-x-xx--x-xx-
-x-xx--xxx--
-x-xx--xx--x
x--xx--xx--x
-xx-x--xx--x
-xx---xxx--x
-xx--xx-x--x
-xx--xx---xx
-xx---x-xxx-
-xx---xxxx--
-xx---xx-x-x
-xx---xx--xx
-xxx---x--xx
-xxxx-----xx
-xxxx--x---x
-xxxx--xx---
-x-xxx-xx---
-x--xxxxx---
-x--xx-xxx--
-x--xx--xxx-
-x--xx---xxx
---xxx---xxx
--xx-x---xxx
--xxx----xxx
--xx--x--xxx
--xx---x-xxx
--xx---xxx-x
--xx---xxxx-
x-x----xxxx-
x----x-xxxx-
x---xx--xxx-
x---xxx-x-x-
x---xxxxx---
x---xxxx---x
x---xxxx--x-
-x--xxxx--x-
---xxxxx--x-
--xx-xxx--x-
--xxx-xx--x-
--xxxxx---x-
--xxxx-x--x-
--xxxx--x-x-
--xxxx--xx--
--xxxx--x--x
--xxxx---x-x
--xxx--xx-x-
--xxx-x--xx-
--x-xxx--xx-
--x-x-xx-xx-
--x-x--xxxx-
--x-x--xx-xx
----xx-xx-xx
----xxx-x-xx
----xx--xxxx
----xx-x-xxx
-x--x--xx-xx
-x-xx---x-xx
-x-x---xx-xx
-x-xx--x--xx
-x-xx-x---xx
-x-xxx----xx
-x-xx----xxx
-x-xx-x--x-x
-x-xx-xx-x--
x-x-x-xx-x--
xx-x--xx-x--
x-xxx-x--x--
x-xx--x-xx--
x-xx-xx-x---
x-xx-x--x-x-
x--x-x-xx-x-
x--x-x-x-x-x
x-x-x--x-x-x
xx-xx--x-x--
xx-xx-x-x---
xx-xx-x--x--
x-x-xxx--x--
x-x--xx-xx--
x-x--x-x-xx-
x-x--x--xxx-
x-x--x--xx-x
x---x-x-xx-x
x--x-x--xx-x
x--xx--x-x-x
x--xx-xx-x--
xx-x--x--xx-
xx-x-x--xx--
xx-x----xx-x
xx-x---xx-x-
xxx---xx-x--
xxxx--x--x--
xxxx---x-x--
xxxx-x-x----
xxxx-x--x---
x--xx-xxx---
x--x--xxxx--
x--x-xxxx---
x--x--xxx-x-
x--x--xx-xx-
x--x---xx-xx
x---x--xx-xx
x--xx--x--xx
x--xx---x-xx
x--xx-x--x-x
---xxxxxx---
-xx-xxx-x---
-xxxxx--x---
-xxx-x--xx--
-xxx---xxx--
-xxx--xxx---
--xxx--xxx--
-xx-x--xxx--
--x-xx-xxx--
---x-xxxxx--
---xx-xxxx--
---xxx-xx-x-
---xxx--x-xx
}}}



!
[[BACK|L05.1b]]|[[BACK to Lecture 5 outline|L05]]
!!!
!!ALIGNMENTS: setting up gap sequence matrices
There are 176 gap-sequence patterns possible for 6 characters and 6 gaps. In aligning two 6 character strings, that equates to 30,976 total alignments.

Alignment is the wrong word to really describe what we are doing with the two sequences. It would be better to call it something like "filtering" or "sorting", because the process is passive in that there is no "aligning" or shifting of sequence characters. We start with all possible potential string results, and then just compare them all to figure out which pair combination is best. Then that one is output as the "ALIGNMENT".

We know have to translate the chain patterns in @Gaps into amino acid patterns specific for each of the target sequences we want to compare:
{{{
# Set Seq array 1 - - - - - - - - - - - 
foreach my $gap (@Gaps)
{	my @x = split(//, $gap);
	my @seq = split(//,$seq1);
	my $gapseq = "";
	foreach my $x (@x)
	{	if ($x =~ m/-/)
		{	$gapseq .= "-"; }
		else
		{	$gapseq .= shift(@seq); }
	}
	push (@Seq1, $gapseq);
	print "$gapseq\n";
}
}}}
So the elements of @Seq1 now have same chain patterns as in @Gaps except that for each "X" in the chain, an amino acid from $seq1 has been substituted:
{{{
CATDO-G-----
CATDO--G----
CATDO---G---
CATDO----G--
CATDO-----G-
CATDO------G
CATD-O-----G
CATD--O----G
CATD---O---G
CATD----O--G
CATD-----O-G
CATD------OG
CAT-D-----OG
CAT--D----OG
CAT---D---OG
CAT----D--OG
CAT-----D-OG
CAT------DOG
CA-T-----DOG
CA--T----DOG
CA---T---DOG
CA----T--DOG
CA-----T-DOG
CA------TDOG
C-A-----TDOG
C--A----TDOG
C---A---TDOG
C----A--TDOG
C-----A-TDOG
C------ATDOG
-C-----ATDOG
--C----ATDOG
---C---ATDOG
----C--ATDOG
-----C-ATDOG
------CATDOG
-----CA-TDOG
-----CATD-OG
-----CATDOG-
-----CATDO-G
----CA-TDO-G
----CATD-O-G
----CATDO--G
----CATD--OG
----CATD-OG-
---CA-TD-OG-
---CATD--OG-
---CAT-D-OG-
---CAT-DOG--
---CAT-DO--G
--CA-T-DO--G
--CAT--DO--G
--CA--TDO--G
--CA-TD-O--G
--CA-TD---OG
--CA-TD--OG-
-CA--TD--OG-
-C-A-TD--OG-
-C-ATD---OG-
-C-AT--D-OG-
-C-AT--DOG--
-C-AT--DO--G
C--AT--DO--G
-CA-T--DO--G
-CA---TDO--G
-CA--TD-O--G
-CA--TD---OG
-CA---T-DOG-
-CA---TDOG--
-CA---TD-O-G
-CA---TD--OG
-CAT---D--OG
-CATD-----OG
-CATD--O---G
-CATD--OG---
-C-ATD-OG---
-C--ATDOG---
-C--AT-DOG--
-C--AT--DOG-
-C--AT---DOG
---CAT---DOG
--CA-T---DOG
--CAT----DOG
--CA--T--DOG
--CA---T-DOG
--CA---TDO-G
--CA---TDOG-
C-A----TDOG-
C----A-TDOG-
C---AT--DOG-
C---ATD-O-G-
C---ATDOG---
C---ATDO---G
C---ATDO--G-
-C--ATDO--G-
---CATDO--G-
--CA-TDO--G-
--CAT-DO--G-
--CATDO---G-
--CATD-O--G-
--CATD--O-G-
--CATD--OG--
--CATD--O--G
--CATD---O-G
--CAT--DO-G-
--CAT-D--OG-
--C-ATD--OG-
--C-A-TD-OG-
--C-A--TDOG-
--C-A--TD-OG
----CA-TD-OG
----CAT-D-OG
----CA--TDOG
----CA-T-DOG
-C--A--TD-OG
-C-AT---D-OG
-C-A---TD-OG
-C-AT--D--OG
-C-AT-D---OG
-C-ATD----OG
-C-AT----DOG
-C-AT-D--O-G
-C-AT-DO-G--
C-A-T-DO-G--
CA-T--DO-G--
C-ATD-O--G--
C-AT--D-OG--
C-AT-DO-G---
C-AT-D--O-G-
C--A-T-DO-G-
C--A-T-D-O-G
C-A-T--D-O-G
CA-TD--O-G--
CA-TD-O-G---
CA-TD-O--G--
C-A-TDO--G--
C-A--TD-OG--
C-A--T-D-OG-
C-A--T--DOG-
C-A--T--DO-G
C---A-T-DO-G
C--A-T--DO-G
C--AT--D-O-G
C--AT-DO-G--
CA-T--D--OG-
CA-T-D--OG--
CA-T----DO-G
CA-T---DO-G-
CAT---DO-G--
CATD--O--G--
CATD---O-G--
CATD-O-G----
CATD-O--G---
C--AT-DOG---
C--A--TDOG--
C--A-TDOG---
C--A--TDO-G-
C--A--TD-OG-
C--A---TD-OG
C---A--TD-OG
C--AT--D--OG
C--AT---D-OG
C--AT-D--O-G
---CATDOG---
-CA-TDO-G---
-CATDO--G---
-CAT-D--OG--
-CAT---DOG--
-CAT--DOG---
--CAT--DOG--
-CA-T--DOG--
--C-AT-DOG--
---C-ATDOG--
---CA-TDOG--
---CAT-DO-G-
---CAT--D-OG
}}}
!
[[BACK|L05]]|[[NEXT|L05.2a]]
!!!
!!ALIGNMENTS: run the score calcs
So we now have both sequences loaded into a "permutation" array of all their possible letter/gap configurations. We then want to compare each of those gap-sequence elements for $seq1 to each and every gap-sequence for $seq2.

Each individual comparison is scored by a metric of our choice to provide a quantitative index of how good or how bad that particular comparison turns out to be. We will start by using a simple scoring algorithm:
# Scoring Sequence Comparisons: Metric version #0.00000001
**  Execute across all positions in the gap-sequence strings 
**  Add +4 points when non-gap characters are identical
**  Add -2 points when non-gap characters are not identical
**  Add -1 points when gap characters are aligned
The algorithm is easy to implement, with @c1 = char array for $seq1 and @c2 = cahr array for $seq2:
<html><img src="05/algoscore.png" style="height:250px"></html>

Full code block for scoring. More string-control code lines than char-scoring code lines:
{{{
# Sequence scoring - - - - - - - - - - - 
my $max = 0;
my ($t1,$t2); 
foreach my $s1 (@Seq1)
{	my @c1 = split(//,$s1);
	foreach my $s2 (@Seq2)
	{	my @c2 = split(//,$s2);
		my $score = 0;
		foreach my $i (0..$#c1)
		{	# Amino Acid matching . . . . 
			if ($c1[$i] ne "-" && $c2[$i] ne "-" ) 
			{	if ($c1[$i] eq $c2[$i])
				{	$score += 4; }
				else
				{	$score -= 2; }
			}
			# Gap penalty . . . . . 
			elsif ($c1[$i] eq "-" && $c2[$i] eq "-" )
			{	$score -= 1; }
		}
		
		if ($score >= $max)
		{	$max = $score;
			$t1 = $s1;
			$t2 = $s2;
		}
		
		if ($score > 8)
		{	print "-------------\n";
			print "score = $score\n";
			print "$s1\n";
			print "$s2\n";	
		}
	}
}

print "\n\n          MAX ALIGNMENT:\n";
print "          score = $max\n";
print "              $t1\n";
print "              $t2\n";
}}}


!
[[BACK|L05.2]]|[[Back to Lecture 5 outline|L05]]
!!!
!!ALIGNMENTS: output results
The code dumps the gap-sequence comparison with the highest score value.
''OUPUT:''
{{{
          MAX ALIGNMENT:
          score = 12
              ---CAT--D-OG
              -B--AT---HOG
}}}
But it is a little more complex than just this one alignment, because there are many comparisons that generated a score of 12 (78 in total):
{{{
-------------
score = 12
---CATD--OG-
-B--AT--HOG-
-------------
score = 12
---CATD--OG-
B---AT--HOG-
-------------
score = 12
---CAT-D-OG-
-B--AT--HOG-
-------------
score = 12
---CAT-D-OG-
B---AT--HOG-
-------------
score = 12
---CAT-D-OG-
--B-ATH--OG-
-------------
score = 12
-CA--TD--OG-
B-A--T-H-OG-
-------------
score = 12
-CA--TD--OG-
B-A--T--HOG-
-------------
score = 12
-C-ATD---OG-
--BAT-H--OG-
-------------
score = 12
-C-AT--D-OG-
--BAT-H--OG-
-------------
score = 12
-C-AT--DOG--
--BATH--OG--
-------------
score = 12
-C-AT--DO--G
--BATH--O--G
-------------
score = 12
C--AT--DO--G
--BATH--O--G
-------------
score = 12
-C-ATD-OG---
B--AT-HOG---
-------------
score = 12
-C--AT--DOG-
---BATH--OG-
-------------
score = 12
-C--AT--DOG-
---BAT-H-OG-
-------------
score = 12
-C--AT--DOG-
--B-ATH--OG-
-------------
score = 12
-C--AT---DOG
---BAT--H-OG
-------------
score = 12
--CAT----DOG
-B-AT---H-OG
-------------
score = 12
--CAT----DOG
-B-AT--H--OG
-------------
score = 12
--CAT----DOG
-B-AT-H---OG
-------------
score = 12
--CAT----DOG
-B-ATH----OG
-------------
score = 12
--CAT----DOG
B--AT--H--OG
-------------
score = 12
--CAT----DOG
B--AT---H-OG
-------------
score = 12
--CA---T-DOG
-B-A---TH-OG
-------------
score = 12
--CA---T-DOG
B--A---TH-OG
-------------
score = 12
C---AT--DOG-
---BATH--OG-
-------------
score = 12
C---AT--DOG-
---BAT-H-OG-
-------------
score = 12
C---AT--DOG-
--B-ATH--OG-
-------------
score = 12
C---ATD-O-G-
---BAT-HO-G-
-------------
score = 12
--CATD--OG--
-B-AT--HOG--
-------------
score = 12
--CATD--O--G
-B-AT--HO--G
-------------
score = 12
--CATD--O--G
B--AT--HO--G
-------------
score = 12
--CATD---O-G
-B-AT-H--O-G
-------------
score = 12
--CATD---O-G
B--AT--H-O-G
-------------
score = 12
--CATD---O-G
B--AT-H--O-G
-------------
score = 12
--CAT-D--OG-
-B-ATH---OG-
-------------
score = 12
--CAT-D--OG-
-B-AT--H-OG-
-------------
score = 12
--C-ATD--OG-
---BAT-H-OG-
-------------
score = 12
--C-ATD--OG-
-B--AT--HOG-
-------------
score = 12
--C-ATD--OG-
B---AT--HOG-
-------------
score = 12
-C-AT---D-OG
--BAT----HOG
-------------
score = 12
-C-AT---D-OG
B--AT--H--OG
-------------
score = 12
-C-A---TD-OG
--BA---T-HOG
-------------
score = 12
-C-AT--D--OG
--BAT----HOG
-------------
score = 12
-C-AT--D--OG
B--AT---H-OG
-------------
score = 12
-C-AT-D---OG
--BAT----HOG
-------------
score = 12
-C-AT-D---OG
B--AT--H--OG
-------------
score = 12
-C-AT-D---OG
B--AT---H-OG
-------------
score = 12
-C-ATD----OG
--BAT----HOG
-------------
score = 12
-C-ATD----OG
B--AT--H--OG
-------------
score = 12
-C-ATD----OG
B--AT---H-OG
-------------
score = 12
-C-AT----DOG
B--AT--H--OG
-------------
score = 12
-C-AT----DOG
B--AT---H-OG
-------------
score = 12
-C-AT-D--O-G
--BATH---O-G
-------------
score = 12
-C-AT-D--O-G
B--AT--H-O-G
-------------
score = 12
C-AT--D-OG--
-BAT-H--OG--
-------------
score = 12
C-AT--D-OG--
-BAT---HOG--
-------------
score = 12
C-A--T-D-OG-
-BA--TH--OG-
-------------
score = 12
C-A--T--DOG-
-BA--TH--OG-
-------------
score = 12
C--AT--D-O-G
--BATH---O-G
-------------
score = 12
C--AT--D-O-G
-B-AT-H--O-G
-------------
score = 12
C--AT-DOG---
-B-ATH-OG---
-------------
score = 12
C--A---TD-OG
--BA---T-HOG
-------------
score = 12
C--AT--D--OG
--BAT----HOG
-------------
score = 12
C--AT--D--OG
-B-AT---H-OG
-------------
score = 12
C--AT--D--OG
-B-AT-H---OG
-------------
score = 12
C--AT--D--OG
-B-ATH----OG
-------------
score = 12
C--AT--D--OG
-B-AT----HOG
-------------
score = 12
C--AT---D-OG
--BAT----HOG
-------------
score = 12
C--AT---D-OG
-B-AT--H--OG
-------------
score = 12
C--AT---D-OG
-B-AT-H---OG
-------------
score = 12
C--AT---D-OG
-B-ATH----OG
-------------
score = 12
C--AT---D-OG
-B-AT----HOG
-------------
score = 12
C--AT-D--O-G
--BATH---O-G
-------------
score = 12
-CAT-D--OG--
B-AT--H-OG--
-------------
score = 12
-CAT---DOG--
B-AT--H-OG--
-------------
score = 12
---CAT-DO-G-
B---ATH-O-G-
-------------
score = 12
---CAT--D-OG
-B--AT---HOG
}}}

!
[[BACK|L05]]
!!!
!Computational Complexity
If you had to compare two 100-aa long sequences, there are ~10^^58^^ possible gap-sequence strings that would have to be scored.

Ideally, you want to "direct" the scoring process to only consider the most likely alignments, instead of ALL of the possible alignments. The minimum similarity alignment we could possible have in our example is:
{{{
     C A T D O G - - - - - - 
     - - - - - - B A T H O G
}}}
But do we need to expend CPU time to actually execute this comparison?

So efforts at reducing the computational tasks of string comparisons geared toward limiting the initial set of possibilities that are considered. 

You should have an appreciation for the sophistication of programs like BLAST in its high-throughput ability for string manipulation. The actual scoring algorithms for these alignment programs are not the complex part of the code. It is the string "pre-processing" that gives them their real power.

!
!Comparing Sequences Faster
# ''Amino Acid Frequencies''
## Plots of AA freqs: [[HW2plots]]
## Reduce complexity of comparison . . . [[L06.01]]
## Less noise in analysis . . . . [[L06.01plots]]
# ''Benchmarking code''
## Getting time intervals for code execution: [[L06.02]]
## Running Code on Biowolf:
*** Home Work #3: WordCompareBenchmark
*** Example Plots: WordCompareBenchmarkPlots
## Running faster code: [[L06.02b]]
# ''~Needleman-Wunsch Alignment Algorithm''
## From Dweyer's Chapter 3
## Working Code . . . . [[NeedlemanWunschAlign]]
## Simple scoring matrix . . . [[L06.03]]
## Scoring Mismatches 
***  ~CAT-XXXX-DOG . . . [[L06.03b]]
***  ~CAT-DOG-HOUSE . . . [[L06.03c]]
***  Metal-dependent ATP binding domains . . . [[L06.03d]]
## Bioinformatic algorithms mostly optimize code execution

!
[[BACK to Lecture 6|L06]]
!!!
!Comparison Complexity:
Rather than comparing ALL proteins simultaneously, perhaps relationships among AA usage might be more readily detectable looking at a subset of specific proteins.
New script to filter the AA Frequency file that has been generated to select just specific proteins: @@[[AAfreqCalcsScreen]]@@
!!!Filter Criterion:
{{{
# - - - - - - - - - - - - -  - - - - 
# HERE'S THE FILTER LOOP . . . . . . 
if ($protein =~ m/NBS/)
{
	print OUT "$protein\n";
	$count += 1;
}
# HERE'S THE FILTER LOOP . . . . . . 
# - - - - - - - - - - - - -  - - - - 
}}}
!!!Script Output:
{{{
My footsteps are ticking
Like water dripping from a tree
Walking a harline
And stepping very carefully. . . . 

There are 125 proteins in "Arabidopsis-TAIR8-NT-cd95-Filter1008.txt" 

DONE   
}}}



[[BACK to Lecture 6|L06]]
!!!
!!RESULTS:
<html>
<img src="06/AAfreq-corrgram-NBS125.png" style="height:600px"><br>
<img src="06/AAfreq-AvL-NBS125.png" style="height:600px">
</html>
[[BACK to Lecture 6|L06]]
!!!
!Timing Code Execution
It is very important before you run any analysis that you have an expectation of what the results may be ''AND'' how long it will take for the computer to get those results.
{{{
# Declare the package:
        use Benchmark;

# Start timing by declaring a new variable
        my $Time0 = new Benchmark;

# Get difference between START time and NOW time:
        &Time;

# Here's the subroutine
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TIME
{   my $t1 = new Benchmark;
    my $td = timediff($t1, $Time0);
    print "\n(Time for code execution :",timestr($td),")\n";
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
}}}

!

[[BACK to Lecture 6|L06]]
!!!
!Running Faster Code:
The biggest limitation on code execution time is the recombinatorial diversity of all potential alignments. Given $seq1 has length ''m'' and $seq2 has length ''n'', then the running time of our present script is going to be proportional to:  ''n^^2^^ * m^^2^^'' . 

//This is really bad for any program. If execution time is proportional to an exponential function of the size of the input data, then you are screwed. You will never have enough computer time to finish a REAL, biological analysis!!!// 

So we often have to look for ways to reduce the computational complexity. They are logical shortcuts to reduce CPU time overhead, but without sacrificing any accuracy of the work.
{{{
# Alignment options: . . . . . 
	# decimal percent of identical matches
		my $MinIdentityLimit = 0.0;
	# decimal percent of how much sequence MUST be aligned
		my $OVERlap = 0.0;       
}}}
So by simple altering the these two variables, we can largely reduce the scope of the computational problem:
{{{
# Just using default zero values:
There are 261 permutations for "CATDOG"    using "xxxxxx----------"
There are 477 permutations for "BATPIGDOG" using "xxxxxxxxx-------"
------------------------------
1. Score = 35
    |-CAT---DOG|
    |B-ATPIGDOG|
------------------------------
2. Score = 35
    |C-AT---DOG|
    |-BATPIGDOG|

(Time for code execution : . . . 232.46 CPU)

# Now using MinIdentityLimit at 50% and sequence overlap at 50%:
There are 177 permutations for "CATDOG"    using "xxxxxx------"
There are 123 permutations for "BATPIGDOG" using "xxxxxxxxx---"
------------------------------
1. Score = 35
    |-CAT---DOG|
    |B-ATPIGDOG|
------------------------------
2. Score = 35
    |C-AT---DOG|
    |-BATPIGDOG|

(Time for code execution :  . . . 7.05 CPU)
}}}
|@@''Here we have changed the algorithm without changing the calculation''@@|
[[BACK to Lecture 6|L06]]
!!!
!Scoring Matrix
Rather than generating individual gap-sequence-chains for separate alignment scores, the dynamic scoring table or matrix generates this information on the fly by a simple character to character comparison. 
{{{
my $seq1 = "CATDOG";
my $seq2 = "CATDOG";
}}}
|   | - | C | A | T | D | O | G |
| - |bgcolor(yellow): 0.0 | -0.5 | -1.0 | -1.5 | -2.0 | -2.5 | -3.0 |
| C | -0.5 |bgcolor(yellow): 1.0 | 0.5 | 0.0 |-0.5 | -1.0 | -1.5 |
| A | -1.0 | 0.5 |bgcolor(yellow): 2.0 | 1.5 | 1.0 | 0.5 | 0.0 |
| T | -1.5 | 0.0 | 1.5 |bgcolor(yellow): 3.0 | 2.5 | 2.0 | 1.5 |
| D | -2.0 | -0.5 | 1.0 | 2.5 |bgcolor(yellow): 4.0 | 3.5 | 3.0 |
| O | -2.5 | -1.0 | 0.5 | 2.0 | 3.5 |bgcolor(yellow): 5.0 | 4.5 |
| G | -3.0 | -1.5 | 0.0 | 1.5 | 3.0 | 4.5 |bgcolor(yellow): 6.0 |
<html><img src="06/Mscore-CATDOG.png" style="height:500px"></html>

!

[[BACK to Lecture 6|L06]]
!!!
!CATXXXXDOG
{{{
my $seq1 = "CATXXXXDOG";
my $seq2 = "CATDOG";
}}}
|   | - | C | A | T | D | O | G |
| - |bgcolor(yellow): 0.0 | -0.5 | -1.0 | -1.5 | -2.0 | -2.5 | -3.0 |
| C | -0.5 |bgcolor(yellow): 1.0 | 0.5 | 0.0 |-0.5 | -1.0 | -1.5 |
| A | -1.0 | 0.5 |bgcolor(yellow): 2.0 | 1.5 | 1.0 | 0.5 | 0.0 |
| T | -1.5 | 0.0 | 1.5 |bgcolor(yellow): 3.0 | 2.5 | 2.0 | 1.5 |
| X | -2.0 | -0.5 | 1.0 | 2.5 | 2.0 | 1.5 | 1.0 |
| X | -2.5 | -1.0 | 0.5 | 2.0 | 1.5 | 1.0 | 0.5 |
| X | -3.0 | -1.5 | 0.0 | 1.5 | 1.0 | 1.5 | 0.0 |
| X | -3.5 | -2.0 | -0.5 | 1.0 | 0.5 | 0.0 | -0.5 |
| D | -4.0 | -2.5 | -1.0 | 0.5 |bgcolor(yellow): 2.0 | 1.5 | 1.0 |
| O | -4.5 | -3.0 | -1.5 | 0.0 | 1.5 |bgcolor(yellow): 3.0 | 2.5 |
| G | -5.0 | -3.5 | -2.0 | -0.5 | 1.0 | 2.5 |bgcolor(yellow): 4.0 |
<html><img src="06/Mscore-CATXDOG.png" style="height:500px"></html>
[[BACK to Lecture 6|L06]]
!!!
!~CAT-DOG-HOUSE
{{{
my $seq1 = "CATDOGHOUSE";
my $seq2 = "BATHOGBIRDHOUSE";
}}}

''OUTPUT:''
{{{
Needleman-Wunsch Dynamic Programming Table Alignment:
      Similarity score: 5
      Alignment: 
                 CATDOG----HOUSE
                 BATHOGBIRDHOUSE

    * * * D O N E * * *
}}}

Here's the scoring MATRIX:<br>
<html><img src="06/Mscore-plot1.png" style="height:500px"></html>
Adding CHAR labels and contour lines. The best alignment sequences are 'built' by following the reverse path through the matrix. The MAX similarity score will ALWAYS by near the bottom right.
<html><img src="06/Mscore-plot3.gif" style="height:500px"></html>
!
[[BACK to Lecture 6|L06]]
!!!
!ATP Binding Domain Alignment

Reconsider the the CPU computational time of the brute force approach: WordCompareBenchmarkPlots

{{{
my $seq1 = "LWLKKATLEFTRSRKSMSVCCTSTEDARIHSLFVKGAPEEILKRCTRIM";
my $seq2 = "KWKKEFTLEFSRDRKSMSAYCFPASGGSGAKMFVKGAPEGVLGRCTHVR";
}}}

{{{
Needleman-Wunsch Dynamic Programming Table Alignment:
      Similarity score: 7
      Alignment: 
                 LWLKK-ATLEFTRSRKSMS-VC-CTS-TEDARIHSLFVKGAPEEILKRCT--RIM
                 KW-KKEFTLEFSRDRKSMSAYCFPASGGSGA---KMFVKGAPEGVLGRCTHVR--

(Time for code execution : 0 wallclock secs ( 0.03 usr +  0.00 sys =  0.03 CPU))

    * * * D O N E * * *
}}}

<html><img src="06/Mscore-ATP.png" style="height:500px"></html>

!

!Predictive Statistics for comparing sequences

''Dweyer, //Genomic Perl//, Chapter 4, pp. 44-48''
* In this section, Dweyer does a very good summary of two important concepts in sequence analysis: 
** ~Log-of-Odds scoring
** Sequence Entropy
* You may not be familiar with the set or probability notations used in the text, but they are very simple concepts that you should be able to understand from the narrative that is provided.
* In this lecture, after going over the midterm code, I will just introduce these concepts with a few additional examples.
!!Focus Topics:
#  [[Entropy|Entropy01]]
#  [[NT frequencies|Entropy02]]
#  [[Log of Odds|LOD]]
#  [[Sequence Entropy|Entropy03]]

!
!Alignment Scoring
Topics for Lecture 8:
# ''Homework 4:'' [[L08.01]]
# ''BLAST Alignment:'' [[L08.02]]
# ''BLAST on Biowolf:'' [[L08.03]]

!
[[Back to Lecture 8|L08]]
!!!
!Profiling LOD scores
Assignment asked you to utilize a moving window to generate a local domain word score.
Working script: @@LODprofiling@@

!!!Plot output:
<html><img src="08/LODprofile-plot1.png" style="height:150px"></html>

!!!Coding Domain Start:
<html><img src="08/LODprofile-plot1-ORF1.png" style="height:300px"></html>

!!!Profile ~DNA2:
<html><img src="08/LODprofile-DNA2-021.png" style="height:300px"></html>

!!!Profile ~DNA2 with larger ~121nt window:
<html><img src="08/LODprofile-DNA2-121.png" style="height:300px"></html>

!
[[Back to Lecture 8|L08]]
!!!
!BLAST
From Dweyer: Chapter 7, "//Local Alignment and the BLAST Heurestic//"

The ~Needleman-Wunsch algorithm is a "global" alignment of two sequences. That means that every position in the first query sequence is considered in the final alignment against every position in a target sequence.

The ~Smith-Waterman algorithm performs a "local" alignment of two sequences by considering just sub-components of the query and target sequences. It essentially uses the NW algorithm approach, but just adds some substring processing before running the actual alignments. 

For the midterm exam, we wrote a small "BLAST"-like script to take one query sequence and find the best alignment in a subset of Arabidopsis ~ORFs.  Now we can break this task up into smaller alignment tasks using the substring function: (see script MidTermBlast-SW)
{{{
my $seq2 = $PRTs{$protein_name};	
my $max = -9999;
foreach my $i (1..length($seq1))
{	foreach my $j (0..length($seq1)-$i)
	{	foreach my $k (0..length($seq2)-$i)
		{	
#-------------------------------------------
my $score = &Similarity(substr($seq1,$j,$i),substr($seq2,$k,$i));
if ($score > $max)
{	$max = $score;
	print "\n\n---------------------------------\n";
	print "      Alignment: $protein_name\n      Score= $score\n";
	foreach my $x (&Alignment(substr($seq1,$j,$i),substr($seq2,$k,$i))) 
	{	print "                 ",$x,"\n"; }
}
#-------------------------------------------
		}# end foreach $k
		print " . ";
	}# end foreach $j
}# end foreach $i
}}}
[[Back to Lecture 8|L08]]
[[Back to Lecture 9|L09.01]]
!!!
!Running BLAST on Biowolf
The form of the BLAST command is straight forward:
{{{blastall -p blastp -d nr -i $IN-filename -o $OUT-filename -e 1.0 -m 1 }}}

| blastall | calls the BLAST package |
| -p | option selects an individual program: here "blastp" |
| -d | option selects the database for the target sequences: here nr=non-redundant |
| -i | option for input file of query sequences in fasta format |
| -o | option for output file for BLAST results |
| -e | option for threshold value for significant alignments |
| -m | option for specific output format |
| -a | option to require BLAST to run an a quad core node |


!!!Shell Script
{{{
#!/bin/sh
#$ -cwd
#$ -S /bin/sh
#$ -j y
#$ -pe threaded 4
#$ -M amarsh@udel.edu
#$ -m bae
#$ -N Blast

blastall -p blastp -d nr -i UnknownSeqs.faa -o Blastout.txt -e 1.0 -m 1

}}}

!!!Query sequences:
[[Download the query sequence fasta file here.|08b/UnknownSeqs.faa]]
!BLAST

''A.'' In [[Lecture 08|L08]], we quickly summarized the [[Smith-Waterman algorithm|L08.02]] as a component of a BLAST-like package and quickly reviewed how to run [[BLAST on Biowolf|L08.03]].

''B.'' Polished ~Smith-Waterman BLAST: [[LocalBLAST]]
The final alignment scores are highly dependent upon the gap penalty that is utilized in the algorithm:
{{{
********************************************
 PROTEIN: AT1G53260.1 unnamed protein product 

Max Score= 4.5
                 -LVS-KIIELRP
                 DL-SHKIKEL-P
********************************************
}}}

''C.'' What is BLAST?  . . . . [[L09.01]]

''D.'' What is a "good" gap penalty? Next coding assignment . . . [[L09.02]]

''E.'' Statistics of BLAST database searches . . . . . [[L09.03]]

''E.'' CPAN: PERL module updater, LWP for http access . . . . [[L09.04]]



!
[[Back to Lecture 9|L09]]
!!!
!What is BLAST?
# ''B''asic ''A''lignment ''S''earch ''T''ool
#  ''BLAST'' is a heurestic scoring procedure to rapidly screen for potential sequence matches, which are then scored by a quantitative similarity algorithm, like the Smith-Waterman approach. BLAST is so much faster than the SW algorithm because a lot of preprocessing of the sequences is done to identify the regions of comparison that are the most likely to generate significant alignments. The SW algorithm runs blindly in that it compares every sequence position to every other one.
##  BLAST first assumes that high-value alignments WILL contain one or more high-scoring matches of 3 letter words (or sequence substrings).
## BLAST next assumes that high-value alignments WILL not contain gaps around these high-scoring 3 letter words.
## WHY do these assumptions work?
# Running BLAST: 
##  [[BLAST on Biowolf|L08.03]]
## Batch running BLAST jobs: [[BLASTrunner]]
### Look at the 10 different output format options . . . . 



!
[[Back to Lecture 9|L09]]
!!!
!Optimizing GAP penalties?
How does one assess the optimal gap penalty for an alignment?
Using the LocalBLAST script, one can vary the gap penalty, but the real question is what would you look for (measure) as a function of the gap penalty? What sequences would you compare?
{{{
# With $GAP = -0.5 . . . . . . 
Max Score= 4.5
                 -LVS-KIIELRP
                 DL-SHKIKEL-P

# With $GAP = -0.1 . . . . . . 
Max Score= 7.9
                 LV----S-KI-IELRP-SIVSSR
                 LVSEDLSHKIK-EL-PK--V---
}}}
One approach would be to look at the distribution of alignment scores . . . . 
<html><table><tr>
<td><img src="09/plotgap01b.png" style="height:300px"></td>
<td><img src="09/plotgap05b.png" style="height:300px"></td>
</tr></table></html>

[[Back to Lecture 9|L09]]
!!!
!BLAST Statistics

Chapter 8, Dweyer's //Genomic PERL//

Essentially, given any query sequence, one wants to know what is the probability that a "match" or alignment would be obtained for that sequence in a database of random sequences. Note that the calculation is not based on "possibility" but on probability.

Dweyer presents an excellent summary of the probability statistics behind the BLAST e-values. But probability math is not necessarily tangible for most biologists. 

BLAST output . . . 
{{{
Sequences producing significant alignments:         (bits) Value
ref|YP_463218.1| DNA replication and repair protein    250   3e-65
ref|ZP_02172167| DNA replication and repair protein    235   1e-60
ref|YP_00137720| DNA replication and repair protein    180   3e-44
. . . . . . 
}}}

# To summarize . . . . 
##  Alignments are scored for information content which is expressed in bits. 
###  We've discussed [[Entropy|Entropy03]]
###  Chapter 4, p. 47 in Dweyer
## A high BIT score means that each position in the alignment is //informative// in terms of distinguishing from a random alignment. 
## The e-value essentially gives an estimate of the probability that you would observe that BIT score when matching your query sequence against a random database. 
## So the BIT score really sets the probability threshold.
# 1e-15:
## p = 0.3 x 10^^-6^^ that the  match IS RANDOM
## OR . . . . 
## p = 99.997% that this match IS NOT RANDOM. 

|''How could you use a BIT scoring probability to assess the most effective gap penalty?''|

!
[[Back to Lecture 9|L09]]
!!!
!Installing PERL modules:

There is an internal module manager for PERL that you is called ''CPAN'' = 
{{engindent{//Central Perl Archive Network//}}}

You start this manager with the command: {{{prompt> cpan}}}.

You will have to go through an initial install routine to configure the CPAN engine. Just accept any defaults you may be prompted for. 

Once CPAN is running you will want to install the ''LWP'' module for internet communication protocols. Now there are several ways this could happen depending upon your platform. If you don't want to go through with this installation on your laptop, don't worry, the LWP module is on Biowolf so you can run all your http scripts there on your class account. 

With the LWP module installed, it just requires a simple ''get'' function call to retrieve the contents of a web page.
{{{
#!/usr/bin/perl
use strict;
use LWP::Simple;
# - - - - - H E A D E R - - - - - - - - - - - - - - - - -
# Simple use of LWP module to grab the contents of
#   a url and save it to disk.

# - - - - - U S E R    V A R I A B L E S - - - - - - - -
my $URL = "http://www.sciencemag.org/magazine.dtl";

# - - - - - M A I N - - - - - - - - - - - - - - - - - - - -
print "Opening $URL\n";
my $WebPage = get($URL);
open(OUT,">ScienceMag.html");
print OUT $WebPage;
close(OUT);

print "\n\n*** DONE ***\n\n";
# - - - - - EOF - - - - - - - - - - - - - - - - - - - - - -
}}}
!
!BLAST

# [[Stat overview from NCBI|NCBIblast]]
# [[Review in Genome Biology 2001|BLASTpdf]]
# [[Parsing BLAST output|BLASTparse]]
# [[Generating a local, custom BLAST database|BLASTdb]]
# [[Next coding assignment . . . . |BLASTrandom]]

!
!!Select a genome FFN:
[[Aeromonas hydrophila|04/genomes/Aeromonas_hydrophila_ATCC_7966-PID16697-cd95.ffn]]
[[Archaeoglobus fulgidus|04/genomes/Archaeoglobus_fulgidus-PID104-cd95.ffn]]
[[Bacillus subtilis|04/genomes/Bacillus_subtilis-PID76-cd95.ffn]]
[[Bartonella quintana|04/genomes/Bartonella_quintana_Toulouse-PID44-cd95.ffn]]
[[Chlamydophila pneumoniae|04/genomes/Chlamydophila_pneumoniae_AR39-PID247-cd95.ffn]]
[[Colwellia psychrerythraea|04/genomes/Colwellia_psychrerythraea_34H-PID275-cd95.ffn]]
[[Cyanobacteria bacterium|04/genomes/Cyanobacteria_bacterium_Yellowstone_A-Prime-PID16251-cd95.ffn]]
[[Haloarcula marismortui|04/genomes/Haloarcula_marismortui_ATCC_43049-PID105-cd95.ffn]]
[[Methanococcus jannaschii|04/genomes/Methanococcus_jannaschii-PID102-cd95.ffn]]
[[Nitrobacter hamburgensis|04/genomes/Nitrobacter_hamburgensis_X14-PID13473-cd95.ffn]]
!!!
[[BACK|AAcount]]
!
[[BACK to Lecture 7|L07]]
!!!
!Log of Odds Ratios
From the NT example we just covered ([[Entropy02]]), the chance that you would pull out a G and then an A from a random assortment of coding nucleotides is: p(G)*p(A) = 0.28 * 0.25 = 0.070. But for the family of noncoding sequences, p(G)*p(A) = 0.38 * 0.15 = 0.568. This kind of comparison is the logic that is used to ascertain whether $~QuerySeq is a member of SEQ~~code~~ or SEQ~~nocode~~. 

We can use these probability values to calculate the //''log likelihood''// of $~QuerySeq being a member of a family of sequence models for either "coding" and for "noncoding" sequences, which is equivalent to the simple probability of $~QuerySeq being selected at random from a distribution of sequences with known frequencies of G, A, T and C:

//L//(SEQ~~model-X~~|$~QuerySeq)  = //P//($~QuerySeq|SEQ~~model-X~~)

!!!Write Script to Calculate the probabilies: 
Given:
$~QuerySeq = "GACTAATAATGACGCTAGCTAGCTAGCTAGCATTATATAGGCGATATCAG";

Then:
//P//($~QuerySeq|SEQ~~code~~)   =   p(G)~~code~~ * p(A)~~code~~ * .... * p(G)~~code~~

//P//($~QuerySeq|SEQ~~nocode~~)   =   p(G)~~nocode~~ * p(A)~~nocode~~ * .... * p(G)~~nocode~~

@@[[Script Template|QuickEntropy]]@@
@@[[Working Script From Class|LODscore]]@@

!!!RESULTS:
//L//(SEQ~~code~~|$~QuerySeq)  = 4.0488 x 10^^-31^^

//L//(SEQ~~nocode~~|$~QuerySeq)  = 4.3225 x 10^^-35^^

!!!LOD score:
The chance of $~QuerySeq being a piece of coding sequence instead of noncoding sequence is calculated as the simple ratio of the two probabilities above:

//P~~code~~//  /  //P~~nocode~~//    =    4.0488 x 10^^-31^^  /  4.3225 x 10^^-35^^  =  ''9,366.80''

The odds of $~QuerySeq being coding and not noncoding are: +9,000 : 1. 
''This is called the __log likelihood ratio__, or __log of odds__, or __lod score__.''

!
__''HOMEWORK:''__ Send your PERL script and two xy plots (one for each DNA sample) by 5 pm Friday 31 OCT via email to //amarsh@udel.edu//. All files need to begin with your last name so I can keep them organized in one folder. 

!Log of Odds Scoring Profile
The background information necessary for this assignment can be found in Chapter 4 of Dweyer's //Genomic PERL// and in [[Lecture 7|L07]]. 
| Given two unknown genomic DNA sequences, each 30 KB in length, plot the LOD score for coding vs. non-coding across the length of those sequences. |

''Approach:''  In class, I worked with an example sequence named $~QuerySeq that was 50 nt. Here we have a piece of DNA that is 30,000 nt. We do not want to do a //P// calc for code and non-code sequence for ALL nts at once because there are a mix of coding and noncoding domains in that sequence. So we want to run the calculation as a moving "window" across the 30 KB sequence. 

For example, start with a window size of 21 nts. For each i^^th^^ nt position, we could use the substring function to grab a query sequence from i-10 to i+10 so that you have a 21 nt sequence centered on the i^^th^^ position. Calculate the LOD score for this position, then move to the next position (i+1) and repeat. 

The idea is that at every nt position you calculate a ''local'' LOD score of the surrounding nucleotides. Then plotting this score against nt position, you will be able to isolate positional shifts between coding and noncoding blocks of sequence. Here's an example using the test sequence data __"~DNAunknown-01.txt"__:   
<html><img src="08/LODprofile-plot1.png" style="height:200px"></html>

!Getting Started:
''1.''  The probability script we worked on in class for Lecture 7 has been organized for you and posted here: @@LODscore@@. Note that the //P-value// calculation has been put into a subroutine that you call by passing a sequence (here $~QuerySeq) and a hash array of NT frequencies (here \%Pcode [see ArrayRef for info on why the "\" is used here]). For this exercise we don't need to change the &Pcalc subroutine around.
| {{{ $P = &Pcalc($QuerySeq,\%Pcode); }}} |

''2.''  Download the 2 sequences: [[DNA1|08/DNAunknown-01.txt]] and [[DNA2|08/DNAunknown-02.txt]]
Write your script to just handle one sequence at a time. ~DNA1 is a test in which there are large differences in coding and noncoding nt freqs (see plot above). ~DNA2 is more realistic with much smaller differences in nt freqs. These text files just have sequence in them. Nothing else. No headers. No line breaks. So the best way to read one of them into your script would be as follows:
{{{
open(IN,"<$SeqFile");
my $SEQ = <IN>;
close(IN);
my $N = length($SEQ);
print "There are $N nucleotides in \"$SeqFile\"\n\n";
}}}

''3.'' You will need to change the NT frequencies for coding vs. noncoding domains for each of the unknown DNA sequences above. The coding and noncoding nt freq are different for the two DNAs. Just write the script to handle one DNA sequence at a time.
|!|>|!  ~DNA1 |!|>|!  ~DNA2 |
|!|! Coding |! Noncoding |! * |! Coding |! Noncoding |
| A | 0.1513 | 0.4056 || 0.2495 | 0.2466 |
| G | 0.3598 | 0.1673 || 0.2710 | 0.2611 |
| T | 0.1353 | 0.3158 || 0.2116 | 0.2860 |
| C | 0.3536 | 0.1112 || 0.2680 | 0.2063 |
Here's the code to setup the values for ~DNA1:
{{{
# Nucleotide Frequencies: p(A), p(G), p(T), p(C)
	my @Fcode = (0.1513, 0.3598, 0.1353, 0.3536);
	my @Fnot  = (0.4056, 0.1673, 0.3158, 0.1112);
}}}

''4.'' Now think about what you want at the end of the program. You will need an array of LOD score values, so let's put them into a regular array called ''@SCORE'', where $SCORE[10] will hold the LOD value for the sequence window from 0 to 20 of $SEQ. That's a 21 nt sequence block, with 10 nts on either side of position index #10 (which is actually the 11th element in the array). For the actual calculation of the LOD score, you will have something like this (note the ''log'' function in PERL returns the natural log, ln, by default):
{{{
$SCORE[$index] = log($Pcoding/$Pnotcoding);
}}}

!!!Working Script:
The full script is now posted here: @@LODprofiling@@
More results presented as part of Lecture 8: @@[[L08.01]]@@
! 
!Profiling Script
This is one possible solution to homework assignment #4: @@LODprofile@@
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# Homework Assignment #4.
# Use the LOD calculation to profile the likelihood
#     that a local sequence domain is an ORF.

# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $SeqFile = "DNAunknown-01.txt";

# Nucleotide Frequencies: p(A), p(G), p(T), p(C)
	my @Fcode = (0.1513, 0.3598, 0.1353, 0.3536);
	my @Fnot  = (0.4056, 0.1673, 0.3158, 0.1112);

my $WINDOW = 21;

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @NT = qw | A G T C |;
my %Pcode;
my %Pnot;
my $P;
foreach my $i (0..3)
{	$Pcode{$NT[$i]} = $Fcode[$i]; $Pnot{$NT[$i]} = $Fnot[$i]; }
my @SCORE;

# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nRUNNING . . . . \n\n";

open(IN,"<$SeqFile");
my $SEQ = <IN>;
close(IN);
my $N = length($SEQ);
print "There are $N nucleotides in \"$SeqFile\"\n\n";

my $W = $WINDOW;
foreach (0..$N-1){$SCORE[$_] = 0;}

foreach my $pos (0..$N-$WINDOW)
{	my $index = $pos + int(($W-1)/2);
	my $query = substr($SEQ,$pos,$W);
	my $Pcod = &Pcalc($query,\%Pcode);
	my $Pnot = &Pcalc($query,\%Pnot);
	$SCORE[$index] = log($Pcod/$Pnot);
}


open(OUT,">DNAunknown-01-profile.txt");
foreach my $i (4950..5050)   # (0..$#SCORE)
{	my $nt = substr($SEQ,$i,1);
	print OUT "$nt\t$SCORE[$i]\n";
}
close(OUT);


print "\n\n   DONE   \n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Pcalc
{	my ($QS,$Pvalues) = @_;
	my @QS = split(//,$QS);
	my $prob = 1; 
	foreach my $nt (@QS)
	{	$prob = $prob * ${$Pvalues}{$nt}; }
	return $prob;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - EOF - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
}}}
!Calc LOD script
Here's the script that we basically worked on in class. I have 'organized' it a little better by putting the actual P calc into a subroutine call. 
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# 22OCT Lecture 7.
# Quick LOD Calculation

# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $QuerySeq = "GACTAATAATGACGCTAGCTAGCTAGCTAGCATTATATAGGCGATATCAG";

# Nucleotide Frequencies: p(A), p(G), p(T), p(C)
	my @Fcode = (0.25, 0.28, 0.21, 0.26);
	my @Fnot  = (0.15, 0.38, 0.13, 0.34);

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @NT = qw | A G T C |;
my %Pcode;
my %Pnot;
my $P;
foreach my $i (0..3)
{	$Pcode{$NT[$i]} = $Fcode[$i]; $Pnot{$NT[$i]} = $Fnot[$i]; }

# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nRUNNING . . . . \n\n";

$P = &Pcalc($QuerySeq,\%Pcode);
print "P(QS|Coding Seq) = $P\n";

$P = &Pcalc($QuerySeq,\%Pnot);
print "P(QS|NonCoding Seq) = $P\n";


print "\n\n   DONE   \n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Pcalc
{	my ($QS,$Pvalues) = @_;
	my @QS = split(//,$QS);
	my $prob = 1; 
	foreach my $nt (@QS)
	{	$prob = $prob * ${$Pvalues}{$nt}; }
	return $prob;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - EOF - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
}}}
<html>
<div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Index of Weekly Lectures 
</html>
1. [[Introduction; File Input/Output |L01]]
2. [[Read/Edit/Write Files |L02]]
3. [[String Manipulation: Protein Translation |L03]]
4. [[Comparing Genes/Genomes: Sequence Metrics |L04]]
5. [[Comparing Sequences|L05]]
6. [[Comparing Sequences: Needleman-Wunsch |L06]]
| Midterm Exam: @@MidTerm@@ |
7. [[Statistical Models for examining sequences |L07]]
8. [[BLAST: Smith-Waterman |L08]]
9. [[BLAST alignment scoring |L09]]
10. [[BLAST parser & database |L10]]
11. [[BLAST null distributions |BLASTproject]]
| @@[[FINAL EXAM|HappyDay]]@@ |
!!
/***
|''Name:''|LegacyStrikeThroughPlugin|
|''Description:''|Support for legacy (pre 2.1) strike through formatting|
|''Version:''|1.0.2|
|''Date:''|Jul 21, 2006|
|''Source:''|http://www.tiddlywiki.com/#LegacyStrikeThroughPlugin|
|''Author:''|MartinBudden (mjbudden (at) gmail (dot) com)|
|''License:''|[[BSD open source license]]|
|''CoreVersion:''|2.1.0|
***/

//{{{
// Ensure that the LegacyStrikeThrough Plugin is only installed once.
if(!version.extensions.LegacyStrikeThroughPlugin) {
version.extensions.LegacyStrikeThroughPlugin = {installed:true};

config.formatters.push(
{
	name: "legacyStrikeByChar",
	match: "==",
	termRegExp: /(==)/mg,
	element: "strike",
	handler: config.formatterHelpers.createElementAndWikify
});

} //# end of "install only once"
//}}}
!!!
[[BACK to Working Code|CodeWorks]]
!!!
!Load Codon Table for Translation:
There are 19 codon translation tables that [[GenBank|http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1]] currently has on file. This subroutine has the two sets that you will most likely need included as part of the {{{ DATA "__END__"}}} section. You need to edit the user variable $codontable to specify which one you want to load. Use "Standard" for all eukaryote genomes and "Bacteria" for all prokaryote genomes.  

{{{
    # Declare this global HASH array:
          my %CodonTable;

    # Declare this user variable:
    #       where tablename is the first word on the header line
    #       of the codon table you want to use.
    #       Use "Standard" for eukaryotes, "Bacteria" for prokaryotes

         my $codontable = "---tablename--";  

    # Call the subroutine with the variable $codontable:
	&LoadCodons($codontable);

# ---------------------------------------------------------
sub LoadCodons
{
	$/=">";
	my $Table = shift(@_);
	my @TABLE = <DATA>;
	foreach my $j (@TABLE)
	{	if ($j =~ m/^ (\d){1,2} $Table/)
		{	my @k = split(/\n/,$j);
			$k[1] =~ s/Amino  //;
			foreach my $i (1..3)
			{	$k[$i+1] =~ s/Base$i  //; }
			my @AA = split(//,$k[1]);
			my @B1 = split(//,$k[2]);
			my @B2 = split(//,$k[3]);
			my @B3 = split(//,$k[4]);
			foreach my $i (0..63)
			{	$CodonTable{$B1[$i].$B2[$i].$B3[$i]} = $AA[$i]; }
		}
	}
	# foreach my $nnn (keys %CodonTable)
	# {  print "$nnn = $CodonTable{$nnn}\n";}
	$/="\n";   # reset back to default before leaving subroutine
}
# ---------------------------------------------------------
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# The lines below are not perl statements and are not executed as part of the 
# program.  Instead, they are available to be read as data input by the program
# using the I/O handle name "DATA". This is a default handle name for any data 
# you want to include in a script file.
__END__
> 0 Codon Translation Tables
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1
> 1 Standard
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
> 11 Bacteria and Archea
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG

}}}
[[Back to Lecture 9|L09]]
!!!
!LocalBLAST
Here's a full implementation of the ~Smith-Waterman algorithm for generating local alignment similarity scores.
{{{
#!/usr/bin/perl
use strict;
$|=1;      # forces print output to be sent to screen in real-time

# - - - - - H E A D E R - - - - - - - - - - - - - - - - -
# Local BLAST alignment scoring with Smith-Waterman algorithm
# See Chapter 7 in Dweyer, "Genomic Perl"

# - - - - - U S E R    V A R I A B L E S - - - - - - - -
my $infile = "Arabidopsis-midterm-NT-fasta-08.ffn";  
my $TargetSeq = "LVSKIIELRPSIVSSRN";
my $codontable = "Standard";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @FILE;        # input array to hold file contents
my %NTs;         # Hash-Array to hold each orf name & sequence
my %CodonTable;  # Hash-Array for codons
my %PRTs;        # Hash-array to hold protein sequences
my @M;           # alignment matrix; filled by similarity scores
my $gap = -0.5;  # gap penalty

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nI don't know Toto, but I don't think we are in Kansas anymore:\n\n\n";

# 1. Process FASTA and make protein sequences . . . . 
	&ReadFasta($infile);
	&LoadCodons($codontable);
	&TranslateFasta;

# 2. Iteratively execute the similarity scoring subroutine: 
	my $max = -99999;
	my $alignment;
	my $seq1 = $TargetSeq;
	foreach my $protein_name (keys %PRTs)
	{	print "\n********************************************\n";
		my $seq2 = $PRTs{$protein_name};
		my $max = -9999;
		foreach my $i (1..length($seq1))
		{	foreach my $j (0..length($seq1)-$i)
			{	foreach my $k (0..length($seq2)-$i)
				{	
					#-------------------------------------------
					my $score = &SWblast(substr($seq1,$j,$i),substr($seq2,$k,$i));
					#-------------------------------------------
					if ($score > $max)
					{	$max = $score;
						print " * ";
						$alignment = "";
						foreach my $x (&Alignment(substr($seq1,$j,$i),substr($seq2,$k,$i))) 
						{	$alignment .= "                 ".$x."\n";	}
					}
					#-------------------------------------------
				} # end foreach $k
				print " . ";
			} # end foreach $j
		} # end foreach $i
		
		print "\n********************************************\n";
		print " PROTEIN: $protein_name\n\nMax Score= $max\n";
		print $alignment;

	} # end foreach $protein_name

print "\n\n    * * * D O N E * * *\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		$NTs{$name} = $seq;
	}
	$/="\n"; # reset input break character
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TranslateFasta
{	# A.  Load the AA codon table from end of program.
	my @data = split(/\n/,<DATA>);
	foreach my $line (@data)          
	{  	my @codons = split(/ /,$line); # separate on "space" character
		my $AA = shift(@codons);       # $AA= amino acid, then remove from @codon
		foreach my $nnn (@codons) 
		{	$nnn =~ s/U/T/g;
			$CodonTable{$nnn} = $AA; 
			print ">>> $nnn = $CodonTable{$nnn}\n";
		}
    }
	
	# B. Convert the NT sequence into AAs . . . . . .
	foreach my $header (keys %NTs)
	{	my $protein = "";            # set to "empty" at the start of each loop
		for (my $i=0; $i <= length($NTs{$header})-2; $i += 3)  # another FOR-loop structure
		{	my $codon = substr($NTs{$header},$i,3);             # $codon = 3 nts at a time
			my $aa = $CodonTable{$codon};       # here's the translation step
			$protein .= $aa;
		}
		$PRTs{$header} = $protein;
	}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub SWblast 
{	# Smith-Watterman Similarity for LOCAL alignment
    my($s,$t) = @_;  # sequences to be aligned.
	my $SWbest;
    foreach my $i (1..length($s)) 
	{	foreach my $j (1..length($t)) 
		{	my $p =  &ID(substr($s,$i-1,1),substr($t,$j-1,1));
			$M[$i][$j] = &MAX($M[$i-1][$j] + $gap, $M[$i][$j-1] + $gap, $M[$i-1][$j-1] + $p);
			$SWbest = &MAX($SWbest,$M[$i][$j]);
		}
    }
    return ($SWbest);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ID 
{  # call &ID(char1,char2)
    my ($aa1, $aa2) = @_;
    return ($aa1 eq $aa2)?1:-1;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub MAX
{	# find max value
	# call &MAX(default value, other values . . . )
	my ($m,@l) = @_;
    foreach my $x (@l) { $m = $x if ($x > $m); }
    return $m;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Alignment
{	# call &Alignment(seq1,seq2)
    my ($s,$t) = @_;  ## sequences to be aligned.
    my ($i,$j) = (length($s), length($t));
    return ( "-"x$j, $t) if ($i==0);
    return ( $s, "-"x$i) if ($j==0);
    my ($sLast,$tLast) = (substr($s,-1),substr($t,-1));
    
    if ($M[$i][$j] == $M[$i-1][$j-1] + &ID($sLast,$tLast)) 
	{ ## Case 1: last letters are paired in the best alignment
		my ($sa, $ta) = &Alignment(substr($s,0,-1), substr($t,0,-1));
		return ($sa . $sLast , $ta . $tLast );
    } 
	elsif ($M[$i][$j] == $M[$i-1][$j] + $gap) 
	{ ## Case 2: last letter of the first string is paired with a gap
		my ($sa, $ta) = &Alignment(substr($s,0,-1), $t);
		return ($sa . $sLast , $ta . "-");
    } 
	else 
	{ ## Case 3: last letter of the 2nd string is paired with a gap
		my ($sa, $ta) = &Alignment($s, substr($t,0,-1));
		return ($sa . "-" , $ta . $tLast );
    }
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# ---------------------------------------------------------
sub LoadCodons
{
	$/=">";
	my $Table = shift(@_);
	my @TABLE = <DATA>;
	foreach my $j (@TABLE)
	{	if ($j =~ m/^ (\d){1,2} $Table/)
		{	my @k = split(/\n/,$j);
			$k[1] =~ s/Amino  //;
			foreach my $i (1..3)
			{	$k[$i+1] =~ s/Base$i  //; }
			my @AA = split(//,$k[1]);
			my @B1 = split(//,$k[2]);
			my @B2 = split(//,$k[3]);
			my @B3 = split(//,$k[4]);
			foreach my $i (0..63)
			{	$CodonTable{$B1[$i].$B2[$i].$B3[$i]} = $AA[$i]; }
		}
	}
	# foreach my $nnn (keys %CodonTable)
	# {  print "$nnn = $CodonTable{$nnn}\n";}
	$/="\n";   # reset back to default before leaving subroutine
}
# ---------------------------------------------------------
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# The lines below are not perl statements and are not executed as part of the 
# program.  Instead, they are available to be read as data input by the program
# using the I/O handle name "DATA". This is a default handle name for any data 
# you want to include in a script file.
__END__
> 0 Codon Translation Tables
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1
> 1 Standard
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
> 11 Bacteria and Archea
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


}}}

! 
{{menubox2{A Biologist's}}}{{menubox3{Intro Perl}}}
<<search>>
!OUTLINE:
''FrontPage''
[[1. What's New?|What's New?]]
[[2. Syllabus|Syllabus]]
[[3. Lectures|Lecture Index]]
[[4. Resources|Resource Index]]
[[5. Working Code|CodeWorks]]
[[6. Home Work|HomeWork]]
!!
{{tuduSlider{<<slider chkToolbox Toolbox 'Toolbox »'>>}}}
!AGM Tools:
[[Setup|Welcome to the Webview TiddlyWiki]]
[[SubTopics|ConfigSubTopics]]
[[Insert Figure|InFig]]
[[Emphasis Text|TextMonaco]]
!!























<!--{{{-->
<link rel='alternate' type='application/rss+xml' title='RSS' href='index.xml'/>
<!--}}}-->

<style type="text/css">#contentWrapper {display:none;}</style>
<div id="SplashScreen" style="border: 3px solid #ccc; display: block; text-align: center; width: 320px; margin: 100px auto; padding: 50px; color:#000; font-size: 28px; font-family:Tahoma; background-color:#eee;">
<b>MAST667-011</b> Intro PERL for biologists <br><br>
<span style="font-size: 14px; color:red;">
<blink><i>l o a d i n g  . . .</i></blink><br><br>
</span>
</div>
!Emily Maung's Comments for Code 01
If you have any questions about what is happening in the code lines for assignment #1, here is a script version that is HEAVILY commented and provides a lot of detailed explanation.
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - - - -
### EMaung-2008
### Goals for this program:
###		I. Count 
###			A. total amino acids in each protein
###			B. individual amino acids in each protein				
###				- Amino Acid frequency composition = B/A
###		II. Calculate 
###			A. Frequency values for individual amino acids among proteins
###			B. Frequency values for individual amino acids within proteins 
###To acheive goals must: 1. Read in FASTA file (sub ReadFasta) 2. convert from NT->AA->protein and then 3. count AAs (in individual proteins and across all proteins) then 4. calculate frequencies  

# - - - - - U S E R    V A R I A B L E S - - - - - - 
my $infile = "cyanobacteria.ffn";  # user edited input fasta file - CYANOBACTERIA GENOME FILE

# - - - - - G L O B A L  V A R I A B L E S  - - - - -
my @FILE;        # input array to hold file contents
my %NTs;         # Hash-Array to hold each orf name & sequence
my %CodonTable;  # Hash-Array for codons
my %PRTs;        # Hash-array to hold protein sequences


my %AAprotcount; 	### hash	
my %AAgenomecount;	### hash
my @AA = qw |A C D E F G H I K L M N P Q R S T V W Y X |;	###  nice way to fill in array without having to use quotation marks around each array member

# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\nOne way to get the most out of life (and programming) is to look upon it as an adventure\n\n"; ###made it this far

print "\nAdventure 1 begins: Must read in FASTA file\n\n";
	&ReadFasta($infile);	### This subroutine will read in the cyanobacteria FASTA file  (defined above as cyanobacteria.ffn)

print "\nRead-in was a success! on to Adventure 2: Translation from codons->AA\n\n";
	&TranslateFasta;	### This subroutine: (1) loads codon->AA table located after EOF, (2) Clumps 3 nucleotides  together into a codon from file we read-in, translates them into Amino Acids (3) strings the amino acids into proteins

print "\nFile successfully translated! Adventure 3: Calculate frequencies\n\n";
	&AAfreq; ### This subroutine calculates frequencies of amino acids

print "\nSuccessfully calculated frequencies! Adventure 4 at last: Averaging frequencies\n\n";
	&Average;

print "\nOur Programming adventure is complete! yay!!!\n\n";
	

# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0]; 
	$/=">";  ### input divded up by the ">" symbol (each '>' symbol denotes a different protein name)
	open(FASTA,"<$file") or die "\n\n\nFile did not open\n\n\n"; ###opens file into place called FASTA; tells user if unable to do so
	@FILE=<FASTA>;  ### contents of "FASTA" dumped into array @FILE
	close(FASTA); ### close the FASTA
	shift(@FILE); ### removes the first element of @FILE 
	foreach my $orf (@FILE)		### does the following loop for each element of array @FILE 
	{	my @Lines = split(/\n/,$orf);	### split up $orf into the elements of array @Lines according to the presence of new line characters (\n)
		my $name = $Lines[0];	### $name is set equal to first member in array @Lines (because referring to specific element and not whole array, we use $ and not @ to refer to it)
		my $seq = "";	### set $seq equal to blank space character
		foreach my $i (1..$#Lines) ### start loop at 1 and continue loop until we reach terminal member of Lines
		{	$seq .= $Lines[$i]; } ### $seq is all $Lines glued together
		$seq =~ s/>//;
		$NTs{$name} = $seq; ### creates long string of nucleotides for each protein ($NTs{$name}; where '$name' = name of the protein)
	}
	###$/="\n"; 		#### SHOULD reset input break back to default, BUT messes up my Protein-cyanobacteria.ffn file for some reason. makes all amino acids = A instead - no idea WHY!!!
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TranslateFasta
{	# A.  Load the AA codon table from end of program.
	my @data = split(/\n/,<DATA>);	### reads in <DATA> (i.e. codon table); splits up the codon table line by line  (based on encountering the "newline" character
	foreach my $line (@data)     ###  each member of the array @data is actually a line from the codon table (thus, we call it "$line" and we do loop for every line of codon table)     
	{  	my @codons = split(/ /,$line); 	### breaks up $line into pieces every time compiler finds "space" character,;each piece called "@codons"
		my $AA = shift(@codons);       ### $AA= amino acid = first member from @codons is set equal to $AA ($AA=the first single letter, represents AA name); leftover = the multiple codons which code for that amino acid
		foreach my $nnn (@codons)  ###  does loop for each codon set (i.e. leftovers)
		{	$nnn =~ s/U/T/g;	### convert @codons read in from table from RNA->DNA
			$CodonTable{$nnn} = $AA; ### creates hash table (so that computer recognises that multiple codons ($CodonTable{GCU/GCC/...}) are all equal a specific amino acid name ($AA, i.e. A)
			### print ">>> $nnn = $CodonTable{$nnn}\n";     ### prints out ">>> codon1 = amino acid name (newline)" does so for each pass of loop (therefore each line of codon table)
		}
    }
	
	# B. Convert the NT sequence into AAs . . . . . .
	foreach my $header (keys %NTs)
	{	my $protein = "";      ### set $protein to "empty" at the start of each loop
		for (my $i=0; $i <= length($NTs{$header})-2; $i += 3)  # another FOR-loop structure
		{	my $codon = substr($NTs{$header},$i,3);   ### $codon = 3 nucleotides ($NTs) at a time; these nucleotides came from the FASTA file we read-in
			my $aa = $CodonTable{$codon};       ### here's the translation step; for each pass of loop, equates $aa to the 3 nucleotides we just glued together in previous step ($codon )
			$protein .= $aa;	### with each pass of the loop tacks current value of $aa to the end of $protein (i.e. $protein grows in length by adding amino acids; $protein = string of amino acids)
		}
		$PRTs{$header} = $protein;	### with each pass of loop, a member of the hash table $PRTs is equal to the protein we just made ($protein); at end of subroutine will have a hash table ($PRTs) filled with these stings of glued together amino acids (which are called $protein)
	}
}

# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub AAfreq
{   
	foreach my $name (keys %PRTs)	### do loop for all proteins in the hash $PRTs; with each pass of loop, set $name=member of $PRTs (i.e. $name=specific protein from $PRTs hash)
	{  ### Initialize counts (for $AAcount and for all members of the 2-D array $AAprotcount) to zero
	     my $AAcount = 0; 		### set our counter for number of amino acids ($AAcount) to zero
		foreach my $a (@AA)		### for each pass of this loop, will set  $a equal to an element of the list/array @AA, all of which are the abbrev. names of amino acids (e.g. A, D, V, etc.)  
		{	$AAprotcount{$name}{$a} = 0; }		### sets member of the hash table ($AAprotcount) to zero; member = $AAprotcount{name of protein}{abbrev. name of amino acid}; thus hash table members specify an amino acid within a particular protein
		
		# split the protein sequence into an array of AA characters
		my @aminoacids = split(//,$PRTs{$name}); ###  splits up each protein in the hash $PRTs on the space character; sets each element in the array @aminoacids equal to an amino acid from the glued together sting we previously called $protein
		foreach my $aa (@aminoacids) 		###  set member from array @aminoacids (i.e. individual amino acids)  equal to $aa with each pass of loop 
		{	# make sure we are only counting AAs
			if ($aa =~ m/[ACDEFGHIKLMNPQRSTVWYX]/)     ### do the following lines (all of which deal with incremental counters)  iff we are working with amino acids (i.e. A,C,D,E...Y, X)
			{	$AAprotcount{$name}{$aa} += 1;   ### 2-D array to keep track of abundance of a specific amino acid ($aa) within a particular protein ($name)
				$AAgenomecount{$aa} += 1; 	### keeps track of abundance of a specific amino acid ($aa) in the entire genome
				$AAcount += 1;		### keeps track of abundance of ALL amino acids in this specific protein ($name)
			}
		
		}
		
		foreach my $aa1 (@AA)	### sets an element from amino acid list (@AA) equal to $aa1 for each pass of loop; does loop for all members of @AA
		{	$AAprotcount{$name}{$aa1} = &Round($AAprotcount{$name}{$aa1}/$AAcount); 
		###  the line above: amino acid frequency for a specific amino acid within a specific protein is calculated and passed to the subroutine &round, the result is stored in the 2-D array $AAprotcount 
		###  print "\n\n\tFreq. composition of the amino acid $aa1 in the protein $name \n\tis $AAprotcount{$name}{$aa1}\n\n"; ### check for me to view the frequencies
				
			
		}
	}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Round
{ 	my $x = @_[0]; 	### take the value passed to it from sub AAfreq 
	my $x = (int(($x*10**4) + 0.5)/10**4);	### do some mathmagic by moving the decimal place of passed value and take the integer value then move decimal place make to where it was
	return $x; 	### pass the newly rounded result back to where it came from (sub AAfreq)
}

sub Average
{
	my $N = 0;		### initialize $N to zero
	my %aasum; 		### declare %aasum
	foreach my $id (keys %AAprotcount)		### does a loop for each element of whole hash (AAprotcount); 
	{	# Need to count total proteins
		$N += 1; 		### $N is an incremental counter; keeps track of the number of proteins in entire genome
		# Need to sum the freq from each protein 
		foreach my $a (@AA) 	### do loop for every elment of array @AA (array of amino acid abrev. names, e.g. A, C, D...W,X) and refer to it as $a
		{	$aasum{$a} += $AAprotcount{$id}{$a}; } 		### hash $assum{$a}  is a sum of all frequencies of a particular amino acid {$a} from all proteins {$id}
	}

	my $TOTAL = 0;		### set $TOTAL equal to zero
	my %aamean;			### define hash %aamean (we use % instead of $ to indicate we mean the whole hash)
	foreach my $a (@AA) 	### loop occurs for each element of @AA (value will be assign to $a)
	{	# calculate each mean freq . . . . 
		$aamean{$a} = &Round($aasum{$a}/$N); 	### calculates  average frequency of an amino acid across all proteins; passes value to subroutine &Round to give us a number with less decimal places
		# we also need total amino acids . . . .  
		$TOTAL += $AAgenomecount{$a};	### $TOTAL will ultimately be the number of amino acids for whole genome; each pass of the loop adds the total of each amino acid across all proteins
	}

	foreach my $a (@AA)		### loop occurs for each element of @AA (value will be assign to $a)
	{	# calculate the AA fraction using genome counts
		my $genfreq = &Round($AAgenomecount{$a}/$TOTAL); 	### calculates the frequency of a specific amino acid across the entire genome; makes this value look prettier by passing it to the &round subroutine
		# compare on screen the mean protein freqs against the genome freqs (i.e. make a pretty table with what we've found)
		print "$a:  Protein calc > $aamean{$a}  ==  $genfreq <= Genome calc\n";
	}
}



# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# The lines below are not perl statements and are not executed as part of the 
# program.  Instead, they are available to be read as data input by the program
# using the I/O handle name "DATA". This is a default handle name for any data 
# you want to include in a script file.
__END__
A GCU GCC GCA GCG
R CGU CGC CGA CGG AGA AGG
N AAU AAC
D GAU GAC 
C UGU UGC
Q CAA CAG
E GAA GAG
G GGU GGC GGA GGG
H CAU CAC
I AUU AUC AUA
L UUA UUG CUU CUC CUA CUG
K AAA AAG
M AUG
F UUU UUC
P CCU CCC CCA CCG
S UCU UCC UCA UCG AGU AGC
T ACU ACC ACA ACG
W UGG
Y UAU UAC
V GUU GUC GUA GUG
* UAA UAG UGA
}}}
<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='MicroGen' macro='tiddler MicroGenSubtopicMenu'></div><div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div><div class='tagClear'></div>
<!--}}}-->
16 OCT: A possible code script for the midterm assignment is posted here: @@MidTermCode@@
!!!
!Alignment Scoring:
The midterm exam will consist of a simple implementation of the ~Needleman-Wunsch algorithm to find the best alignment for a target sequence in a FASTA file of known Arabidopsis proteins.

''1.'' Download the test FASTA file, which is a small subset of Arabidopsis proteins (8 MB file): [[CLICK HERE|07-midterm/Arabidopsis-midterm-NT-fasta.ffn]]

''2.'' Compose a PERL script using any and all available subroutines to you that will:
* Read the NT fasta file
* Translate the nt codons into protein sequences
* Then hunt through each protein and run an alignment against this target peptide sequence: @@{{{LVSKIIELRPSIVSSRN}}}@@ 

''3.'' When you are done, name your script "lastname-midterm.pl" and email it to me along with your answer to the question: __//Which protein gives the best alignment?//__ (and please include the alignment you get for that protein in your message)

!!!HINTS:
@@''a.''@@ For the alignments you can either use the subroutines that Dweyer provides in chapter 3 (which are available on this page: [[NWsubs]]) or the variants I presented in class last week ([[L06]]).

@@''b.''@@ You will need to structure an iterative loop:
{{{
	foreach my $protein_name (sort keys %PRTs)
	{	my $protein_sequence = $PRTs{$protein_name});
	 	. . . code to run alignment . . .              
	 	. . . code to print or save results . . . 
	}
}}}

@@''c.''@@  You might want to filter the results on the alignment score value. A simple ''IF'' statement like: 
{{{
              if ($scorevariable > 12)
              {      
                       . .  . now print or save result . . . ; 
              }

}}}

!
[[Back to Lecture 8|L08.02]]
!!!
!Really SLOW and Simple ~Smith-Waterman BLAST routine
{{{
#!/usr/bin/perl
use strict;
$|=1;      # forces print output to be sent to screen in real-time

# - - - - - H E A D E R - - - - - - - - - - - - - - - - -
# MidTerm Exam, PERL Bioinformatics, 15 OCT 2008
#     OBJECTIVE: simple implementation of the Needleman-Wunsch algorithm 
#     to find the best alignment for a target sequence in a FASTA file of 
#     known Arabidopsis proteins.

# - - - - - U S E R    V A R I A B L E S - - - - - - - -
my $infile = "Arabidopsis-midterm-NT-fasta.ffn";  
my $TargetSeq = "LVSKIIELRPSIVSSRN";
my $codontable = "Standard";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @FILE;        # input array to hold file contents
my %NTs;         # Hash-Array to hold each orf name & sequence
my %CodonTable;  # Hash-Array for codons
my %PRTs;        # Hash-array to hold protein sequences
my @M;           # alignment matrix; filled by similarity scores
my $g = 0;    # gap penalty

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nI don't know Toto, but I don't think we are in Kansas anymore:\n";

# 1. Process FASTA and make protein sequences . . . . 
	&ReadFasta($infile);
	&LoadCodons($codontable);
	&TranslateFasta;

# 2. Iteratively execute the similarity scoring subroutine: 
	my $max = -99999;
	my $seq1 = $TargetSeq;
	foreach my $protein_name (keys %PRTs)
	{	
		my $seq2 = $PRTs{$protein_name};
		
		my $max = -9999;
		foreach my $i (1..length($seq1))
		{	foreach my $j (0..length($seq1)-$i)
			{	foreach my $k (0..length($seq2)-$i)
				{	
#-------------------------------------------
my $score = &Similarity(substr($seq1,$j,$i),substr($seq2,$k,$i));
if ($score > $max)
{	$max = $score;
	print "\n\n---------------------------------\n";
	print "      Alignment: $protein_name\n      Score= $score\n";
	foreach my $x (&Alignment(substr($seq1,$j,$i),substr($seq2,$k,$i))) 
	{	print "                 ",$x,"\n"; }
}
#-------------------------------------------
				}# end foreach $k
				print " . ";
			}# end foreach $j
		}# end foreach $i
		
		print " * "; # just "." so you know the program is still runnning
	} # end foreach $protein_name

print "\n\n    * * * D O N E * * *\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		$NTs{$name} = $seq;
	}
	$/="\n"; # reset input break character
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TranslateFasta
{	# A.  Load the AA codon table from end of program.
	my @data = split(/\n/,<DATA>);
	foreach my $line (@data)          
	{  	my @codons = split(/ /,$line); # separate on "space" character
		my $AA = shift(@codons);       # $AA= amino acid, then remove from @codon
		foreach my $nnn (@codons) 
		{	$nnn =~ s/U/T/g;
			$CodonTable{$nnn} = $AA; 
			print ">>> $nnn = $CodonTable{$nnn}\n";
		}
    }
	
	# B. Convert the NT sequence into AAs . . . . . .
	foreach my $header (keys %NTs)
	{	my $protein = "";            # set to "empty" at the start of each loop
		for (my $i=0; $i <= length($NTs{$header})-2; $i += 3)  # another FOR-loop structure
		{	my $codon = substr($NTs{$header},$i,3);             # $codon = 3 nts at a time
			my $aa = $CodonTable{$codon};       # here's the translation step
			$protein .= $aa;
		}
		$PRTs{$header} = $protein;
	}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Similarity 
{	# call &Similarity($seq1, $seq2)
	# Determines score of best alignment of strings $seq1 and $seq2
	# Score values are stored in @M
	# Returns max alignment score
	# Calls subroutines &ID and &MAX
	#. . . . . . . . . . . . . . . . .
    my($s,$t) = @_;  # sequences to be aligned.
    foreach my $i (0..length($s)) { $M[$i][0] = $g * $i; }
    foreach my $j (0..length($t)) { $M[0][$j] = $g * $j; }
	
    foreach my $i (1..length($s)) 
	{	foreach my $j (1..length($t)) 
		{	my $p =  &ID(substr($s,$i-1,1),substr($t,$j-1,1));
			$M[$i][$j] = &MAX($M[$i-1][$j] + $g, $M[$i][$j-1] + $g,$M[$i-1][$j-1] + $p);
		}
    }
    return ( $M[length($s)][length($t)] );
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ID 
{  # call &ID(char1,char2)
    my ($aa1, $aa2) = @_;
    return ($aa1 eq $aa2)?1:-1;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub MAX
{	# find max value
	# call &MAX(default value, other values . . . )
	my ($m,@l) = @_;
    foreach my $x (@l) { $m = $x if ($x > $m); }
    return $m;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Alignment
{	# call &Alignment(seq1,seq2)
    my ($s,$t) = @_;  ## sequences to be aligned.
    my ($i,$j) = (length($s), length($t));
    return ( "-"x$j, $t) if ($i==0);
    return ( $s, "-"x$i) if ($j==0);
    my ($sLast,$tLast) = (substr($s,-1),substr($t,-1));
    
    if ($M[$i][$j] == $M[$i-1][$j-1] + &ID($sLast,$tLast)) 
	{ ## Case 1: last letters are paired in the best alignment
		my ($sa, $ta) = &Alignment(substr($s,0,-1), substr($t,0,-1));
		return ($sa . $sLast , $ta . $tLast );
    } 
	elsif ($M[$i][$j] == $M[$i-1][$j] + $g) 
	{ ## Case 2: last letter of the first string is paired with a gap
		my ($sa, $ta) = &Alignment(substr($s,0,-1), $t);
		return ($sa . $sLast , $ta . "-");
    } 
	else 
	{ ## Case 3: last letter of the 2nd string is paired with a gap
		my ($sa, $ta) = &Alignment($s, substr($t,0,-1));
		return ($sa . "-" , $ta . $tLast );
    }
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# ---------------------------------------------------------
sub LoadCodons
{
	$/=">";
	my $Table = shift(@_);
	my @TABLE = <DATA>;
	foreach my $j (@TABLE)
	{	if ($j =~ m/^ (\d){1,2} $Table/)
		{	my @k = split(/\n/,$j);
			$k[1] =~ s/Amino  //;
			foreach my $i (1..3)
			{	$k[$i+1] =~ s/Base$i  //; }
			my @AA = split(//,$k[1]);
			my @B1 = split(//,$k[2]);
			my @B2 = split(//,$k[3]);
			my @B3 = split(//,$k[4]);
			foreach my $i (0..63)
			{	$CodonTable{$B1[$i].$B2[$i].$B3[$i]} = $AA[$i]; }
		}
	}
	# foreach my $nnn (keys %CodonTable)
	# {  print "$nnn = $CodonTable{$nnn}\n";}
	$/="\n";   # reset back to default before leaving subroutine
}
# ---------------------------------------------------------
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# The lines below are not perl statements and are not executed as part of the 
# program.  Instead, they are available to be read as data input by the program
# using the I/O handle name "DATA". This is a default handle name for any data 
# you want to include in a script file.
__END__
> 0 Codon Translation Tables
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1
> 1 Standard
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
> 11 Bacteria and Archea
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


}}}
!
[[BACK to Midterm exam|MidTerm]]
[[FULL working code example|NW-AlignBlast]]
!!!
!Sample Code for Exam:
There are a few ways in which you could approach the task of running a string search through a group of protein sequences. The most direct approach given the material I have presented in class is described below. In general, it consists of merging the FASTA reader/translator script (FASTAtranslate2) with the ~Needleman-Wunsch Alignments script (NeedlemanWunschAlign).

''1.'' You first need all the FASTA processing code using the subroutines:
*  [[&ReadFasta|ReadFasta]]
*  [[&LoadCodonTable|LoadCodonTable]]
*  [[&TranslateFasta|TransFasta]]
Once you take care of the house-cleaning tasks of declaring the appropriate variables ($infile and $codontable), then the first part of the script just looks like this:
{{{
# 1. Process FASTA and make protein sequences . . . . 
	&ReadFasta($infile);
	&LoadCodons($codontable);
	&TranslateFasta;
}}}
After ''&~TranslateFasta'' executes, and the protein sequences will be stored in the ''%~PRTs'' hash array using the "name" of each protein as the index key. So if the first protein name is equal to ''Glucokinase'', then the sequence for glucokinase can be accessed with {{{ $PRT{"Glucokinase"} }}}.

''2.''  Now stepping up a notch in difficulty, the next step involves running the ''&Similarity'' and ''&Alignment'' subroutines from the ~Needleman-Wunsch alignment script. Note that if you tried to use the WordCompare script from class, there is no way the program could have run within the next 365 CPU days. The goal specifically asked for a NW implementation because of this. So you should have copied all the subroutines from NeedlemanWunschAlign and pasted them into the subroutine block in your script. 
* Define the target/query  sequence as a User Variable: 
**  {{{ my $TargetSeq = "LVSKIIELRPSIVSSRN"; }}}
* Declare the ''@M'' array and the gap penalty ''$g''
* Then use Hint B to structure a ''foreach'' loop to iteratively process each protein sequence separately. The basic logic block looks like this:
{{{
foreach my $protein_name (keys %PRTs)
{      my $seq1 = $TargetSeq;
       my $seq2 = $PRTs{$protein_name};

       #------------ CODE FROM NWALIGN --------------
       # 1. Run similarity score first . . . . . . 
       print "      Similarity score: ", &Similarity($seq1,$seq2), "\n";
	
       # 2. Find the alignment for that similarity score . . . . 
       print "      Alignment: \n";
       foreach my $x (&Alignment($seq1,$seq2)) 
       {	print "                 ",$x,"\n"; }
       #------------ CODE FROM NWALIGN --------------
}
}}}
Just doing this up to step #2 was what I thought everyone should be able to achieve given the code I have posted on the web site, the home work assignments, and code I have discussed in lecture. There is nothing new or different up to this step.


''3.'' The hardest task was to think of the logic behind how to find the maximum alignment value and then output the protein name and sequence alignment. 
* ''Brute Force Approach:''
** Hint ''C'' suggested the ''IF'' conditional test to evaluate whether you wanted to look at the value of the alignment score //which is returned by ''&Similarity''//: The ''&Similarity'' subroutine returns the best alignment score between $TargetSeq and $protein_sequence as a numerical value.
** You could have put an ''IF'' statement around the #2. code block above like:
{{{
# 1. Run similarity score first . . . . . . 
my $score = &Similarity($seq1,$seq2);

if ($score > XXXX)
{      # 2. Find the alignment for that similarity score . . . . 
       print "      Alignment: \n";
       foreach my $x (&Alignment($seq1,$seq2)) 
       {	print "                 ",$x,"\n"; }
       #------------ CODE FROM NWALIGN --------------
}
}}}
Here you could have started with XXXX = -1000, and then raised the value after running the program and seeing how many alignments were printed to screen. With XXXX set to a value that just let 20 alignments be printed, you could have visually found which one had the highest score.

* ''Using a MAX function''
** A maximum search loop was presented to you in the @@WordCompare@@ scripts. The idea is straightforward in that a value is stored in a varibale called $max and in every iteration you compare the current calculation to the value stored in $max, and if the current calc value is greater than $max, you have a new $max:
{{{
# 1. Run similarity score first . . . . . . 
my $score = &Similarity($seq1,$seq2);
if ($score >= $max)
{     $max = $score;
       # 2. Find the alignment for that similarity score . . . . 
       print "      Alignment: \n";
       foreach my $x (&Alignment($seq1,$seq2)) 
       {	print "                 ",$x,"\n"; }
       #------------ CODE FROM NWALIGN --------------
}
}}}
This code approach will print several alignments to screen. Every time a new MAX score value is found, you will get screen output. ''//But what is certain is that the last alignment printed to screen will definitely be one of the BEST, i.e. maximum score//''

The final line of output should look something like this, depending upon your print statements:
{{{
Alignment: AT1G67855.1 | Symbols:  | unknown protein | chr1: 25445979-25446261 REVERSE
Score= -8
     -----L-----------V---SKI-IELRPS-IV-S----S--R-N
     MLTTDLTMFFTRDTETTVFITS-IGI--TPSDAVGSKRVVSRVRY*
}}}

In the working code example I provide (NW-AlignBlast), the print statements are a little fancier just for clearer output.


!
!BLAST Stats:
''The Statistics of Sequence Similarity Scores''
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
<html>
<!-- title with bullet --> 

   <h3><img src="GIFS/bluebullet.gif" width="16" height="14">Introduction</h3>

   <!-- end of title with bullet --> 

<SPAN CLASS=TEXTWIDE>

&nbsp;&nbsp;&nbsp;To assess whether a given alignment constitutes evidence for homology, it
helps to know how strong an alignment can be expected from chance alone.
In this context, "chance" can mean the comparison of (i) real but non-homologous sequences; (ii) real sequences that are shuffled to preserve
compositional properties <A HREF="#ref1">[1-3]</A>; or (iii) sequences that are generated
randomly based upon a DNA or protein sequence model. Analytic statistical
results invariably use the last of these definitions of chance, while
empirical results based on simulation and curve-fitting may use any of
the definitions.<BR>


 <h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head1">The statistics of global sequence comparison</A></h3>

&nbsp;&nbsp;&nbsp;Unfortunately, under even the simplest random models and scoring systems,
very little is known about the random distribution of optimal global
alignment scores <A HREF="#ref4">[4]</A>. Monte Carlo experiments can provide rough
distributional results for some specific scoring systems and sequence
compositions <A HREF="#ref5">[5]</A>, but these can not be generalized easily. Therefore,
one of the few methods available for assessing the statistical significance
of a particular global alignment is to generate many random sequence
pairs of the appropriate length and composition, and calculate the
optimal alignment score for each <A HREF="#ref1">[1,3]</A>. While it is then possible to
express the score of interest in terms of standard deviations from the
mean, it is a mistake to assume that the relevant distribution is normal
and convert this <I>Z</I>-value into a <I>P</I>-value; the tail behavior of global
alignment scores is unknown. The most one can say reliably is that if
100 random alignments have score inferior to the alignment of interest,
the <I>P</I>-value in question is likely less than 0.01. One further pitfall
to avoid is exaggerating the significance of a result found among multiple
tests. When many alignments have been generated, e.g. in a database
search, the significance of the best must be discounted accordingly.
An alignment with <I>P</I>-value 0.0001 in the context of a single trial may
be assigned a <I>P</I>-value of only 0.1 if it was selected as the best among
1000 independent trials.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head2">The statistics of local sequence comparison</A></h3>

&nbsp;&nbsp;&nbsp;Fortunately statistics for the scores of local alignments, unlike those of
global alignments, are well understood. This is particularly true for local
alignments lacking gaps, which we will consider first. Such alignments were
precisely those sought by the original BLAST database search programs <A HREF="#ref6">[6]</A>.<BR>
 
&nbsp;&nbsp;&nbsp;A local alignment without gaps consists simply of a pair of equal length
segments, one from each of the two sequences being compared. A modification
of the Smith-Waterman <A HREF="#ref7">[7]</A> or Sellers <A HREF="#ref8">[8]</A> algorithms will find all segment
pairs whose scores can not be improved by extension or trimming. These are
called high-scoring segment pairs or HSPs.<BR>

&nbsp;&nbsp;&nbsp;To analyze how high a score is likely to arise by chance, a model of random
sequences is needed. For proteins, the simplest model chooses the amino acid
residues in a sequence independently, with specific background probabilities
for the various residues. Additionally, the expected score for aligning a
random pair of amino acid is required to be negative. Were this not the case,
long alignments would tend to have high score independently of whether the
segments aligned were related, and the statistical theory would break down.<BR>

&nbsp;&nbsp;&nbsp;Just as the sum of a large number of independent identically distributed
(i.i.d) random variables tends to a normal distribution, the maximum
of a large number of i.i.d. random variables tends to an extreme value
distribution <A HREF="#ref9">[9]</A>. (We will elide the many technical points required
to make this statement rigorous.) In studying optimal local sequence
alignments, we are essentially dealing with the latter case <A HREF="#ref10">[10,11]</A>.
In the limit of sufficiently large sequence lengths <I>m</I> and <I>n</I>, the
statistics of HSP scores are characterized by two parameters, <I>K</I> and
<I>lambda</I>. Most simply, the expected number of HSPs with score at least
<I>S</I> is given by the formula<BR>

<IMG SRC="GIFS/(1).gif"  WIDTH="460" HEIGHT="50" BORDER="0"><BR><BR><BR><BR>

We call this the <I>E</I>-value for the score <I>S</I>.<BR>
&nbsp;&nbsp;&nbsp;This formula makes eminently intuitive sense. Doubling the length of
either sequence should double the number of HSPs attaining a given score.
Also, for an HSP to attain the score <I>2x</I> it must attain the score <I>x</I> twice
in a row, so one expects <I>E</I> to decrease exponentially with score. The
parameters <I>K</I> and <I>lambda</I> can be thought of simply as natural scales for
the search space size and the scoring system respectively.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head3">Bit scores</A></h3>

&nbsp;&nbsp;&nbsp;Raw scores have little meaning without detailed knowledge of the scoring
system used, or more simply its statistical parameters <I>K</I> and <I>lambda</I>.
Unless the scoring system is understood, citing a raw score alone is like citing a distance without specifying
feet, meters, or light years.
By normalizing a raw score using the formula<BR>

 <IMG SRC="GIFS/(2).gif" ALIGN=BOTTOM WIDTH="460" HEIGHT="65" BORDER="0"><BR><BR><BR><BR>

one attains a "bit score" <I>S'</I>, which has a standard set of units. The <I>E</I>-value
corresponding to a given bit score is simply<BR>

<IMG SRC="GIFS/(3).gif" ALIGN=BOTTOM WIDTH="460" HEIGHT="65" BORDER="0"><BR><BR><BR><BR>

Bit scores subsume the statistical essence of the scoring system employed,
so that to calculate significance one needs to know in addition only the
size of the search space.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head4">P-values</A></h3>

&nbsp;&nbsp;&nbsp;The number of random HSPs with score >= <I>S</I> is described by a Poisson
distribution <A HREF="#ref10">[10,11]</A>. This means that the probability of finding exactly
<I>a</I> HSPs with score >=<I>S</I> is given by<BR>

<IMG SRC="GIFS/(4).gif" ALIGN=BOTTOM WIDTH="460" HEIGHT="65" BORDER="0"><BR><BR><BR><BR>

where <I>E</I> is the <I>E</I>-value of <I>S</I> given by equation (1) above. Specifically the
chance of finding zero HSPs with score >=<I>S</I> is e<SUP>-E</SUP>, so the probability
of finding at least one such HSP is<BR>

<IMG SRC="GIFS/(5).gif" ALIGN=BOTTOM WIDTH="460" HEIGHT="50" BORDER="0"><BR><BR><BR><BR>

This is the <I>P</I>-value associated with the score <I>S</I>. For example, if one expects
to find three HSPs with score >= <I>S</I>, the probability of finding at least one
is 0.95. The BLAST programs report <I>E</I>-value rather than <I>P</I>-values because it
is easier to understand the difference between, for example, <I>E</I>-value of 5
and 10 than <I>P</I>-values of 0.993 and 0.99995. However, when <I>E</I> < 0.01, <I>P</I>-values
and <I>E</I>-value are nearly identical.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head5">Database searches</A></h3>

&nbsp;&nbsp;&nbsp;The <I>E</I>-value of equation (1) applies to the comparison of two proteins of
lengths <I>m</I> and <I>n</I>. How does one assess the significance of an alignment that
arises from the comparison of a protein of length <I>m</I> to a database containing
many different proteins, of varying lengths? One view is that all proteins
in the database are <I>a priori</I> equally likely to be related to the query.
This implies that a low <I>E</I>-value for an alignment involving a short database
sequence should carry the same weight as a low <I>E</I>-value for an alignment
involving a long database sequence. To calculate a "database search" <I>E</I>-value,
one simply multiplies the pairwise-comparison <I>E</I>-value by the number of
sequences in the database. Recent versions of the FASTA protein comparison
programs <A HREF="#ref12">[12]</A> take this approach <A HREF="#ref13">[13]</A>.<BR>

&nbsp;&nbsp;&nbsp;An alternative view is that a query is <I>a priori</I> more likely to be related to
a long than to a short sequence, because long sequences are often composed of
multiple distinct domains. If we assume the <I>a priori</I> chance of relatedness is
proportional to sequence length, then the pairwise <I>E</I>-value involving a database
sequence of length <I>n</I> should be multiplied by <I>N/n</I>, where <I>N</I> is the total length
of the database in residues. Examining equation (1), this can be accomplished
simply by treating the database as a single long sequence of length <I>N</I>. The
BLAST programs <A HREF="#ref6">[6,14,15]</A> take this approach to calculating database <I>E</I>-value.
Notice that for DNA sequence comparisons, the length of database records is
largely arbitrary, and therefore this is the only really tenable method for
estimating statistical significance.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head6">The statistics of gapped alignments</A></h3>

&nbsp;&nbsp;&nbsp;The statistics developed above have a solid theoretical foundation only
for local alignments that are not permitted to have gaps. However, many
computational experiments <A HREF="#ref14">[14-21]</A> and some analytic results <A HREF="#ref22">[22]</A> strongly
suggest that the same theory applies as well to gapped alignments. For
ungapped alignments, the statistical parameters can be calculated, using
analytic formulas, from the substitution scores and the background residue frequencies of the sequences being compared. For gapped alignments,
these parameters must be estimated from a large-scale comparison of
"random" sequences.<BR>

&nbsp;&nbsp;&nbsp;Some database search programs, such as FASTA <A HREF="#ref12">[12]</A> or various implementation
of the Smith-Waterman algorithm <A HREF="#ref7">[7]</A>, produce optimal local alignment scores
for the comparison of the query sequence to every sequence in the database.
Most of these scores involve unrelated sequences, and therefore can be used
to estimate <I>lambda</I> and <I>K</I> <A HREF="#ref17">[17,21]</A>. This approach avoids the artificiality of
a random sequence model by employing real sequences, with their attendant
internal structure and correlations, but it must face the problem of excluding
from the estimation scores from pairs of related sequences. The BLAST programs
achieve much of their speed by avoiding the calculation of optimal alignment
scores for all but a handful of unrelated sequences. The must therefore rely
upon a pre-estimation of the parameters <I>lambda</I> and <I>K</I>, for a selected set of
substitution matrices and gap costs. This estimation could be done using real
sequences, but has instead relied upon a random sequence model <A HREF="#ref14">[14]</A>, which
appears to yield fairly accurate results <A HREF="#ref21">[21]</A>.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head7">Edge effects</A></h3>

&nbsp;&nbsp;&nbsp;The statistics described above tend to be somewhat conservative for short
sequences. The theory supporting these statistics is an asymptotic one,
which assumes an optimal local alignment can begin with any aligned pair
of residues. However, a high-scoring alignment must have some length,
and therefore can not begin near to the end of either of two sequences
being compared. This "edge effect" may be corrected for by calculating
an "effective length" for sequences <A HREF="#ref14">[14]</A>; the BLAST programs implement
such a correction. For sequences longer than about 200 residues the edge
effect correction is usually negligible.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head8">The choice of substitution scores</A></h3>

&nbsp;&nbsp;&nbsp;The results a local alignment program produces depend strongly upon the
scores it uses. No single scoring scheme is best for all purposes, and
an understanding of the basic theory of local alignment scores can improve
the sensitivity of one's sequence analyses. As before, the theory is fully
developed only for scores used to find ungapped local alignments, so we
start with that case.<BR>

&nbsp;&nbsp;&nbsp;A large number of different amino acid substitution scores, based upon a
variety of rationales, have been described <A HREF="#ref23">[23-36]</A>. However the scores of
any substitution matrix with negative expected score can be written uniquely
in the form<BR>

<IMG SRC="GIFS/(6).gif" ALIGN=BOTTOM WIDTH="460" HEIGHT="80" BORDER="0"><BR><BR><BR><BR><BR><BR>
 
where the <I>q<SUB>ij</SUB></I>, called target frequencies, are positive numbers that sum
to 1, the <I>p<SUB>i</SUB></I> are background frequencies for the various residues, and
<I>lambda</I> is a positive constant <A HREF="#ref10">[10,31]</A>. The <I>lambda</I> here is identical to the
<I>lambda</I> of equation (1).<BR>

&nbsp;&nbsp;&nbsp;Multiplying all the scores in a substitution matrix by a positive constant
does not change their essence: an alignment that was optimal using the
original scores remains optimal. Such multiplication alters the parameter
<I>lambda</I> but not the target frequencies <I>q<SUB>ij</SUB></I>. Thus, up to a constant
scaling factor, every substitution matrix is uniquely determined by its
target frequencies. These frequencies have a special significance <A HREF="#ref10">[10,31]</A>:<BR><BR>

  <CENTER><TABLE WIDTH=400>
  <TR><TD ><SPAN CLASS=TEXTWIDE>
  A given class of alignments is best distinguished from chance by the
  substitution matrix whose target frequencies characterize the class.
  </SPAN></TD></TR>
  </TABLE></CENTER><BR>

To elaborate, one may characterize a set of alignments representing homologous
protein regions by the frequency with which each possible pair of residues is
aligned. If valine in the first sequence and leucine in the second appear in
1% of all alignment positions, the target frequency for (valine, leucine) is
0.01. The most direct way to construct appropriate substitution matrices for
local sequence comparison is to estimate target and background frequencies,
and calculate the corresponding log-odds scores of formula (6). These
frequencies in general can not be derived from first principles, and their
estimation requires empirical input.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head9">The PAM and BLOSUM amino acid substitution matrices</A></h3>

&nbsp;&nbsp;&nbsp;While all substitution matrices are implicitly of log-odds form, the first
explicit construction using formula (6) was by Dayhoff and coworkers <A HREF="#ref24">[24,25]</A>. From a study of observed residue replacements in closely related proteins,
they constructed the PAM (for "point accepted mutation") model of molecular
evolution. One "PAM" corresponds to an average change in 1% of all amino
acid positions. After 100 PAMs of evolution, not every residue will have
changed: some will have mutated several times, perhaps returning to their
original state, and others not at all. Thus it is possible to recognize as
homologous proteins separated by much more than 100 PAMs. Note that there
is no general correspondence between PAM distance and evolutionary time, as
different protein families evolve at different rates.<BR>

&nbsp;&nbsp;&nbsp;Using the PAM model, the target frequencies and the corresponding substitution
matrix may be calculated for any given evolutionary distance. When two
sequences are compared, it is not generally known a priori what evolutionary
distance will best characterize any similarity they may share. Closely
related sequences, however, are relatively easy to find even will non-optimal
matrices, so the tendency has been to use matrices tailored for fairly distant
similarities. For many years, the most widely used matrix was PAM-250,
because it was the only one originally published by Dayhoff.<BR>

&nbsp;&nbsp;&nbsp;Dayhoff's formalism for calculating target frequencies has been criticized
<A HREF="#ref27">[27]</A>, and there have been several efforts to update her numbers using the
vast quantities of derived protein sequence data generated since her work
<A HREF="#ref33">[33,35]</A>. These newer PAM matrices do not differ greatly from the original
ones <A HREF="#ref37">[37]</A>.<BR>

&nbsp;&nbsp;&nbsp;An alternative approach to estimating target frequencies, and the corresponding
log-odds matrices, has been advanced by Henikoff & Henikoff <A HREF="#ref34">[34]</A>. They examine
multiple alignments of distantly related protein regions directly, rather than
extrapolate from closely related sequences. An advantage of this approach is
that it cleaves closer to observation; a disadvantage is that it yields no
evolutionary model. A number of tests <A HREF="#ref13">[13,37]</A> suggest that the "BLOSUM"
matrices produced by this method generally are superior to the PAM matrices
for detecting biological relationships.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head10">DNA substitution matrices</A></h3>

&nbsp;&nbsp;&nbsp;While we have discussed substitution matrices only in the context of protein sequence comparison, all the main issues carry over to DNA sequence comparison.
One warning is that when the sequences of interest code for protein, it is almost always better to compare the protein translations than to compare the DNA sequences directly.
The reason is that after only a small amount of evolutionary change, the DNA sequences, when compared using simple nucleotide substitution scores, contain less
information with which to deduce homology than do the encoded protein sequences
<A HREF="#ref32">[32]</A>.<BR>
&nbsp;&nbsp;&nbsp;Sometimes, however, one may wish to compare non-coding DNA sequences, at which point the same log-odds approach as before applies.
An evolutionary model in which all nucleotides are equally common and all substitution mutations are equally likely yields different scores only for matches and mismatches <A HREF="#ref32">[32]</A>.
A more complex model, in which transitions are more likely than transversions, yields different "mismatch" scores for transitions and transversions <A HREF="#ref32">[32]</A>.
The best scores to use will depend upon whether one is seeking relatively diverged or closely related sequences <A HREF="#ref32">[32]</A>.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name="head11">Gap scores</A></h3>

&nbsp;&nbsp;&nbsp;Our theoretical development concerning the optimality of matrices constructed
using equation (6) unfortunately is invalid as soon as gaps and associated gap
scores are introduced, and no more general theory is available to take its
place. However, if the gap scores employed are sufficiently large, one can
expect that the optimal substitution scores for a given application will not
change substantially. In practice, the same substitution scores have been
applied fruitfully to local alignments both with and without gaps. Appropriate
gap scores have been selected over the years by trial and error <A HREF="#ref13">[13]</A>, and most
alignment programs will have a default set of gap scores to go with a default
set of substitution scores. If the user wishes to employ a different set of
substitution scores, there is no guarantee that the same gap scores will remain
appropriate. No clear theoretical guidance can be given, but "affine gap
scores" <A HREF="#ref38">[38-41]</A>, with a large penalty for opening a gap and a much smaller
one for extending it, have generally proved among the most effective.<BR>


<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name = "head12">Low complexity sequence regions</A></h3>

&nbsp;&nbsp;&nbsp;There is one frequent case where the random models and therefore the statistics
discussed here break down. As many as one fourth of all residues in protein
sequences occur within regions with highly biased amino acid composition.
Alignments of two regions with similarly biased composition may achieve very
high scores that owe virtually nothing to residue order but are due instead
to segment composition. Alignments of such "low complexity" regions have
little meaning in any case: since these regions most likely arise by gene
slippage, the one-to-one residue correspondence imposed by alignment is
not valid. While it is worth noting that two proteins contain similar low
complexity regions, they are best excluded when constructing alignments
<A HREF="#ref42">[42-44]</A>. The BLAST programs employ the SEG algorithm <A HREF="#ref43">[43]</A> to filter low
complexity regions from proteins before executing a database search.<BR>

<h3><img src="GIFS/bluebullet.gif" width="16" height="14"><A name = "refs">References</A></h3>


<A NAME="ref1">[1]</A> Fitch, W.M. (1983) "Random sequences." J. Mol. Biol. 163:171-176. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=6842586&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref2">[2]</A> Lipman, D.J., Wilbur, W.J., Smith T.F. & Waterman, M.S. (1984) "On the
   statistical significance of nucleic acid similarities." Nucl. Acids Res.
   12:215-226. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=6694902&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref3">[3]</A> 
Altschul, S.F. & Erickson, B.W. (1985) "Significance of nucleotide sequence
   alignments: a method for random sequence permutation that preserves
   dinucleotide and codon usage." Mol. Biol. Evol. 2:526-538. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=3870875&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref4">[4]</A> Deken, J. (1983) "Probabilistic behavior of longest-common-subsequence
   length." In "Time Warps, String Edits and Macromolecules: The Theory and
   Practice of Sequence Comparison." D. Sankoff & J.B. Kruskal (eds.),
   pp. 55-91, Addison-Wesley, Reading, MA. <BR><BR>

<A NAME="ref5">[5]</A> Reich, J.G., Drabsch, H. & Daumler, A. (1984) "On the statistical
   assessment of similarities in DNA sequences." Nucl. Acids Res.
   12:5529-5543. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=6462914&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref6">[6]</A> Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990)
   "Basic local alignment search tool." J. Mol. Biol. 215:403-410. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=2231712&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref7">[7]</A> Smith, T.F. & Waterman, M.S. (1981) "Identification of common molecular
   subsequences." J. Mol. Biol. 147:195-197. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=7265238&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref8">[8]</A> Sellers, P.H. (1984) "Pattern recognition in genetic sequences by mismatch
   density." Bull. Math. Biol. 46:501-514.<BR><BR>

<A NAME="ref9">[9]</A> Gumbel, E. J. (1958) "Statistics of extremes." Columbia University Press,
   New York, NY.<BR><BR>

<A NAME="ref10">[10]</A> Karlin, S. & Altschul, S.F. (1990) "Methods for assessing the statistical
   significance of molecular sequence features by using general scoring
   schemes." Proc. Natl. Acad. Sci. USA 87:2264-2268.<A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=2315319&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref11">[11]</A> Dembo, A., Karlin, S. & Zeitouni, O. (1994) "Limit distribution of maximal
   non-aligned two-sequence segmental score." Ann. Prob. 22:2022-2039.<BR><BR>

<A NAME="ref12">[12]</A> Pearson, W.R. & Lipman, D.J. (1988) Improved tools for biological sequence
   comparison." Proc. Natl. Acad. Sci. USA 85:2444-2448. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=3162770&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref13">[13]</A> Pearson, W.R. (1995) "Comparison of methods for searching protein sequence
   databases." Prot. Sci. 4:1145-1160. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=7549879&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref14">[14]</A> Altschul, S.F. & Gish, W. (1996) "Local alignment statistics." Meth.
   Enzymol. 266:460-480. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=8743700&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref15">[15]</A> Altschul, S.F., Madden, T.L., Sch&auml;ffer, A.A., Zhang, J., Zhang, Z., Miller,   W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of
   protein database search programs." Nucleic Acids Res. 25:3389-3402. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=9254694&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref16">[16]</A> Smith, T.F., Waterman, M.S. & Burks, C. (1985) "The statistical
   distribution of nucleic acid similarities." Nucleic Acids Res. 13:645-656. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=3871073&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref17">[17]</A> Collins, J.F., Coulson, A.F.W. & Lyall, A. (1988) "The significance of
   protein sequence similarities." Comput. Appl. Biosci. 4:67-71. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=3383005&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref18">[18]</A> Mott, R. (1992) "Maximum-likelihood estimation of the statistical
   distribution of Smith-Waterman local sequence similarity scores." Bull.
   Math. Biol. 54:59-75. <BR><BR>

<A NAME="ref19">[19]</A> Waterman, M.S. & Vingron, M. (1994) "Rapid and accurate estimates of
   statistical significance for sequence database searches." Proc. Natl. Acad.
   Sci. USA 91:4625-4628. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=8197109&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref20">[20]</A> Waterman, M.S. & Vingron, M. (1994) "Sequence comparison significance and
   Poisson approximation." Stat. Sci. 9:367-381.<BR><BR>

<A NAME="ref21">[21]</A> Pearson, W.R. (1998) "Empirical statistical estimates for sequence
   similarity searches." J. Mol. Biol. 276:71-84. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=9514730&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref22">[22]</A> Arratia, R. & Waterman, M.S. (1994) "A phase transition for the score in
   matching random sequences allowing deletions." Ann. Appl. Prob. 4:200-225.<BR><BR>

<A NAME="ref23">[23]</A> McLachlan, A.D. (1971) "Tests for comparing related amino-acid sequences.
   Cytochrome c and cytochrome c-551." J. Mol. Biol. 61:409-424. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=5167087&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref24">[24]</A> Dayhoff, M.O., Schwartz, R.M. & Orcutt, B.C. (1978) "A model of
   evolutionary change in proteins." In "Atlas of Protein Sequence and
   Structure," Vol. 5, Suppl. 3 (ed. M.O. Dayhoff), pp. 345-352. Natl. Biomed. Res. Found., Washington, DC.<BR><BR>

<A NAME="ref25">[25]</A> Schwartz, R.M. & Dayhoff, M.O. (1978) "Matrices for detecting distant
   relationships." In "Atlas of Protein Sequence and Structure," Vol. 5,
   Suppl. 3 (ed. M.O. Dayhoff), p. 353-358. Natl. Biomed. Res. Found.,
   Washington, DC.<BR><BR>

<A NAME="ref26">[26]</A> Feng, D.F., Johnson, M.S. & Doolittle, R.F. (1984) "Aligning amino acid
   sequences: comparison of commonly used methods." J. Mol. Evol. 21:112-125. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=6100188&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref27">[27]</A> Wilbur, W.J. (1985) "On the PAM matrix model of protein evolution." Mol.
   Biol. Evol. 2:434-447. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=3870870&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref28">[28]</A> Taylor, W.R. (1986) "The classification of amino acid conservation."
   J. Theor. Biol. 119:205-218. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=3461222&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref29">[29]</A> Rao, J.K.M. (1987) "New scoring matrix for amino acid residue exchanges
   based on residue characteristic physical parameters." Int. J. Peptide
   Protein Res. 29:276-281. <BR><BR>

<A NAME="ref30">[30]</A> Risler, J.L., Delorme, M.O., Delacroix, H. & Henaut, A. (1988) "Amino acid
   substitutions in structurally related proteins. A pattern recognition
   approach. Determination of a new and efficient scoring matrix." J. Mol.
   Biol. 204:1019-1029. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=3221397&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref31">[31]</A> Altschul, S.F. (1991) "Amino acid substitution matrices from an information
   theoretic perspective." J. Mol. Biol. 219:555-565. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=2051488&dopt=Abstract">(PubMed)</A><BR><BR>   
<A NAME="ref32">[32]</A> States, D.J., Gish, W. & Altschul, S.F. (1991) "Improved sensitivity
   of nucleic acid database searches using application-specific scoring
   matrices." Methods 3:66-70. <BR><BR>

<A NAME="ref33">[33]</A> Gonnet, G.H., Cohen, M.A. & Benner, S.A. (1992) "Exhaustive matching of the
   entire protein sequence database." Science 256:1443-1445. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=1604319&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref34">[34]</A> Henikoff, S. & Henikoff, J.G. (1992) "Amino acid substitution matrices from
   protein blocks." Proc. Natl. Acad. Sci. USA 89:10915-10919. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=1438297&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref35">[35]</A> Jones, D.T., Taylor, W.R. & Thornton, J.M. (1992) "The rapid generation of
   mutation data matrices from protein sequences." Comput. Appl. Biosci.
   8:275-282. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=1633570&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref36">[36]</A> Overington, J., Donnelly, D., Johnson M.S., Sali, A. & Blundell, T.L.
   (1992) "Environment-specific amino acid substitution tables: Tertiary
   templates and prediction of protein folds." Prot. Sci. 1:216-226. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=1304904&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref37">[37]</A> Henikoff, S. & Henikoff, J.G. (1993) "Performance evaluation of amino acid
   substitution matrices." Proteins 17:49-61. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=8234244&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref38">[38]</A> Gotoh, O. (1982) "An improved algorithm for matching biological sequences."
   J. Mol. Biol. 162:705-708. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=7166760&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref39">[39]</A> Fitch, W.M. & Smith, T.F. (1983) "Optimal sequence alignments." Proc. Natl.
   Acad. Sci. USA 80:1382-1386.<BR><BR>

<A NAME="ref40">[40]</A> Altschul, S.F. & Erickson, B.W. (1986) "Optimal sequence alignment using
   affine gap costs." Bull. Math. Biol. 48:603-616. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=3580642&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref41">[41]</A> Myers, E.W. & Miller, W. (1988) "Optimal alignments in linear space."
   Comput. Appl. Biosci. 4:11-17. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=3382986&dopt=Abstract">(PubMed)</A><BR><BR>
<A NAME="ref42">[42]</A> Claverie, J.-M. & States, D.J. (1993) "Information enhancement methods for
   large-scale sequence-analysis." Comput. Chem. 17:191-201.<BR><BR>

<A NAME="ref43">[43]</A> Wootton, J.C. & Federhen, S. (1993) "Statistics of local complexity in
   amino acid sequences and sequence databases." Comput. Chem. 17:149-163.<BR><BR>

<A NAME="ref44">[44]</A> Altschul, S.F., Boguski, M.S., Gish, W. & Wootton, J.C. (1994) "Issues in
   searching molecular sequence databases." Nature Genet. 6:119-129. <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=8162065&dopt=Abstract">(PubMed)</A><BR><BR>

  </td>



 </tr>

</table>

<!--         end of content          --> <!--         bottom of the page        --> 
</html>
!Script from Midterm Exam:
This script uses the ~Needleman-Wunsch algorithm in a "BLAST"-like implementation to find the best similarity alignment between a query sequence and a FASTA file of reference sequences. (see midterm exam [[MidTerm]]).
{{{
#!/usr/bin/perl
use strict;
$|=1;      # forces print output to be sent to screen in real-time

# - - - - - H E A D E R - - - - - - - - - - - - - - - - -
# MidTerm Exam, PERL Bioinformatics, 15 OCT 2008
#     OBJECTIVE: simple implementation of the Needleman-Wunsch algorithm 
#     to find the best alignment for a target sequence in a FASTA file of 
#     known Arabidopsis proteins.

# - - - - - U S E R    V A R I A B L E S - - - - - - - -
my $infile = "Arabidopsis-midterm-NT-fasta.ffn";  
my $TargetSeq = "LVSKIIELRPSIVSSRN";
my $codontable = "Standard";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @FILE;        # input array to hold file contents
my %NTs;         # Hash-Array to hold each orf name & sequence
my %CodonTable;  # Hash-Array for codons
my %PRTs;        # Hash-array to hold protein sequences
my @M;           # alignment matrix; filled by similarity scores
my $g = -0.5;    # gap penalty

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nI don't know Toto, but I don't think we are in Kansas anymore:\n";

# 1. Process FASTA and make protein sequences . . . . 
	&ReadFasta($infile);
	&LoadCodons($codontable);
	&TranslateFasta;

# 2. Iteratively execute the similarity scoring subroutine: 
	my $max = -99999;
	my $seq1 = $TargetSeq;
	foreach my $protein_name (keys %PRTs)
	{	my $seq2 = $PRTs{$protein_name};
	 	my $score = &Similarity($seq1,$seq2);
		if ($score > $max)
		{	$max = $score;
			print "\n\n---------------------------------\n";
			print "      Alignment: $protein_name\n      Score= $score\n";
			foreach my $x (&Alignment($seq1,$seq2)) 
			{	print "                 ",$x,"\n"; }
		}
		else
		{	print " . "; } # just "." so you know the program is still runnning
	}

print "\n\n    * * * D O N E * * *\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		$NTs{$name} = $seq;
	}
	$/="\n"; # reset input break character
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TranslateFasta
{	# A.  Load the AA codon table from end of program.
	my @data = split(/\n/,<DATA>);
	foreach my $line (@data)          
	{  	my @codons = split(/ /,$line); # separate on "space" character
		my $AA = shift(@codons);       # $AA= amino acid, then remove from @codon
		foreach my $nnn (@codons) 
		{	$nnn =~ s/U/T/g;
			$CodonTable{$nnn} = $AA; 
			print ">>> $nnn = $CodonTable{$nnn}\n";
		}
    }
	
	# B. Convert the NT sequence into AAs . . . . . .
	foreach my $header (keys %NTs)
	{	my $protein = "";            # set to "empty" at the start of each loop
		for (my $i=0; $i <= length($NTs{$header})-2; $i += 3)  # another FOR-loop structure
		{	my $codon = substr($NTs{$header},$i,3);             # $codon = 3 nts at a time
			my $aa = $CodonTable{$codon};       # here's the translation step
			$protein .= $aa;
		}
		$PRTs{$header} = $protein;
	}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Similarity 
{	# call &Similarity($seq1, $seq2)
	# Determines score of best alignment of strings $seq1 and $seq2
	# Score values are stored in @M
	# Returns max alignment score
	# Calls subroutines &ID and &MAX
	#. . . . . . . . . . . . . . . . .
    my($s,$t) = @_;  # sequences to be aligned.
    foreach my $i (0..length($s)) { $M[$i][0] = $g * $i; }
    foreach my $j (0..length($t)) { $M[0][$j] = $g * $j; }
	
    foreach my $i (1..length($s)) 
	{	foreach my $j (1..length($t)) 
		{	my $p =  &ID(substr($s,$i-1,1),substr($t,$j-1,1));
			$M[$i][$j] = &MAX($M[$i-1][$j] + $g, $M[$i][$j-1] + $g,$M[$i-1][$j-1] + $p);
		}
    }
    return ( $M[length($s)][length($t)] );
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ID 
{  # call &ID(char1,char2)
    my ($aa1, $aa2) = @_;
    return ($aa1 eq $aa2)?1:-1;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub MAX
{	# find max value
	# call &MAX(default value, other values . . . )
	my ($m,@l) = @_;
    foreach my $x (@l) { $m = $x if ($x > $m); }
    return $m;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Alignment
{	# call &Alignment(seq1,seq2)
    my ($s,$t) = @_;  ## sequences to be aligned.
    my ($i,$j) = (length($s), length($t));
    return ( "-"x$j, $t) if ($i==0);
    return ( $s, "-"x$i) if ($j==0);
    my ($sLast,$tLast) = (substr($s,-1),substr($t,-1));
    
    if ($M[$i][$j] == $M[$i-1][$j-1] + &ID($sLast,$tLast)) 
	{ ## Case 1: last letters are paired in the best alignment
		my ($sa, $ta) = &Alignment(substr($s,0,-1), substr($t,0,-1));
		return ($sa . $sLast , $ta . $tLast );
    } 
	elsif ($M[$i][$j] == $M[$i-1][$j] + $g) 
	{ ## Case 2: last letter of the first string is paired with a gap
		my ($sa, $ta) = &Alignment(substr($s,0,-1), $t);
		return ($sa . $sLast , $ta . "-");
    } 
	else 
	{ ## Case 3: last letter of the 2nd string is paired with a gap
		my ($sa, $ta) = &Alignment($s, substr($t,0,-1));
		return ($sa . "-" , $ta . $tLast );
    }
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# ---------------------------------------------------------
sub LoadCodons
{
	$/=">";
	my $Table = shift(@_);
	my @TABLE = <DATA>;
	foreach my $j (@TABLE)
	{	if ($j =~ m/^ (\d){1,2} $Table/)
		{	my @k = split(/\n/,$j);
			$k[1] =~ s/Amino  //;
			foreach my $i (1..3)
			{	$k[$i+1] =~ s/Base$i  //; }
			my @AA = split(//,$k[1]);
			my @B1 = split(//,$k[2]);
			my @B2 = split(//,$k[3]);
			my @B3 = split(//,$k[4]);
			foreach my $i (0..63)
			{	$CodonTable{$B1[$i].$B2[$i].$B3[$i]} = $AA[$i]; }
		}
	}
	# foreach my $nnn (keys %CodonTable)
	# {  print "$nnn = $CodonTable{$nnn}\n";}
	$/="\n";   # reset back to default before leaving subroutine
}
# ---------------------------------------------------------
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# The lines below are not perl statements and are not executed as part of the 
# program.  Instead, they are available to be read as data input by the program
# using the I/O handle name "DATA". This is a default handle name for any data 
# you want to include in a script file.
__END__
> 0 Codon Translation Tables
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1
> 1 Standard
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
> 11 Bacteria and Archea
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
}}}
[[BACK to Exam|MidTerm]]
!!!
!Needleman-Wunsch Algorithms
Here are the subroutines that Dweyer has provided for a simple implementation the NW alignment stratgey:
{{{
#########################################
sub p 
{  
## given two amino acids or nucleotides, compares
## them and returns a match reward (+1) or mismatch
## penalty (-1).  For amino acids, it should normally
## be replaced with a more complicated function.
## RETURNS: numerical reward/penalty.
#########################################
    my ($aa1, $aa2) = @_;  ## residues/bases to be compared.
    return ($aa1 eq $aa2)?1:-1;
}
#########################################
}}}
{{{
#########################################
sub max 
{
## Given any positive number of numerical arguments,
## RETURNS: the largest.
#########################################
    my ($m,@l) = @_; ## numerical values.
    foreach my $x (@l) { $m = $x if ($x > $m); }
    return $m;
}
#########################################
}}}
{{{
#########################################
sub similarity 
{
##  Determines score of best alignment of strings $s and $t
##  by filling in alignment matrix @M.
##  RETURNS: nothing; fills @M.
#########################################
    my($s,$t) = @_;  ## sequences to be aligned.
    foreach my $i (0..length($s)) { $M[$i][0] = $g * $i; }
    foreach my $j (0..length($t)) { $M[0][$j] = $g * $j; }
    foreach my $i (1..length($s)) {
	foreach my $j (1..length($t)) {
	    my $p =  p(substr($s,$i-1,1),substr($t,$j-1,1));
	    $M[$i][$j] = 
                max($M[$i-1][$j] + $g,
                    $M[$i][$j-1] + $g,
                    $M[$i-1][$j-1] + $p);
	}
    }
    return ( $M[length($s)][length($t)] );
}
#########################################
}}}
{{{
#########################################
sub getAlignment 
{
##  Reconstructs best alignment of strings $s and $t using information
##  stored in alignment matrix @M by similarity.  Recursive.
##  RETURNS: list of two strings representing best alignments.
##     These strings are $s and $t with gap symbols inserted.
#########################################
    my ($s,$t) = @_;  ## sequences to be aligned.
    my ($i,$j) = (length($s), length($t));
    return ( "-"x$j, $t) if ($i==0);
    return ( $s, "-"x$i) if ($j==0);
    my ($sLast,$tLast) = (substr($s,-1),substr($t,-1));
    
    if ($M[$i][$j] == $M[$i-1][$j-1] + p($sLast,$tLast)) { ## Case 1
        ## last letters are paired in the best alignment
	my ($sa, $ta) = getAlignment(substr($s,0,-1), substr($t,0,-1));
	return ($sa . $sLast , $ta . $tLast );
    } elsif ($M[$i][$j] == $M[$i-1][$j] + $g) { ## Case 2
        ## last letter of the first string is paired with a gap
	my ($sa, $ta) = getAlignment(substr($s,0,-1), $t);
	return ($sa . $sLast , $ta . "-");
    } else { ## Case 3: last letter of the 2nd string is paired with a gap
	my ($sa, $ta) = getAlignment($s, substr($t,0,-1));
	return ($sa . "-" , $ta . $tLast );
    }
}
#########################################
}}}
[[BACK to Lecture 6|L06]]
!!!
!Dynamic Programming Tables

__REFERENCE:__
Needleman, S.B. and C.D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins. //Journal Molecular Biology// ''48'',443-453.

The authors algorithm presented in this paper falls into a broad classification of code blocks known as ''Dynamic Programming''. Basically, this type of approach allows for recursive iterations to solve problems with the level of recursiveness and number of iterations determined at run time based on the current progress of the algorithm. 
{{{
# Program calls subroutine LOOP:
    &LOOP($i,$j);

#-------------------
sub LOOP
{     my ($x, $y) = @_;
      my $m = somefunction<<$x>>;
      my $n = somefunction<<$y>>;
      &LOOP($m,$n);  # But LOOP calls itself again and again . . . 
}
}}}

!Working Code:
Save this script as: ''01.0-~NeedlemanWunschAlign.pl''
{{{
#!/usr/bin/perl
use strict;
# - - - - - H E A D E R - - - - - - - - - - - - - - - - -
################################################################
# An implementation of a Dynamic Programming Table for sequence
#   alignment using the Needleman-Wunsch Algorithm.
# Source: Genomic Perl: From Bioinformatics Basics to Working Code
#         Copyright (c) 2002 Rex A. Dwyer.
################################################################

# - - - - - U S E R    V A R I A B L E S - - - - - - - -
my $seq1 = "CATDOGHOUSE";
my $seq2 = "BATHOGBIRDHOUSE";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @M;        # alignment matrix; filled by similarity scores
my $g = -0.5;   # gap penalty

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nNeedleman-Wunsch Dynamic Programming Table Alignment:\n";

# 1. Run similarity score first . . . . . . 
	print "      Similarity score: ", &Similarity($seq1,$seq2), "\n";
	
# 2. Find the alignment for that similarity score . . . . 
	print "      Alignment: \n";
	foreach my $x (&Alignment($seq1,$seq2)) 
	{	print "                 ",$x,"\n"; }

&Dump;

print "\n\n    * * * D O N E * * *\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Similarity 
{	# call &Similarity($seq1, $seq2)
	# Determines score of best alignment of strings $seq1 and $seq2
	# Score values are stored in @M
	# Returns max alignment score
	# Calls subroutines &ID and &MAX
	#. . . . . . . . . . . . . . . . .
    my($s,$t) = @_;  # sequences to be aligned.
    foreach my $i (0..length($s)) { $M[$i][0] = $g * $i; }
    foreach my $j (0..length($t)) { $M[0][$j] = $g * $j; }
	
    foreach my $i (1..length($s)) 
	{	foreach my $j (1..length($t)) 
		{	my $p =  &ID(substr($s,$i-1,1),substr($t,$j-1,1));
			$M[$i][$j] = &MAX($M[$i-1][$j] + $g, $M[$i][$j-1] + $g,$M[$i-1][$j-1] + $p);
		}
    }
    return ( $M[length($s)][length($t)] );
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ID 
{  # call &ID(char1,char2)
    my ($aa1, $aa2) = @_;
    return ($aa1 eq $aa2)?1:-1;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub MAX
{	# find max value
	# call &MAX(default value, other values . . . )
	my ($m,@l) = @_;
    foreach my $x (@l) { $m = $x if ($x > $m); }
    return $m;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Alignment
{	# call &Alignment(seq1,seq2)
    my ($s,$t) = @_;  ## sequences to be aligned.
    my ($i,$j) = (length($s), length($t));
    return ( "-"x$j, $t) if ($i==0);
    return ( $s, "-"x$i) if ($j==0);
    my ($sLast,$tLast) = (substr($s,-1),substr($t,-1));
    
    if ($M[$i][$j] == $M[$i-1][$j-1] + &ID($sLast,$tLast)) 
	{ ## Case 1: last letters are paired in the best alignment
		my ($sa, $ta) = &Alignment(substr($s,0,-1), substr($t,0,-1));
		return ($sa . $sLast , $ta . $tLast );
    } 
	elsif ($M[$i][$j] == $M[$i-1][$j] + $g) 
	{ ## Case 2: last letter of the first string is paired with a gap
		my ($sa, $ta) = &Alignment(substr($s,0,-1), $t);
		return ($sa . $sLast , $ta . "-");
    } 
	else 
	{ ## Case 3: last letter of the 2nd string is paired with a gap
		my ($sa, $ta) = &Alignment($s, substr($t,0,-1));
		return ($sa . "-" , $ta . $tLast );
    }
}

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Dump
{	# output score matrix . . . . . 
	open(OUT,">Mscore3.txt");
	print OUT "X\tY\tM\n";
	foreach my $i (0..length($seq1))
	{	foreach my $j (0..length($seq2))
		{	print OUT "$i\t$j\t$M[$i][$j]\n"; }
	}
	close(OUT);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - EOF - - - - - - - - - - - - - - - - - - - - - -
}}}

!
[[Back to Main Exam Page|HappyDay]]
!!!
!Question 3:

''3.'' To answer this question you will need to translate the FFN file from question 2 into amino acid sequences. Now using the script you just wrote for question 2 and the script for calculating amino acid frequencies (AAfreqCountTable) [with editing as necessary], determine whether the position of a nucleotide in a codon (1st, 2nd or 3rd) impacts the representation of a target amino acid for each gene/protein in the //Anaeromyxobacter// genome. Specifically:

{{engindent{''A.''  First identify the most frequent nucleotide in the coding genes of //Anaeromyxobacter//.}}}
{{engindent{''B.''  Then select your favorite amino acid that has: at least 3 or more codons AND has the #1 nucleotide identified above in two or more of those codons.}}}
{{engindent{''C.''  For each protein in the  //Anaeromyxobacter// fasta files, calculate the frequency of your target amino acid (from B above).}}}
{{engindent{''D.''  For each gene/protein in the  //Anaeromyxobacter// fasta files, calculate the frequency of your target nucleotide (from A above) in each codon position for each gene. So you will need to edit your NT count script so that it looks at each codon position individually. (Hint: look at the code control in the [[&TranslateFasta|TransFasta]] subroutine).}}}
{{engindent{''E.''  Submit your code work to me.}}}
{{engindent{''F.''  Submit three plots to me: }}}
{{engindent{{{engindent{1. Target AA frequency vs. Target NT frequency in codon position 1, where each point shows the data for one protein/gene.
2. Target AA frequency vs. Target NT frequency in codon position 2, where each point shows the data for one protein/gene.
3. Target AA frequency vs. Target NT frequency in codon position 2, where each point shows the data for one protein/gene.}}}}}}
{{engindent{''G.''  Submit a summary statement describing your results/conclusions (250 words max).}}}

!
//{{{
config.options.chkSearchTitles=true;
config.options.chkSearchText=true;
config.options.chkSearchTags=true;
config.options.chkSearchFields=true;
config.options.chkSearchTitlesFirst=false;
config.options.chkSearchList=true;
config.options.chkSearchByDate=false;
config.options.chkSearchIncremental=true;
config.options.chkSearchShadows=false; 
//}}}
!What is PERL:
In computer programming, Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall, a linguist working as a systems administrator for NASA, in 1987, as a general purpose Unix scripting language to make @@ report processing easier.@@ The language provides powerful text processing facilities without the arbitrary data length limits of many contemporary Unix tools, making it the ideal language for manipulating text files. (from http://en.wikipedia.org/wiki/PERL)

Does your computer have PERL installed? All unix-type operating systems (Linux and OS X) have a PERL version installed by default. In contrast,  MS Windows verisons (2000, XP, Vista) do not.  You can check by opening a terminal window and at the command prompt (here just designated "prompt>") enter "perl -v" and hit <enter>:
{{{
prompt> perl -v
}}}
If PERL is installed, then you will get a brief summary of the current version index that is on your machine. If you need to install PERL, two common sources are listed below:

''PERL Sources:''
1. PERL Org - http://www.perl.org
<html><img src="00/perlorg.png" style="height:150px"></html>

2. For Windows (2000, XP, Vista): Active State - http://www.activestate.com/Products/activeperl/
<html><img src="00/activeperl.png" style="height:150px"></html>
''You will need the Windows x86 MSI file. Here is a local link to the current version available from Active state: [[EASY DOWNLOAD|00/ActivePerl-5.10.0.1003-MSWin32-x86-285500.msi]]
Full instructions for the install are described here: 
| [[Windows PERL Install]] |

!

<!--{{{-->
<div id='mainMenu' refresh='content' tiddler='MainMenu'></div>
<div id='sidebar'>
<div id='sidebarOptions' refresh='content' tiddler='SideBarOptions'></div>
<div id='sidebarTabs' refresh='content' force='true' tiddler='SideBarTabs'></div>
</div>
<div id='displayArea'>
<div id='messageArea'></div>
<div id='tiddlerDisplay'></div>
</div>
<!--}}}-->
/***
|''Name:''|PasswordOptionPlugin|
|''Description:''|Extends TiddlyWiki options with non encrypted password option.|
|''Version:''|1.0.2|
|''Date:''|Apr 19, 2007|
|''Source:''|http://tiddlywiki.bidix.info/#PasswordOptionPlugin|
|''Author:''|BidiX (BidiX (at) bidix (dot) info)|
|''License:''|[[BSD open source license|http://tiddlywiki.bidix.info/#%5B%5BBSD%20open%20source%20license%5D%5D ]]|
|''~CoreVersion:''|2.2.0 (Beta 5)|
***/
//{{{
version.extensions.PasswordOptionPlugin = {
	major: 1, minor: 0, revision: 2, 
	date: new Date("Apr 19, 2007"),
	source: 'http://tiddlywiki.bidix.info/#PasswordOptionPlugin',
	author: 'BidiX (BidiX (at) bidix (dot) info',
	license: '[[BSD open source license|http://tiddlywiki.bidix.info/#%5B%5BBSD%20open%20source%20license%5D%5D]]',
	coreVersion: '2.2.0 (Beta 5)'
};

config.macros.option.passwordCheckboxLabel = "Save this password on this computer";
config.macros.option.passwordInputType = "password"; // password | text
setStylesheet(".pasOptionInput {width: 11em;}\n","passwordInputTypeStyle");

merge(config.macros.option.types, {
	'pas': {
		elementType: "input",
		valueField: "value",
		eventName: "onkeyup",
		className: "pasOptionInput",
		typeValue: config.macros.option.passwordInputType,
		create: function(place,type,opt,className,desc) {
			// password field
			config.macros.option.genericCreate(place,'pas',opt,className,desc);
			// checkbox linked with this password "save this password on this computer"
			config.macros.option.genericCreate(place,'chk','chk'+opt,className,desc);			
			// text savePasswordCheckboxLabel
			place.appendChild(document.createTextNode(config.macros.option.passwordCheckboxLabel));
		},
		onChange: config.macros.option.genericOnChange
	}
});

merge(config.optionHandlers['chk'], {
	get: function(name) {
		// is there an option linked with this chk ?
		var opt = name.substr(3);
		if (config.options[opt]) 
			saveOptionCookie(opt);
		return config.options[name] ? "true" : "false";
	}
});

merge(config.optionHandlers, {
	'pas': {
 		get: function(name) {
			if (config.options["chk"+name]) {
				return encodeCookie(config.options[name].toString());
			} else {
				return "";
			}
		},
		set: function(name,value) {config.options[name] = decodeCookie(value);}
	}
});

// need to reload options to load passwordOptions
loadOptionsCookie();

/*
if (!config.options['pasPassword'])
	config.options['pasPassword'] = '';

merge(config.optionsDesc,{
		pasPassword: "Test password"
	});
*/
//}}}
/***
|''Name:''|PlasticCalendarPlugin|
|''Description:''|This plugin creates a custom Gregorian calendar|
|''Version:''|1.3.1|
|''Date:''|Mar 13, 2007|
|''Source:''|http://www.math.ist.utl.pt/~psoares/addons.html|
|''Documentation:''|[[PlasticCalendarPlugin Documentation|PlasticCalendarPluginDoc]]|
|''Author:''|Paulo Soares|
|''License:''|[[Creative Commons Attribution-Share Alike 3.0 License|http://creativecommons.org/licenses/by-sa/3.0/]]|
|''~CoreVersion:''|2.1.0|
***/
{{{
// --------------------------------------------------------------------
// Calendar
// --------------------------------------------------------------------

config.macros.calendar = {holidays: []};
config.macros.calendar.options = {
 // day week starts from (normally 0-Su or 1-Mo)
 calendarWeekStart: 0,
 calendarToday: "Today",
 calendarHoliday: "Holiday: ",
 calendarLongDateFormat: "0DD/0MM/YYYY",
 calendarShortDateFormat: "0DD/0MM",
 calendarTag: ["journal"]
};

/***************************************************************************
** Internal functions
***************************************************************************/
var cldTag;

config.macros.calendar.calendarIsHoliday = function(date) {
 var cm = config.macros.calendar;
 var longHoliday = date.formatString(cm.options.calendarLongDateFormat);
 var shortHoliday = date.formatString(cm.options.calendarShortDateFormat);
 for(var i = 0; i < cm.holidays.length; i++) {
 if(cm.holidays[i][0] == longHoliday || cm.holidays[i][0] == shortHoliday) {
 return cm.holidays[i];
 }
 }
 return null;
}

config.macros.calendar.onClickOtherDay = function(e) {
 var day = this.getAttribute('tiddlylink');
 story.displayTiddler(null,day,DEFAULT_EDIT_TEMPLATE);
 for(var i=0; i<cldTag.length;i++){
 story.setTiddlerTag(day, cldTag[i], 0);
 }
 story.focusTiddler(day,"text");
}

config.macros.calendar.getPopupText = function(title) {
 var popup_entries = store.getTiddlerText(title).split("\n");
 var popup = popup_entries[0];
 if(popup_entries.length>1) popup += " ...";
 return popup;
}

config.macros.calendar.findCalendar = function(child) {
 var parent;
 while (child && child.parentNode) {
 parent = child.parentNode;
 if (parent.id == "calendarWrapper") {
 return parent;
 }
 child = parent;
 }
 return null;
}

config.macros.calendar.selectDate = function(e) {
 if (!e) var e = window.event;
 var cm = config.macros.calendar;
 var calendar = cm.findCalendar(this);
 if (calendar) {
 var d = this.getAttribute("date");
 if (d != null) cm.makeCalendar(calendar, new Date(new Number(d)));
 }
 e.cancelBubble = true;
 if (e.stopPropagation) e.stopPropagation();
 return false;
}

config.macros.calendar.makeCalendar = function(calendar, dt_current) {
 var cm = config.macros.calendar;
 var currentDay = new Date(new Number(calendar.getAttribute("currentDay")));
 var setControls = calendar.getAttribute("setControls");
 calendar.setAttribute("date", dt_current.valueOf());

 while (calendar.hasChildNodes())
 calendar.removeChild(calendar.firstChild);

if(setControls==1){
 // get same date in the previous year
 var dt_prev_year = new Date(dt_current);
 dt_prev_year.setFullYear(dt_prev_year.getFullYear() - 1);
 if (dt_prev_year.getDate() != dt_current.getDate())
 dt_prev_year.setDate(0);

 // get same date in the next year
 var dt_next_year = new Date(dt_current);
 dt_next_year.setFullYear(dt_next_year.getFullYear() + 1);
 if (dt_next_year.getDate() != dt_current.getDate())
 dt_next_year.setDate(0);

 // get same date in the previous month
 var dt_prev_month = new Date(dt_current);
 dt_prev_month.setMonth(dt_prev_month.getMonth() - 1);
 if (dt_prev_month.getDate() != dt_current.getDate())
 dt_prev_month.setDate(0);

 // get same date in the next month
 var dt_next_month = new Date(dt_current);
 dt_next_month.setMonth(dt_next_month.getMonth() + 1);
 if (dt_next_month.getDate() != dt_current.getDate())
 dt_next_month.setDate(0);
}

 // get first day to display in the grid for current month
 var dt_firstday = new Date(dt_current);
 dt_firstday.setDate(1);
 dt_firstday.setDate(1 - (7 + dt_firstday.getDay() - cm.options.calendarWeekStart) % 7);

 var area, header;
 var line, cell, i;

 // 1 - calendar header table
 // 2 - print weekdays titles
 // 3 - calendar days table 
calendar.cellPadding = 0;
calendar.cellSpacing = 0;
area = createTiddlyElement(calendar, "tbody");

 // 1 - calendar header table
 header = createTiddlyElement(area,"tr", "calendarHeader");
 header.cellPadding = 0;
 header.cellSpacing = 0;

if(setControls==1){ 
var headerValues = [
 [ "<<", "selectYear", dt_prev_year.valueOf() ],
 [ "<", "selectMonth", dt_prev_month.valueOf() ],
 [ config.messages.dates.months[dt_current.getMonth()] + ' ' + dt_current.getFullYear(),
 "selectToday", currentDay.valueOf() ],
 [ ">", "selectMonth", dt_next_month.valueOf() ],
 [ ">>", "selectYear", dt_next_year.valueOf() ]
 ];

 for (i = 0; i < headerValues.length; ++i) {
 var link = createTiddlyElement(header,"td", null, null, headerValues[i][0]);
 if(i==2) link.colSpan=3;
 link.onclick = cm.selectDate;
 link.setAttribute("date", headerValues[i][2]);
 }
} else {
 var link = createTiddlyElement(header,"td", null, null, 
config.messages.dates.months[dt_current.getMonth()] + ' ' + dt_current.getFullYear());
link.colSpan=7;
}

 // 2 - print weekdays titles
 line = createTiddlyElement(area, "tr", "weekNames");
 for (var n = 0; n < 7; ++n) {
 createTiddlyElement(line, "td", null, null, config.messages.dates.shortDays[(cm.options.calendarWeekStart + n)%7]);
 }

 // 3 - calendar days table
 var dt_current_day = new Date(dt_firstday);
 var day_class;
 var title;
 var holiday;
 var popup;
 var clickHandler;

 while (dt_current_day.getMonth() == dt_current.getMonth() ||
 dt_current_day.getMonth() == dt_firstday.getMonth()) {

 // print row header
 line = createTiddlyElement(area, "tr", "calendarLine", null, null);
 for (var n_current_wday = 0; n_current_wday < 7; ++n_current_wday) {
 title = dt_current_day.formatString(cm.options.calendarLongDateFormat);
 clickHandler = cm.onClickOtherDay;
 popup = null;
 holiday = cm.calendarIsHoliday(dt_current_day);

 if (holiday != null) {
 // holidays
 day_class = (holiday.length==3)? holiday[2]: "holiDay";
 popup = cm.options.calendarHoliday + holiday[1];
 } else if (dt_current_day.getDay() == 0 || dt_current_day.getDay() == 6) {
 // weekend days
 day_class = "weekDay";
 } else {
 // print working days of current month
 day_class = "workingDay";
 }

if(dt_current_day.getMonth() == dt_current.getMonth()){
 if (currentDay.valueOf() == dt_current_day.valueOf()) {
 // print current date
 if (store.tiddlerExists(title)){
 // day has a tiddler associated with it
 day_class += " currentscheduledDay";
 clickHandler = onClickTiddlerLink;
 popup = cm.options.calendarToday + ": "+ cm.getPopupText(title);
 } else {
 day_class += " currentDay";
 popup = cm.options.calendarToday;
}
}


 if (store.tiddlerExists(title) && store.getTiddler(title).isTagged(cldTag[0]))  {
 // day has a tiddler associated with it
 day_class += " scheduledDay";
 clickHandler = onClickTiddlerLink;
 popup = cm.getPopupText(title);
 }
}

 // extra formatting for days of previous or next month
 if (dt_current_day.getMonth() != dt_current.getMonth()) {
 day_class += " otherMonthDay";
 }

 var text = dt_current_day.getDate();
 var cell = createTiddlyElement(line, "td", null, day_class, text);
 cell.onclick=clickHandler;
 cell.setAttribute("date", dt_current_day.valueOf());
 cell.setAttribute("tiddlyLink", title);
 if(popup) cell.setAttribute("title", popup);
 dt_current_day.setDate(dt_current_day.getDate()+1);
 }
 }
}

config.macros.calendar.handler = function(place,macroName,params,wikifier,paramString,tiddler) {
 var start_date = new Array();
 var date = new Date();
 var cldParams = paramString.parseParams('calendarParams', null, true);
 var cldYear = (cldParams[0].year)?parseFloat(cldParams[0].year[0]): date.getFullYear();
 var cldMonth = (cldParams[0].month)?parseFloat(cldParams[0].month[0]): date.getMonth();
 var n_months = (cldParams[0].numberMonths)?parseFloat(cldParams[0].numberMonths[0]): 1;
 var n_cols = (cldParams[0].numberColumns)?parseFloat(cldParams[0].numberColumns[0]): 3;
 cldTag = (cldParams[0].tag)?cldParams[0].tag[0].split("#"): config.macros.calendar.options.calendarTag;
 for(var i = 0; i < n_months; i++){
 start_date[i] = new Date(cldYear, cldMonth+i, 1);
 }
 var n_rows = Math.max(1,Math.ceil(n_months/n_cols));
 n_cols = Math.min(n_cols,n_months);
 var setControls=(n_months>1)? 0: 1;
 var currentDay = new Date();
 currentDay = new Date(currentDay.getFullYear(), currentDay.getMonth(), currentDay.getDate());
 var holder = createTiddlyElement(place, "table", null,"calendarHolder");
 var holderTable = createTiddlyElement(holder, "tbody");
 for(var i = 0; i < n_rows; i++){
 var holderLine = createTiddlyElement(holderTable, "tr");
 for(var j = 0; j < n_cols; j++){
 var holderCell = createTiddlyElement(holderLine, "td");
 if(n_cols*i+j+1<=n_months){
 var calendar = createTiddlyElement(holderCell, "table", "calendarWrapper");
 calendar.setAttribute("name", "calendarWrapper");
 calendar.setAttribute("setControls", setControls);
 calendar.setAttribute("currentDay", currentDay.valueOf());
 config.macros.calendar.makeCalendar(calendar, start_date[n_cols*i+j]);
 }
 }
 }
}

function refreshCalendars(hint) {
 var calendars = document.getElementsByName("calendarWrapper");
 var i, c;
 for (i = 0; i < calendars.length; ++i) {
 c = calendars.item(i);
 if (c.id == "calendarWrapper") {
 config.macros.calendar.makeCalendar(c, new Date(new Number(c.getAttribute("date"))));
 }
 }
}

store.addNotification(null, refreshCalendars);

setStylesheet("/***\n!Calendar Styles\n***/\n/*{{{*/\n .viewer .calendarHolder {\n margin-left: auto;\n margin-right: auto;\n border: none;\n}\n\n .viewer .calendarHolder table {\n border: none;\n margin: 0;\n}\n\n .viewer .calendarHolder tr {\n border: none;\n vertical-align: top;\n}\n\n .viewer .calendarHolder td {\n border: none;\n vertical-align: top;\n}\n\n .viewer #calendarWrapper {\n width: 21em;\n border: 2px solid #4682b4;\n cursor: pointer;\n}\n\n #calendarWrapper #calendarLine td {\n height: 2.5em;\n}\n\n #calendarWrapper tr {\n border:none;\n}\n\n #calendarWrapper td {\n text-align: center;\n vertical-align: middle;\n width: 14.28%;\n border:none;\n}\n\n #calendarWrapper #calendarHeader td{\n color: #ffffff;\n background-color: #4682b4;\n height: 2em;\n}\n\n #calendarWrapper #weekNames td {\n color: #ffffff;\n background-color: #87cefa;\n height: 2em;\n}\n\n #calendarWrapper .weekDay {\n background-color: #ccff99;\n}\n\n #calendarWrapper .holiDay {\n background-color: #9acd32;\n}\n\n #calendarWrapper .currentDay {\n border: solid #ff0000 2px;\n font-weight: bold;\n}\n\n #calendarWrapper .currentscheduledDay {\n border: solid #ff0000 2px;\n font-weight: bold;\n}\n\n #calendarWrapper .workingDay {\n background-color: #ffffff;\n}\n\n #calendarWrapper .scheduledDay {\n border: solid orange 2px;\n}\n\n #calendarWrapper .otherMonthDay {\n background-color: #999;\n}\n\n/*}}}*/","CalendarStyles");

config.shadowTiddlers.PlasticCalendarPluginDoc="The documentation is missing. It is available [[here|http://www.math.ist.utl.pt/~psoares/addons.html#PlasticCalendarPluginDoc]].";
}}}
version.extensions.Holidays = {
 major: 1, minor: 1, revision: 0,
 date: new Date(2006, 4, 18), 
 type: 'config'
};

config.macros.calendar.holidays = [ ["01/01", "New Year's day"], ["25/12", "Christmas day", "Christian"] ];
[[BACK|L02.03]]
!!!Print Code Block A
At this point in the program the fasta text file has been slurped into the @FILE array variable: 
<html><img src="02/workcodefileread.png" style="height:150px"></html>

We want to look at just the first 3 entries in @FILE to see what's really there so uncomment the code lines to execute the print functions. Note that the actual value of $entry will be framed between brackets ([ and ]) and that this string "\n- - - - - - - \n" is added to separate between the individual elements in @FILE:
<html><img src="02/workcodeA.png" style="height:175px"></html>

The screen dump will sort of look like this. I have used color highlighting to better delineate what is what, and more importantly, I have ADDED to this representation {{blue{blue}}}-''bold'' line break characters "\n" to show you just where they are inside each //$entry//. The first entry in the array (which is indexed as $FILE[0]) just contains the first ">". The others all end with two characters that we do not want in the final sequence: "\n" and ">". ''NOTE:'' the value of the second element ($FILE[1]) is this when you remove the formatting imposed by the line breaks:
| Test Fasta 001''\n''agctcgatcgatggcgcgatatagcgcgtatagcgctagggatcgcgcgatagcgatagcgat''\n''>|

!!!
<html><img src="02/workprintA.png" style="height:300px"></html>

[[BACK|L02.03]]
!!!
[[BACK|L02.03]]
!!!Print Code Block B
At this point in the program we need to divide each entry in @FILE into separate lines so that we can get at the header information and sequence information: 
<html><img src="02/workcodelinesplit.png" style="height:100px"></html>

We want to look at the line elements (separated by \n) in each $orf within @FILE.  Uncomment the code lines in B to execute the print functions. The //''die''// statement is placed so that we will just look at the first $orf. Note that the actual values of $line will be framed between brackets ([ and ]) and that this string "\n- - - - -\n" is added to separate between the individual elements in @Lines.
<html><img src="02/workcodeB.png" style="height:175px"></html>

First thing to note is that there are no {{blue{blue}}}-''bold'' line break characters "\n" within any //$line//. The ''split'' function divides $orf at the line breaks \n but removes them when it does. The first $line is the header. The next $line values will hold the sequence-chunks. The last $line is the ">" (except for on the very last $orf entry in @FILE - //thanks Glenn//).
!!!
<html><img src="02/workprintB.png" style="height:250px"></html>

| Here's a tweak: what would happen if you changed the split operation by substituting ''agga'' for ''\n''?  Would it break your computer? The cool thing is that the split operator can work on any character string, not just individual characters. |

[[BACK|L02.03]]
!!!
[[BACK|L02.03]]
!!!Print Code Block C
At this point in the program we have concatenated the $line sequence parts into a string variable called $seq. A regex expression is then used to remove the ">" character if it exists. If it doesn't exist, then nothing is done to $seq. The switch function works like this {{{/find-pattern/replace-pattern/}}}. Here, the replace pattern is "empty" (nothing) so it essentially results in a simple deletion. 
{{{
$seq =~ s/>//;  # remove the ">" at the end if it is there
}}}

We want to look at the length of each sequence as a way to check the code.  Uncomment the code lines in C to execute the print functions. There is no //''die''// statement here so the whole program will run. 
<html><img src="02/workcodeC.png" style="height:125px"></html>

The base sequence used in this file is 63 nucleotides in length. So in the output, we are looking for multiples of 63 ending on the tenth sequence at 630 nts.
!!!
<html><img src="02/workprintC.png" style="height:500px"></html>


[[BACK|L02.03]]
!!!
[[Back to BLAST project page|BLASTproject]]
!!!
!Random Protein Assembly
This script will generate random protein sequences for a genome with defined %NT composition data.
{{{
#!/usr/bin/perl
use strict;
$|=1;

# - - - - - H E A D E R - - - - - - - - - - - - - - - - - - -
# Given an input of nucleotide frequencies and an amino acid
# translation table, this script outputs a FASTA formated file 
# of random protein sequences > 200 AAs in length.
# AGM2008

# - - - - - U S E R   V A R I A B L E S - - - - - - - - - - -
my $outfile = "BabelSeqs-02DEC.txt";

# Nucleotide Frequencies: p(A), p(G), p(T), p(C)
my @Freq = (0.24, 0.28, 0.22, 0.26);
my $NTgene = 2000;              # raw nt gene size
my $SizeThresh = 200;           # minimum number of amino acids
my $codontable = "Standard";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @NT = qw | A G T C |;
my @SEED;
my %CodonTable;
my %P;
foreach my $i (0..3)
{	$P{$NT[$i]} = $Freq[$i]; }

# SET UP the SEED sequence for nt selection . . . . 
my $start = 0;
my $end = 0;
foreach my $nt (@NT)
{	$start = $end;
	$end = $start + $P{$nt}*1000-1;
	foreach my $i ($start..$end)
	{	push(@SEED, $nt); }
}

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
	&LoadCodons($codontable,\%CodonTable);
	open(OUT,">$outfile");
	my $count = 1;
	my $cycles = 1;
	while ($count <= 100)
	{	&Shuffle(\@SEED);
		my @SEQs = ();
		my $SEQ1 = "";
		
	# Build the target sequence . . . . . . 
		foreach my $i (0..$NTgene-1)
		{	$SEQ1 .= $SEED[int(rand(999))]; }

	# Extract six reading frames . . . 		
		push(@SEQs,$SEQ1);
		my $SEQ2 = substr($SEQ1,1,length($SEQ1));
		push(@SEQs,$SEQ2);		
		my $SEQ3 = substr($SEQ1,2,length($SEQ1));
		push(@SEQs,$SEQ3);
		my $SEQ4 = reverse($SEQ1);
		$SEQ4 =~ tr/ATGC/TACG/;
		push(@SEQs,$SEQ4);
		my $SEQ5 = substr($SEQ4,1,length($SEQ4));
		push(@SEQs,$SEQ5);		
		my $SEQ6 = substr($SEQ4,2,length($SEQ4));
		push(@SEQs,$SEQ6);		
		
	# Look for open reading frames . . . . .  
		foreach my $SEQ (@SEQs)
		{	my $PSEQ = &TranslateSeq($SEQ);
			my $lim = int(($NTgene/3)/4); 
			$PSEQ =~ s/^.*?M/M/;             # Trim to first upstream M
			$PSEQ =~ s/\*.*?$/*/;            # Trim to last downstream *
			if ($PSEQ =~ m/^(M[A-Z]+\*$)/ && length($PSEQ)>$SizeThresh )
			{	my $len = length($PSEQ);
				my ($Hnt, $Hcd, $Haa) = &Entropy($SEQ,$PSEQ);
				print "$cycles = $count. $len-NTs, Hnt= $Hnt; Hcd= $Hcd; Haa= $Haa\n$PSEQ\n\n"; 
				print OUT "> $len-NTs, run count= $count, run cycle= $cycles, Hnt=$Hnt; Hcd=$Hcd; Haa=$Haa\n$PSEQ\n";
				$count += 1;
			}
			
		}
		$cycles += 1;
	}
	
	close(OUT);
	
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# ---------------------------------------------------------
sub Entropy
{
	my ($seq, $pseq) = @_;
	my ($H1,$H2,$H3);
	my @AAref = qw | A C D E F G H I K L M N P Q R S T V W Y |;
# NT entropy . . . . . . 
	my %C;
	$C{"A"} = ($seq =~ tr/A/A/)/length($seq);
	$C{"T"} = ($seq =~ tr/T/T/)/length($seq);
	$C{"G"} = ($seq =~ tr/G/G/)/length($seq);
	$C{"C"} = ($seq =~ tr/C/C/)/length($seq);
	foreach my $nt (@NT)
	{	$H1 += -1*$C{$nt}*&log2($C{$nt}); }
	
# codon entropy . . . . . . 
	my %CC;
	for (my $i=0; $i <= length($seq)-2; $i += 3)  
	{	$CC{substr($seq,$i,3)} += 1/(length($seq)/3);  }
	foreach my $codon (keys %CodonTable)
	{	if ($CC{$codon} > 0)
		{	$H2 += -1*$CC{$codon}*&log2($CC{$codon}); }
	}
	
# amino acid entropy . . . . . 
	my %AA;
	for (my $i=0; $i <= length($seq); $i += 1)  
	{	$AA{substr($pseq,$i,1)} += 1/(length($pseq));  }	
	foreach my $aa (@AAref)
	{	if ($AA{$aa} > 0)
		{	$H3 += -1*$AA{$aa}*&log2($AA{$aa});}
	}
	
	return(&Round($H1),&Round($H2),&Round($H3));
}
# ---------------------------------------------------------
sub Round
{	my $n = shift;
	return(int($n * 10000 + 5)/10000);
}
# ---------------------------------------------------------
sub log2 
{	my $n = shift;
	return (log($n)/log(2));
}
# ---------------------------------------------------------
sub LoadCodons
{
	$/=">";
	my ($Table, $Hash) = @_;
	my @TABLE = <DATA>;
	foreach my $j (@TABLE)
	{	if ($j =~ m/^ (\d){1,2} $Table/)
		{	my @k = split(/\n/,$j);
			$k[1] =~ s/Amino  //;
			foreach my $i (1..3)
			{	$k[$i+1] =~ s/Base$i  //; }
			my @AA = split(//,$k[1]);
			my @B1 = split(//,$k[2]);
			my @B2 = split(//,$k[3]);
			my @B3 = split(//,$k[4]);
			foreach my $i (0..63)
			{	${$Hash}{$B1[$i].$B2[$i].$B3[$i]} = $AA[$i]; }
		}
	}
	$/="\n"; # reset to default
}
# ---------------------------------------------------------
sub Shuffle
{
    my $array = shift(@_);
    my $i = $#{$array};
    while ($i--) 
	{	my $j = int rand($i+1);
        @{$array}[$i,$j] = @{$array}[$j,$i];
    }
}
# ---------------------------------------------------------
sub TranslateSeq
{	# Convert the NT sequence into AAs . . . . . .
	my $seq = $_[0];
	my $protein = "";           
	for (my $i=0; $i <= length($seq)-2; $i += 3)  
	{	$protein .= $CodonTable{substr($seq,$i,3)};  }
	return($protein);
}


# ---------------------------------------------------------
# - - - - - EOF - - - - - - - - - - - - - - - - - - - - - - 
# ---------------------------------------------------------
__END__
> 0 Codon Translation Tables
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c#SG1
> 1 Standard
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
> 11 Bacteria and Archea
Amino  FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Base1  TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3  TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
# - - - - - End of DATA - - - - - - - - - - - - - - - - - - - - - -
}}}
<html><img src="02/PMcover.jpg" style="height:200px"></html>
//Album: ''Bayou Country'', Creedence Clearwater Revival, 1969.//
!!
"Proud Mary" is a song written by American singer and guitarist John Fogerty. It was first recorded by rock band Creedence Clearwater Revival (in which Fogerty played lead guitar and sang lead vocals) on the 1969 album Bayou Country. Released as a single in January of 1969, it became the band's first top ten hit on the U.S. Pop chart, peaking at number two. 
(from http://en.wikipedia.org/wiki/Proud_Mary).

| Contrary to the popular opinion in the class, Tina Turner did not write this song. She did a cover version in 1971 that became one of her signature songs, but she didn't write it. |

You can see the CCR version on [[YouTube|http://www.youtube.com/watch?v=1oqSxkGCAm8]]
<html><img src="02/Fogerty.png" style="height:200px"></html>

[[BACK|L02]]
!

[[BACK to Log of Odds Page|LOD]]
!!!
!Entropy Calc Skeleton Script:
{{{
#!/usr/bin/perl
use strict;

# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# 22OCT Lecture 7.
# Quick Entropy Calculation

# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $QuerySeq = "GACTAATAATGACGCTAGCTAGCTAGCTAGCATTATATAGGCGATATCAG";

# Nucleotide Frequencies: p(A), p(G), p(T), p(C)
	my @Fcode = (0.25, 0.28, 0.21, 0.26);
	my @Fnot = (0.15, 0.38, 0.13,0.34);

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @NT = qw | A G T C |;
my %Pcode;
my %Pnot;
foreach my $i (0..3)
{	$Pcode{$NT[$i]} = $Fcode[$i]; 
	$Pnot{$NT[$i]} = $Fnot[$i];
}

# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nRUNNING . . . . \n\n";
print "Query Sequence: $QuerySeq\n";
print "\nNT frequencies in coding sequence:\n";
foreach my $nt (@NT)
{	print "      $nt = $Pcode{$nt}\n"; }
print "\nNT frequencies in noncoding sequence:\n";
foreach my $nt (@NT)
{	print "      $nt = $Pnot{$nt}\n"; }
print "\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - -


print "\n\n   DONE   \n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -

}}}
!!!
[[BACK to Working Code|CodeWorks]]
!!!
!FASTA read:
Just opens a NT fasta file and stores the header and sequence information for each ORF. 

{{{
    # Must declare this global HASH array
            my %NTs; 

    # Call subroutine with the name of the input file
            &ReadFasta($infile);
        
# Subroutine Code:
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub ReadFasta
{   my $file = $_[0];
	$/=">";
	open(FASTA,"<$file") or die "\n\n\n Nada $file\n\n\n";
	@FILE=<FASTA>;
	close(FASTA);
	shift(@FILE); 
	foreach my $orf (@FILE)
	{	my @Lines = split(/\n/,$orf);
		my $name = $Lines[0];
		my $seq = "";
		foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i]; }
		$seq =~ s/>//;
		$NTs{$name} = $seq;
	}
       $/="\n"; # reset input break back to default
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -

}}}
 
!Course Resources:
1. [[Course text book Dwyer's "Genomic PERL"|Text Book]]
2. [[PERL]]
3. [[Running PERL]]
4. [[Code editors|Editors]]
5. [[DBI Bioinformatic Cluster, Biowolf|Biowolf]]
6. [[Secure Shell File Transfer|SSH]]
7. [[Emily's PC command window summary|00/EmilyBasicWindowsStuff.pdf]]
8. [[Command Reference Page|Commands]]
!

!!!
[[Back to Working Code List|CodeWorks]]
!!!
!Rounding significant digits
It's often confusing for us mere humans to cope with 12 digit numbers (and mostly meaningless as well in terms of biological significance). Here's a routine to round numbers that relies on the ''integer'' function, which just drops any decimal digits. 
__''The number of digits retained is set by the power of ten (exponents) used in the calculation.''__
{{{
# Call the subroutine:
       &Round(--any numeric variable or expression--);

# Subroutine Code:
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Round
{ 	my $x = @_[0];
	my $x = (int(($x*10**4) + 0.5)/10**4);
	return $x;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
}}}
[[Back to Assignment 3|WordCompareBenchmark]]
!!!
!BIOWOLF Shell Script
You will use this script below to send your jobs to the parallel Sun Grid Engine named @@[[Biowolf]]@@ at DBI. The code is listed here, but it is better if you directly download the file:
@@[[DIRECT DOWNLOAD 00-RUN.sh|05/00-RUN.sh]]@@

{{{
#!/bin/sh -x
#$ -cwd
#$ -S /bin/sh
#$ -j y
#$ -pe mpi 1
#$ -M email-address-here
#$ -m ae
#$ -N jobname-here

perl 01.4-WordCompare-LOOP.pl



}}}
!PERL on a MS Windows operating system
Active PERL ([[Active State|http://www.activestate.com/Products/activeperl/]]) is provided in an Microsoft Installer Package that will handle most of the details of the install. The default install location is "C:\Perl". It doesn't matter where PERL is installed, what matters is that you know where it is. Also, the installer should add the perl path statement (which with a default install is "C:\Perl\bin") to the PATH environmental variable. Instructions for checking this can be found [[HERE| http://www.computerhope.com/issues/ch000549.htm]]. You can check if PERL is installed by opening a terminal window and at the command prompt (here just designated "prompt>") enter "perl -v" and hit <enter>:
{{{
prompt> perl -v
}}}
If PERL is installed, then you will get a brief summary of the current version index that is on your machine. If you need to install PERL, two common sources are listed below:

!PERL test script
Here's the archetypical "hello world" program to test your perl skills at this point. 
''1.'' Copy the lines below and paste them into a new text file via your favorite flavor of text editor.
{{{
#!PATH-2-PERL
# - - - - - M A I N - - - - - - - - - 
print "\n\n\nLeft a good job in the city,\n";
print "workin for the man every night and day\n\n\n";
# - - - - - EOF - - - - - - - - - - - 
}}}

''2.'' EDIT the "PATH-2-PERL" string by putting the exact path to where the perl.exe program resides on your machine. On a windows PC with a default Active Perl install, this should be "C:\Perl\bin". On unix-flavor machines, you can find where perl is installed by entering the following into a command window:
{{{
prompt> which perl
}}}

''3.'' SAVE the file as "00-TestPerl.pl"

''4.'' Open a command terminal window and navigate to the folder where you just saved "00-TestPerl.pl". This is usually accomplished using these two commands (where folder = the name of a folder within the current folder where the prompt is located):
{{{
prompt> cd folder  # go down one level to folder
propmt> cd ..        # go up one level in tree
}}}
For those of you with Windows PCs, more information about navigating in command mode can be found [[HERE| http://ourworld.compuserve.com/homepages/jsuebersax/dos.htm#paths]]

''5.'' Once you are in the folder where "00-TestPerl.pl" is located, you can run the script as follows:
@@FOR WINDOWS PCs@@
Enter the full filename at the command prompt:
{{{
prompt> 00-TestPerl.pl
}}}
@@FOR UNIX-FLAVORS@@
Before the script can be run, it has to be "given" permission to run as an executable file. This is accomplished with the //chmod// command:
{{{
prompt> chmod 755 00-TestPerl.pl
prompt> 00-TestPerl.pl
}}}

''6.'' The print statement should execute sending this message to the screen (or whatever message you put into the script):
{{{
Left a good job in the city, 
workin for the man every night and day
}}}
@@ Be the 7th person to email the name of this song and you will win, . . . not 2,  . . . not 4, but ''+5'' extra credit points on your first homework assignment. @@ 

[[BACK|Resource Index]]
!
@@. ''SSH is an ecrypted file transfer protocol that replaces telnet or ftp'' .@@
!!1. For Windows OS: 
SSH and SSHclient are available from the UD Network software pages at http://www.udel.edu/network:
[img[00/udnetwork01.png]]
[img[00/udnetwork02.png]]


!!2. For Mac OS X:
It's already installed and running, but you in order to mount a remote system as a system drive, you need the macfuse update at (at least for Tiger): http://code.google.com/p/macfuse/
<html><img src="00/macfuse.png" style="height:400px"></html>


!!3. For linux/unix flavors:
It is already installed, running and can readily be mounted as a system drive. I like the KDE desktop file manager because it automatically mounts ssh connections as "drives".
!!!
//{{{
window.reportSearchResults=function(text,matches)
{
	var title=config.macros.search.reportTitle
	var q = config.options.chkRegExpSearch ? "/" : "'";
	var body="\n";

	// numbered list of links to matching tiddlers
	body+="\n<<<";
	for(var t=0;t<matches.length;t++) {
		var date=config.options.chkSearchByDate?(matches[t].modified.formatString('YYYY.0MM.0DD 0hh:0mm')+" "):"";
		body+="\n# "+date+"[["+matches[t].title+"]]";
	}
	body+="\n<<<\n";

	// create/update the tiddler
	var tiddler=store.getTiddler(title); if (!tiddler) tiddler=new Tiddler();
	tiddler.set(title,body,config.options.txtUserName,(new Date()),"excludeLists excludeSearch");
	store.addTiddler(tiddler); story.closeTiddler(title);

	// use alternate "search again" label in <<search>> macro
	var oldprompt=config.macros.search.label;
	config.macros.search.label="search again";

	// render/refresh tiddler
	story.displayTiddler(null,title,1);
	store.notify(title,true);

	// restore standard search label
	config.macros.search.label=oldprompt;

}

//}}}
/***
|Name|SearchOptionsPlugin|
|Source|http://www.TiddlyTools.com/#SearchOptionsPlugin|
|Documentation|http://www.TiddlyTools.com/#SearchOptionsPluginInfo|
|Version|2.6.1|
|Author|Eric Shulman - ELS Design Studios|
|License|http://www.TiddlyTools.com/#LegalStatements <br>and [[Creative Commons Attribution-ShareAlike 2.5 License|http://creativecommons.org/licenses/by-sa/2.5/]]|
|~CoreVersion|2.1|
|Type|plugin|
|Requires||
|Overrides|Story.prototype.search, TiddlyWiki.prototype.search, config.macros.search.onKeyPress|
|Description|extend core search function with additional user-configurable options|
Extend core search function with additional user-configurable options including generating a ''list of matching tiddlers'' instead of immediately displaying all matches.
!!!!!Documentation
>see [[SearchOptionsPluginInfo]]
!!!!!Configuration
<<<
<<option chkSearchTitles>> Search in titles
<<option chkSearchText>> Search in tiddler text
<<option chkSearchTags>> Search in tags
<<option chkSearchFields>> Search in data fields
<<option chkSearchShadows>> Search shadow tiddlers
<<option chkSearchTitlesFirst>> Show title matches first
<<option chkSearchByDate>> Sort matching tiddlers by date
<<option chkSearchList>> Show list of matches in [[SearchResults]]
<<option chkSearchIncremental>> Incremental (key-by-key) searching
<<<
!!!!!Revisions
<<<
2007.02.17 [2.6.1] added redefinition of config.macros.search.onKeyPress() to restore check to bypass key-by-key searching (i.e., when chkSearchIncremental==false), which had been unintentionally removed with v2.6.0
|please see [[SearchOptionsPluginInfo]] for additional revision details|
2005.10.18 [1.0.0] Initial Release
<<<
!!!!!Code
***/
//{{{
version.extensions.searchOptions = {major: 2, minor: 6, revision: 1, date: new Date(2007,2,17)};

if (config.options.chkSearchTitles===undefined) config.options.chkSearchTitles=true;
if (config.options.chkSearchText===undefined) config.options.chkSearchText=true;
if (config.options.chkSearchTags===undefined) config.options.chkSearchTags=true;
if (config.options.chkSearchFields===undefined) config.options.chkSearchFields=true;
if (config.options.chkSearchTitlesFirst===undefined) config.options.chkSearchTitlesFirst=false;
if (config.options.chkSearchList===undefined) config.options.chkSearchList=false;
if (config.options.chkSearchByDate===undefined) config.options.chkSearchByDate=false;
if (config.options.chkSearchIncremental===undefined) config.options.chkSearchIncremental=true;
if (config.options.chkSearchShadows===undefined) config.options.chkSearchShadows=false;

if (config.optionsDesc) {
	config.optionsDesc.chkSearchTitles="Search in tiddler titles";
	config.optionsDesc.chkSearchText="Search in tiddler text";
	config.optionsDesc.chkSearchTags="Search in tiddler tags";
	config.optionsDesc.chkSearchFields="Search in tiddler data fields";
	config.optionsDesc.chkSearchShadows="Search in shadow tiddlers";
	config.optionsDesc.chkSearchTitlesFirst="Search results show title matches first";
	config.optionsDesc.chkSearchList="Search results show list of matching tiddlers";
	config.optionsDesc.chkSearchByDate="Search results sorted by modification date ";
	config.optionsDesc.chkSearchIncremental="Incremental searching";
} else {
	config.shadowTiddlers.AdvancedOptions += "\n<<option chkSearchTitles>> Search in tiddler titles"
		+"\n<<option chkSearchText>> Search in tiddler text"
		+"\n<<option chkSearchTags>> Search in tiddler tags"
		+"\n<<option chkSearchFields>> Search in tiddler data fields"
		+"\n<<option chkSearchShadows>> Search in shadow tiddlers"
		+"\n<<option chkSearchTitlesFirst>> Search results show title matches first"
		+"\n<<option chkSearchList>> Search results show list of matching tiddlers"
		+"\n<<option chkSearchByDate>> Search results sorted by modification date"
		+"\n<<option chkSearchIncremental>> Incremental searching";
}

if (config.macros.search.reportTitle==undefined)
	config.macros.search.reportTitle="SearchResults";

config.macros.search.onKeyPress = function(e)
{
	if(!e) var e = window.event;
	switch(e.keyCode)
		{
		case 13: // Ctrl-Enter
		case 10: // Ctrl-Enter on IE PC
			config.macros.search.doSearch(this);
			break;
		case 27: // Escape
			this.value = "";
			clearMessage();
			break;
		}
	if (config.options.chkSearchIncremental) {
		if(this.value.length > 2)
			{
			if(this.value != this.getAttribute("lastSearchText"))
				{
				if(config.macros.search.timeout)
					clearTimeout(config.macros.search.timeout);
				var txt = this;
				config.macros.search.timeout = setTimeout(function() {config.macros.search.doSearch(txt);},500);
				}
			}
		else
			{
			if(config.macros.search.timeout)
				clearTimeout(config.macros.search.timeout);
			}
	}
}
//}}}

//{{{
Story.prototype.search = function(text,useCaseSensitive,useRegExp)
{
	highlightHack = new RegExp(useRegExp ? text : text.escapeRegExp(),useCaseSensitive ? "mg" : "img");
	var matches = store.search(highlightHack,config.options.chkSearchByDate?"modified":"title","excludeSearch");
	if (config.options.chkSearchByDate) matches=matches.reverse(); // most recent changes first
	var q = useRegExp ? "/" : "'";
	clearMessage();
	if (!matches.length) {
		if (config.options.chkSearchList) discardSearchResults();
		displayMessage(config.macros.search.failureMsg.format([q+text+q]));
	} else {
		if (config.options.chkSearchList) 
			reportSearchResults(text,matches);
		else {
			var titles = []; for(var t=0; t<matches.length; t++) titles.push(matches[t].title);
			this.closeAllTiddlers(); story.displayTiddlers(null,titles);
			displayMessage(config.macros.search.successMsg.format([matches.length, q+text+q]));
		}
	}
	highlightHack = null;
}

TiddlyWiki.prototype.search = function(searchRegExp,sortField,excludeTag)
{
	var candidates = this.reverseLookup("tags",excludeTag,false,sortField);

	// scan for matching titles first...
	var results = [];
	if (config.options.chkSearchTitles) {
		for(var t=0; t<candidates.length; t++)
			if(candidates[t].title.search(searchRegExp)!=-1)
				results.push(candidates[t]);
		if (config.options.chkSearchShadows)
			for (var t in config.shadowTiddlers)
				if ((t.search(searchRegExp)!=-1) && !store.tiddlerExists(t))
					results.push((new Tiddler()).assign(t,config.shadowTiddlers[t]));
	}
	// then scan for matching text, tags, or field data
	for(var t=0; t<candidates.length; t++) {
		if (config.options.chkSearchText && candidates[t].text.search(searchRegExp)!=-1)
			results.pushUnique(candidates[t]);
		if (config.options.chkSearchTags && candidates[t].tags.join(" ").search(searchRegExp)!=-1)
			results.pushUnique(candidates[t]);
		if (config.options.chkSearchFields && store.forEachField!=undefined) // requires TW2.1 or above
			store.forEachField(candidates[t],
				function(tid,field,val) { if (val.search(searchRegExp)!=-1) results.pushUnique(candidates[t]); },
				true); // extended fields only
	}
	// then check for matching text in shadows
	if (config.options.chkSearchShadows)
		for (var t in config.shadowTiddlers)
			if ((config.shadowTiddlers[t].search(searchRegExp)!=-1) && !store.tiddlerExists(t))
				results.pushUnique((new Tiddler()).assign(t,config.shadowTiddlers[t]));

	// if not 'titles first', or sorting by modification date,  re-sort results to so titles, text, tag and field matches are mixed together
	if(!sortField) sortField = "title";
	var bySortField=function (a,b) {if(a[sortField] == b[sortField]) return(0); else return (a[sortField] < b[sortField]) ? -1 : +1; }
	if (!config.options.chkSearchTitlesFirst || config.options.chkSearchByDate) results.sort(bySortField);

	return results;
}

// REPORT GENERATOR
if (!window.reportSearchResults) window.reportSearchResults=function(text,matches)
{
	var title=config.macros.search.reportTitle
	var q = config.options.chkRegExpSearch ? "/" : "'";
	var body="\n";

	// summary: nn tiddlers found matching '...', options used
	body+="''"+config.macros.search.successMsg.format([matches.length,q+"{{{"+text+"}}}"+q])+"''\n";
	body+="^^//searched in:// ";
	body+=(config.options.chkSearchTitles?"''titles'' ":"");
	body+=(config.options.chkSearchText?"''text'' ":"");
	body+=(config.options.chkSearchTags?"''tags'' ":"");
	body+=(config.options.chkSearchFields?"''fields'' ":"");
	body+=(config.options.chkSearchShadows?"''shadows'' ":"");
	if (config.options.chkCaseSensitiveSearch||config.options.chkRegExpSearch) {
		body+=" //with options:// ";
		body+=(config.options.chkCaseSensitiveSearch?"''case sensitive'' ":"");
		body+=(config.options.chkRegExpSearch?"''text patterns'' ":"");
	}
	body+="^^";

	// numbered list of links to matching tiddlers
	body+="\n<<<";
	for(var t=0;t<matches.length;t++) {
		var date=config.options.chkSearchByDate?(matches[t].modified.formatString('YYYY.0MM.0DD 0hh:0mm')+" "):"";
		body+="\n# "+date+"[["+matches[t].title+"]]";
	}
	body+="\n<<<\n";

	// open all matches button
	body+="<html><input type=\"button\" href=\"javascript:;\" ";
	body+="onclick=\"story.displayTiddlers(null,["
	for(var t=0;t<matches.length;t++)
		body+="'"+matches[t].title.replace(/\'/mg,"\\'")+"'"+((t<matches.length-1)?", ":"");
	body+="],1);\" ";
	body+="accesskey=\"O\" ";
	body+="value=\"open all matching tiddlers\"></html> ";

	// discard search results button
	body+="<html><input type=\"button\" href=\"javascript:;\" ";
	body+="onclick=\"story.closeTiddler('"+title+"'); store.deleteTiddler('"+title+"'); store.notify('"+title+"',true);\" ";
	body+="value=\"discard "+title+"\"></html>";

	// search again
	body+="\n\n----\n";
	body+="<<search \""+text+"\">>\n";
	body+="<<option chkSearchTitles>>titles ";
	body+="<<option chkSearchText>>text ";
	body+="<<option chkSearchTags>>tags";
	body+="<<option chkSearchFields>>fields";
	body+="<<option chkSearchShadows>>shadows";
	body+="<<option chkCaseSensitiveSearch>>case-sensitive ";
	body+="<<option chkRegExpSearch>>text patterns";
	body+="<<option chkSearchByDate>>sort by date";

	// create/update the tiddler
	var tiddler=store.getTiddler(title); if (!tiddler) tiddler=new Tiddler();
	tiddler.set(title,body,config.options.txtUserName,(new Date()),"excludeLists excludeSearch temporary");
	store.addTiddler(tiddler); story.closeTiddler(title);

	// use alternate "search again" label in <<search>> macro
	var oldprompt=config.macros.search.label;
	config.macros.search.label="search again";

	// render/refresh tiddler
	story.displayTiddler(null,title,1);
	store.notify(title,true);

	// restore standard search label
	config.macros.search.label=oldprompt;

}

if (!window.discardSearchResults) window.discardSearchResults=function()
{
	// remove the tiddler
	story.closeTiddler(config.macros.search.reportTitle);
	store.deleteTiddler(config.macros.search.reportTitle);
}
//}}}
<<<
# [[(default) on http://mptw.tiddlyspot.com]]
# [[(default) on http://tiddlywiki.bidix.info]]
# [[(default) on http://tw.lewcid.org]]
# [[(default) on http://www.tiddlytools.com]]
# [[01ViewTemplate]]
# [[AAuseViewTemplate]]
# [[AnnotationsPlugin]]
# [[DefaultTiddlers]]
# [[E.A.S.E]]
# [[EditTemplate]]
# [[EditToolbar]]
# [[FontSizePlugin]]
# [[ForEachTiddlerPlugin]]
# [[Formatting cheatsheet]]
# [[HideWhenPlugin]]
# [[HistoryPlugin]]
# [[INTROViewTemplate]]
# [[InlineJavascriptPlugin]]
# [[Instructions]]
# [[LegacyStrikeThroughPlugin]]
# [[MainMenu]]
# [[MarkupPreHead]]
# [[MicroGenViewTemplate]]
# [[NCBIblast]]
# [[Override SearchOptionsPlugin options]]
# [[PageTemplate]]
# [[PasswordOptionPlugin]]
# [[SearchOptions plugin tweaks]]
# [[SearchOptionsPlugin]]
# [[SelectPaletteMacro]]
# [[SideBarOptions]]
# [[SideBarTabs]]
# [[SinglePageModePlugin]]
# [[SplashScreenPlugin]]
# [[StyleSheet]]
# [[TaggedTemplateTweak]]
# [[ToggleRightSidebar]]
# [[UploadPlugin]]
# [[ViewTemplate]]
# [[Welcome to the Webview TiddlyWiki]]
# [[bluepalette]]
# [[bubblegumpalette]]
# [[easyColoredText]]
# [[easyFormat]]
# [[easyGreek]]
# [[easyHebrew]]
# [[easyHighlighting]]
# [[easyIndent]]
# [[easyNotes]]
# [[easyTableheader]]
# [[easyTables]]
# [[graypalette]]
# [[greenishgraypalette]]
# [[purplepalette]]
# [[store.php]]
# [[topic-01ViewTemplate]]
# [[topic-02ViewTemplate]]
# [[topic1SubtopicMenu]]
# [[topic1ViewTemplate]]
# [[webviewViewTemplate]]
# [[webviewindex]]
# [[z_configOptions]]
<<<
/***
Quick and dirtly palette switcher for 2.1.x
<<selectPalette>>
WARNING this will overwrite your ColorPalette tiddler.
***/

//{{{

merge(config.macros,{

	setPalette: {

		handler: function(place,macroName,params,wikifier,paramString,tiddler) {
			var paletteName = params[0] ? params[0] : tiddler.title;
			createTiddlyButton(place,"apply","Apply this palette",function(e) {
				config.macros.selectPalette.updatePalette(tiddler.title);
				return false;
			});
		}
	},

	selectPalette: {

		handler: function(place,macroName,params,wikifier,paramString,tiddler) {
			createTiddlyDropDown(place,this.onPaletteChange,this.getPalettes());
		},

		getPalettes: function() {
			var result = [
				{caption:"-select palette-", name:""},
				{caption:"(Default)", name:"(default)"}
			];
			var tagged = store.getTaggedTiddlers("palette","title");
			for(var t=0; t<tagged.length; t++)
				result.push({caption:tagged[t].title, name:tagged[t].title});
			return result;
		},

		onPaletteChange: function(e) {
			config.macros.selectPalette.updatePalette(this.value);
			return true;
		},

		updatePalette: function(title) {
			if (title != "") {
				store.deleteTiddler("ColorPalette");
				if (title != "(default)")
					store.saveTiddler("ColorPalette","ColorPalette",store.getTiddlerText(title),
								config.options.txtUserName,undefined,"");
				this.refreshPalette();
				if(config.options.chkAutoSave)
					saveChanges(true);
			}
		},

		refreshPalette: function() {
			config.macros.refreshDisplay.onClick();
		}
	}
});

//}}}
<<search>><<closeAll>><<permaview>><<newTiddler>><<newTiddler title:"tagnameSubtopicMenu" tag:"SubtopicMenu" label:"new subtopic menu" text:"{{tableindex{
|[[subtopic1]]|[[subtopic2]]|[[subtopic3]]|
}}}">><<newTiddler title:"tagnameViewTemplate" tag:"excludeLists" label:"new viewtemplate" text:"<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='xxxx' macro='tiddler xxxxSubtopicMenu'></div><div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div><div class='tagClear'></div>
<!--}}}-->
">><<saveChanges>>[[Formatting cheatsheet]]<<selectPalette>><<slider chkSliderOptionsPanel OptionsPanel "options »" "Change TiddlyWiki advanced options">>

<<tabs txtMainTab "Timeline" "Timeline" TabTimeline "All" "All tiddlers" TabAll "Tags" "All tags" TabTags "More" "More lists" TabMore>>
/***
|Name|SinglePageModePlugin|
|Source|http://www.TiddlyTools.com/#SinglePageModePlugin|
|Documentation|http://www.TiddlyTools.com/#SinglePageModePluginInfo|
|Version|2.8.2|
|Author|Eric Shulman - ELS Design Studios|
|License|http://www.TiddlyTools.com/#LegalStatements <br>and [[Creative Commons Attribution-ShareAlike 2.5 License|http://creativecommons.org/licenses/by-sa/2.5/]]|
|~CoreVersion|2.1|
|Type|plugin|
|Requires||
|Overrides|Story.prototype.displayTiddler(), Story.prototype.displayTiddlers()|
|Description|Show tiddlers one at a time with automatic permalink, or always open tiddlers at top/bottom of page.|
This plugin allows you to configure TiddlyWiki to navigate more like a traditional multipage web site with only one tiddler displayed at a time.
!!!!!Documentation
>see [[SinglePageModePluginInfo]]
!!!!!Configuration
<<<
<<option chkSinglePageMode>> Display one tiddler at a time
><<option chkSinglePageKeepFoldedTiddlers>> Don't auto-close folded tiddlers
><<option chkSinglePagePermalink>> Automatically permalink current tiddler
<<option chkTopOfPageMode>> Always open tiddlers at the top of the page
<<option chkBottomOfPageMode>> Always open tiddlers at the bottom of the page
<<option chkSinglePageAutoScroll>> Automatically scroll tiddler into view (if needed)

Notes:
* The "display one tiddler at a time" option can also be //temporarily// set/reset by including a 'paramifier' in the document URL: {{{#SPM:true}}} or {{{#SPM:false}}}.
* If more than one display mode is selected, 'one at a time' display takes precedence over both 'top' and 'bottom' settings, and if 'one at a time' setting is not used, 'top of page' takes precedence over 'bottom of page'.
* When using Apple's Safari browser, automatically setting the permalink causes an error and is disabled.
<<<
!!!!!Revisions
<<<
2008.03.14 [2.8.2] in displayTiddler(), if editing specified tiddler, just move it to top/bottom of story *without* re-rendering (prevents discard of partial edits).
| Please see [[SinglePageModePluginInfo]] for previous revision details |
2005.08.15 [1.0.0] Initial Release.  Support for BACK/FORWARD buttons adapted from code developed by Clint Checketts.
<<<
!!!!!Code
***/
//{{{
version.extensions.SinglePageMode= {major: 2, minor: 8, revision: 2, date: new Date(2008,3,14)};
//}}}
//{{{
config.paramifiers.SPM = { onstart: function(v) {
	config.options.chkSinglePageMode=eval(v);
	if (config.options.chkSinglePageMode && config.options.chkSinglePagePermalink && !config.browser.isSafari) {
		config.lastURL = window.location.hash;
		if (!config.SPMTimer) config.SPMTimer=window.setInterval(function() {checkLastURL();},1000);
	}
} };
//}}}
//{{{
if (config.options.chkSinglePageMode==undefined) config.options.chkSinglePageMode=false;
if (config.options.chkSinglePageKeepFoldedTiddlers==undefined) config.options.chkSinglePageKeepFoldedTiddlers=true;
if (config.options.chkSinglePagePermalink==undefined) config.options.chkSinglePagePermalink=true;
if (config.options.chkTopOfPageMode==undefined) config.options.chkTopOfPageMode=false;
if (config.options.chkBottomOfPageMode==undefined) config.options.chkBottomOfPageMode=false;
if (config.options.chkSinglePageAutoScroll==undefined) config.options.chkSinglePageAutoScroll=true;

if (config.optionsDesc) {
	config.optionsDesc.chkSinglePageMode="Display one tiddler at a time";
	config.optionsDesc.chkSinglePageKeepFoldedTiddlers="Don't auto-close folded tiddlers";
	config.optionsDesc.chkSinglePagePermalink="Automatically permalink current tiddler";
	config.optionsDesc.chkSinglePageAutoScroll="Automatically scroll tiddler into view (if needed)";
	config.optionsDesc.chkTopOfPageMode="Always open tiddlers at the top of the page";
	config.optionsDesc.chkBottomOfPageMode="Always open tiddlers at the bottom of the page";
} else {
	config.shadowTiddlers.AdvancedOptions += "\
		\n<<option chkSinglePageMode>> Display one tiddler at a time \
		\n<<option chkSinglePageKeepFoldedTiddlers>> Don't auto-close folded tiddlers \
		\n<<option chkSinglePagePermalink>> Automatically permalink current tiddler \
		\n<<option chkSinglePageAutoScroll>> Automatically scroll tiddler into view (if needed) \
		\n<<option chkTopOfPageMode>> Always open tiddlers at the top of the page \
		\n<<option chkBottomOfPageMode>> Always open tiddlers at the bottom of the page";
}
//}}}
//{{{
config.SPMTimer = 0;
config.lastURL = window.location.hash;
function checkLastURL()
{
	if (!config.options.chkSinglePageMode)
		{ window.clearInterval(config.SPMTimer); config.SPMTimer=0; return; }
	if (config.lastURL == window.location.hash) return; // no change in hash
	var tids=convertUTF8ToUnicode(decodeURIComponent(window.location.hash.substr(1))).readBracketedList();
	if (tids.length==1) // permalink (single tiddler in URL)
		story.displayTiddler(null,tids[0]);
	else { // restore permaview or default view
		config.lastURL = window.location.hash;
		if (!tids.length) tids=store.getTiddlerText("DefaultTiddlers").readBracketedList();
		story.closeAllTiddlers();
		story.displayTiddlers(null,tids);
	}
}

if (Story.prototype.SPM_coreDisplayTiddler==undefined)
	Story.prototype.SPM_coreDisplayTiddler=Story.prototype.displayTiddler;
Story.prototype.displayTiddler = function(srcElement,title,template,animate,slowly)
{
	var opt=config.options;
	if (opt.chkSinglePageMode) {
		// close all tiddlers except current tiddler, tiddlers being edited, and tiddlers that are folded (optional)
		story.forEachTiddler(function(tid,elem) {
			if (	tid==title
				|| elem.getAttribute("dirty")=="true"
				|| (opt.chkSinglePageKeepFoldedTiddlers && elem.getAttribute("folded")=="true"))
				return;
			story.closeTiddler(tid);
		});
	}
	else if (opt.chkTopOfPageMode)
		arguments[0]=null;
	else if (opt.chkBottomOfPageMode)
		arguments[0]="bottom";
	if (opt.chkSinglePageMode && opt.chkSinglePagePermalink && !config.browser.isSafari) {
		window.location.hash = encodeURIComponent(convertUnicodeToUTF8(String.encodeTiddlyLink(title)));
		config.lastURL = window.location.hash;
		document.title = wikifyPlain("SiteTitle") + " - " + title;
		if (!config.SPMTimer) config.SPMTimer=window.setInterval(function() {checkLastURL();},1000);
	}
	var tiddlerElem=document.getElementById(story.idPrefix+title); // ==null unless tiddler is already display
	if (tiddlerElem && tiddlerElem.getAttribute("dirty")=="true") { // editing... move tiddler without re-rendering
		var isTopTiddler=(tiddlerElem.previousSibling==null);
		if (!isTopTiddler && (opt.chkSinglePageMode || opt.chkTopOfPageMode))
			tiddlerElem.parentNode.insertBefore(tiddlerElem,tiddlerElem.parentNode.firstChild);
		else if (opt.chkBottomOfPageMode)
			tiddlerElem.parentNode.insertBefore(tiddlerElem,null);
		else this.SPM_coreDisplayTiddler.apply(this,arguments); // let CORE render tiddler
	} else
		this.SPM_coreDisplayTiddler.apply(this,arguments); // let CORE render tiddler
	var tiddlerElem=document.getElementById(story.idPrefix+title);
	if (tiddlerElem&&opt.chkSinglePageAutoScroll) {
		var yPos=ensureVisible(tiddlerElem); // scroll to top of tiddler
		var isTopTiddler=(tiddlerElem.previousSibling==null);
		if (opt.chkSinglePageMode||opt.chkTopOfPageMode||isTopTiddler)
			yPos=0; // scroll to top of page instead of top of tiddler
		if (opt.chkAnimate) // defer scroll until 200ms after animation completes
			setTimeout("window.scrollTo(0,"+yPos+")",config.animDuration+200); 
		else
			window.scrollTo(0,yPos); // scroll immediately
	}
}

if (Story.prototype.SPM_coreDisplayTiddlers==undefined)
	Story.prototype.SPM_coreDisplayTiddlers=Story.prototype.displayTiddlers;

Story.prototype.displayTiddlers = function() {
	// suspend single-page mode (and/or top/bottom display options) when showing multiple tiddlers
	var opt=config.options;
	var saveSPM=opt.chkSinglePageMode; opt.chkSinglePageMode=false;
	var saveTPM=opt.chkTopOfPageMode; opt.chkTopOfPageMode=false;
	var saveBPM=opt.chkBottomOfPageMode; opt.chkBottomOfPageMode=false;
	this.SPM_coreDisplayTiddlers.apply(this,arguments);
	opt.chkBottomOfPageMode=saveBPM;
	opt.chkTopOfPageMode=saveTPM;
	opt.chkSinglePageMode=saveSPM;
}
//}}}

~MAST667 Intro PERL
/***
|''Name:''|SparklinePlugin|
|''Description:''|Sparklines macro|
***/
//{{{
if(!version.extensions.SparklinePlugin) {
version.extensions.SparklinePlugin = {installed:true};

//--
//-- Sparklines
//--

config.macros.sparkline = {};
config.macros.sparkline.handler = function(place,macroName,params)
{
	var data = [];
	var min = 0;
	var max = 0;
	var v;
	for(var t=0; t<params.length; t++) {
		v = parseInt(params[t]);
		if(v < min)
			min = v;
		if(v > max)
			max = v;
		data.push(v);
	}
	if(data.length < 1)
		return;
	var box = createTiddlyElement(place,"span",null,"sparkline",String.fromCharCode(160));
	box.title = data.join(",");
	var w = box.offsetWidth;
	var h = box.offsetHeight;
	box.style.paddingRight = (data.length * 2 - w) + "px";
	box.style.position = "relative";
	for(var d=0; d<data.length; d++) {
		var tick = document.createElement("img");
		tick.border = 0;
		tick.className = "sparktick";
		tick.style.position = "absolute";
		tick.src = "data:image/gif,GIF89a%01%00%01%00%91%FF%00%FF%FF%FF%00%00%00%C0%C0%C0%00%00%00!%F9%04%01%00%00%02%00%2C%00%00%00%00%01%00%01%00%40%02%02T%01%00%3B";
		tick.style.left = d*2 + "px";
		tick.style.width = "2px";
		v = Math.floor(((data[d] - min)/(max-min)) * h);
		tick.style.top = (h-v) + "px";
		tick.style.height = v + "px";
		box.appendChild(tick);
	}
};


}
//}}}
/***

''Inspired by [[TiddlyPom|http://www.warwick.ac.uk/~tuspam/tiddlypom.html]]''

|Name|SplashScreenPlugin|
|Created by|SaqImtiaz|
|Location|http://tw.lewcid.org/#SplashScreenPlugin|
|Version|0.21 |
|Requires|~TW2.08+|
!Description:
Provides a simple splash screen that is visible while the TW is loading.

!Installation
Copy the source text of this tiddler to your TW in a new tiddler, tag it with systemConfig and save and reload. The SplashScreen will now be installed and will be visible the next time you reload your TW.

!Customizing
Once the SplashScreen has been installed and you have reloaded your TW, the splash screen html will be present in the MarkupPreHead tiddler. You can edit it and customize to your needs.

!History
* 20-07-06 : version 0.21, modified to hide contentWrapper while SplashScreen is displayed.
* 26-06-06 : version 0.2, first release

!Code
***/
//{{{
var old_lewcid_splash_restart=restart;

restart = function()
{   if (document.getElementById("SplashScreen"))
        document.getElementById("SplashScreen").style.display = "none";
      if (document.getElementById("contentWrapper"))
        document.getElementById("contentWrapper").style.display = "block";
    
    old_lewcid_splash_restart();
   
    if (splashScreenInstall)
       {if(config.options.chkAutoSave)
			{saveChanges();}
        displayMessage("TW SplashScreen has been installed, please save and refresh your TW.");
        }
}


var oldText = store.getTiddlerText("MarkupPreHead");
if (oldText.indexOf("SplashScreen")==-1)
   {var siteTitle = store.getTiddlerText("SiteTitle");
   var splasher='\n\n<style type="text/css">#contentWrapper {display:none;}</style><div id="SplashScreen" style="border: 3px solid #ccc; display: block; text-align: center; width: 320px; margin: 100px auto; padding: 50px; color:#000; font-size: 28px; font-family:Tahoma; background-color:#eee;"><b>'+siteTitle +'</b> is loading<blink> ...</blink><br><br><span style="font-size: 14px; color:red;">Requires Javascript.</span></div>';
   if (! store.tiddlerExists("MarkupPreHead"))
       {var myTiddler = store.createTiddler("MarkupPreHead");}
   else
      {var myTiddler = store.getTiddler("MarkupPreHead");}
      myTiddler.set(myTiddler.title,oldText+splasher,config.options.txtUserName,null,null);
      store.setDirty(true);
      var splashScreenInstall = true;
}
//}}}
/*{{{*/
/*FONT ADJUSTMENTS*/
body {font-family: Trebuchet MS; font-size: 10pt;}
#mainMenu .tiddlyLinkExisting, #mainMenu .tiddlyLinkNonExisting {font-family: Trebuchet MS; font-size: 10pt;}
#mainMenu {font-family: Trebuchet MS; font-size: 10pt;}
#mainMenu h1 {font-size: 10pt;}
#mainMenu th {background-color:[[ColorPalette::SecondaryPale]]; color:[[ColorPalette::SecondaryDark]];}
#mainMenu table {border:none;}
#mainMenu tr {background-color:white;}
#mainMenu {background-color:[[ColorPalette::PrimaryLight]];}
.viewer {line-height: 1.7em;}
/*WIDEN MAINMENU*/
#mainMenu {width: 14.5em;}
#mainMenu {text-align: left;}
#displayArea {margin: 0em 17em 0em 17em;}
.teeny {font-size: 9pt; text-align: center;}
/*TABLE HEADER*/
.viewer th {color: #000; background-color: #eeeeee;} 
/*TIDDLER TOPMARGIN AND BUTTON BORDER*/
a.button{border: 0;} 
.viewer { margin-top: 1em; }
/*UNORDERED and ORDERED LISTS TWEAK*/
.viewer li {padding-top: 0.0em; padding-bottom: 0.0em;} 
/*LINELESS BLOCKQUOTES*/
.viewer blockquote {border-left: 0px; margin-top:0em; margin-bottom:0em; }
/*HEADLINE COLOR, etc*/
h1,h2,h3,h4,h5 { color: #000; background: none; font-family: Trebuchet MS;}
/*TuDuSlider*/
.tuduSlider .button{font-family: Trebuchet MS; font-weight: bold; font-size: 10pt; color: black;}
/* GIFFMEX TWEAKS TO STYLESHEETPRINT (so that nothing but tiddler title and text are printed) */
@media print {#mainMenu {display: none ! important;}}
@media print {#topMenu {display: none ! important;}}
@media print {#sidebar {display: none ! important;}}
@media print {#messageArea {display: none ! important;}} 
@media print {#toolbar {display: none ! important;}}
@media print {.header {display: none ! important;}}
@media print {.tiddler .subtitle {display: none ! important;}}
@media print {.tiddler .toolbar {display; none ! important; }}
@media print {.tiddler .tagging {display; none ! important; }}
@media print {.tiddler .tagged {display; none ! important; }}
@media print {#displayArea {margin: 1em 1em 0em 1em;}}
@media print {.pageBreak {page-break-before: always;}}
/*CSS FOR BIBLE FORMATTING*/
.engindent {margin-left: 2em; display:block;}
.gkindent {font-family: Gentium; font-size: 16pt; margin-left: 2em; display:block;}
.greek {font-family: Gentium; font-size: 16pt;}
.hebrewNoAlign{font-family: Gentium; font-size: 20pt;}
.hebrewRightAlign{text-align:right; font-family: Gentium; font-size: 20pt; display:block;}
.hebAlignAndIndent{text-align:right; font-family: Gentium; font-size: 20pt; margin-right: 2em; display:block;}
.red {color: #ff3300; font-weight: bold;}
.blue {color: #0000cc; font-weight: bold;}
.green {color: #22bb00; font-weight: bold;}
.gold {color: #bbaa55; font-weight: bold;}
.purple {color: #9922ff; font-weight: bold;}
.gray {color: #777777; font-weight: bold;}
.magenta{color: #cc0066; font-weight: bold;}
.teal {color: #008888; font-weight: bold;}
.burgundy {color: #990000; font-weight: bold;}
.orange {color: #ff8866; font-weight: bold;}
/*INVISIBLE TABLE*/
.viewer .invisiblecomm table {border-color: white;}
.viewer .invisiblecomm table td { font-size: 1em; font-family: Verdana; border-color: white; padding: 10px 20px 10px 0px; text-align: left; vertical-align: top; padding-bottom: 20px;} 
.viewer .invisiblecomm table th {color:[[ColorPalette::PrimaryMid]]; background-color: white; border-color: white; font-family: Verdana; font-size: 1.2em; font-weight: bold; padding: 10px 20px 10px 0px; text-align: left; vertical-align: top;} 
.viewer .invisiblecomm table tr.leftColumn { background-color: #bbbbbb; }
/*OTHER TABLES*/
.menubox { display:block; padding:1em; -moz-border-radius:1em; border:1px solid; background:[[ColorPalette::TertiaryDark]]; color:#000; }
.menubox2 { display:block; padding: .25em; border:none; margin: 0; background:[[ColorPalette::TertiaryDark]]; [[ColorPalette::SecondaryDark]]; text-align: center; font-size: 1.6em;}
.menubox3 { display:block; padding:.25em; border:none; margin: 0; background:[[ColorPalette::TertiaryDark]]; [[ColorPalette::SecondaryDark]]; text-align: center; font-size: 2.5em;}
.viewer th {background-color:[[ColorPalette::SecondaryPale]]; [[ColorPalette::SecondaryDark]]}
.tableindex table, .tableindex td, .tableindex tr { font-size: 1em; border: solid white; background-color:[[ColorPalette::SecondaryPale]]; [[ColorPalette::SecondaryDark]]}
/*}}}*/
/*{{{*/
* html .tiddler {height:1%;}

body {font-size:.75em; font-family:arial,helvetica; margin:0; padding:0;}

h1,h2,h3,h4,h5,h6 {font-weight:bold; text-decoration:none;}
h1,h2,h3 {padding-bottom:1px; margin-top:1.2em;margin-bottom:0.3em;}
h4,h5,h6 {margin-top:1em;}
h1 {font-size:1.35em;}
h2 {font-size:1.25em;}
h3 {font-size:1.1em;}
h4 {font-size:1em;}
h5 {font-size:.9em;}

hr {height:1px;}

a {text-decoration:none;}

dt {font-weight:bold;}

ol {list-style-type:decimal;}
ol ol {list-style-type:lower-alpha;}
ol ol ol {list-style-type:lower-roman;}
ol ol ol ol {list-style-type:decimal;}
ol ol ol ol ol {list-style-type:lower-alpha;}
ol ol ol ol ol ol {list-style-type:lower-roman;}
ol ol ol ol ol ol ol {list-style-type:decimal;}

.txtOptionInput {width:11em;}

#contentWrapper .chkOptionInput {border:0;}

.externalLink {text-decoration:underline;}

.indent {margin-left:3em;}
.outdent {margin-left:3em; text-indent:-3em;}
code.escaped {white-space:nowrap;}

.tiddlyLinkExisting {font-weight:bold;}
.tiddlyLinkNonExisting {font-style:italic;}

/* the 'a' is required for IE, otherwise it renders the whole tiddler in bold */
a.tiddlyLinkNonExisting.shadow {font-weight:bold;}

#mainMenu .tiddlyLinkExisting,
	#mainMenu .tiddlyLinkNonExisting,
	#sidebarTabs .tiddlyLinkNonExisting {font-weight:normal; font-style:normal;}
#sidebarTabs .tiddlyLinkExisting {font-weight:bold; font-style:normal;}

.header {position:relative;}
.header a:hover {background:transparent;}
.headerShadow {position:relative; padding:4.5em 0em 1em 1em; left:-1px; top:-1px;}
.headerForeground {position:absolute; padding:4.5em 0em 1em 1em; left:0px; top:0px;}

.siteTitle {font-size:3em;}
.siteSubtitle {font-size:1.2em;}

#mainMenu {position:absolute; left:0; width:10em; text-align:right; line-height:1.6em; padding:1.5em 0.5em 0.5em 0.5em; font-size:1.1em;}

#sidebar {position:absolute; right:3px; width:16em; font-size:.9em;}
#sidebarOptions {padding-top:0.3em;}
#sidebarOptions a {margin:0em 0.2em; padding:0.2em 0.3em; display:block;}
#sidebarOptions input {margin:0.4em 0.5em;}
#sidebarOptions .sliderPanel {margin-left:1em; padding:0.5em; font-size:.85em;}
#sidebarOptions .sliderPanel a {font-weight:bold; display:inline; padding:0;}
#sidebarOptions .sliderPanel input {margin:0 0 .3em 0;}
#sidebarTabs .tabContents {width:15em; overflow:hidden;}

.wizard {padding:0.1em 1em 0em 2em;}
.wizard h1 {font-size:2em; font-weight:bold; background:none; padding:0em 0em 0em 0em; margin:0.4em 0em 0.2em 0em;}
.wizard h2 {font-size:1.2em; font-weight:bold; background:none; padding:0em 0em 0em 0em; margin:0.4em 0em 0.2em 0em;}
.wizardStep {padding:1em 1em 1em 1em;}
.wizard .button {margin:0.5em 0em 0em 0em; font-size:1.2em;}
.wizardFooter {padding:0.8em 0.4em 0.8em 0em;}
.wizardFooter .status {padding:0em 0.4em 0em 0.4em; margin-left:1em;}
.wizard .button {padding:0.1em 0.2em 0.1em 0.2em;}

#messageArea {position:fixed; top:2em; right:0em; margin:0.5em; padding:0.5em; z-index:2000; _position:absolute;}
.messageToolbar {display:block; text-align:right; padding:0.2em 0.2em 0.2em 0.2em;}
#messageArea a {text-decoration:underline;}

.tiddlerPopupButton {padding:0.2em 0.2em 0.2em 0.2em;}
.popupTiddler {position: absolute; z-index:300; padding:1em 1em 1em 1em; margin:0;}

.popup {position:absolute; z-index:300; font-size:.9em; padding:0; list-style:none; margin:0;}
.popup .popupMessage {padding:0.4em;}
.popup hr {display:block; height:1px; width:auto; padding:0; margin:0.2em 0em;}
.popup li.disabled {padding:0.4em;}
.popup li a {display:block; padding:0.4em; font-weight:normal; cursor:pointer;}
.listBreak {font-size:1px; line-height:1px;}
.listBreak div {margin:2px 0;}

.tabset {padding:1em 0em 0em 0.5em;}
.tab {margin:0em 0em 0em 0.25em; padding:2px;}
.tabContents {padding:0.5em;}
.tabContents ul, .tabContents ol {margin:0; padding:0;}
.txtMainTab .tabContents li {list-style:none;}
.tabContents li.listLink { margin-left:.75em;}

#contentWrapper {display:block;}
#splashScreen {display:none;}

#displayArea {margin:1em 17em 0em 14em;}

.toolbar {text-align:right; font-size:.9em;}

.tiddler {padding:1em 1em 0em 1em;}

.missing .viewer,.missing .title {font-style:italic;}

.title {font-size:1.6em; font-weight:bold;}

.missing .subtitle {display:none;}
.subtitle {font-size:1.1em;}

.tiddler .button {padding:0.2em 0.4em;}

.tagging {margin:0.5em 0.5em 0.5em 0; float:left; display:none;}
.isTag .tagging {display:block;}
.tagged {margin:0.5em; float:right;}
.tagging, .tagged {font-size:0.9em; padding:0.25em;}
.tagging ul, .tagged ul {list-style:none; margin:0.25em; padding:0;}
.tagClear {clear:both;}

.footer {font-size:.9em;}
.footer li {display:inline;}

.annotation {padding:0.5em; margin:0.5em;}

* html .viewer pre {width:99%; padding:0 0 1em 0;}
.viewer {line-height:1.4em; padding-top:0.5em;}
.viewer .button {margin:0em 0.25em; padding:0em 0.25em;}
.viewer blockquote {line-height:1.5em; padding-left:0.8em;margin-left:2.5em;}
.viewer ul, .viewer ol {margin-left:0.5em; padding-left:1.5em;}

.viewer table, table.twtable {border-collapse:collapse; margin:0.8em 1.0em;}
.viewer th, .viewer td, .viewer tr,.viewer caption,.twtable th, .twtable td, .twtable tr,.twtable caption {padding:3px;}
table.listView {font-size:0.85em; margin:0.8em 1.0em;}
table.listView th, table.listView td, table.listView tr {padding:0px 3px 0px 3px;}

.viewer pre {padding:0.5em; margin-left:0.5em; font-size:1.0em; line-height:1.0em; overflow:auto;}
.viewer code {font-size:1.0em; line-height:1.0em;}

.editor {font-size:1.1em;}
.editor input, .editor textarea {display:block; width:100%; font:inherit;}
.editorFooter {padding:0.25em 0em; font-size:.9em;}
.editorFooter .button {padding-top:0px; padding-bottom:0px;}

.fieldsetFix {border:0; padding:0; margin:1px 0px 1px 0px;}

.sparkline {line-height:1em;}
.sparktick {outline:0;}

.zoomer {font-size:1.1em; position:absolute; overflow:hidden;}
.zoomer div {padding:1em;}

* html #backstage {width:99%;}
* html #backstageArea {width:99%;}
#backstageArea {display:none; position:relative; overflow: hidden; z-index:150; padding:0.3em 0.5em 0.3em 0.5em;}
#backstageToolbar {position:relative;}
#backstageArea a {font-weight:bold; margin-left:0.5em; padding:0.3em 0.5em 0.3em 0.5em;}
#backstageButton {display:none; position:absolute; z-index:175; top:0em; right:0em;}
#backstageButton a {padding:0.1em 0.4em 0.1em 0.4em; margin:0.1em 0.1em 0.1em 0.1em;}
#backstage {position:relative; width:100%; z-index:50;}
#backstagePanel {display:none; z-index:100; position:absolute; margin:0em 3em 0em 3em; padding:1em 1em 1em 1em;}
.backstagePanelFooter {padding-top:0.2em; float:right;}
.backstagePanelFooter a {padding:0.2em 0.4em 0.2em 0.4em;}
#backstageCloak {display:none; z-index:20; position:absolute; width:100%; height:100px;}

.whenBackstage {display:none;}
.backstageVisible .whenBackstage {display:block;}
/*}}}*/
Subtopic menus are menus at the top of the topic tiddlers, like the one above, which has three subtopics: "Welcome", "Instructions", and "Subtopic menu instructions". You can have a separate subtopic menu for as many topics as you add to your mainmenu. There are three steps to creating a new subtopic menu. You may do these steps in any order you wish:
#''Create tiddlers for each of the subtopics within a topic.'' Tag them all with one appropriate tag pertaining to the topic. This will link them all so that they appear in the subtopic menu.
#''Create a subtopic menu tiddler.'' This will be the tiddler where the menu that appears above the other tiddlers is stored. In the Sidebar, click on 'new subtopic menu'. Replace 'tagname' in the title with the name of the tag you added to the tiddlers above. Then add the title of your subtopic tiddlers in the table provided, within the double brackets {{{[[ ]]}}}. Three table cells have been provided. Delete or add table cells as needed.
#''Create a custom ~ViewTemplate for your topic.'' This will tell ~TiddlyWiki to show your subtopic menu at the top of all the tiddlers that you have tagged with that topic's tag. In the Sidebar, click on 'new viewtemplate'. Replace 'tagname' with the tag you added to the tiddlers above. Do this for the title of the tiddler, as well as in the two instances of 'tagname' in the viewtemplate's code (it will look like the line shown below before you change it).
<!--{{{-->
<div class='tagnameMacro='tiddler tagnameSubtopicmenu'></div>
<!--}}}-->
That's it. A menu of links to the tiddlers you have tagged and added to your subtopic menu tiddler should appear above the tiddler title of each of those tiddlers.
!Course Summary:

|! DATE |! Subject |! Chapter |! Assignment |
|SEP 03 |File Input/Output | | read FASTA file |
|SEP 10 |Regular Search Expressions | 1 | restriction map |
|SEP 17 |Compare DNA sequences | 3 | |
|SEP 24 |DNA species distances | 4 |bgcolor(yellow): Codework #1 |
|OCT 01 |AA substitution matrix | 5 | |
|OCT 08 |Sequence Databases | 6 | FTP |
|OCT 15 |>|>|bgcolor(yellow): MIDTERM |
|OCT 22 |Local BLAST | 7 | |
|OCT 29 |BLAST stat parsing | 8 |bgcolor(yellow): Codework #2 |
|NOV 05 |Sequence Alignment | 9 | |
|NOV 12 |Phylogenetic Trees | 10 | |
|NOV 19 |Protein Motifs | 12 | |
|NOV 26 |>|>|bgcolor(lightgreen): Thanksgiving |
|DEC 03 |Gene Prediction | 14 |bgcolor(yellow): Codework #3 |
|DEC 10 |>|>|bgcolor(yellow): FINAL |

!
/***
|Name|TaggedTemplateTweak|
|Source|http://www.TiddlyTools.com/#TaggedTemplateTweak|
|Documentation|http://www.TiddlyTools.com/#TaggedTemplateTweakInfo|
|Version|1.1.0|
|Author|Eric Shulman - ELS Design Studios|
|License|http://www.TiddlyTools.com/#LegalStatements <br>and [[Creative Commons Attribution-ShareAlike 2.5 License|http://creativecommons.org/licenses/by-sa/2.5/]]|
|~CoreVersion|2.1|
|Type|plugin|
|Requires||
|Overrides|Story.prototype.chooseTemplateForTiddler()|
|Description|use alternative ViewTemplate/EditTemplate for tiddler's tagged with specific tag values|
This tweak extends story.chooseTemplateForTiddler() so that ''whenever a tiddler is marked with a specific tag value, it can be viewed and/or edited using alternatives to the standard tiddler templates.'' 
!!!!!Documentation
>see [[TaggedTemplateTweakInfo]]
!!!!!Revisions
<<<
2008.01.22 [*.*.*] plugin size reduction - documentation moved to [[TaggedTemplateTweakInfo]]
2007.06.23 [1.1.0] re-written to use automatic 'tag prefix' search instead of hard coded check for each tag.  Allows new custom tags to be used without requiring code changes to this plugin.
| please see [[TaggedTemplateTweakInfo]] for previous revision details |
2007.06.11 [1.0.0] initial release
<<<
!!!!!Code
***/
//{{{
version.extensions.taggedTemplate= {major: 1, minor: 1, revision: 0, date: new Date(2007,6,23)};
Story.prototype.taggedTemplate_chooseTemplateForTiddler = Story.prototype.chooseTemplateForTiddler
Story.prototype.chooseTemplateForTiddler = function(title,template)
{
	// get default template from core
	var template=this.taggedTemplate_chooseTemplateForTiddler.apply(this,arguments);

	// if the tiddler to be rendered doesn't exist yet, just return core result
	var tiddler=store.getTiddler(title); if (!tiddler) return template;

	// look for template whose prefix matches a tag on this tiddler
	for (t=0; t<tiddler.tags.length; t++) {
		var tag=tiddler.tags[t];
		if (store.tiddlerExists(tag+template)) { template=tag+template; break; }
		// try capitalized tag (to match WikiWord template titles)
		var cap=tag.substr(0,1).toUpperCase()+tag.substr(1);
		if (store.tiddlerExists(cap+template)) { template=cap+template; break; }
	}

	return template;
}
//}}}
!Text book for the course:
''Rex A. Dwyer, //Genomic Perl: From Bioinformatics Basics to Working Code.//''
<html><img src="00/dwyergenperl.png" style="height:200px"></html>
From Amazon.com: //In this introduction to computational molecular biology, Rex Dwyer explains many basic computational problems and gives concise, working programs to solve them in the Perl programming language. With minimal prerequisites, he covers the biological background for each problem, develops a model for the solution, and then introduces the Perl concepts needed to implement the solution. The chapters discuss pairwise and multiple sequence alignment, fast database searches for homologous sequences, protein motif identification, genome rearrangement, physical mapping, phylogeny reconstruction, satellite identification, sequence assembly, gene finding, and RNA secondary structure. Concrete examples and a step-by-step approach enable readers to grasp the computational and statistical methods.// 
{{{
<html>
<div style="color: rgb(100, 100, 150); font-family: Monaco;">
<big><big><b>
xxxx
</html>
}}}
/%
|Name|ToggleRightSidebar|
|Source|http://www.TiddlyTools.com/#ToggleRightSidebar|
|Version|1.0.0|
|Author|Eric Shulman - ELS Design Studios|
|License|http://www.TiddlyTools.com/#LegalStatements <br>and [[Creative Commons Attribution-ShareAlike 2.5 License|http://creativecommons.org/licenses/by-sa/2.5/]]|
|~CoreVersion|2.1|
|Type|script|
|Requires|InlineJavascriptPlugin|
|Overrides||
|Description|show/hide right sidebar (SideBarOptions)|

Usage: <<tiddler ToggleRightSidebar>>

Config settings:
	config.options.txtToggleRightSideBarLabelShow (◄)
	config.options.txtToggleRightSideBarLabelHide (►)
	config.options.txtToggleRightSideBarTipShow ("show right sidebar")
	config.options.txtToggleRightSideBarTipHide ("hide right sidebar")

%/<script label="show/hide right sidebar">
	var sb=document.getElementById('sidebar'); if (!sb) return;
	var show=sb.style.display=='none';
	if (!show) { sb.style.display='none'; var margin='1em'; }
	else { sb.style.display='block'; var margin=config.options.txtDisplayAreaRightMargin||''; }
	if (typeof(place)!='undefined') {
		place.innerHTML=show?
			config.options.txtToggleRightSideBarLabelHide:config.options.txtToggleRightSideBarLabelShow;
		place.title=show?
			config.options.txtToggleRightSideBarTipHide:config.options.txtToggleRightSideBarTipShow;
	}
	document.getElementById('displayArea').style.marginRight=margin;
	config.options.chkShowRightSidebar=show;
	saveOptionCookie('chkShowRightSidebar');
	var sm=document.getElementById('storyMenu'); if (sm) config.refreshers.content(sm);
	return false;
</script><script>
	if (config.options.chkShowRightSidebar==undefined)
		config.options.chkShowRightSidebar=true;
	if (!config.options.txtDisplayAreaRightMargin||!config.options.txtDisplayAreaRightMargin.length)
		config.options.txtDisplayAreaRightMargin="18em";
	if (config.options.txtToggleRightSideBarLabelShow==undefined)
		config.options.txtToggleRightSideBarLabelShow=config.browser.isSafari?"&#x25c0;":"&#x25c4;";
	if (config.options.txtToggleRightSideBarLabelHide==undefined)
		config.options.txtToggleRightSideBarLabelHide="&#x25ba;";
	if (config.options.txtToggleRightSideBarTipShow==undefined)
		config.options.txtToggleRightSideBarTipShow="show right sidebar";
	if (config.options.txtToggleRightSideBarTipHide==undefined)
		config.options.txtToggleRightSideBarTipHide="hide right sidebar";

	var show=config.options.chkShowRightSidebar;
	document.getElementById('sidebar').style.display=show?"block":"none";
	document.getElementById('displayArea').style.marginRight=show?
		config.options.txtDisplayAreaRightMargin:"1em";
	place.lastChild.innerHTML=show?
		config.options.txtToggleRightSideBarLabelHide:config.options.txtToggleRightSideBarLabelShow;
	place.lastChild.title=show?
		config.options.txtToggleRightSideBarTipHide:config.options.txtToggleRightSideBarTipShow;
	place.lastChild.style.fontWeight="normal";
</script>
|!History:|!<<back>>|>|!<<history>>|>|!<<forward>>|
|!Font size:|>|>|>|>|! <<fontSize>>|
|!Sidebar:|>|>|>|>|!<<tiddler ToggleRightSidebar>>|
!!!
[[BACK to Working Code|CodeWorks]]
!!!
!Translate NT seqs to Protein Amino Acid seqs:
Requires that the HASH arrays %NTs and %CodonTable have been defined and loaded with appropriate data. The amino acid sequences will be stored in %PRTs with the orf names as the hash keys.
{{{
   # Define this global hash array
         my %PRTs;

   # Call subroutine:
          &TranslateFasta

# Subroutine code:
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TranslateFasta
{	# Convert the NT sequence into AAs . . . . . .
	foreach my $header (keys %NTs)
	{	my $protein = "";           
		for (my $i=0; $i <= length($NTs{$header})-2; $i += 3)  
		{	my $codon = substr($NTs{$header},$i,3);      
			my $aa = $CodonTable{$codon};   
			$protein .= $aa;
		}
		$PRTs{$header} = $protein;
	}
}

# - - - - - - - - - - - - - - - - - - - - - - - - - -

}}}


!
# The logic task is straight forward: 
##  Get each nucleotide sequence
##  Divide it into codon sequence units
##  TRANSLATE those codons into amino acids
##  Assemble the amino acids into a protein sequence
But this logic description is a series of serial tasks, which is just the way we think . . . step 1, step 2, step 3  . . . etc.

The beauty of computer code logic is that it often executes in ''parallel'' which tends to be much more efficient in terms of CPU time. (//Note: this is actually determined by bus speed//). We can accomplish this steps in an iterative, nested loop structure:
{{{
foreach my $seq (@Seqs)
{      foreach my $codon <<in $seq>> !!! <<This is not real computer code>>
       {       find amino acid;
               add to protein;
       }
}
}}}

We are going to use the specialized string function @@[[substr]]@@ ("substring") to do the actual manipulaton of the NT sequence to get the groups of 3 nts for each codon.

<html><img src="03/codontranslate.png" style="height:250px"></html>

[[BACK|L03]]
!
/***
|''Name:''|UploadPlugin|
|''Description:''|Save to web a TiddlyWiki|
|''Version:''|4.1.3|
|''Date:''|Feb 24, 2008|
|''Source:''|http://tiddlywiki.bidix.info/#UploadPlugin|
|''Documentation:''|http://tiddlywiki.bidix.info/#UploadPluginDoc|
|''Author:''|BidiX (BidiX (at) bidix (dot) info)|
|''License:''|[[BSD open source license|http://tiddlywiki.bidix.info/#%5B%5BBSD%20open%20source%20license%5D%5D ]]|
|''~CoreVersion:''|2.2.0|
|''Requires:''|PasswordOptionPlugin|
***/
//{{{
version.extensions.UploadPlugin = {
	major: 4, minor: 1, revision: 3,
	date: new Date("Feb 24, 2008"),
	source: 'http://tiddlywiki.bidix.info/#UploadPlugin',
	author: 'BidiX (BidiX (at) bidix (dot) info',
	coreVersion: '2.2.0'
};

//
// Environment
//

if (!window.bidix) window.bidix = {}; // bidix namespace
bidix.debugMode = false;	// true to activate both in Plugin and UploadService
	
//
// Upload Macro
//

config.macros.upload = {
// default values
	defaultBackupDir: '',	//no backup
	defaultStoreScript: "store.php",
	defaultToFilename: "index.html",
	defaultUploadDir: ".",
	authenticateUser: true	// UploadService Authenticate User
};
	
config.macros.upload.label = {
	promptOption: "Save and Upload this TiddlyWiki with UploadOptions",
	promptParamMacro: "Save and Upload this TiddlyWiki in %0",
	saveLabel: "save to web", 
	saveToDisk: "save to disk",
	uploadLabel: "upload"	
};

config.macros.upload.messages = {
	noStoreUrl: "No store URL in parmeters or options",
	usernameOrPasswordMissing: "Username or password missing"
};

config.macros.upload.handler = function(place,macroName,params) {
	if (readOnly)
		return;
	var label;
	if (document.location.toString().substr(0,4) == "http") 
		label = this.label.saveLabel;
	else
		label = this.label.uploadLabel;
	var prompt;
	if (params[0]) {
		prompt = this.label.promptParamMacro.toString().format([this.destFile(params[0], 
			(params[1] ? params[1]:bidix.basename(window.location.toString())), params[3])]);
	} else {
		prompt = this.label.promptOption;
	}
	createTiddlyButton(place, label, prompt, function() {config.macros.upload.action(params);}, null, null, this.accessKey);
};

config.macros.upload.action = function(params)
{
		// for missing macro parameter set value from options
		if (!params) params = {};
		var storeUrl = params[0] ? params[0] : config.options.txtUploadStoreUrl;
		var toFilename = params[1] ? params[1] : config.options.txtUploadFilename;
		var backupDir = params[2] ? params[2] : config.options.txtUploadBackupDir;
		var uploadDir = params[3] ? params[3] : config.options.txtUploadDir;
		var username = params[4] ? params[4] : config.options.txtUploadUserName;
		var password = config.options.pasUploadPassword; // for security reason no password as macro parameter	
		// for still missing parameter set default value
		if ((!storeUrl) && (document.location.toString().substr(0,4) == "http")) 
			storeUrl = bidix.dirname(document.location.toString())+'/'+config.macros.upload.defaultStoreScript;
		if (storeUrl.substr(0,4) != "http")
			storeUrl = bidix.dirname(document.location.toString()) +'/'+ storeUrl;
		if (!toFilename)
			toFilename = bidix.basename(window.location.toString());
		if (!toFilename)
			toFilename = config.macros.upload.defaultToFilename;
		if (!uploadDir)
			uploadDir = config.macros.upload.defaultUploadDir;
		if (!backupDir)
			backupDir = config.macros.upload.defaultBackupDir;
		// report error if still missing
		if (!storeUrl) {
			alert(config.macros.upload.messages.noStoreUrl);
			clearMessage();
			return false;
		}
		if (config.macros.upload.authenticateUser && (!username || !password)) {
			alert(config.macros.upload.messages.usernameOrPasswordMissing);
			clearMessage();
			return false;
		}
		bidix.upload.uploadChanges(false,null,storeUrl, toFilename, uploadDir, backupDir, username, password); 
		return false; 
};

config.macros.upload.destFile = function(storeUrl, toFilename, uploadDir) 
{
	if (!storeUrl)
		return null;
		var dest = bidix.dirname(storeUrl);
		if (uploadDir && uploadDir != '.')
			dest = dest + '/' + uploadDir;
		dest = dest + '/' + toFilename;
	return dest;
};

//
// uploadOptions Macro
//

config.macros.uploadOptions = {
	handler: function(place,macroName,params) {
		var wizard = new Wizard();
		wizard.createWizard(place,this.wizardTitle);
		wizard.addStep(this.step1Title,this.step1Html);
		var markList = wizard.getElement("markList");
		var listWrapper = document.createElement("div");
		markList.parentNode.insertBefore(listWrapper,markList);
		wizard.setValue("listWrapper",listWrapper);
		this.refreshOptions(listWrapper,false);
		var uploadCaption;
		if (document.location.toString().substr(0,4) == "http") 
			uploadCaption = config.macros.upload.label.saveLabel;
		else
			uploadCaption = config.macros.upload.label.uploadLabel;
		
		wizard.setButtons([
				{caption: uploadCaption, tooltip: config.macros.upload.label.promptOption, 
					onClick: config.macros.upload.action},
				{caption: this.cancelButton, tooltip: this.cancelButtonPrompt, onClick: this.onCancel}
				
			]);
	},
	options: [
		"txtUploadUserName",
		"pasUploadPassword",
		"txtUploadStoreUrl",
		"txtUploadDir",
		"txtUploadFilename",
		"txtUploadBackupDir",
		"chkUploadLog",
		"txtUploadLogMaxLine"		
	],
	refreshOptions: function(listWrapper) {
		var opts = [];
		for(i=0; i<this.options.length; i++) {
			var opt = {};
			opts.push();
			opt.option = "";
			n = this.options[i];
			opt.name = n;
			opt.lowlight = !config.optionsDesc[n];
			opt.description = opt.lowlight ? this.unknownDescription : config.optionsDesc[n];
			opts.push(opt);
		}
		var listview = ListView.create(listWrapper,opts,this.listViewTemplate);
		for(n=0; n<opts.length; n++) {
			var type = opts[n].name.substr(0,3);
			var h = config.macros.option.types[type];
			if (h && h.create) {
				h.create(opts[n].colElements['option'],type,opts[n].name,opts[n].name,"no");
			}
		}
		
	},
	onCancel: function(e)
	{
		backstage.switchTab(null);
		return false;
	},
	
	wizardTitle: "Upload with options",
	step1Title: "These options are saved in cookies in your browser",
	step1Html: "<input type='hidden' name='markList'></input><br>",
	cancelButton: "Cancel",
	cancelButtonPrompt: "Cancel prompt",
	listViewTemplate: {
		columns: [
			{name: 'Description', field: 'description', title: "Description", type: 'WikiText'},
			{name: 'Option', field: 'option', title: "Option", type: 'String'},
			{name: 'Name', field: 'name', title: "Name", type: 'String'}
			],
		rowClasses: [
			{className: 'lowlight', field: 'lowlight'} 
			]}
};

//
// upload functions
//

if (!bidix.upload) bidix.upload = {};

if (!bidix.upload.messages) bidix.upload.messages = {
	//from saving
	invalidFileError: "The original file '%0' does not appear to be a valid TiddlyWiki",
	backupSaved: "Backup saved",
	backupFailed: "Failed to upload backup file",
	rssSaved: "RSS feed uploaded",
	rssFailed: "Failed to upload RSS feed file",
	emptySaved: "Empty template uploaded",
	emptyFailed: "Failed to upload empty template file",
	mainSaved: "Main TiddlyWiki file uploaded",
	mainFailed: "Failed to upload main TiddlyWiki file. Your changes have not been saved",
	//specific upload
	loadOriginalHttpPostError: "Can't get original file",
	aboutToSaveOnHttpPost: 'About to upload on %0 ...',
	storePhpNotFound: "The store script '%0' was not found."
};

bidix.upload.uploadChanges = function(onlyIfDirty,tiddlers,storeUrl,toFilename,uploadDir,backupDir,username,password)
{
	var callback = function(status,uploadParams,original,url,xhr) {
		if (!status) {
			displayMessage(bidix.upload.messages.loadOriginalHttpPostError);
			return;
		}
		if (bidix.debugMode) 
			alert(original.substr(0,500)+"\n...");
		// Locate the storeArea div's 
		var posDiv = locateStoreArea(original);
		if((posDiv[0] == -1) || (posDiv[1] == -1)) {
			alert(config.messages.invalidFileError.format([localPath]));
			return;
		}
		bidix.upload.uploadRss(uploadParams,original,posDiv);
	};
	
	if(onlyIfDirty && !store.isDirty())
		return;
	clearMessage();
	// save on localdisk ?
	if (document.location.toString().substr(0,4) == "file") {
		var path = document.location.toString();
		var localPath = getLocalPath(path);
		saveChanges();
	}
	// get original
	var uploadParams = new Array(storeUrl,toFilename,uploadDir,backupDir,username,password);
	var originalPath = document.location.toString();
	// If url is a directory : add index.html
	if (originalPath.charAt(originalPath.length-1) == "/")
		originalPath = originalPath + "index.html";
	var dest = config.macros.upload.destFile(storeUrl,toFilename,uploadDir);
	var log = new bidix.UploadLog();
	log.startUpload(storeUrl, dest, uploadDir,  backupDir);
	displayMessage(bidix.upload.messages.aboutToSaveOnHttpPost.format([dest]));
	if (bidix.debugMode) 
		alert("about to execute Http - GET on "+originalPath);
	var r = doHttp("GET",originalPath,null,null,username,password,callback,uploadParams,null);
	if (typeof r == "string")
		displayMessage(r);
	return r;
};

bidix.upload.uploadRss = function(uploadParams,original,posDiv) 
{
	var callback = function(status,params,responseText,url,xhr) {
		if(status) {
			var destfile = responseText.substring(responseText.indexOf("destfile:")+9,responseText.indexOf("\n", responseText.indexOf("destfile:")));
			displayMessage(bidix.upload.messages.rssSaved,bidix.dirname(url)+'/'+destfile);
			bidix.upload.uploadMain(params[0],params[1],params[2]);
		} else {
			displayMessage(bidix.upload.messages.rssFailed);			
		}
	};
	// do uploadRss
	if(config.options.chkGenerateAnRssFeed) {
		var rssPath = uploadParams[1].substr(0,uploadParams[1].lastIndexOf(".")) + ".xml";
		var rssUploadParams = new Array(uploadParams[0],rssPath,uploadParams[2],'',uploadParams[4],uploadParams[5]);
		var rssString = generateRss();
		// no UnicodeToUTF8 conversion needed when location is "file" !!!
		if (document.location.toString().substr(0,4) != "file")
			rssString = convertUnicodeToUTF8(rssString);	
		bidix.upload.httpUpload(rssUploadParams,rssString,callback,Array(uploadParams,original,posDiv));
	} else {
		bidix.upload.uploadMain(uploadParams,original,posDiv);
	}
};

bidix.upload.uploadMain = function(uploadParams,original,posDiv) 
{
	var callback = function(status,params,responseText,url,xhr) {
		var log = new bidix.UploadLog();
		if(status) {
			// if backupDir specified
			if ((params[3]) && (responseText.indexOf("backupfile:") > -1))  {
				var backupfile = responseText.substring(responseText.indexOf("backupfile:")+11,responseText.indexOf("\n", responseText.indexOf("backupfile:")));
				displayMessage(bidix.upload.messages.backupSaved,bidix.dirname(url)+'/'+backupfile);
			}
			var destfile = responseText.substring(responseText.indexOf("destfile:")+9,responseText.indexOf("\n", responseText.indexOf("destfile:")));
			displayMessage(bidix.upload.messages.mainSaved,bidix.dirname(url)+'/'+destfile);
			store.setDirty(false);
			log.endUpload("ok");
		} else {
			alert(bidix.upload.messages.mainFailed);
			displayMessage(bidix.upload.messages.mainFailed);
			log.endUpload("failed");			
		}
	};
	// do uploadMain
	var revised = bidix.upload.updateOriginal(original,posDiv);
	bidix.upload.httpUpload(uploadParams,revised,callback,uploadParams);
};

bidix.upload.httpUpload = function(uploadParams,data,callback,params)
{
	var localCallback = function(status,params,responseText,url,xhr) {
		url = (url.indexOf("nocache=") < 0 ? url : url.substring(0,url.indexOf("nocache=")-1));
		if (xhr.status == httpStatus.NotFound)
			alert(bidix.upload.messages.storePhpNotFound.format([url]));
		if ((bidix.debugMode) || (responseText.indexOf("Debug mode") >= 0 )) {
			alert(responseText);
			if (responseText.indexOf("Debug mode") >= 0 )
				responseText = responseText.substring(responseText.indexOf("\n\n")+2);
		} else if (responseText.charAt(0) != '0') 
			alert(responseText);
		if (responseText.charAt(0) != '0')
			status = null;
		callback(status,params,responseText,url,xhr);
	};
	// do httpUpload
	var boundary = "---------------------------"+"AaB03x";	
	var uploadFormName = "UploadPlugin";
	// compose headers data
	var sheader = "";
	sheader += "--" + boundary + "\r\nContent-disposition: form-data; name=\"";
	sheader += uploadFormName +"\"\r\n\r\n";
	sheader += "backupDir="+uploadParams[3] +
				";user=" + uploadParams[4] +
				";password=" + uploadParams[5] +
				";uploaddir=" + uploadParams[2];
	if (bidix.debugMode)
		sheader += ";debug=1";
	sheader += ";;\r\n"; 
	sheader += "\r\n" + "--" + boundary + "\r\n";
	sheader += "Content-disposition: form-data; name=\"userfile\"; filename=\""+uploadParams[1]+"\"\r\n";
	sheader += "Content-Type: text/html;charset=UTF-8" + "\r\n";
	sheader += "Content-Length: " + data.length + "\r\n\r\n";
	// compose trailer data
	var strailer = new String();
	strailer = "\r\n--" + boundary + "--\r\n";
	data = sheader + data + strailer;
	if (bidix.debugMode) alert("about to execute Http - POST on "+uploadParams[0]+"\n with \n"+data.substr(0,500)+ " ... ");
	var r = doHttp("POST",uploadParams[0],data,"multipart/form-data; ;charset=UTF-8; boundary="+boundary,uploadParams[4],uploadParams[5],localCallback,params,null);
	if (typeof r == "string")
		displayMessage(r);
	return r;
};

// same as Saving's updateOriginal but without convertUnicodeToUTF8 calls
bidix.upload.updateOriginal = function(original, posDiv)
{
	if (!posDiv)
		posDiv = locateStoreArea(original);
	if((posDiv[0] == -1) || (posDiv[1] == -1)) {
		alert(config.messages.invalidFileError.format([localPath]));
		return;
	}
	var revised = original.substr(0,posDiv[0] + startSaveArea.length) + "\n" +
				store.allTiddlersAsHtml() + "\n" +
				original.substr(posDiv[1]);
	var newSiteTitle = getPageTitle().htmlEncode();
	revised = revised.replaceChunk("<title"+">","</title"+">"," " + newSiteTitle + " ");
	revised = updateMarkupBlock(revised,"PRE-HEAD","MarkupPreHead");
	revised = updateMarkupBlock(revised,"POST-HEAD","MarkupPostHead");
	revised = updateMarkupBlock(revised,"PRE-BODY","MarkupPreBody");
	revised = updateMarkupBlock(revised,"POST-SCRIPT","MarkupPostBody");
	return revised;
};

//
// UploadLog
// 
// config.options.chkUploadLog :
//		false : no logging
//		true : logging
// config.options.txtUploadLogMaxLine :
//		-1 : no limit
//      0 :  no Log lines but UploadLog is still in place
//		n :  the last n lines are only kept
//		NaN : no limit (-1)

bidix.UploadLog = function() {
	if (!config.options.chkUploadLog) 
		return; // this.tiddler = null
	this.tiddler = store.getTiddler("UploadLog");
	if (!this.tiddler) {
		this.tiddler = new Tiddler();
		this.tiddler.title = "UploadLog";
		this.tiddler.text = "| !date | !user | !location | !storeUrl | !uploadDir | !toFilename | !backupdir | !origin |";
		this.tiddler.created = new Date();
		this.tiddler.modifier = config.options.txtUserName;
		this.tiddler.modified = new Date();
		store.addTiddler(this.tiddler);
	}
	return this;
};

bidix.UploadLog.prototype.addText = function(text) {
	if (!this.tiddler)
		return;
	// retrieve maxLine when we need it
	var maxLine = parseInt(config.options.txtUploadLogMaxLine,10);
	if (isNaN(maxLine))
		maxLine = -1;
	// add text
	if (maxLine != 0) 
		this.tiddler.text = this.tiddler.text + text;
	// Trunck to maxLine
	if (maxLine >= 0) {
		var textArray = this.tiddler.text.split('\n');
		if (textArray.length > maxLine + 1)
			textArray.splice(1,textArray.length-1-maxLine);
			this.tiddler.text = textArray.join('\n');		
	}
	// update tiddler fields
	this.tiddler.modifier = config.options.txtUserName;
	this.tiddler.modified = new Date();
	store.addTiddler(this.tiddler);
	// refresh and notifiy for immediate update
	story.refreshTiddler(this.tiddler.title);
	store.notify(this.tiddler.title, true);
};

bidix.UploadLog.prototype.startUpload = function(storeUrl, toFilename, uploadDir,  backupDir) {
	if (!this.tiddler)
		return;
	var now = new Date();
	var text = "\n| ";
	var filename = bidix.basename(document.location.toString());
	if (!filename) filename = '/';
	text += now.formatString("0DD/0MM/YYYY 0hh:0mm:0ss") +" | ";
	text += config.options.txtUserName + " | ";
	text += "[["+filename+"|"+location + "]] |";
	text += " [[" + bidix.basename(storeUrl) + "|" + storeUrl + "]] | ";
	text += uploadDir + " | ";
	text += "[[" + bidix.basename(toFilename) + " | " +toFilename + "]] | ";
	text += backupDir + " |";
	this.addText(text);
};

bidix.UploadLog.prototype.endUpload = function(status) {
	if (!this.tiddler)
		return;
	this.addText(" "+status+" |");
};

//
// Utilities
// 

bidix.checkPlugin = function(plugin, major, minor, revision) {
	var ext = version.extensions[plugin];
	if (!
		(ext  && 
			((ext.major > major) || 
			((ext.major == major) && (ext.minor > minor))  ||
			((ext.major == major) && (ext.minor == minor) && (ext.revision >= revision))))) {
			// write error in PluginManager
			if (pluginInfo)
				pluginInfo.log.push("Requires " + plugin + " " + major + "." + minor + "." + revision);
			eval(plugin); // generate an error : "Error: ReferenceError: xxxx is not defined"
	}
};

bidix.dirname = function(filePath) {
	if (!filePath) 
		return;
	var lastpos;
	if ((lastpos = filePath.lastIndexOf("/")) != -1) {
		return filePath.substring(0, lastpos);
	} else {
		return filePath.substring(0, filePath.lastIndexOf("\\"));
	}
};

bidix.basename = function(filePath) {
	if (!filePath) 
		return;
	var lastpos;
	if ((lastpos = filePath.lastIndexOf("#")) != -1) 
		filePath = filePath.substring(0, lastpos);
	if ((lastpos = filePath.lastIndexOf("/")) != -1) {
		return filePath.substring(lastpos + 1);
	} else
		return filePath.substring(filePath.lastIndexOf("\\")+1);
};

bidix.initOption = function(name,value) {
	if (!config.options[name])
		config.options[name] = value;
};

//
// Initializations
//

// require PasswordOptionPlugin 1.0.1 or better
bidix.checkPlugin("PasswordOptionPlugin", 1, 0, 1);

// styleSheet
setStylesheet('.txtUploadStoreUrl, .txtUploadBackupDir, .txtUploadDir {width: 22em;}',"uploadPluginStyles");

//optionsDesc
merge(config.optionsDesc,{
	txtUploadStoreUrl: "Url of the UploadService script (default: store.php)",
	txtUploadFilename: "Filename of the uploaded file (default: in index.html)",
	txtUploadDir: "Relative Directory where to store the file (default: . (downloadService directory))",
	txtUploadBackupDir: "Relative Directory where to backup the file. If empty no backup. (default: ''(empty))",
	txtUploadUserName: "Upload Username",
	pasUploadPassword: "Upload Password",
	chkUploadLog: "do Logging in UploadLog (default: true)",
	txtUploadLogMaxLine: "Maximum of lines in UploadLog (default: 10)"
});

// Options Initializations
bidix.initOption('txtUploadStoreUrl','');
bidix.initOption('txtUploadFilename','');
bidix.initOption('txtUploadDir','');
bidix.initOption('txtUploadBackupDir','');
bidix.initOption('txtUploadUserName','');
bidix.initOption('pasUploadPassword','');
bidix.initOption('chkUploadLog',true);
bidix.initOption('txtUploadLogMaxLine','10');


// Backstage
merge(config.tasks,{
	uploadOptions: {text: "upload", tooltip: "Change UploadOptions and Upload", content: '<<uploadOptions>>'}
});
config.backstageTasks.push("uploadOptions");


//}}}

<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div>
<div class='tagClear'></div>
<!--}}}-->
This is just a modest adaptation of ~TiddlyWiki for use as a webpage. I created it for my own use, but thought others might like an empty template of it. To see a working example of the Webview ~TiddlyWiki, [[see here|http://www.giffmex.org/webviewtwexample.html]]. ''Features of WebviewTW:''
*I have reduced as much clutter as possible, so as not to confuse first time visitors to your site: the header is gone, the sidebar hidden, and Tiddler elements such as author, date created, tagged and tagging have been removed. The mainmenu has a toolbox, which itself can be gutted if desired, when you are ready to upload.
*Only one tiddler opens at a time.
*There is a way to create a series of tiddlers linked in a colorful subtopic menu above the tiddler titles (the three squares above are an example of a subtopic menu and include the instructions necessary to create one). These are good just as subtopic menus, but are also meant for slideshows and linear tutorials and lessons. The idea is similar to the [[PresentationPlugin|http://lewcid.googlepages.com/presentation_empty_full.html#Documentation]], but this setup operates in a different way.
*Saving options have been set to ~SaveBackup:unchecked, and Animations:disabled, and the sidebar is hidden by default. (See [[z_configOptions]] to change these)
*In edit mode there are a number of easyEdit menus. See [[Formatting cheatsheet]] for details. There are also several color palettes to choose from (found in the Sidebar).
*The UploadPlugin and SplashScreenPlugin are installed. For directions for the UploadPlugin, see [[this external link|http://www.giffmex.org/twfortherestofus.html#%5B%5BSimple%20instructions%20for%20BidiX's%20UploadPlugin%5D%5D]]. My apologies to Alan Hecht, the creator of the ~WebViewPlugin - there is no relation between this adaptation and that plugin, which is not used here.
!Synopsis of site postings
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Thursday, 23OCT:</html>''Next Coding Assignment #4'': see @@[[LODprofile]]@@
* You need to produce two separate xy plot of LOD score versus nucleotide position
**   These plots as well as your script are due by email at 5 pm Friday, 31 OCT.
**   Ask questions if you get stuck on any part of this assignment.  
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Friday, 04OCT:</html>''Next Coding Assignment #3'': see @@[[WordCompareBenchmark]]@@
* You need to produce an xy plot of CPU time versus sequence length
**   This is due by email at 5 pm Friday, 10 OCT.
* ALSO, helps/hints for [[Home Work #2|AAFreqCode]] are now available if you need assistance: @@[[AAFChint1]]@@
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Tuesday, 30SEP:</html>''Next Coding Assignment'': see @@[[AAFreqCode]]@@
* You need to produce an xy plot of amino acid frequency data
**   This is due by email at 5 pm Monday, 06 OCT.
* Emily Maung produced a very well documented script for the Code 01 exercise and if anyone is still unsure about what does what in that script, look over her comments: @@MaungCode01@@
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Friday, 26SEP:</html>''Coding Assignment #1'': see @@[[AAcount]]@@
* A sixth code block for counting has been posted: @@[[AAcount5]]@@
** This shows the steps you need to calculate the mean AA freqs from the two hash arrays that are in the script.
* Also note that these freq values will not be exactly equal: @@[[AAdistribution]]@@
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Wednesday, 24SEP:</html>''Coding Assignment #1'': see @@[[AAcount]]@@
* A fifth code block for counting has been posted: @@[[AAcount4]]@@
** This is the working code we built in class with a few minor additions
* Note the new @@[[&Round routine|Round]]@@ that has been added to the Working Code page. (Why do we add +0.5? . . . //hmmmm, looks like a good question for the mid term exam . . .// )
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Monday, 22SEP:</html>''Coding Assignment #1'': see @@[[AAcount]]@@
* A second code block for counting has been posted: @@[[AAcount2]]@@
* Subroutine code blocks have been posted: @@[[CodeWorks]]@@
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Friday, 19SEP:</html>''Coding Assignment #1'': see @@[[AAcount]]@@
* Edit script for scoring sequence metrics based on amino acid composition
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Thursday, 18SEP:</html>''Command Reference Page'': see [[Commands]]
* Collection of common functions defined in one spot in the [[Resources]] section.
* I will add to this during the course as we encounter new material.
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Friday, 12SEP:</html>''FASTA translate script resources'': see [[Lecture 3|L03]]
* Code editing is described here: [[L03.01]]
** just replace the Task 2 code block in your FASTA reader script and you will now have a FASTA translator script
* Full code for the translator is here: FASTAtranslate 
* AND . . . more help on using the command window on PCs:
** [[Emily's PC command window summary|00/EmilyBasicWindowsStuff.pdf]]
!!!
<html><div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
Wednesday, 10SEP:</html>''FASTA read script resources'': added after [[Lecture 02|L02]]
* Annotated and dissected script code:
** Here's a fully commented version of the FASTA reader script: FASTAread-NOTED
** Here's a presentation discussion of that annotated script with screen dumps to illustrate what values some variables are equal to during different parts of the run: [[L02.03]]   
* Clean Code - ready for prime time
** New and improved FASTA reader is available [[HERE|FASTAread]]
** This version is almost identical to the annotated script above, except:
### All the comments have been stripped out
### The actual FASTA read code has been put into a SUBROUTINE format
!!!
''* * * *   Active PERL for Windows PCs   * * * *''

If you are reading this page, you are likely having significant problems getting PERL running on your PC. So . . . . . . .

@@1.@@ Let's Start Over. Even if you already have a C:\Perl directory, let's pretend you don't.

@@2.@@ Here is a local link to the current Windows-x86-MSI version available from Active state: [[EASY DOWNLOAD|00/ActivePerl-5.10.0.1003-MSWin32-x86-285500.msi]]. After clicking on this link, if you are requested to do anything, select save and put it on your desktop. The file will be named: "ActivePerl-5.10.0.1003-MSWin32-x86-285500.msi"

@@3.@@ The "msi" extension on the file stands for "MicroSoft Installer". Double-click the file and it will start running an automatic installation routine. Just accept the default location ("C:\Perl") and on the options screen leave the boxes checked for: "Add Perl to PATH environmental variable" and "Create Perl file extension associations."

@@4.@@ Once the installer finishes, click START, click RUN, enter the command "cmd" (without quotations) and press RETURN to open a terminal window. 

@@5.@@ At the command prompt, enter "perl -v" (w/o quotes), hit RETURN and the current version number of the perl install should be displayed in the terminal window. You are now ready to run perl scripts.

''If this worked, then you are ready to go to the test script here:'' [[Running PERL]]

//If the "perl -v" command doesn't work, then we will have to contact Active State and try to troubleshoot what's wrong with your computer.//

[[BACK|Resource Index]]
!
!Alignment of word strings
This code was presented in rough form for lecture 5. It is now "polished" by having most functions compartmentalized as subroutines. Strings of different lengths can be compared. Scoring values are input in the User Variable section.
{{{
#!/usr/bin/perl
use strict;
use Benchmark;
$|=1;

# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# 01OCT Lecture 5.
# How to compare two sequences . . . . 
# First cut at running a direct brute force alignment

# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $seq1 = "CATDOG";
my $seq2 = "BATHOG";
# Alignment options: . . . . . 
	# decimal percent of identical matches
		my $MinIdentityLimit = 0.5;
	# decimal percent of how much sequence MUST be aligned
		my $OVERlap = 0.5;       
	# score values . . . . . . . . 
		my $IDmatch = 	 8;
		my $MISmatch =   4;
		my $GAPpenalty =  1;


# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my (@Gaps1,@Gaps2);
my (@Seq1, @Seq2);
my @Results;
my $N = int((1 - $OVERlap/2)*(length($seq1) + length($seq2))) + 1;

# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
my $Time0 = new Benchmark;
print "\n\nSTART: Philosophy is a walk on the slippery rocks . . . . . \n\n";

# 1. Initialize the first chain pattern; save in @GapsX . . . . . . 
  	push(@Gaps1,join('',&GapPattern($N,$seq1)));
  	push(@Gaps2,join('',&GapPattern($N,$seq2)));
		
# 2. Generate the permutation patterns . . . . .
	&Permute($N,$seq1,\@Gaps1);
	&Permute($N,$seq2,\@Gaps2);

# 3. Set the sequences into the chain patterns . . . 
	@Seq1 = &SeqChain($seq1,\@Gaps1);
	@Seq2 = &SeqChain($seq2,\@Gaps2);
	
# 4. Alignment scoring results for all gap-seqs . . .
	# @Results = "score|s1|s2|", plus maxscore is last element
	@Results = &AlignScore(\@Seq1,\@Seq2);
	
# 5. Parse the results . . . . . . 
	my $Max = pop(@Results);
	my $count = 0;
	foreach my $align (@Results)
	{	my @x = split(/\|/,$align);
		if ($x[0] == $Max)
		{	$count += 1;
			print "------------------------------\n$count. Score = $x[0]\n";
			print "    |$x[1]|\n    |$x[2]|\n";
		}
	}
&TIME;
print "\n\n\n   DONE: Don't fall in the water.   \n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -

# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub GapPattern
{	# call with: &GapPattern($N,$seq1)
	# returns the first seq-gap character string
	my $n = shift(@_);
	my $s = shift(@_);
	my @c;
	foreach (1..length($s))   {	push(@c,"x"); }
	foreach (length($s)+1..$n){	push(@c,"-"); }
	return @c;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Permute
{	# called with: &Permute($N,$seq1,\@Gaps1);
	# returns all seq-gap permutations
	my ($n, $m, $gaps) = @_;
	my @chain = split(//,@{$gaps}[0]);
	foreach my $i (1..length($m))
	{	for(my $j = length($m)-1; $j>=0; $j -= 1)
		{	foreach my $k ($j..$n-2)
			{   @chain[$k,$i+$k] = @chain[$i+$k, $k];
				my $match = 0;
				my $seq = join('',@chain);
			# Check to see if seq pattern has already been found:
				foreach my $gap (@{$gaps})
				{	if ($gap eq $seq){ $match = 1; last;} }
			# Store unique chain patterns in @Gaps:
				if ($match == 0)
				{	push(@{$gaps},$seq); }  
			}
		}
	}
	my $z = $#{$gaps} + 1;
	print "There are $z permutations for \"$m\" using \"@{$gaps}[0]\"\n";	
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub SeqChain
{	# call with: @Seq1 = &SeqChain($seq1,\@Gaps1);
	# returns the seq-gap chains with SEQUECE data
	my ($seq, $gaps) = @_;
	my @seqchains;
	foreach my $gap (@{$gaps})
	{	my @x = split(//, $gap);
		my @s = split(//,$seq);
		my $gapseq = "";
		foreach my $x (@x)
		{	if ($x =~ m/-/)
			{	$gapseq .= "-"; }
			else
			{	$gapseq .= shift(@s); }
		}
		push (@seqchains, $gapseq);
		# print "$gapseq\n";
	}
	return @seqchains;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub AlignScore
{	# call with: @Results = &AlignScore(\@Seq1,\@Seq2);
	# returns array with all "score|gap-seq1|gap-seq2|" + $max at the end
	my ($seq1, $seq2) = @_;
	my @results;
	my $max = 0;
	my @G;
	foreach my $s1 (@{$seq1})
	{	foreach my $s2 (@{$seq2})
		{	my @c2 = split(//,$s2);    # set the @c2 seq pattern
			my @c1 = split(//,$s1);    # reset the @c1 seq pattern
			my $score = 0;
			my $distance;
		# Check for identity threshold . . . . 
			my $T = 0;
			foreach my $i (0..$#c1)
			{	if ($c1[$i] eq $c2[$i] && $c2[$i] ne "-" )
				{	$T += 1; }
			}
			if ($T >= $MinIdentityLimit)
			{	# Check alignment and remove overlap gaps
				my $DONE = 0;
				while (!$DONE)
				{	my $splice = 0;
					my $k = $#c1;
					foreach my $i (0..$k)
					{	if ($c1[$i] eq "-" && $c2[$i] eq "-" )
						{	splice(@c1,$i,1);
							splice(@c2,$i,1);
							$splice = 1;
						}
					}
					if ($splice == 0){ $DONE = 1; }
				}
				# Check if unique alignment . . . 
				my $str = join('',@c1)."|".join('',@c2);
				foreach my $align (@G)
				{	if ($str eq $align){ $str = "FOUND"; } }
				unless ($str eq "FOUND")
				{ 	push(@G,$str);
					my $n = $#c1 + $#c2 + 2;
#. . . . . . . . . . . . . . . . . . . . .
# SCORING SCORING SCORING SCORING SCORING
	foreach my $i (0..$#c1)
	{	# $distance = abs($i-$n/2); # distance from middle
		# Amino Acid matching . . . . 
		if ($c1[$i] =~ /\w/ && $c2[$i] =~ /\w/ ) 
		{	if ($c1[$i] eq $c2[$i])
			{	$score += $IDmatch; }
			else
			{	$score -= $MISmatch; }
		}
		# Gap penalty . . . . . 
		elsif ($c1[$i] eq "-" || $c2[$i] eq "-" )				
		{	$score -= $GAPpenalty; }
	}
# SCORING SCORING SCORING SCORING SCORING
#. . . . . . . . . . . . . . . . . . . . .
					if ($score > $max) { $max = $score; }
					$str = $score."|".$str."|";
					push(@results,$str);
					
				} # unless $str is FOUND
			} # end if $T >= $MinIdentityLimit
		}# end foreach @Seq2
	} # end foreach @Seq1
	push(@results, $max);
	return @results;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TIME
{
	my $t1 = new Benchmark;
    my $td = timediff($t1, $Time0);
    print "\n(Time for code execution :",timestr($td),")\n";
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -

}}}
[[Back to Lecture 5|L05]]
!!!
!Rough Alignment Code:
This listing is just the rough code. I will work on getting the code blocks into subroutines for more convenient use.
{{{
#!/usr/bin/perl
use strict;
$|=1;

# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# 01OCT Lecture 5.
# How to brute force compare two sequences . . . . 
# AGM2008
# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $seq1 = "CATDOG";
my $seq2 = "BATHOG";

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my @Gaps;
my @Seq1;
my @Seq2;
my $N = length($seq1) + length($seq2);



# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\n  Philosophy is a walk on the slippery rocks . . . . . \n\n";

my @chain;
foreach (1..length($seq1))   {	push(@chain,"x"); }
foreach (length($seq1)+1..$N){	push(@chain,"-"); }
push(@Gaps,join('',@chain));

foreach my $i (1..length($seq1))
{	for(my $j = length($seq1)-1; $j>=0; $j -= 1)
	{	foreach my $k ($j..$N-2)
		{   @chain[$k,$i+$k] = @chain[$i+$k, $k];
			my $match = 0;
			my $seq = join('',@chain);
		# Check to see if seq pattern has already been found:
			foreach my $gap (@Gaps)
			{	if ($gap eq $seq){ $match = 1; last;} }
		# Store unique chain patterns in @Gaps:
			if ($match == 0)
			{	push(@Gaps,$seq); }  
		}
	}
}
my $n = $#Gaps +1;
print "\nThere are $n permutations in the gap set\n";

# Set Seq array 1 - - - - - - - - - - - 
foreach my $gap (@Gaps)
{	my @x = split(//, $gap);
	my @seq = split(//,$seq1);
	my $gapseq = "";
	foreach my $x (@x)
	{	if ($x =~ m/-/)
		{	$gapseq .= "-"; }
		else
		{	$gapseq .= shift(@seq); }
	}
	push (@Seq1, $gapseq);
	#print "$gapseq\n";
}


# Set Seq array 2 - - - - - - - - - - - 
foreach my $gap (@Gaps)
{	my @x = split(//, $gap);
	my @seq = split(//,$seq2);
	my $gapseq = "";
	foreach my $x (@x)
	{	if ($x =~ m/-/)
		{	$gapseq .= "-"; }
		else
		{	$gapseq .= shift(@seq); }
	}
	push (@Seq2, $gapseq);
	# print "$gapseq\n";
}

# Sequence scoring - - - - - - - - - - - 
my $count = 0;
my $max = 0;
my ($t1,$t2); 
foreach my $s1 (@Seq1)
{	my @c1 = split(//,$s1);
	foreach my $s2 (@Seq2)
	{	my @c2 = split(//,$s2);
		my $score = 0;
		foreach my $i (0..$#c1)
		{	# Amino Acid matching . . . . 
			if ($c1[$i] ne "-" && $c2[$i] ne "-" ) 
			{	if ($c1[$i] eq $c2[$i])
				{	$score += 4; }
				else
				{	$score -= 2; }
			}
			# Gap penalty . . . . . 
			elsif ($c1[$i] eq "-" && $c2[$i] eq "-" )
			{	$score -= 1; }
		}
		
		if ($score >= $max)
		{	$max = $score;
			$t1 = $s1;
			$t2 = $s2;
		}
		
		# if ($score == 12)
		# {	print "-------------\n";
		# 	print "score = $score\n";
		# 	print "$s1\n";
		# 	print "$s2\n";
		# 	$count += 1;
		# }
	}
}

print "\n\n          MAX ALIGNMENT:\n";
print "          score = $max\n";
print "              $t1\n";
print "              $t2\n";

print "\n\n\n   DONE   \n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - 
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
}}}
!Time Benchmark Alignment Algorithms:
''GOAL:'' This project is designed to familiarize you with running scripts on BIOWOLF and collecting some data about how time-efficient the alignment code blocks may be.

''HERE IS A PLOT FROM MY TRIAL RUN OF THE SCRIPT:'' @@[[Example Plots|WordCompareBenchmarkPlots]]@@

!!!INSTRUCTIONS:
# Download the time-loop version of the ~WordCompare script: @@[[WordCompareLoop]]@@
**   Save it to your computer with the name ''01.4-~WordCompare-LOOP.pl''
**   Edit the two word sequences to something else but keep each string at a length of 10 characters.
**   This script uses a PERL package called "Benchmark" to automatically time computational processes.
# Transfer this file to your BIOWOLF account and put it into the folder "03-SANDBOX". You will be running and working with it from this location.
# Download the control shell script ''00-RUN.sh'' from @@[[RunShell]]@@
**   Save it to your computer with using this name.
**   Edit it at this point so that it contains your email address and replace the yyyy with WCL, which will be the abbreviation we'll use for the Word Compare Loop script that you'll be running.
**   Also transfer this file to the 03-SANDBOX folder in your Biowolf account.
# Now start a command ssh session on Biowolf from a terminal window, where XX is your class number:
**   @@{{{prompt> ssh classXX@biowolf.dbi.udel.edu}}}@@
**   Enter your password
# You should be logged into your Biowolf account at your home folder. The screen prompt is now:
**  @@{{{classxx@biowolf ~ $ }}}@@
** The ''~'' character signifies your home folder level. The "$" is just a punctuation break signifying the end of the prompt text string and the beginning of where you start to type input.
# Now you want to move into the 03-SANDBOX folder with this command:
**  @@{{{classxx@biowolf ~ $ cd 03*}}}@@
**  And the screen should display this below showing you that you are in the right folder (~/03-SANDBOX):
**  @@{{{classxx@biowolf ~/03-SANDBOX $ }}}@@
# List the contents of this folder with the "ls" command:
**  INPUT: @@{{{classxx@biowolf ~/03-SANDBOX $ ls}}}@@
**  OUTPUT: @@{{{00-RUN.sh  01.4-WordCompare-LOOP.pl}}}@@
**  This shows you that there are two files here, __00-RUN.sh__ and __01.4-~WordCompare-LOOP.pl__
# Change execution permessions on the 00-RUN.sh file so it can "run" on Biowolf:
**   INPUT: @@{{{classxx@biowolf ~/03-SANDBOX $ chmod 755 00-RUN.sh }}}@@
**   You can see file permission settings by using the "ls" command with option "-l":
***   INPUT: @@{{{classxx@biowolf ~/03-SANDBOX $ ls -l }}}@@
**   The screen listing will have one file on each line. The info for 00-RUN.sh should look like this:
***  OUTPUT: @@{{{-rwxr-xr-x 1 class00 class  236 Oct  3 11:02 00-RUN.sh}}}@@ 
***  Note that the "x" = execute priviledge, "r" = read priviledge, "w"= write priviledge.
***  The first three letters are the user (you) priviledges, the second three letters are for the group priviledges (in this case your classxx account is part of a group called "class", the last three letters are for world priviledges (which means anyone who can log into Biowolf).  
# Jobs are submitted to Biowolf's SGE using the RUN shell script. It is important that the program name on the last line of the script (after perl) is exactly the program name in the list command above. If not, re-edit the RUN script on your local computer and then transfer a new copy of it to Biowolf. The command for submitting jobs is: ''qsub''
**   INPUT: @@{{{classxx@biowolf ~/03-SANDBOX $ qsub 00-RUN.sh }}}@@
**   OUTPUT:  @@{{{Your job <some number> ("WCL") has been submitted.}}}@@
***  Note that your job will now have the id value "WCL" (or whatever you put in the shell script, or yyyy if you didn't change it at all).
# You can check the status of your job using the command: ''qstat''
**   INPUT: @@{{{classxx@biowolf ~/03-SANDBOX $ qstat }}}@@
**   Look at the table of jobs for your class number under USER and "WCL" under NAME
# Biowolf will generate two output files while it is running: one I call the "o" file which contains all the text that would normally be printed to the screen; the other I call the "po" file which you can ignore.
# To see how the program is running, use the ''more'' command to look at the contents of the "o" file:
**   INPUT: @@{{{classxx@biowolf ~/03-SANDBOX $ more WLC.o* }}}@@ 
**   Note that the output file starts with the job id name that you gave it, then a ".oZZZZZZ" where z is the job number.
**   Tap the ~SPACE-BAR to advance 1 screen page at a time; hit enter/return to advance 1 line at a time.
# If you run into problems and need to re-submit the job, you should delete the current WLC* text files that have been generated using the "rm" (remove) command:
**   INPUT: @@{{{classxx@biowolf ~/03-SANDBOX $ rm WLC* }}}@@ 
# Now logout and leave. Biowolf will email you when the job is complete.
# When the job is done, log back into Biowolf and transfer the output file ''WCL.oZZZZZZ'' to your local computer. Open in a text editor and look at the time stamp for each alignment length from 1 to 10:
**  @@{{{ Time for code execution :1953 wallclock secs (1952.55 usr +  0.28 sys = 1952.83 CPU))}}}@@
**  Build a quick table of "Number of Characters" and CPU seconds (here 1952.83)
**  Convert the CPU time from seconds to hours.
**  Generate an XY plot of x= Number of Characters and y= CPU hours 
**  Convert plot to image file.
**  ''SEND IMAGE FILE TO ME''

[[BACK to Lecture 6|L06]]
!!!Biowolf Commands:
''qsub shellname'' = submits shellname to be processed by the SGE cluster
''qstat -u classXX'' = checks on the status of __your__ running jobs on Biowolf, (XX is your number)
''qdel jobname'' = stops the running program jobname 

''cd folder'' = change directory
''rm filename'' = remove/delete filename
''ls -l'' = list with all info
''chmod XXX filename'' = change permission settings on filename

!


[[BACK to Home Work 3|WordCompareBenchmark]]
[[BACK to Lecture 6|L06]]
!!!
!Regression for Alignment Time:
~HW3 assignment - plot the CPU time for the sequential alignment of two strings.  
{{{
my $seqX = "CATDOGHORSEGHOSTDEER";
my $seqY = "BATHOGHOUSEGOATSBEAR";
}}}
<html><table><tr>
<td><img src="06/AlignTime-linear.png" style="height:300px"></td>
<td><img src="06/AlignTime-log.png" style="height:300px"></td>
</tr></table></html>
[[BACK to Assignment 3|WordCompareBenchmark]]
!Alignment Benchmarking:
Script iteratively runs alignments for two sequences and outputs the CPU time required for each additional character position. This is the script to run on BIOWOLF for ~HW3. The code is listed here, but it is better if you directly download the file:
@@[[DIRECT DOWNLOAD 01.4-WordCompare-LOOP.pl|05/01.4-WordCompare-LOOP.pl]]@@

{{{
#!/usr/bin/perl
use strict;
use Benchmark;
$|=1;


# - - - - - H E A D E R - - - - - - - - - - - - - - - -
# 01OCT Lecture 5.
# How to compare two sequences . . . . 

# - - - - - U S E R   V A R I A B L E S - - - - - - - -
my $seqX = "CATDOGHORSE";
my $seqY = "BATHOGHOUSE";
# Alignment options: . . . . . 
	# decimal percent of identical matches
		my $MinIdentityLimit = 0.0;
	# decimal percent of how much sequence MUST be aligned
		my $OVERlap = 0.0;       
	# score values . . . . . . . . 
		my $IDmatch = 	 8;
		my $MISmatch =   4;
		my $GAPpenalty =  1;

# - - - - - G L O B A L  V A R I A B L E S  - - - - - -
my (@Gaps1,@Gaps2);
my (@Seq1, @Seq2);
my @Results;

# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
print "\n\nSTART: Philosophy is a walk on the slippery rocks . . . . . \n\n";

foreach (0..length($seqX)-1)
{	my $seq1 = substr($seqX,0,$_+1);
	my $seq2 = substr($seqY,0,$_+1);
	my $N = int((1 - $OVERlap/2)*(length($seq1) + length($seq2))) + 1;
	@Gaps1 = ();
	@Gaps2 = ();
	@Seq1 = ();
	@Seq2 = ();
	@Results = ();
	
	
	my $Time0 = new Benchmark;
	print "\n$_. - - - - - - - - - - - - - - - - - - - - - - - - - - - \n";
	
	# 1. Initialize the first chain pattern; save in @GapsX . . . . . . 
		push(@Gaps1,join('',&GapPattern($N,$seq1)));
		push(@Gaps2,join('',&GapPattern($N,$seq2)));
			
	# 2. Generate the permutation patterns . . . . .
		&Permute($N,$seq1,\@Gaps1);
		&Permute($N,$seq2,\@Gaps2);
	
	# 3. Set the sequences into the chain patterns . . . 
		@Seq1 = &SeqChain($seq1,\@Gaps1);
		@Seq2 = &SeqChain($seq2,\@Gaps2);
		
	# 4. Alignment scoring results for all gap-seqs . . .
		# Return is @Results = "score|s1|s2|", plus maxscore is last element
		@Results = &AlignScore(\@Seq1,\@Seq2);
		&TIME($Time0);
	# 5. Parse the results . . . . . . 
		my $Max = pop(@Results);
		my $count = 0;
		foreach my $align (@Results)
		{	my @x = split(/\|/,$align);
			if ($x[0] == $Max)
			{	$count += 1;
				print "------------------------------\n$count. Score = $x[0]\n";
				print "    |$x[1]|\n    |$x[2]|\n";
			}
		}
}

print "\n\n\n   DONE: Don't fall in the water.   \n\n\n";
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - S U B R O U T I N E S - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - -

# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub GapPattern
{	# call with: &GapPattern($N,$seq1)
	# returns the first seq-gap character string
	my $n = shift(@_);
	my $s = shift(@_);
	my @c;
	foreach (1..length($s))   {	push(@c,"x"); }
	foreach (length($s)+1..$n){	push(@c,"-"); }
	return @c;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub Permute
{	# called with: &Permute($N,$seq1,\@Gaps1);
	# returns all seq-gap permutations
	my ($n, $m, $gaps) = @_;
	my @chain = split(//,@{$gaps}[0]);
	foreach my $i (1..length($m))
	{	for(my $j = length($m)-1; $j>=0; $j -= 1)
		{	foreach my $k ($j..$n-2)
			{   @chain[$k,$i+$k] = @chain[$i+$k, $k];
				my $match = 0;
				my $seq = join('',@chain);
			# Check to see if seq pattern has already been found:
				foreach my $gap (@{$gaps})
				{	if ($gap eq $seq){ $match = 1; last;} }
			# Store unique chain patterns in @Gaps:
				if ($match == 0)
				{	push(@{$gaps},$seq); }  
			}
		}
	}
	my $z = $#{$gaps} + 1;
	print "There are $z permutations for \"$m\" using \"@{$gaps}[0]\"\n";	
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub SeqChain
{	# call with: @Seq1 = &SeqChain($seq1,\@Gaps1);
	# returns the seq-gap chains with SEQUECE data
	my ($seq, $gaps) = @_;
	my @seqchains;
	foreach my $gap (@{$gaps})
	{	my @x = split(//, $gap);
		my @s = split(//,$seq);
		my $gapseq = "";
		foreach my $x (@x)
		{	if ($x =~ m/-/)
			{	$gapseq .= "-"; }
			else
			{	$gapseq .= shift(@s); }
		}
		push (@seqchains, $gapseq);
		# print "$gapseq\n";
	}
	return @seqchains;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub AlignScore
{	# call with: @Results = &AlignScore(\@Seq1,\@Seq2);
	# returns array with all "score|gap-seq1|gap-seq2|" + $max at the end
	my ($seq1, $seq2) = @_;
	my @results;
	my $max = 0;
	my @G;
	foreach my $s1 (@{$seq1})
	{	foreach my $s2 (@{$seq2})
		{	my @c2 = split(//,$s2);    # set the @c2 seq pattern
			my @c1 = split(//,$s1);    # reset the @c1 seq pattern
			my $score = 0;
			my $distance;
		# Check for identity threshold . . . . 
			my $T = 0;
			foreach my $i (0..$#c1)
			{	if ($c1[$i] eq $c2[$i] && $c2[$i] ne "-" )
				{	$T += 1; }
			}
			if ($T >= $MinIdentityLimit)
			{	# Check alignment and remove overlap gaps
				my $DONE = 0;
				while (!$DONE)
				{	my $splice = 0;
					my $k = $#c1;
					foreach my $i (0..$k)
					{	if ($c1[$i] eq "-" && $c2[$i] eq "-" )
						{	splice(@c1,$i,1);
							splice(@c2,$i,1);
							$splice = 1;
						}
					}
					if ($splice == 0){ $DONE = 1; }
				}
				# Check if unique alignment . . . 
				my $str = join('',@c1)."|".join('',@c2);
				foreach my $align (@G)
				{	if ($str eq $align){ $str = "FOUND"; } }
				unless ($str eq "FOUND")
				{ 	push(@G,$str);
					my $n = $#c1 + $#c2 + 2;
					#. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
					# SCORING SCORING SCORING SCORING SCORING . . . . . . . . .
					foreach my $i (0..$#c1)
					{	# $distance = abs($i-$n/2); # distance from middle
						# Amino Acid matching . . . . 
						if ($c1[$i] =~ /\w/ && $c2[$i] =~ /\w/ ) 
						{	if ($c1[$i] eq $c2[$i])
							{	$score += $IDmatch; }
							else
							{	$score -= $MISmatch; }
						}
						# Gap penalty . . . . . 
						elsif ($c1[$i] eq "-" || $c2[$i] eq "-" )				
						{	$score -= $GAPpenalty; }
					}
					# SCORING SCORING SCORING SCORING SCORING . . . . . . . . .
					#. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
					if ($score > $max) { $max = $score; }
					$str = $score."|".$str."|";
					push(@results,$str);
					
				} # unless $str is FOUND
			} # end if $T >= $MinIdentityLimit
		}# end foreach @Seq2
	} # end foreach @Seq1
	push(@results, $max);
	return @results;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
sub TIME
{   my $t0 = $_[0];
	my $t1 = new Benchmark;
    my $td = timediff($t1, $t0);
    print "\n(Time for code execution :",timestr($td),")\n";
}
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - EOF - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - -

}}}
Background: #fff
Foreground: #000
PrimaryPale: #ccccff
PrimaryLight: #ccccff
PrimaryMid: #333366
PrimaryDark: #014
SecondaryPale: #bbbbff
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #333366
TertiaryPale: #eee
TertiaryLight: #ccc
TertiaryMid: #999
TertiaryDark: #9999cc
Error: #f88
!INPUT BREAK CHARACTER

The input stream/file can be split into units based on the ''input break character'' which is set by with the interpreter option ''$/'' (where string is the character or group of characters that will ''divide'' the input):
{{{
$/="string";
}}}
By default, the input break character is set to the new line character "\n". 

[[BACK|Commands]]
!
Background: #fff
Foreground: #000
PrimaryPale: #ffccff
PrimaryLight: #ffccff
PrimaryMid: #ff0066
PrimaryDark: #014
SecondaryPale: #ffcccc
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #ff0066
TertiaryPale: #eee
TertiaryLight: #ccc
TertiaryMid: #999
TertiaryDark: #ff99cc
Error: #f88
!Concatenation operator "."

The "." (period) is an operator that means concatenate string variables. It essentially merges two things together. Given $a and $b below, when they are cancatenated, the value of $newvar is "ProudMary".
{{{
my $a = "Proud";
my $b = "Mary";
my $newvar = $a . $b;
}}}

!!!Questions
{{{
1st: In the FASTAtranslate output process, there was this code:
          my $outfile = "Protein-".$infile;
So what is the use of the $infile here? 
Does the "." between the "Protein" and $infile 
means it is the protein seq of the $infile?
}}}
//Here, $infile is equal to the name of the input fasta file, let's say "Test.txt". The value of $outfile is now "Protein-Test.txt". All this line does is concatenate "Protein-" to the existing value of $infile. It is not a form of object-oriented variable referencing.//

{{{
2nd: In the FASTA readfile subroutine, there was this code:
               foreach my $i (1..$#Lines)
		{	$seq .= $Lines[$i];}
Dose this command join all the Lines together and 
put them into one elements of the $seq? 
So whenever I wanna do this, just 
use $a .=$b[i], this kind of command?
}}}
//Yes, the current string in $Lines[$i] is concatenated (merged) to the end of $seq. Each pass through this loop adds another line of sequence data as $i is incremented from 1 to the last line number in @Lines.//

A common use of the concatenation operator is to use it to "build" message statements during a complex program. Instead of always inserting "print" statements into the code, you can just merge what you intended
to print into one string: {{{ $mssg .= "Line 74: A= $a, B=$b, C=$c\n"; }}}. 
And then later you could add: {{{ $mssg .= "Line 127: X= $x, Y=$y, Z=$z\n"; }}}
At the end of the program you would just dump $mssg to the screen: {{{ print $mssg;}}}

[[BACK|Commands]]
!
/***An adaptation of [[easyFormat]]***/
//{{{
config.commands.Color = new TWkd.Ease('Color','change the color of selected text');

config.commands.Color.addMode({
 name:'Red',
 tooltip:'turns selection red',
 operation:function(){
config.commands.Color.putInPlace("{{red{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Color.addMode({
 name:'Blue',
 tooltip:'turns selection blue',
 operation:function(){
config.commands.Color.putInPlace("{{blue{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Color.addMode({
 name:'Green',
 tooltip:'turns selection green',
 operation:function(){
config.commands.Color.putInPlace("{{green{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Color.addMode({
 name:'Gold',
 tooltip:'turns selection gold',
 operation:function(){
config.commands.Color.putInPlace("{{gold{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Color.addMode({
 name:'Gray',
 tooltip:'turns selection gray',
 operation:function(){
config.commands.Color.putInPlace("{{gray{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Color.addMode({
 name:'Magenta',
 tooltip:'turns selection magenta',
 operation:function(){
config.commands.Color.putInPlace("{{magenta{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Color.addMode({
 name:'Purple',
 tooltip:'turns selection purple',
 operation:function(){
config.commands.Color.putInPlace("{{purple{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Color.addMode({
 name:'Teal',
 tooltip:'turns selection teal',
 operation:function(){
config.commands.Color.putInPlace("{{teal{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Color.addMode({
 name:'Burgundy',
 tooltip:'turns selection burgundy',
 operation:function(){
config.commands.Color.putInPlace("{{burgundy{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
//}}}
/***
|!''Name:''|!easyFormat|
|''Description:''|the format command format selection according to your choice|
|''Version:''|0.1.0|
|''Date:''|13/01/2007|
|''Source:''|[[TWkd|http://yann.perrin.googlepages.com/twkd.html#easyFormat]]|
|''Author:''|[[Yann Perrin|YannPerrin]]|
|''License:''|[[BSD open source license]]|
|''~CoreVersion:''|2.x|
|''Browser:''|Firefox 1.0.4+; Firefox 1.5; InternetExplorer 6.0|
|''Requires:''|@@color:red;''E.A.S.E''@@|
***/
//{{{
config.commands.Format = new TWkd.Ease('Format','format selection accordingly to chosen mode');

config.commands.Format.addMode({
 name:'Bold',
 tooltip:'turns selection into bold text',
 operation:function(){
config.commands.Format.putInPlace("''"+TWkd.context.selection.content+"''",TWkd.context.selection);
 }
});
config.commands.Format.addMode({
 name:'Italic',
 tooltip:'turns selection into italic text',
 operation:function(){
config.commands.Format.putInPlace("//"+TWkd.context.selection.content+"//",TWkd.context.selection);
 }
});
config.commands.Format.addMode({
 name:'Underline',
 tooltip:'underlines selected text',
 operation:function(){
config.commands.Format.putInPlace("__"+TWkd.context.selection.content+"__",TWkd.context.selection);
 }
});
config.commands.Format.addMode({
 name:'Highlight',
 tooltip:'highlight selection',
 operation:function(){
config.commands.Format.putInPlace("@@"+TWkd.context.selection.content+"@@",TWkd.context.selection);
 }
});
config.commands.Format.addMode({
 name:'Hyperlink',
 tooltip:'turns selection into a link using double brackets',
 operation:function(){
config.commands.Format.putInPlace("[["+TWkd.context.selection.content+"]]",TWkd.context.selection);
 }
});
//}}}
/***An adaptation of [[easyFormat]]***/
//{{{
config.commands.Greek = new TWkd.Ease('Greek','formatting for Greek text');

config.commands.Greek.addMode({
 name:'Greek',
 tooltip:'formats Greek text correctly',
 operation:function(){
config.commands.Greek.putInPlace("{{greek{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Greek.addMode({
 name:'GkIndent1x',
 tooltip:'formats Gk and indents text 1x',
 operation:function(){
config.commands.Greek.putInPlace("{{gkindent{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Greek.addMode({
 name:'GkIndent2x',
 tooltip:'formats Gk and indents text 2x',
 operation:function(){
config.commands.Greek.putInPlace("{{gkindent{{{gkindent{"+TWkd.context.selection.content+"}}}}}}",TWkd.context.selection);
 }
});
config.commands.Greek.addMode({
 name:'GkIndent3x',
 tooltip:'formats Gk and indents text 3x',
 operation:function(){
config.commands.Greek.putInPlace("{{gkindent{{{gkindent{{{gkindent{"+TWkd.context.selection.content+"}}}}}}}}}",TWkd.context.selection);
 }
});
config.commands.Greek.addMode({
 name:'GkIndent4x',
 tooltip:'formats Gk and indents text 4x',
 operation:function(){
config.commands.Greek.putInPlace("{{gkindent{{{gkindent{{{gkindent{{{gkindent{"+TWkd.context.selection.content+"}}}}}}}}}}}}",TWkd.context.selection);
 }
});
config.commands.Greek.addMode({
 name:'GkIndent5x',
 tooltip:'formats Gk and indents text 5x',
 operation:function(){
config.commands.Greek.putInPlace("{{gkindent{{{gkindent{{{gkindent{{{gkindent{{{gkindent{"+TWkd.context.selection.content+"}}}}}}}}}}}}}}}",TWkd.context.selection);
 }
});
//}}}
/***An adaptation of [[easyFormat]]***/
//{{{
config.commands.Hebrew = new TWkd.Ease('Hebrew','formatting for Hebrew text');

config.commands.Hebrew.addMode({
 name:'HebrewNoAlign',
 tooltip:'formats Hebrew text correctly',
 operation:function(){
config.commands.Hebrew.putInPlace("{{hebrewNoAlign{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Hebrew.addMode({
 name:'HebrewRightAlign',
 tooltip:'formats Hebrew text correctly and aligns text to the right',
 operation:function(){
config.commands.Hebrew.putInPlace("{{hebrewRightAlign{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Hebrew.addMode({
 name:'HebIndent1x',
 tooltip:'formats Heb and indents text 1x',
 operation:function(){
config.commands.Hebrew.putInPlace("{{hebAlignAndIndent{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Hebrew.addMode({
 name:'HebIndent2x',
 tooltip:'formats Heb and indents text 2x',
 operation:function(){
config.commands.Hebrew.putInPlace("{{hebAlignAndIndent{{{hebAlignAndIndent{"+TWkd.context.selection.content+"}}}}}}",TWkd.context.selection);
 }
});
config.commands.Hebrew.addMode({
 name:'HebIndent3x',
 tooltip:'formats Heb and indents text 3x',
 operation:function(){
config.commands.Hebrew.putInPlace("{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{"+TWkd.context.selection.content+"}}}}}}}}}",TWkd.context.selection);
 }
});
config.commands.Hebrew.addMode({
 name:'HebIndent4x',
 tooltip:'formats Heb and indents text 4x',
 operation:function(){
config.commands.Hebrew.putInPlace("{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{"+TWkd.context.selection.content+"}}}}}}}}}}}}",TWkd.context.selection);
 }
});
config.commands.Hebrew.addMode({
 name:'HebIndent5x',
 tooltip:'formats Heb and indents text 5x',
 operation:function(){
config.commands.Hebrew.putInPlace("{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{{{hebAlignAndIndent{"+TWkd.context.selection.content+"}}}}}}}}}}}}}}}",TWkd.context.selection);
 }
});
//}}}
/***
This is an adaptation of:
|!''Name:''|!easyFormat|
|''Description:''|the format command format selection according to your choice|
|''Version:''|0.1.0|
|''Date:''|13/01/2007|
|''Source:''|[[TWkd|http://yann.perrin.googlepages.com/twkd.html#easyFormat]]|
|''Author:''|[[Yann Perrin|YannPerrin]]|
|''License:''|[[BSD open source license]]|
|''~CoreVersion:''|2.x|
|''Browser:''|Firefox 1.0.4+; Firefox 1.5; InternetExplorer 6.0|
|''Requires:''|@@color:red;''E.A.S.E''@@|
***/
//{{{
config.commands.Highlighting = new TWkd.Ease('Highlighting','highlight selected text with a color chosen from the list');
config.commands.Highlighting.addMode({
 name:'Red',
 tooltip:'highlights selection red',
 operation:function(){
config.commands.Highlighting.putInPlace("@@bgcolor(#ff6666):"+TWkd.context.selection.content+"@@",TWkd.context.selection);
 }
});
config.commands.Highlighting.addMode({
 name:'Blue',
 tooltip:'highlights selection blue',
 operation:function(){
config.commands.Highlighting.putInPlace("@@bgcolor(#ccccff):"+TWkd.context.selection.content+"@@",TWkd.context.selection);
 }
});
config.commands.Highlighting.addMode({
 name:'Yellow',
 tooltip:'highlights selection yellow',
 operation:function(){
config.commands.Highlighting.putInPlace("@@"+TWkd.context.selection.content+"@@",TWkd.context.selection);
 }
});
config.commands.Highlighting.addMode({
 name:'Green',
 tooltip:'highlights selection green',
 operation:function(){
config.commands.Highlighting.putInPlace("@@bgcolor(#99ff99):"+TWkd.context.selection.content+"@@",TWkd.context.selection);
 }
});
config.commands.Highlighting.addMode({
 name:'Brown',
 tooltip:'highlights selection brown',
 operation:function(){
config.commands.Highlighting.putInPlace("@@bgcolor(#cc9966):"+TWkd.context.selection.content+"@@",TWkd.context.selection);
 }
});
config.commands.Highlighting.addMode({
 name:'Grey',
 tooltip:'highlight selection',
 operation:function(){
config.commands.Highlighting.putInPlace("@@bgcolor(#cccc99):"+TWkd.context.selection.content+"@@",TWkd.context.selection);
 }
});
config.commands.Highlighting.addMode({
 name:'Orange',
 tooltip:'turns selection into unicode text, for Greek characters',
 operation:function(){
config.commands.Highlighting.putInPlace("@@bgcolor(#ff9933):"+TWkd.context.selection.content+"@@",TWkd.context.selection);
 }
});
//}}}
/***An adaptation of [[easyFormat]]***/
//{{{
config.commands.Indent = new TWkd.Ease('Indent','indents selected text as a blockquote');

config.commands.Indent.addMode({
 name:'Indent1x',
 tooltip:'indents text 1x',
 operation:function(){
config.commands.Indent.putInPlace("{{engindent{"+TWkd.context.selection.content+"}}}",TWkd.context.selection);
 }
});
config.commands.Indent.addMode({
 name:'Indent2x',
 tooltip:'indents text 2x',
 operation:function(){
config.commands.Indent.putInPlace("{{engindent{{{engindent{"+TWkd.context.selection.content+"}}}}}}",TWkd.context.selection);
 }
});
config.commands.Indent.addMode({
 name:'Indent3x',
 tooltip:'indents text 3x',
 operation:function(){
config.commands.Indent.putInPlace("{{engindent{{{engindent{{{engindent{"+TWkd.context.selection.content+"}}}}}}}}}",TWkd.context.selection);
 }
});
config.commands.Indent.addMode({
 name:'Indent4x',
 tooltip:'indents text 4x',
 operation:function(){
config.commands.Indent.putInPlace("{{engindent{{{engindent{{{engindent{{{engindent{"+TWkd.context.selection.content+"}}}}}}}}}}}}",TWkd.context.selection);
 }
});
config.commands.Indent.addMode({
 name:'Indent5x',
 tooltip:'indents text 5x',
 operation:function(){
config.commands.Indent.putInPlace("{{engindent{{{engindent{{{engindent{{{engindent{{{engindent{"+TWkd.context.selection.content+"}}}}}}}}}}}}}}}",TWkd.context.selection);
 }
});
//}}}
/***An adaptation of [[easyFormat]]***/
//{{{
config.commands.Notes = new TWkd.Ease('Notes','add notes and popups');

config.commands.Notes.addMode({
 name:'Syntax',
 tooltip:'adds syntax note',
 operation:function(){
config.commands.Notes.putInPlace("((syntax(add note here)))",TWkd.context.selection);
 }
});
config.commands.Notes.addMode({
 name:'Translation',
 tooltip:'adds syntax note',
 operation:function(){
config.commands.Notes.putInPlace("&#149; ((translation(add note here)))",TWkd.context.selection);
 }
});
config.commands.Notes.addMode({
 name:'Text',
 tooltip:'adds textual note',
 operation:function(){
config.commands.Notes.putInPlace("&#149; ((text(add note here)))",TWkd.context.selection);
 }
});
config.commands.Notes.addMode({
 name:'Gramm.',
 tooltip:'adds grammatical note',
 operation:function(){
config.commands.Notes.putInPlace("&#149; ((gram(add note here)))",TWkd.context.selection);
 }
});
config.commands.Notes.addMode({
 name:'Popup',
 tooltip:'adds popup note to selected text',
 operation:function(){
config.commands.Notes.putInPlace("(("+TWkd.context.selection.content+"(add note here)))",TWkd.context.selection);
 }
});
//}}}
/***An adaptation of [[easyFormat]]***/
//{{{
config.commands.Tableheader = new TWkd.Ease('Tableheader','add the header row for a formatted table');

config.commands.Tableheader.addMode({
 name:'Invisible',
 tooltip:'adds the header row for a 3-column invisible table',
 operation:function(){
config.commands.Tableheader.putInPlace("XXXXX",TWkd.context.selection);
 }
});

config.commands.Tableheader.addMode({
 name:'Sortable',
 tooltip:'adds the header row for a 3-column sortable table',
 operation:function(){
config.commands.Tableheader.putInPlace("|sortable|k||||h",TWkd.context.selection);
 }
});

config.commands.Tableheader.addMode({
 name:'Standard',
 tooltip:'adds the header row for a 3-column standard table',
 operation:function(){
config.commands.Tableheader.putInPlace("|!|!|!|",TWkd.context.selection);
 }
});

//}}}
/***An adaptation of [[easyFormat]]***/
//{{{
config.commands.Tables = new TWkd.Ease('Tables','add preformatted empty tables');

config.commands.Tables.addMode({
 name:'Invisible',
 tooltip:'adds borderless table',
 operation:function(){
config.commands.Tables.putInPlace("{{invisiblecomm{\n|!|!|!|\n||||\n||||\n||||\n}}}",TWkd.context.selection);
 }
});
config.commands.Tables.addMode({
 name:'Sortable',
 tooltip:'adds a sortable table',
 operation:function(){
config.commands.Tables.putInPlace("|sortable|k\n||||h\n||||\n||||\n||||",TWkd.context.selection);
 }
});
config.commands.Tables.addMode({
 name:'Standard',
 tooltip:'adds a standard table',
 operation:function(){
config.commands.Tables.putInPlace("|!|!|!|\n||||\n||||\n||||",TWkd.context.selection);
 }
});
config.commands.Tables.addMode({
 name:'light gray cell',
 tooltip:'inserts a light gray color code into a table cell',
 operation:function(){
config.commands.Tables.putInPlace("bgcolor(#eeeeee):",TWkd.context.selection);
 }
});
config.commands.Tables.addMode({
 name:'dark gray cell',
 tooltip:'inserts a dark gray color code into a table cell',
 operation:function(){
config.commands.Tables.putInPlace("bgcolor(#cccccc):",TWkd.context.selection);
 }
});
//}}}
!!foreach
Loop control for iteratively executing a code block with an incremented control variable.

Full form: declares 3 states, 1) start variable, $i = 0,  2) test condition, $i < X, and 3) iteration action, $i += 1. 
Literal translation: for every line in "code block", start with $i equal to zero, execute instructions, then when done, add 1 to the value of $i, test to see if $i is less than X, and if this is true, repeat the code block with the new value of $i. If $i >= X, then skip the code block and continue with the next lines in the program.  
{{{
for (my $i = 0; $i < X; $i += 1)
{ . . . . . code . . . . . }
}}}

Shorthand for numeric:
{{{
foreach my $i (0..X)
{ . . . . code . . . . }
}}}

Shorthand for non-numeric
{{{
foreach my $value (@ValueArray)
{ . . . . code . . . . }
}}}

[[BACK|Commands]]
!
Background: #fff
Foreground: #000
PrimaryPale: #eeeeee
PrimaryLight: #eeeeee
PrimaryMid: #666666
PrimaryDark: #014
SecondaryPale: #cccccc
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #666666
TertiaryPale: #eee
TertiaryLight: #ccc
TertiaryMid: #999
TertiaryDark: #bbbbbb
Error: #f88
Background: #fff
Foreground: #000
PrimaryPale: #ddeeaa
PrimaryLight: #ddeeaa
PrimaryMid: #666633
PrimaryDark: #014
SecondaryPale: #bbdd88
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #666633
TertiaryPale: #eee
TertiaryLight: #ccc
TertiaryMid: #999
TertiaryDark: #aacc88
Error: #f88
!TABLE HASH

An array is a linear vector of data where elements are accessed by the index number of their position in the array. 
{{{
my @inverts = qw | Cnidarians Ophiuroids Kinorhynchs |;
}}}
So $inverts[2] equals "Kinorhynchs". Indexed arrays are quick and easy to execute, but can be awkward at times to keep track of indexed positions of specific values.

In bioinformatics, you will often have large arrays of different values that you will want to process in parallel. A good example of this are the @NAME and @SEQS arrays in the FASTAreader script. If you wanted to sort the gene NAMES, you would also have to sort the SEQS array in order to keep the element numbers equivalent.

PERL uses ''associative arrays'' or ''data tables'' or ''hash arrays'' to keep track of data like this as one unit. Instead of using a number in [] to index an array value, a variable in {} is used to do the same job. If I define:
{{{
my %DataTable;
$DataTable{"Gene127456"} = "acgggtcgagatcgcgcgtatatgaga";
}}}
I can retrieve that sequence at anytime just by specifying {{{$DataTable{"Gene127456"}}}}

In the FASTAtranslate script, we have used a data table or hash to store the codon information for each amino acid:
<html><img src="03/codonhash.png" style="height:50px"></html>

!!
[[BACK|L03]]
!



!
!IF

Logical comparison. If the comparison is TRUE, then the code block is executed. If it is false, then script execution jumps to the next code block.
{{{
if ($i > 1000)
{ . . .  code . . . . }

or

if ($i > 1000)
{ . . .  code1 . . . . }
else
{ . . .  code2 . . . . }

or 

if ($i > 1000)
{ . . .  code1 . . . . }
elsif ($i > 10000)
{ . . .  code2 . . . . }
else
{ . . . code3 . . . . }

}}}

[[BACK|Commands]]
!
!LOGICAL OPERATORS

These are the operators used to test logical expressions:
<html><table>
<tr><td> Symbol </td><td> Operation </td></tr>
<tr><td> == </td><td> numeric equality </td></tr>
<tr><td> > </td><td> numeric ascendency (greater than) </td></tr>
<tr><td> < </td><td> numeric descendency (less than) </td></tr>
<tr><td> <= </td><td> less than or equal to </td></tr>
<tr><td> >= </td><td> greater than or equal to </td></tr>
<tr><td> != </td><td> numeric inequality (does not equal) </td></tr>
<tr><td> eq </td><td> alphanumeric equality (use with strings) </td></tr>
<tr><td> neq </td><td> alphanumeric inequality (use with strings) </td></tr>
</table></html>

[[BACK|Commands]]
!
!OPEN file I/O handle

The ''open'' statement initializes an input/output buffer stream. It is used primarily to read files and to write files.

READING: //note the direction of the @@<@@ operator//
{{{
open(HandleName,"<filename");
my @FILE = <HandleName>;
}}}

WRITING: //note the direction of the @@>@@ operator//
{{{
open(HandleName,">filename");   # creates new file, deleting contents of any existing file
print HandleName "This text is now being saved to file . . . .\n";

OR

open(HandleName,">>filename");  # appends text to end of existing file
print HandleName "This text is now being added to file . . . .\n";
}}}

[[BACK|Commands]]
!
!REGEX Pattern Matching
[[BACK to Commands|Commands]]
!!!
The power of PERL lies within what are called its "regular pattern matching expressions" or @@''regex''@@ functions. Here's a look at just the basic kinds of things regex processing can do:

# ''Simple matching:''
## @@{{{$variable =~ m/$x/;}}}@@
## logical comparison equivalent to: 
### Does //__$variable__// contain //__$x__//
### returns TRUE or FALSE
### often used as: @@{{{ if($variable =~ m/$x/) }}}@@
## the ''m'' flag is optional, just for "human" clarity
## this works the same: @@{{{ $variable =~ /$x/ }}}@@
## //__$x__// can be a single character or string expression
# ''Simple switching:''
## @@{{{$variable =~ s/$x/$y/;}}}@@
## executed statement to alter the value of //__$variable__//
## if //__$x__// is found, it will be replaced by //__$y__//
## again the ''s'' is optional
# ''SPECIAL OPERATORS:''
## These characters are @@HOT@@, meaning that PERL first looks at them in terms of executing a function and not as just an ascii character. In order to use these symbols as straight characters, they have to be "dereferenced" by prepending a back-slash "\". 
### Example: the "^" symbol means "starts with" so "^xxx" means look for "xxx" at the very beginning
### to look for "^" as a character the expression would be "\^xxx" which means look for "^xxx" anywhere
## @@''^''@@ : match at the beginning of $variable
## @@''.''@@ : wildcard, match any character
## @@''$''@@ : match at the end of $variable
## @@''('' ... ''|'' ... '')''@@ : grouping of match possibilities
### @@{{{s/(ALA|GLY|SER)/THR/; }}}@@
### find ALA or GLY or SER and replace with THR
## @@''['' ... '']''@@ : single charcter groupings
### @@{{{ s/[AGS]/T/;}}}@@
### find A or G or S and replace with T
# ''SPECIAL QUANTIFIERS:''
## @@''*''@@ : match 0 or more times => {{{ m/x*/; or m/(xyz)*/;}}}
## @@''+''@@ : match 1 or more times => {{{ m/x+/; or m/(xyz)+/;}}}
## @@''?''@@ : match 1 or 0 times => {{{ m/x?/; or m/(xyz)?/;}}}
## @@''{n}''@@ : match exactly ''n'' times => {{{ m/x{n}/; or m/(xyz){n}/;}}}
## @@''{n,}''@@ : match at least ''n'' times => {{{ m/x{n,}/; or m/(xyz){n,}/;}}}
## @@''{n,m}''@@ : match at least ''n'' times but less than ''m'' times

[[BACK|Commands]]
!
!PRINT

Statement directs output to the specified I/O device. The default is to use the I/O handle "STD" which directs output to the screen device. To print to a file, use the I/O handle that you have declared for a specific file name. See @@[[open]]@@.
{{{
open(HandleName,">filename");   # creates new file, deleting old
print HandleName "This text is now being saved to file . . . .\n";
}}}
[[BACK|Commands]]
!
Background: #fff
Foreground: #000
PrimaryPale: #ddccff
PrimaryLight: #ddccff
PrimaryMid: #5500aa
PrimaryDark: #014
SecondaryPale: #ddbbff
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #5500aa
TertiaryPale: #eee
TertiaryLight: #ccc
TertiaryMid: #999
TertiaryDark: #ccaaff
Error: #f88
!QW
the "qw" function stands for "quoted words" and is a shorthand method for entering string variables. If you wanted to have an array named @Nucleotides with the values "A", "G", "T", "C", the formal declaration would be:
@@{{{  my @Nucleotides = [ "A", "G", "T", "C"]; }}}@@
It's tiresome to keep typing all the punctuation so the ''qw'' function tells the interpreter that the following items between the delimiters "|" should each be surrounded by quotes:
@@{{{  my @Nucleotides = qw | A G T C |; }}}@@

[[BACK|Commands]]
!
<?php
/***
! User settings
Edit these lines according to your need
***/
//{{{
$AUTHENTICATE_USER = true;	// true | false
$USERS = array(
	'UserName1'=>'Password1', 
	'UserName2'=>'Password2', 
	'UserName3'=>'Password3'); // set usernames and strong passwords
$DEBUG = false;				// true | false
$CLEAN_BACKUP = true; 		// during backuping a file, remove overmuch backups
$FOLD_JS = true; 			// if javascript files have been expanded during download the fold them
error_reporting(E_ERROR | E_WARNING | E_PARSE);
//}}}
/***
!Code
No change needed under
***/
//{{{

/***
 * store.php - upload a file in this directory
 * version :1.6.1 - 2007/08/01 - BidiX@BidiX.info
 * 
 * see : 
 *	http://tiddlywiki.bidi.info/#UploadPlugin for usage
 *	http://www.php.net/manual/en/features.file-upload.php 
 *		for details on uploading files
 * usage : 
 *	POST  
 *		UploadPlugin[backupDir=<backupdir>;user=<user>;password=<password>;uploadir=<uploaddir>;[debug=1];;]
 *		userfile <file>
 *	GET
 *
 * each external javascript file included by download.php is change by a reference (src=...)
 *
 * Revision history
 * V1.6.1 - 2007/08/01
 * Enhancement: Add javascript folding
 * V1.6.0 - 2007/05/17
 * Enhancement: Add backup management
 * V1.5.2 - 2007/02/13
 * Enhancement: Add optional debug option in client parameters
 * V1.5.1 - 2007/02/01
 * Enhancement: Check value of file_uploads in php.ini. Thanks to Didier Corbière
 * V1.5.0 - 2007/01/15
 * Correct: a bug in moving uploadFile in uploadDir thanks to DaniGutiérrez for reporting
 * Refactoring
 * V 1.4.3 - 2006/10/17 
 * Test if $filename.lock exists for GroupAuthoring compatibility
 * return mtime, destfile and backupfile after the message line
 * V 1.4.2 - 2006/10/12
 *  add error_reporting(E_PARSE);
 * v 1.4.1 - 2006/03/15
 *	add chmo 0664 on the uploadedFile
 * v 1.4 - 2006/02/23
 * 	add uploaddir option :  a path for the uploaded file relative to the current directory
 *	backupdir is a relative path
 *	make recusively directories if necessary for backupDir and uploadDir
 * v 1.3 - 2006/02/17
 *	presence and value of user are checked with $USERS Array (thanks to PauloSoares)
 * v 1.2 - 2006/02/12 
  *	POST  
 *		UploadPlugin[backupDir=<backupdir>;user=<user>;password=<password>;]
 *		userfile <file>
*	if $AUTHENTICATE_USER
 *		presence and value of user and password are checked with 
 *		$USER and $PASSWORD
 * v 1.1 - 2005/12/23 
 *	POST  UploadPlugin[backupDir=<backupdir>]  userfile <file>
 * v 1.0 - 2005/12/12 
 *	POST userfile <file>
 *
 * Copyright (c) BidiX@BidiX.info 2005-2007
 ***/
//}}}

//{{{

if ($_SERVER['REQUEST_METHOD'] == 'GET') {
	/*
	 * GET Request
	 */
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
	<head>
		<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
		<title>BidiX.info - TiddlyWiki UploadPlugin - Store script</title>
	</head>
	<body>
		<p>
		<p>store.php V 1.6.1
		<p>BidiX@BidiX.info
		<p>&nbsp;</p>
		<p>&nbsp;</p>
		<p>&nbsp;</p>
		<p align="center">This page is designed to upload a <a href="http://www.tiddlywiki.com/">TiddlyWiki<a>.</p>
		<p align="center">for details see : <a href="http://TiddlyWiki.bidix.info/#HowToUpload">TiddlyWiki.bidix.info/#HowToUpload<a>.</p>	
	</body>
</html>
<?php
exit;
}

/*
 * POST Request
 */
	 
// Recursive mkdir
function mkdirs($dir) {
	if( is_null($dir) || $dir === "" ){
		return false;
	}
	if( is_dir($dir) || $dir === "/" ){
		return true;
	}
	if( mkdirs(dirname($dir)) ){
		return mkdir($dir);
	}
	return false;
}

function toExit() {
	global $DEBUG, $filename, $backupFilename, $options;
	if ($DEBUG) {
		echo ("\nHere is some debugging info : \n");
		echo("\$filename : $filename \n");
		echo("\$backupFilename : $backupFilename \n");
		print ("\$_FILES : \n");
		print_r($_FILES);
		print ("\$options : \n");
		print_r($options);
}
exit;
}

function ParseTWFileDate($s) {
	// parse date element
	preg_match ( '/^(\d\d\d\d)(\d\d)(\d\d)\.(\d\d)(\d\d)(\d\d)/', $s , $m );
	// make a date object
	$d = mktime($m[4], $m[5], $m[6], $m[2], $m[3], $m[1]);
	// get the week number
	$w = date("W",$d);

	return array(
		'year' => $m[1], 
		'mon' => $m[2], 
		'mday' => $m[3], 
		'hours' => $m[4], 
		'minutes' => $m[5], 
		'seconds' => $m[6], 
		'week' => $w);
}

function cleanFiles($dirname, $prefix) {
	$now = getdate();
	$now['week'] = date("W");

	$hours = Array();
	$mday = Array();
	$year = Array();
	
	$toDelete = Array();

	// need files recent first
	$files = Array();
	($dir = opendir($dirname)) || die ("can't open dir '$dirname'");
	while (false !== ($file = readdir($dir))) {
		if (preg_match("/^$prefix/", $file))
        array_push($files, $file);
    }
	$files = array_reverse($files);
	
	// decides for each file
	foreach ($files as $file) {
		$fileTime = ParseTWFileDate(substr($file,strpos($file, '.')+1,strrpos($file,'.') - strpos($file, '.') -1));
		if (($now['year'] == $fileTime['year']) &&
			($now['mon'] == $fileTime['mon']) &&
			($now['mday'] == $fileTime['mday']) &&
			($now['hours'] == $fileTime['hours']))
				continue;
		elseif (($now['year'] == $fileTime['year']) &&
			($now['mon'] == $fileTime['mon']) &&
			($now['mday'] == $fileTime['mday'])) {
				if (isset($hours[$fileTime['hours']]))
					array_push($toDelete, $file);
				else 
					$hours[$fileTime['hours']] = true;
			}
		elseif 	(($now['year'] == $fileTime['year']) &&
			($now['mon'] == $fileTime['mon'])) {
				if (isset($mday[$fileTime['mday']]))
					array_push($toDelete, $file);
				else
					$mday[$fileTime['mday']] = true;
			}
		else {
			if (isset($year[$fileTime['year']][$fileTime['mon']]))
				array_push($toDelete, $file);
			else
				$year[$fileTime['year']][$fileTime['mon']] = true;
		}
	}
	return $toDelete;
}

function replaceJSContentIn($content) {
	if (preg_match ("/(.*?)<!--DOWNLOAD-INSERT-FILE:\"(.*?)\"--><script\s+type=\"text\/javascript\">(.*)/ms", $content,$matches)) {
		$front = $matches[1];
		$js = $matches[2];
		$tail = $matches[3];
		if (preg_match ("/<\/script>(.*)/ms", $tail,$matches2)) {		
			$tail = $matches2[1];
		}
		$jsContent = "<script type=\"text/javascript\" src=\"$js\"></script>";
		$tail = replaceJSContentIn($tail);
		return($front.$jsContent.$tail);
	}
	else
		return $content;
}

// Check if file_uploads is active in php config
if (ini_get('file_uploads') != '1') {
   echo "Error : File upload is not active in php.ini\n";
   toExit();
}

// var definitions
$uploadDir = './';
$uploadDirError = false;
$backupError = false;
$optionStr = $_POST['UploadPlugin'];
$optionArr=explode(';',$optionStr);
$options = array();
$backupFilename = '';
$filename = $_FILES['userfile']['name'];
$destfile = $filename;

// get options
foreach($optionArr as $o) {
	list($key, $value) = split('=', $o);
	$options[$key] = $value;
}

// debug activated by client
if ($options['debug'] == 1) {
	$DEBUG = true;
}

// authenticate User
if (($AUTHENTICATE_USER)
	&& ((!$options['user']) || (!$options['password']) || ($USERS[$options['user']] != $options['password']))) {
	echo "Error : UserName or Password do not match \n";
	echo "UserName : [".$options['user']. "] Password : [". $options['password'] . "]\n";
	toExit();
}



// make uploadDir
if ($options['uploaddir']) {
	$uploadDir = $options['uploaddir'];
	// path control for uploadDir   
    if (!(strpos($uploadDir, "../") === false)) {
        echo "Error: directory to upload specifies a parent folder";
        toExit();
	}
	if (! is_dir($uploadDir)) {
		mkdirs($uploadDir);
	}
	if (! is_dir($uploadDir)) {
		echo "UploadDirError : $uploadDirError - File NOT uploaded !\n";
		toExit();
	}
	if ($uploadDir{strlen($uploadDir)-1} != '/') {
		$uploadDir = $uploadDir . '/';
	}
}
$destfile = $uploadDir . $filename;

// backup existing file
if (file_exists($destfile) && ($options['backupDir'])) {
	if (! is_dir($options['backupDir'])) {
		mkdirs($options['backupDir']);
		if (! is_dir($options['backupDir'])) {
			$backupError = "backup mkdir error";
		}
	}
	$backupFilename = $options['backupDir'].'/'.substr($filename, 0, strrpos($filename, '.'))
				.date('.Ymd.His').substr($filename,strrpos($filename,'.'));
	rename($destfile, $backupFilename) or ($backupError = "rename error");
	// remove overmuch backup
	if ($CLEAN_BACKUP) {
		$toDelete = cleanFiles($options['backupDir'], substr($filename, 0, strrpos($filename, '.')));
		foreach ($toDelete as $file) {
			$f = $options['backupDir'].'/'.$file;
			if($DEBUG) {
				echo "delete : ".$options['backupDir'].'/'.$file."\n";
			}
			unlink($options['backupDir'].'/'.$file);
		}
	}
}

// move uploaded file to uploadDir
if (move_uploaded_file($_FILES['userfile']['tmp_name'], $destfile)) {
	if ($FOLD_JS) {
		// rewrite the file to replace JS content
		$fileContent = file_get_contents ($destfile);
		$fileContent = replaceJSContentIn($fileContent);
		if (!$handle = fopen($destfile, 'w')) {
	         echo "Cannot open file ($destfile)";
	         exit;
	    }
	    if (fwrite($handle, $fileContent) === FALSE) {
	        echo "Cannot write to file ($destfile)";
	        exit;
	    }
	    fclose($handle);
	}
    
	chmod($destfile, 0644);
	if($DEBUG) {
		echo "Debug mode \n\n";
	}
	if (!$backupError) {
		echo "0 - File successfully loaded in " .$destfile. "\n";
	} else {
		echo "BackupError : $backupError - File successfully loaded in " .$destfile. "\n";
	}
	echo("destfile:$destfile \n");
	if (($backupFilename) && (!$backupError)) {
		echo "backupfile:$backupFilename\n";
	}
	$mtime = filemtime($destfile);
	echo("mtime:$mtime");
} 
else {
	echo "Error : " . $_FILES['error']." - File NOT uploaded !\n";

}
toExit();
//}}}
?>
!Compartmentalize Code

Most code that you will be dealing with will be compartmentalized into subroutines. This is just a way to move code blocks from one section of the script to another. For the FASTAread program, most of the code int he script WAS just opening and processing the input file. That's not going to change, so whenever we open the script to work on it, we don't need to be looking at that code. So we define a subroutine at that position in the code logic called ''ReadFasta'' and we will pass to that subroutine the variable ''$infile''.
<html><img src="03/Task1sub.png" style="height:75px"></html>

We then define ''ReadFasta'' at the end of the script using the subroutine syntax:
{{{
sub SubroutineName
{ . . . . code . . . . }
}}}

Now the only tricky thing here is that we want ''ReadFasta'' to execute with the filename in $infile, so we have to extract that information. By default, this variable is stored in a special PERL array referenced simply as ''$_''. Since we are sending the subroutine only 1 element in the array, $_[0] has the value of $infile:
{{{
sub ReadFasta
{      my $file = $_[0];
	$/=">";
       . . . . code . . . . 
}
}}}

[[BACK|L03]]
!
!SUBSTR

''SUBSTRING'': extracts a portion of a string variable.

command form:  {{{substr(variable, start position, number of characters)}}}

example:
{{{
my $football = "Tampa Bay Bucs";
my $mascot = substr($football,11,4);
print $mascot;   # $mascot now contains the string "Bucs"
}}}
[[BACK|Commands]]
!
<html>
<div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
TITLE OR HEADER OR DESCRIPTOR . . . .. 
</html>
1. [[x |1.1.1]]
2. [[y |1.1.2]]
3. [[z |1.1.3]]
!!
<html>
<div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
TITLE OR HEADER OR DESCRIPTOR . . . .. 
</html>
1. [[x |1.2.1]]
2. [[y |1.2.2]]
3. [[z |1.2.3]]
!!
<html>
<div style="color: rgb(100, 100, 150); font-family: Monaco;"><big><b>
TITLE OR HEADER OR DESCRIPTOR . . . .. 
</html>
1. [[x |1.3.1]]
2. [[y |1.3.2]]
3. [[z |1.3.3]]
!
{{tableindex{
|[[Lecture Index]]|[[L01]]|[[L02]]|[[L03]]|[[L04]]|[[L05]]|[[L06]]|[[L07]]|[[L08]]|[[L09]]|[[L10]]|[[L11]]|[[L12]]|
}}}
<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='topic-01' macro='tiddler topic-01SubtopicMenu'></div><div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div><div class='tagClear'></div>
<!--}}}-->
{{tableindex{
|[[Resource Index]]|[[Text Book]]|[[PERL]]|[[Running PERL]]|[[Commands]]|[[Editors]]|[[Biowolf]]|[[SSH]]|
}}}
<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='topic-02' macro='tiddler topic-02SubtopicMenu'></div><div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div><div class='tagClear'></div>
<!--}}}-->
{{tableindex{
|[[Subject01]]|[[Subject02]]|[[subtopic3]]|
}}}
<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='topic1' macro='tiddler topic1SubtopicMenu'></div>
<div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div>
<div class='tagClear'></div>
<!--}}}-->
!Variable types

In general, simple variables are declared by prefixing the variable name with either: 
# ''$'' = single element (string or number)
# ''@'' = multiple elements numerically indexed (array)
# ''%'' = multiple elements with associative indexing  (hash)
You declare variables before you use them so the computer knows that it is going to have to use some memory space for those values when the code is running. The declarative statements are:
{{{
my $simplestring = "This sentence is stored as a single character string";
my @String = ["These", "words", "are", "stored", "individually"];
my %StringHash;
       $StringHash{"first"} = "These";
       $StringHash{"second"} = "words";
       $StringHash{"third"} = "are";
       $StringHash{"fourth"} = "stored";
       $StringHash{"fifth"} = "with";
       $StringHash{"sixth"} = "separate";
       $StringHash{"seventh"} = "keys";
}}}

Note that the declarations use the special symbol to identify the type of variable collection it is going to be (element, array, hash). BUT when you want to access a single value of any type, the ''$'' character is used to tell that interpreter that just one element is going to be used in that code line. So . . . 
* $String[2] is equal to "words"
* $~StringHash{"fourth"} is equal to "stored"

The idea of a hash might seem odd at first, but we use them all the time to group related pieces of information:
{{{
       $GENE{"name"} = "Ornithine Decarboxylase";
       $GENE{"seq"} = "MGTSDWKLVIVIHAGIR........";
       $GENE{"organism"} = "Sterechinus neumayeri";
       $GENE{"introns"} = 6;
       $GENE{"%GC"} = 0.56;
}}}


[[BACK|Commands]]
!
<!--{{{-->
<div class='toolbar' macro='toolbar closeTiddler closeOthers +editTiddler > fields syncing permalink references jump'></div>
<div class='webview' macro='tiddler webviewindex'></div>
<div class='title' macro='view title'></div>
<div class='viewer' macro='view text wikified'></div>
<div class='tagClear'></div>
<!--}}}-->
{{tableindex{
|[[Welcome|Welcome to the Webview TiddlyWiki]]|[[Instructions]]|[[Subtopic menu instructions]]|
}}}
!WHILE

Loop control statement like [[for]] loop, but without an iterative internal variable.

Simple Form: repeat code block "while" test condition is true.
{{{
my $FLAG = 1;
while ($FLAG == 1)
{      code
       code
       code
       if ($DONE eq "TRUE")
       {    $FLAG = 0; }
} 
}}}

Note that there has to be a code step in the loop itself to modify the test variable (here $FLAG) so that the loop doesn't repeat in an endless cycle. WHILE loops need a built in exit signal to know when they are done. 

[[BACK|Commands]]
!
config.options.chkSaveBackups = false;
config.options.chkEnableAnimations = false;
config.options.chkShowRightSidebar= false;
config.options.chkSinglePageMode= true;
config.options.chkSinglePagePermalink= false;