SeqCount

SeqCount is a program that counts the number of short sequences in a set of genomic data. Sequence count is available in both a parallel and sequential versions.

Utilities

The following utilities are available in both c and java:
  • Char2Null.java char2null.c: Convert ascii character based nullomers to nullomer binary format
  • CountNulls.java countnulls.c: Count the number of nullomers in a nullomer file.
  • Difference.java difference.c: Calculate the set difference operation on two nullomer sets.
  • Intersect.java intersect.c: Calculate the set intersection operation on two nullomer sets.
  • Union.java union.c: Calculate the set union operation on two nullomer sets.
  • ViewNulls.java viewnulls.c: print the nullomers contained in a nullomer file to the screen.


  • The c source code is available here
    The java source code is available here
    the documentation for the java source is available here .

    Data

    We have run sequence count on several of the organisms found on the NCBI web site. The sequence count and nullomer files can be found at http://trac.boisestate.edu/dna/results/. The general format for file names is: organismNULLS_x, where organism is replaced with the name of the organism, and x is the nullomer length.

    Organism-specific nullomer sets:
  • Length 11-13 DNA sequences on September 1, 2009
  • 3,066,963 absent length 5 (E. coli) amino acid sequences on October 10, 2009
  • 598,360 absent length 5 (Homo sapien) amino acid sequences on October 10, 2009

  • Nullomer Sequences and Probabalistic Rankings
    Date Len5 Protein Len6 Protein Len15 DNA Len16 DNA Len13 DNA
    Coding Region
    Len14 DNA
    Coding Region
    Len14 DNA
    Non-Coding Region
    Len15 DNA
    Non-Coding Region
    June 2009 51 Peptoprimes
    Rankings
    2,506,930 Peptoprimes 410 Nullomers
    July 2009 47 Peptoprimes
    Rankings
    2,418,323 Peptoprimes 358 Nullomers
    August 2009 40 Peptoprimes
    Rankings
    2,400,122 Peptoprimes 338 Nullomers 5,715,832 Nullomers
    September 2009 39 Peptoprimes
    Rankings
    296 Nullomers 5,419,369 Nullomers 37 Nullomers Rankings* 271,890 Nullomers Rankings*
    October 2009 38 Peptoprimes
    Rankings
    282 Nullomers 5,257,985 Nullomers

  • 1023 absent length 7 promoter-region DNA sequences for August 2009


  • We have also run sequence count on all of the nucleotide and protein data found on the NCBI web site (older data):
  • 60,370 absent length 15 DNA sequences
  • 4428 absent length 15 DNA sequences on October 24, 2008

  • 746 absent length 5 amino acid sequences in 2007
  • 80 absent length 5 amino acid sequences in October 2008

  • Applet

    In addition, we have created an applet that gives you access to the data and allows you to apply some set operations to it.

    Note that, due to the exponential increase in the size of the nullomer sets with increasing sequence length, the applet can perform poorly if you ask it to manipulate nullomer sets at the longer sequence lengths. The applet is merely provided as a tool to whet your appetite. If you really want to manipulate these nullomer sets then you should download them to your local box and play with them there.

    The applet requires the java runtime environment, which can be downloaded from here.

    We also recommend that you use the Firefox web browser to view the applet.