A Compression Program for Chemical, Biological, and
By Bradley S. Tice
Advanced Human Design, P.O. Box 3868, Turlock, California 95381 U.S.A.
The paper will introduce a compression algorithm that will use based number systems beyond the fundamental standard of the traditional binary, or radix 2, based system in use today. A greater level of compression is noted in these radix based number systems when compared to the radix 2 base as applied to a sequential strings of various information. The application of this compression algorithm to both random and non-random sequences for compression will be reviewed in this paper. The natural sciences and engineering applications will be areas covered in this paper.
Keywords: Compression Algorithm, Chemistry, Biology, and Nanotechnology
A binary, or radix 2 based, system is defined as two separate characters, or symbols, that have no semantic meaning apart from not representing the other character. This is the same notion Shaimon gave to the binary based system upon it’s publication in 1948 . This paper will present research that shows how various radix based number systems have a compression value greater than the traditional radix 2 based system as in use today . The compression algorithm will be used to compress various random and non-random sequences. The work has applications in theoretical and applied natural sciences and engineering.
The earliest definition for randomness in a string of 1 ’s and 0’s was defined by von Mises, but it was Martin-Lof s paper of 1966 that gave a measure to randomness by the patternlessness of a sequence of 1 ’s and 0’s in a string that could be used to define a random binary sequence in a string [3 and 4]. A non-random string will be able to compress, were as a random string of characters will not be able to compress. This is the classical measure for Kolmogorov complexity, also known as Algorithmic Information Theory, of the randomness of a sequence found in a binary string.
3. Compression Program
The compression program to be used has been termed the Modified Symbolic Space Multiplier Program as it simply notes the first character in a line of characters in a binary sequence of a string and subgroups them into common or like groups of similar characters, all l’s grouped with l’s and all 0’s grouped with 0’s, in that string and is assigned a single character notation that represents the number found in that sub-group, so that it can be reduced, compressed, and decompressed, expanded, back to it’s original length and form . An underlined 1 or 0 is usually used to note the notation symbol for the placement and character type in previous applications of this program. The underlined initial character to be compressed will be used for this paper.
4. Application of Theory
The compression algorithm will be used for the following radix based number systems: Radix 6, Radix 8, Radix 10, radix 12 and radix 16. These are traditional radix base numbers from the field of computer science and have strong applications to other fields of science and engineering due to the parsimonious nature of these low digit radix base number systems . The compression algorithm in this paper can be both a ‘universal’ compression engine in that all members of a sequence, either random or non-random, can be compressed or a ‘specific’ compression engine that compresses only specific types of sub-groups within a random or non-random string of a sequence.
The compression algorithm will be defined by the following properties:
1. ) Starting at the far left of the string, the begiiming, and moving to the right, towards the end of the string.
2. ) Each sub-group of common characters, including singular characters, will be grouped into common sub-groups and marked accordingly.
3. ) The notation for marking each sub-group will be underling the initial character of that common sub-group. The remaining common characters in that marked sub-group will be removed. This results in a compressed sequential string.
4. ) De-compression of the compressed string is the reverse process with complete position and character count to the original pre-compressed sequential string.
5. ) This will be the same processes for both random and non-random sequential strings.
Chemistry is the science of the structure, the properties and the composition of matter and it’s changes .
A polymer is macromolecule, large molecule, made up of repeating structural segments usually connected by covalent chemical bonds .
A copolymer, also known as a heteropolymer, is a polymer derived from two or more monomers .
Types of Copolymers;
1.) Alternating Copolymers: Regular alternating A and В units.
2.) Periodic Copolymers: A and В units arranged in a repeating sequence.
3.) Statistical Copolymers: Random sequences.
4.) Block Copolymers: Made up of two or more homopolymer subunits joined by covalent bonds.
5.) Stereoblock Copolymer: A structure formed from a monomer.
An example of the use of a compression algorithm on copolymers is as follows: