Lade Inhalt...

Algorithmic Complexity and Plant Genetics

Forschungsarbeit 2014 5 Seiten

Biologie - Genetik / Gentechnologie

Leseprobe

ABSTRACT

The paper will present a compression program algorithm that will compress sequential strings of plant DNA and RNA for storage and transmission of plant genetic information. The need for compression of plant genetic data will be examined in both a theoretical manner and a practical manner in regards to large data pools of plant genetics information, Big Data, and genetic code space saving techniques for applied plant genetics.

Introduction

The paper will present an algorithm program that can compress random and non-random sequential strings that can be applied to plant genetics. Both plant DNA and RNA can be compressed from the original structures genomic length. This has a direct application for storage and transmission of large data pools of agriculturally important plant stock genetics information. The algorithm used for the compression and de-compression of plant genetic information was discovered by the author in 1998 and is the most accurate and precise measure of randomness known [1].

Compression Algorithm

The algorithm compression program uses the traditional left to right input of a segment of a sequential string of characters, in this case individual genetic DNA or RNA molecules, that is then sub-grouped into like natured characters, that can be compressed into a complete compression, a universal compression, or a ‘specific’ or ‘partial’ compression of the compressed sequential string [2].

In the compression of like natured genetic material on the original sequential string of a plant’s genetic code the resulting plant’s genetic code is reduced, compressed, without either the type or placement of that genetic information being lost. This has plant research and development applications to plant genetics as it allows for space saving techniques to theoretical genetics and practical applications to applied genetics research.

A Compression Algorithm: Some Examples

If a sequential string of binary characters represent a digital representation of a plants genetic code, a theoretical model of that code, representing a translation from the plants original analog genetic code sequence; an alpha symbol system, can be ‘compressed’ for storage and transmission purposes within a digital and computer communications network.

Example A: The following binary sequential string is a composite of a random sequential binary string.

Example A: [1111100010001110001110000010]

If the linear sequential string of [1] and [0] of Example A is separated into sequentially common sub-groups the following will result:

Sub-group of Example A: [11111] + [000]+[1]+[000]+[111]+[000]+[111]+[00000]+[1]+[0]

The non-random features of the sub-groups are those that have a ‘pattern’ to the sequence of [1] or [0] characters such as these sub-groups:

Non-random sub-groups: [000]+[111]+[000]+[111]

Each is a three character sub-grouping of either [1] or [0] and can be compressed as 0101 and notated as a sub-group of the initial character, either a [1] or a [0], of each sub-group composed of the same 3 characters total.

The remaining sub-groups are not as patterned’ as the non-random sub-group and are referred to as random sub-group sequences of a sequential binary string.

The remaining random sub-group sequences are as follows:

Random sub-group: [11111]+[000]+[1]…[00000]+[1]+[0]

The random sub-group can be compressed by notating the number of total like natured digits, either [1] or [0], with a suffix number to denote the total number of characters following the initial digit.

[11111] = [1x5]

[000] = [0x3]

[1] = [1]

[00000] = [0x5]

[1]= [1]

[0] = [0]

The original sequential binary string of Example A was as follows:

Example A: 1111100010001110001110000010

The notated compressed form of Example A is as follows:

Notated Example A: 1x5 0x3 1 0x3 1x3 0x3 1x3 0x5 1 0

The non-notated compressed state is as follows:

Non-notated Example A: 1010101010

A compressed state of 10 characters from the original 28 character length.

Big Data

Because digital storage and transmission are used in large data set collections, the use of the traditional binary format of [0] and [1] are used to transcribe the analog world into the digital world of computing. The vast amounts of plant data makes the need for ‘interpreting’ that data into a comprehensible whole a growing need in the biological sciences [3]. Due to the rich diversity of both natural and engineered plants, the practical problems of gaining insight into all this plant genetic data to form some type of plant genetics ‘information’, information being the resulting product of ‘work’ obtained from the scholarly use of the intellectually ‘neutral’ plant genetic data, let alone the storage and transmission of such amounts of genetic data are overwhelming at best [4] & [5].

[...]

Details

Seiten
5
Jahr
2014
ISBN (eBook)
9783656591474
ISBN (Buch)
9783656591467
Dateigröße
405 KB
Sprache
Englisch
Katalognummer
v268096
Note
Schlagworte
algorithmic complexity plant genetics

Autor

Teilen

Zurück

Titel: Algorithmic Complexity and Plant Genetics