Uncertainty
-- Privacy is Key -- |
Uncertainty |
| [ Home ] [ Technical Description ] [ Encryption Strategies ] [ Advancement ] [ Uncertainty ] [ 1996 ] |
[ Back ]
[ Home ]
[ Next ]
|
Scientific or Technological Uncertainty
Overview
We believe that our work on data compression, encryption and data security satisfies the criteria of **‘uncertainty’** for the purposes of SR&ED (Scientific Research and Experimental Development).
For data compression, which is our original goal, achieving the goal is by no means certain. Information theory is a very active field. One of the chief goals is to achieve higher compression ratios and there are many researchers active in this area. The current state of the art does not allow for compression ratios on the order that our techniques would allow. Were there a certain route to achieving higher compression ratios for general compression, research would be following that course. There are convincing theoretical arguments to show that conventional means offer upward limits for data compression. We have some empirical evidence that it is ***possible*** using our techniques to achieve higher compression ratios.
We are essentially ‘prospecting’ in vector space to find suitable transformations which will allow compression ratios which appear to be theoretically unobtainable. The open research question here is whether or not it is possible to find the appropriate transformations to make our ideas a reality.
Data Compression
The goal of data compression is simply stated: to reduce the amount of storage or transmission time required for a given message. This is desirable since compressed data takes less storage space and requires less transmission time. Costs associated with storage and transmission time are proportional to message size. Hence, an increase in compression ratios (all things being equal) result in a decrease in costs.

Many techniques have been proposed and implemented, but for lossless **general** compression the horizon seems to be approximately a **2:1 ratio**. Advances made since we started our work seem to promise some improvement, but any current implementation of lossless compression seems bound by the limits of analytic techniques.
There have been claims that compression techniques outwardly similar to the techniques we propose have achieved compression ratios in excess of 10:1 for lossless compression. However, these have been de-bunked. There is, in fact a simple **‘counting argument’** to show that the consistent compression of truly random data is theoretically impossible.
Our Approach
We know empirically that there exists a mapping from an **‘incompressible’ vector space** such that a source message whose length is **L1** maps to a compressible target message whose length is **L2** via a vector whose length is **L3**. We know empirically that there exist mappings whereby **(L4+L3) < L1**. Furthermore, we know that there are mappings where **(L4+L3)** become arbitrarily small.

It easy to show that it is impossible to compress all streams of size N bits to N-1 bits:
- Assume that the program can compress without loss all files of size ≥ N bits.
- Compress with this program all the $2^N$ files which have exactly N bits.
- All compressed files have at most N-1 bits, so there are at most $(2^N)-1$ different compressed files [$2^{(N-1)}$ files of size N-1, $2^{(N-2)}$ of size N-2, and so on, down to 1 file of size 0].
- So at least two different input files must compress to the same output file. Hence the compression program cannot be lossless. (Much stronger results about the number of incompressible files can be obtained, but the proofs are a little more complex.)
The argument above shows that it is not possible to find a transformation certain to compress any arbitrary stream. However, the task is not to compress arbitrary streams. The task is to compress **‘streams of interest’**. The streams of interest in this case are any data that might likely form a meaningful message.
Our hypothesis is that there exists a ‘reachable’ vector space whereby incompressible data streams of interest map to compressible data streams. If we were to determine that there were ‘only’ $2^{1024}$ (a very large number) streams of interest and found a suitable algorithm, all streams could be tokenized to 128 bytes in length, regardless of the length of the source. This theoretical limit is not likely to be reachable, but **we speculate that by essentially building robots to explore various spaces, we may be able to find suitable spaces which result in gains over current methods of compression.**

At the current time, no technology exists which is capable of finding and exploiting these spaces and indeed it is not even known if they exist in such a way that they lend themselves to a general data compression tool.
Our goal is to create a technology which will allow us to find and exploit vector spaces which yield meaningful compression. Our approach is a technological/empirical one. By building tools to explore the results of transformations we accomplish two things. First, we leverage our chances of stumbling upon promising vector spaces by automating searches using compute time. Second, should other researchers discover appropriate transformations, or the constraints of current technology change, we will have tools available which are ready to exploit the breakthroughs.
In the more than 40 years since the Huffman algorithm was discovered there has been tremendous change in computer science, particularly hardware and software. However, current general compression is little better than it was back then. We believe this is testimony to the difficulty and hence the uncertainty of solving the problem.
| Questions or comments? Send mail to webmaster@hushserver.com. | Copyright 2000 HushServer Division Site was Last modified: January 07, 2000 | ![]() |
-- Privacy is Key --

Comments
Post a Comment