Object storage takes space.
Sometimes the object is large and space is at a premium.
Optimizing access for usage versus storage for reduced cost requires representational transformation.
One type of such is called compression.
Many different methods of compression exist, almost all requiring their own evaluator.
These each individually may be simple to invoke, but become extensively unwieldy as more of them come into use.
In addition to finding a very good compression technique, it would also be useful to have a programmatic interface to tabularize these methods (similar to Jensens Device, thunking). Tuck/untuck for compression and redictuless for representation conformation.
See also “The specification, factoring, defaulting, and overriding of attributes”.
Define, unify , name, specify, enumerate/explicitly list, default, s? tabularize , override/respecify, extensibly(dunseldtore)
2 Table of Contents
5.1 Historical Analysis
22.214.171.124 Archives & Archivers
126.96.36.199.1.1 In a file system
188.8.131.52.1.2 I a directory
184.108.40.206.1.3 In a file
220.127.116.11.1.2 Primitive identity
18.104.22.168.1.3 Vampt: Value, attribute, Mode, Property, Type
22.214.171.124.1.4 Enumeration: Union of primitives
126.96.36.199.1.5 Union: One alternative of many by tag
188.8.131.52.1.6 Structure: List of dissimilars fixed
184.108.40.206.1.7 Set, list, array: Multiple similar, dynamic
220.127.116.11.2.1 Internal versus external
18.104.22.168.2.2 Transportable versus transfixed
22.214.171.124.2.3 Bound versus unbound
126.96.36.199.2.4 Machine versus human
188.8.131.52.2.5 Binary versus text
184.108.40.206.220.127.116.11 Name assign value
18.104.22.168.1 Huffman byte to bit: Pack
22.214.171.124.2 LZ, LZW
126.96.36.199 Archives & Archivers
188.8.131.52.1.1 In a file system
184.108.40.206.1.2 I a directory
220.127.116.11.1.3 In a file
18.104.22.168.1.2 Primitive identity
22.214.171.124.1.3 Vampt: Value, attribute, Mode, Property, Type
126.96.36.199.1.4 Enumeration: Union of primitives
188.8.131.52.1.5 Union: One alternative of many by tag
184.108.40.206.1.6 Structure: List of dissimilars fixed
220.127.116.11.1.7 Set, list, array: Multiple similar, dynamic
18.104.22.168.2.1 Internal versus external
22.214.171.124.2.2 Transportable versus transfixed
126.96.36.199.2.3 Bound versus unbound
188.8.131.52.2.4 Machine versus human
184.108.40.206.2.5 Binary versus text
You can tell a lot by looking at the data structure of the resulting data structure. Each part of the structure gives clues on how to improve it.
1) The data structure of pack is a list of self terminating variable length bit codes, either recalculated at each step (as in a data stream) or calculated maximally across an entire file.
2) Differing codes at each step for each new character encode, and the rules encoding them may vary.
3) The latter will build a single table and prefix it to the data structure.
4) Note that a fixed table for an entire file will only be an average optimal estimate, in that a different rule for populating the table might result in denser coding.
5) The dynamic one will start eith varying code, be loose in the assignation of the codes, and may nor achieve optimal compression until enough of a sample is built up.
6) Different rules for how to reassign the weights to the bit strings may work better or wore in either case (ful file or stream) depending on how the data actually changes from packet to packet. Having the entire file to analyze can result in finding optimal points to insert encoding changes: either according to a fixed rule, or by explicit dictionary edit.
LZW, as does the earlier LZ, builds up a dictionary of strings as it goes; but it also could use a full file analysis in order to prefix a dictionary to the file, or have an assumed one for various contexts.
Varying rules for changing the entries in this table could also be applied for higher density, similar to Huffman encoding in pack.
When the data structure itself is examined, the most noticeable condition is that the individual elements have a very dense part (code) and a very sparse part (datum).
The result is that essentially half (8/20) bits of the compressed file is approximately not compressed at all.
As such, for 50% compression, there should be about another 20% (50% of 8/20) available if the datum part was compressed. This would result in a 60% (30%/50%) final size reduction.
220.127.116.11.2.1 Ordered list of
18.104.22.168.2.2 Compression element of
22.214.171.124.2.3 Structure of
126.96.36.199.2.4 which in itself is a dictionary of
188.8.131.52.2.5 prefixed fixed dictionary and
184.108.40.206.2.6 accretion growth list of
220.127.116.11.2.7 compression element of
18.104.22.168.2.8 structure of
22.214.171.124.2.8.1 fixed assigned code
126.96.36.199.2.8.2 fixed datum
5.2 New methods
188.8.131.52.1 DeStufdt: Data extended system transform unified file/directory trees
184.108.40.206.2 Mafs: Media archive file systems
220.127.116.11.3 Vaptocs: volumes and partitions table of contents systems
Most programs have the option of presenting the data as seen or keeping it in an internal format.
Changin back and forth between them should essentially be a completely transparent operation.
This can be done, as in the “dotz” and “MVFS” NFS-compliant file systems, by tying a front end program to the file operations in order to interpret the sequences involved.
This introduces new errors (“cannot convert properly” and similar), but has the side effect of allowing canonical representations to be maintained with minimal cost overhead.
What is needed is a systematic way to state these variant representations, and have a single interface of common semantics to implement this.
18.104.22.168.1 File systems
22.214.171.124.1.1 Typed files
126.96.36.199.1.1.2 Magic numbers
188.8.131.52.1.1.3 Full data type systems
184.108.40.206.1.2 Symbolic links
A symbolic link by default points to the file to link to by either a relative or absolute path.
What could actually be put in there is an arbitrary text program to provide the converter on open for a specific file, even one that distinguishes which program is opening the file.
Context dependent files change what they look lke or even what they contain on the basis of external data, such as variables in the process environment or what type of system the file is on.
220.127.116.11.2.1 Yar: yet another representor
Or Roy’s ridiculously recursive reformatting representor (r4).
18.104.22.168.3 Extensible lazy evaluation
22.214.171.124.2 Holey (Holographic object lazy evaluation ylem(inator))
Scattering the information uniformly across every bit is both an effective cryptographic technique as well as an effective compression technique.
All ya gotta do is pick a good evaluator.
Using the input data as a transform to find the coefficients of a polynomial in N space to rebuild the input data based on those coefficients, is one method. There are others.
126.96.36.199.3 Extensible lazy evaluation
The basic data structure is one I call redictuless:
Recursively extensible data inference compression tree using lazy evaluation symbol strings.
The basic idea is that the compression codes themselves recursively refer to other codes in the list.
If the table is built up by scanning the entire file, then the codes themselves may recursively refer to themselves (if careful construction is used to provide a terminator for such).
If done by stream oriented scanning, the codes would use the ones already found to add new ones to the list.
If all elements of each code have approximately the same data density, then the index of codes will be at maximum compression levels in all parts of the file or data stream
188.8.131.52.3.1 Ordered list of
184.108.40.206.3.2 Compression element of
220.127.116.11.3.3 Union of
18.104.22.168.3.4 Method/s, which are
22.214.171.124.3.4.1 Termination assessor
If a recursive encoding is allowed, then there must be a method of terminating the recursion.
The top level codes can be used to hold a parameter block for the assessor if desired.
126.96.36.199.188.8.131.52 Level count
A fixed number (do not recurse more than three times) can be used.
Or a variable one.
184.108.40.206.220.127.116.11 DETECT: Data evaluation Test Extended Context Termination
A ful functional routine may be used as a standard item, with additional ones added in.
The method may be specified by an explicit fixed code, or by an extensible startup or runtime extended list of such assessors.
The parameters may be null, at the level of the assessor, or picked from the surrounding context.
With any dynamic recursive test for end of expansion (as in any macro system (cpp, m4, or the lambda calculus, for example)), some form of safety must be convoked to stop runaways on improper execution or structure encoding.
If the codes are built up as simple macros with other code references, then expansion may be terminated via macors that cease to expand, and no separate assessor may be needd; although parameters would still need to be supplied (fixed, stated, context implied, or explicitly stated)
18.104.22.168.3.4.2 (optional, implied) code/assessor
22.214.171.124.3.4.3 (optional, implied) code/parameter)
126.96.36.199.3.4.4 Code/data, which is one of
188.8.131.52.184.108.40.206 Structure of
220.127.116.11.18.104.22.168 List of
Adding a carefully considered method of systematizing ad hoc usages would gain tremendously.