AuthorsDen.com   Join Free! | Login    
   Popular! Books, Stories, Articles, Poetry
Where Authors and Readers come together!

SIGNED BOOKS    AUTHORS    eBOOKS new!     BOOKS    STORIES    ARTICLES    POETRY    BLOGS    NEWS    EVENTS    VIDEOS    GOLD    SUCCESS    TESTIMONIALS

Featured Authors:  Nickolaus Pacione, iMargaret Doner, iKeith Dyne, iRobert Orfali, iOdin Roark, iSheri Hoff, iLaurie Conrad, i

  Home > Science > Articles Popular: Books, Stories, Articles, Poetry     

roy andrea crabtree

· + Follow Me
· Contact Me
· Articles
· Poetry
· Stories
· Blog
· 57 Titles
· 39 Reviews
· Save to My Library
· Share with Friends!
·
Member Since: Dec, 2004

roy andrea crabtree, click here to update your pages on AuthorsDen.




Featured Book
Food for thought
by Antoine Raphael

It's an English version of "Matière à penser"..  
BookAds by Silver
Gold and Platinum Members



CARDS: Compressiong, Archiving, Representing Data Structures
by roy andrea crabtree   
Rated "G" by the Author.
Last edited: Sunday, October 08, 2006
Posted: Saturday, October 07, 2006

  Print   Save    Follow    Share 

Recent articles by
roy andrea crabtree

GNUBG: NNP DB Cheats Honestly
Rcoketeer Love Theme: Where you go my Love
Maximum Parallelism: Order Without Sequence
Resume, bulletted 2006/11/20
In Honor of Ed Bradley: Where Have Ed & 60 Minutes Failed?
Viral attacks: what people can see after the fact.
Restrospective analysis of emotionally & politically controversial events
           >> View all
      View this Article

Many methods exist, and a systematic approach to integrating all of them would provide many bnenefits. Plus a few techniques of my own. All rights Reserved. Work in Progress.




1         Abstract


Object storage takes space. 

Sometimes the object is large and space is at a premium.

Optimizing access for usage versus storage for reduced cost requires representational transformation.

One type of such is called compression.

Many different methods of compression exist, almost all requiring their own evaluator.

These each individually may be simple to invoke, but become extensively unwieldy as more of them come into use.

In addition to finding a very good compression technique, it would also be useful to have a programmatic interface to tabularize these methods (similar to Jensens Device, thunking). Tuck/untuck for compression and redictuless for representation conformation.

See also “The specification, factoring, defaulting, and overriding of attributes”.

Define, unify  , name, specify, enumerate/explicitly list, default, s?  tabularize , override/respecify, extensibly(dunseldtore)


2         Table of Contents


3         Introduction


4         Overview


5         Main


5.1      Historical Analysis


5.1.1      Code/Programs


5.1.1.1  Archives & Archivers


5.1.1.1.1   Directory

5.1.1.1.1.1     In a file system

5.1.1.1.1.2     I a directory

5.1.1.1.1.3     In a file

5.1.1.1.1.3.1   Ar

5.1.1.1.1.3.2   Tar

5.1.1.1.1.3.3   Cpio

5.1.1.1.1.3.4   Pax

5.1.1.1.1.3.5   Shar

5.1.1.1.1.3.6   .EXE

5.1.1.1.1.3.7   zip

5.1.1.2  Representation


5.1.1.2.1   Structure

5.1.1.2.1.1     Name

5.1.1.2.1.2     Primitive identity

5.1.1.2.1.3     Vampt: Value, attribute, Mode, Property, Type

5.1.1.2.1.4     Enumeration: Union of primitives

5.1.1.2.1.5     Union: One alternative of many by tag

5.1.1.2.1.6     Structure: List of dissimilars fixed

5.1.1.2.1.7     Set, list, array: Multiple similar,  dynamic

5.1.1.2.2   Data

5.1.1.2.2.1     Internal versus external

5.1.1.2.2.2     Transportable versus transfixed

5.1.1.2.2.3     Bound versus unbound

5.1.1.2.2.4     Machine versus human

5.1.1.2.2.5     Binary versus text

5.1.1.2.2.5.1   Binary

5.1.1.2.2.5.2   Text

5.1.1.2.2.5.2.1  Name assign value

5.1.1.2.2.5.2.2   

5.1.1.3  Compressors


5.1.1.3.1   Huffman byte to bit: Pack

5.1.1.3.2   LZ, LZW

5.1.1.3.2.1     Compress

5.1.1.3.2.2     Gzip

5.1.1.4     Bzip

5.1.2      Data/Structure


5.1.2.1  Archives & Archivers


5.1.2.1.1   Directory

5.1.2.1.1.1     In a file system

5.1.2.1.1.2     I a directory

5.1.2.1.1.3     In a file

5.1.2.1.1.3.1   Ar

5.1.2.1.1.3.2   Tar

5.1.2.1.1.3.3   Cpio

5.1.2.1.1.3.4   Pax

5.1.2.2  Representation


5.1.2.2.1   Structure

5.1.2.2.1.1     Name

5.1.2.2.1.2     Primitive identity

5.1.2.2.1.3     Vampt: Value, attribute, Mode, Property, Type

5.1.2.2.1.4     Enumeration: Union of primitives

5.1.2.2.1.5     Union: One alternative of many by tag

5.1.2.2.1.6     Structure: List of dissimilars fixed

5.1.2.2.1.7     Set, list, array: Multiple similar,  dynamic

5.1.2.2.2   Data

5.1.2.2.2.1     Internal versus external

5.1.2.2.2.2     Transportable versus transfixed

5.1.2.2.2.3     Bound versus unbound

5.1.2.2.2.4     Machine versus human

5.1.2.2.2.5     Binary versus text

5.1.2.2.2.5.1   Binary

5.1.2.2.2.5.2   Text

5.1.2.3  Compression


You can tell a lot by looking at the data structure of the resulting data structure.   Each part of the structure gives clues on how to improve it.

5.1.2.3.1   Pack

1)      The data structure of pack is a list of self terminating variable length bit codes, either recalculated at each step (as in a data stream) or calculated maximally across an entire file.

2)      Differing codes at each step for each new character encode, and the rules encoding them may vary.

3)      The latter will build a single table and prefix it to the data structure.

4)      Note that a fixed table for an entire file will only be an average optimal estimate, in that a different rule for populating the table might result in denser coding.

5)      The dynamic one will start eith varying code, be loose in the assignation of the codes, and may nor achieve optimal compression until enough of a sample is built up.

6)      Different rules for how to reassign the weights to the bit strings may work better or wore in either case (ful file or stream) depending on how the data actually changes from packet to packet.  Having the entire file to analyze can result in finding optimal points to insert encoding changes: either according to a fixed rule, or by explicit dictionary edit.

5.1.2.3.2   LZW

LZW, as does the earlier LZ, builds up a dictionary of strings as it goes; but it also could use a full file analysis in order to prefix a dictionary to the file, or have an assumed one for various contexts.

Varying rules for changing the entries in this table could also be applied for higher density, similar to Huffman encoding in pack.

When the data structure itself is examined, the most noticeable condition is that the individual elements have a very dense part (code) and a very sparse part (datum).

The result is that essentially half (8/20) bits of the compressed file is approximately not compressed at all.

As such, for 50% compression, there should be about another 20% (50% of 8/20) available if the datum part was compressed.  This would result in a 60% (30%/50%) final size reduction.

5.1.2.3.2.1     Ordered list of

5.1.2.3.2.2     Compression element of

5.1.2.3.2.3     Structure of

5.1.2.3.2.3.1   code

5.1.2.3.2.3.2   datum

5.1.2.3.2.4     which in itself is a dictionary of

5.1.2.3.2.5     prefixed fixed dictionary and

5.1.2.3.2.6     accretion growth list of

5.1.2.3.2.7     compression element of

5.1.2.3.2.8     structure of

5.1.2.3.2.8.1   fixed assigned code

5.1.2.3.2.8.2   fixed datum

5.2      New methods


5.2.1      Code/Programs


5.2.1.1  Archivers


5.2.1.1.1   DeStufdt: Data extended system transform unified file/directory trees

5.2.1.1.2   Mafs: Media archive file systems

5.2.1.1.3   Vaptocs:  volumes and partitions table of contents systems

5.2.1.2  Representation


Most programs have the option of presenting the data as seen or keeping it in an internal format.

Changin back and forth between them should essentially be a completely transparent operation.

This can be done, as in the “dotz” and “MVFS” NFS-compliant file systems, by tying a front end program to the file operations in order to interpret the sequences involved.

This introduces new errors (“cannot convert properly” and similar), but has the side effect of allowing canonical representations to be maintained with minimal cost overhead.

What is needed is a systematic way to state these variant representations, and have a single interface of common semantics to implement this.

5.2.1.2.1   File systems

5.2.1.2.1.1     Typed files

5.2.1.2.1.1.1   Extensions

5.2.1.2.1.1.2   Magic numbers

5.2.1.2.1.1.3   Full data type systems

5.2.1.2.1.2     Symbolic links

A symbolic link by default points to the file to link to by either a relative or absolute path.

What could actually be put in there is an arbitrary text program to provide the converter on open for a specific file, even one that distinguishes which program is opening the file.

5.2.1.2.1.3     CDFs

Context dependent files change what they look lke or even what they contain on the basis of external data, such as variables in the process environment or what type of system the file is on.

5.2.1.2.2   Programs

5.2.1.2.2.1     Yar: yet another representor

Or Roy’s ridiculously recursive reformatting representor (r4).

5.2.1.3  Compressors


5.2.1.3.1   Tuck/untuck

5.2.1.3.2   Holographic

5.2.1.3.3   Extensible lazy evaluation

5.2.2      Data/Structure


5.2.2.1  Archivers


5.2.2.2  Representation


5.2.2.3  Compressors


5.2.2.3.1   Tuck/untuck

5.2.2.3.2   Holey (Holographic object lazy evaluation ylem(inator))

Scattering the information uniformly across every bit is both an effective cryptographic technique as well as an effective compression technique.

All ya gotta do is pick a good evaluator.

Using the input data as a transform to find the coefficients of a polynomial in N space to rebuild the input data based on those coefficients, is one method.  There are others.

5.2.2.3.3   Extensible lazy evaluation

The basic data structure is one I call redictuless:

Recursively extensible data inference compression tree using lazy evaluation symbol strings.

The basic idea is that the compression codes themselves recursively refer to other codes in the list.

If the table is built up by scanning the entire file, then the codes themselves may recursively refer to themselves (if careful construction is used to provide a terminator for such).

If done by stream oriented scanning, the codes would use the ones already found to add new ones to the list.

If all elements of each code have approximately the same data density, then the index of codes will be at maximum compression levels in all parts of the file or data stream

5.2.2.3.3.1     Ordered list of

5.2.2.3.3.2     Compression element of

5.2.2.3.3.3     Union of

5.2.2.3.3.4     Method/s, which are

5.2.2.3.3.4.1   Termination assessor

If a recursive encoding is allowed, then there must be a method of terminating the recursion.

5.2.2.3.3.4.1.1  Parameter

The top level codes can be used to hold a parameter block for the assessor if desired.

5.2.2.3.3.4.1.2  Level count

A fixed number (do not recurse more than three times) can be used.

Or a variable one.

5.2.2.3.3.4.1.3  DETECT: Data evaluation Test Extended Context Termination

A ful functional routine may be used as a standard item, with additional ones added in.

5.2.2.3.3.4.1.3.1 Method

The method may be specified by an explicit fixed code, or by an extensible startup or runtime extended list of such assessors.

5.2.2.3.3.4.1.3.2 Parameters

The parameters may be null, at the level of the assessor, or picked from the surrounding context.

5.2.2.3.3.4.1.3.3 Safeties

With any dynamic recursive test for end of expansion (as in any macro system (cpp, m4, or the lambda calculus, for example)), some form of safety must be convoked to stop runaways on improper execution or structure encoding.

If the codes are built up as simple macros with other code references, then expansion may be terminated via macors that cease to expand, and no separate assessor may be needd; although parameters would still need to be supplied (fixed, stated, context implied, or explicitly stated)

5.2.2.3.3.4.2   (optional, implied) code/assessor

5.2.2.3.3.4.3   (optional, implied) code/parameter)

5.2.2.3.3.4.4   Code/data, which is one of

5.2.2.3.3.4.4.1  Structure of

5.2.2.3.3.4.4.1.1 code

5.2.2.3.3.4.4.1.2 code

5.2.2.3.3.4.4.2  List of

5.2.2.3.3.4.4.2.1 Code

5.2.2.3.3.4.4.3  Etc.

 



6         Summary


7         Closing


Adding a carefully considered method of systematizing ad hoc usages would gain tremendously.


 




 

 

 



Want to review or comment on this article?
Click here to login!


Need a FREE Reader Membership?
Click here for your Membership!


Popular Science Articles
  1. Predicate of Reality: Wheeler's Paradox Co
  2. The Pale Blue Dot
  3. Einstein and Bohr were men of faith
  4. Is This a Purpose-Driven Universe?
  5. Phobos-Grunt Re-entry
  6. The Speed of Light
  7. Universe in a Glass of Wine
  8. Life on Earth: Is This a Dumbed-Down Exist
  9. New Science Perspectives on Ghosts and Oth
  10. China and the Barbarians

Virus X by Frank Ryan

Tells the story of where viruses, such as Ebola and HIV, come from and why they behave as they do. This book changed beliefs in evolutionary biology...  
BookAds by Silver, Gold and Platinum Members

Darwin's Blind Spot by Frank Ryan

A different way of looking at evolution, now increasingly espoused in leading scientific journals, universities and presentations...  
BookAds by Silver, Gold and Platinum Members

Authors alphabetically: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Featured Authors | New to AuthorsDen? | Add AuthorsDen to your Site
Share AD with your friends | Need Help? | About us


Problem with this page?   Report it to AuthorsDen
© AuthorsDen, Inc. All rights reserved.