Reference reading directly related to this post:
http://www.math.princeton.edu/matalive/Error/index.html
http://www.math.princeton.edu/matalive/Error/index.html
http://www.math.princeton.edu/matalive/Error/ErrorLab2/LZCompressor.html
http://www.math.princeton.edu/matalive/Error/ErrorLab2/LZCompressor.html
Hi all.
I've just started learning the ways of compression. I decided to start with the LZ algorithm. The resource I use is:
http://www.math.princeton.edu/matalive/Error/index.html
http://www.math.princeton.edu/matalive/Error/index.html
I understand the fundamentals of parsing the incoming text. Let's call the container that holds the unique sub-strings: a dictionary.
On the page where one can play online with a fully implemented LZ compressor, they use a number of ASCII characters to indicate the index of the unique sub-substring in the "dictionary". Index 1 is the ASCII character 33 ('!' = index 1, so 'SPACE' = not encountered before). ASCII-char 127 (DEL) is the 95th index. To me, a few steps in this online tutorial went too fast, missing a few steps.
Let's play with the online compression tool, I'll ask my questions along the way. I am accurate, so observe the spaces as well :)
I enter: "a"
output: "aa"
QUESTION: why the double a? I see only one letter, not encountered before, etcetera. So I expect to see a space then "a" => " a"
I enter: "aa"
output: "aa"
QUESTION: Why this output? To me the second "a" is a string in the dictionary on index 1. Since index 1 is ASCII-char 33, the
output to me should be: " a!".
I enter: "aaa"
output: " a!a!a"
QUESTION: this is how I think:
1) The dictonary looks like this:
index substring in_terms_of_index
"!" "a" NONE
"""" "aa" !a
So the output must be (according to me): " a!a"
I enter: "aaaa"
output: " a!a a"
QUESTION: To my logic the output should be: " a!a!" . I'll show you my dictionary:
index substring in_terms_of_index
"!" "a" NONE
"""" "aa" !a
So the output must be (according to me): SPACE for 1st "a" + "!a" for double "aa" + "!" for first index "a" = " a!a!"
And so forth.
I have another question:
Suppose my dictionary size exceeds 95? How would I code something with index 96 or higher? What is the logic? I can't just pad a
fixed amount of characters can I?
Thank your for your reading,