Home
"Baby steps are Boring: My quest for the ultimate simple-math compression routine."
My interest with file compression began at the time I learned about
mp3s.
One website claimed that mp3s compress data to one-eleventh
their size without losing any information.
"How on Earth is that possible?"
Well, mp3 loses information, is how that is possible.
"Ah."
But mp3s still sound pretty close. How can you throw out ten out
of eleven pieces of data and hear more than a whisper?
Long story short, clever math.
"It's clever to YOU."
As I was saying, I'm not that good at math, but I AM clever at
finding new ways to think.
So who cares about file compression? I do. I love numbers.
They're little 'things' I can rotate and poke and blow smoke at.
So what's the big deal about file compression?
This: A five-minute CD-quality tune in 'raw data' form takes
roughly 7 hours to 'download' with a 28.8K modem.
That's a guess.
An mp3 version of the same tune takes about 45 minutes.
I am curious about finding a way of transmitting a 7-hour file
within 10 seconds.
No way you say.
I say way.
--The theory--
I thought short and soft about a way to build large numbers quickly.
After all, a file "is" "just a number"; a string of ones and zeros.
In most cases, we're talking about a string of MILLIONS OF DIGITS.
For my benefit, let's think about the significance of digits for a second.
Think about the number 99. Bill Gates would immediately stop reading;
we're outside Base-II now. (saw some SNL skit)
Pick a number between 1 and 99.
Got it?
Grab one of your non-psychic friends and say "Hey what number am I thinking of?"
If you don't have any friends, just imagine how having a friend must feel.
--The significance of digits--
Now that you've had your number guessed, let's talk about hay.
Let's talk about a haystack of 99 straws of hay.
"Well that's not a haystack at all, that's what a farmer tracks indoors."
True.
This 'haystack' has 99 straws. Put a blue mark on one of those.
Close your eyes, mix them up, pick a straw. Like the guess-my-number
example, what are the chances (for non-psychics) of picking out that blue
straw on your first try? Ninety-nine to one?
I guess. I think that's how that works. So what am I getting at?
The significance of digits. The significance of the NUMBER of digits.
What I am getting at is that there are only two digits in 99.
So what?
Before we move on, let's make sure we have a clear understanding of
what I just presented. Okay we talked about hay and the number two
and 99.
Now let's skip from first grade math to algerbra. I mean algaebra.
Mmm, algae.
--Biting an apple--
Multiplication. Fun. Good.
Ten times ten is .. the same thing as ten squared!
Looking at my wrist, I see that I'm stepping up the pace.
Now, would you not say that 1 out of 99 is slim-pickins?
Kind of like a needle in a haystack?
Sort of. What about one in a thousand .. even slimmer. One-thousand, for us ten-fingered people has only 4 digits. FOUR DIGITS. Slim pickins. A million has 7 digits. I just realized that fingers have 3 digits.
--The theory of ultra compression--
I'm going to 'make' a very large number quickly.
11^4000 voila!
That number's so large, and so incredibly unbelievably specific,
like a needle in one of those things, but think for a moment,
11^4000 represents a HUGE STRING OF DIGITS.
This, my un-met friends is the driving force behind an unusual
theory on lossless (also called non-lossy) file compression.
And for y'all who might be concerned over how long it takes
to computer ^4000 and beyond,
--Computer ^4000 and beyond--
I have figured out a way to avoid computing "that high."
I'm going to call any number, any value I'm trying to compress,
a 'target.' My brain is starting to give out.
An example I keep going back to: Target = 100 (one hundred)
What if 2^somepower came REALLY CLOSE to 100?
Well let's see .. let's keep somepower whole, let's keep it real,
let's keep it at a value I can work with.
I know that 2^7 = 128 because I just had to use my fingers to
keep track of how far I was in my memorized list: 2 4 8 16 32 64 128.
Looks like 128 is closest to 100, and for now I'm going to say,
even if it's 1 over target, THROW IT OUT.
So I'm using 64, which is 2^6, which is 36 from 100.
--"Hold on!" you say--
Two to the Sixth is thirty-six from one-hundred.
What am I doing?
I am trying to compress the number One Hundred.
Why? BECAUSE I CAN'T GET IT OUT OF MY HEAD DAMMIT AND
I'M EXCITED ABOUT THIS
"Okay Mr. Excited you may continue."
Thank you.
As I was saying, 2^6 is 36 from 100.
"We know that, Mr. Excitement."
And ..
brain giving out.
"Mr. Excitement why are you trying to compress one hundred?
Why waste our time like this? It has only THREE digits."
Your honor, I am doing this for myself, because I think very slowly..
"Alright I'm calling a 15-minute recess. Jury, be back in ten."
===========
--Evidence of a bumbling idiot--
2^6 is 64 which is 36 from 100
2^5 is 32 which is 4 from 36
2^2 is 4
100 can be thought of as 2^6 + 2^5 + 2^4.
speeding up ..
"Why stick with just 2's?"
Indeed.
3^4 = 81, which is 19 from 100. "Not bad.."
Skipping further, what we have is the general idea that
2^(2 and higher)
3^(2 and higher)
5^(2 and higher)
6^(2 and higher)
7^(2 and higher)
10^(2 and higher)
..
Why did I skip 4, 8 and 9?
Quick example, and one of my favorites because I love this number:
16777216 is the number I want to 'compress'. My 'target' value.
16777216 is a very special number with computers. It's the number
of colors that most graphics cards these days are capable of displaying.
16777216 is 2^24 .. as well as 256^3 .. as well as 4096^2.
It's also 4^12.
The thing that 256, 4096 and 4 have in common is they are all basically
blown up versions of 2.
So, for a moment I consider: Which would I rather use..
2^24, 4^12, 8^8, 16^6 .. oh hey 8^8 .. COOL!
This is the first time I went through that far.
Usually when I toss this around I think of 2^24 'versus' 4096^2.
Inteddesting ..
3^powers I do the same thing.
3^4 = 9^2 = 81.
See how quickly the left side builds? Trick is, keep the left side
as small as possible and let the not-even-flinching right side
make the big numbers.
And with that I'll preface next section by saying that this whole
number^power infatuation helped me realize something (just a few
days ago - March 31) that turned out to be the opposite of what I
expected.
--4, 8, 9, 16, 25--
As originally thought, 4, 8 and 9 etc are not used on the
left side of the ^ function.
2^(2 thru ..) = 4 8 16 32 64 128 256 512 1024 2048 4096
3^(2 thru ..) = 9 27 81 243 729(4 is not used because it's a '2')
5^(2 thru ..) = 25 125 625
6^(2 thru ..) = 36 216
7^(2 thru ..) = 49 343
(8 is a '2')
(9 is a '3')
10^(2 thru..) = 100 1000
11^(2 thru..) = 121
12^(2 thru..) = 144
13^(2 thru..) = 169
Now I'm making a list
4 8 9 16 25 27 32 36 49 64 81 100 121 125 128 144 169 216
243 256 343 512 625 729 1000 1024
"ok I'm done."
--We've got this list--
And I'm going to perform a unique procedure.
1 4
2 8
3 9
4 16
5 25
6 27
7 32
8 36
9 49
10 64
11 81
12 100
13 121
14 125
15 128
And now let's try something.
I'm going to compress 100.
Ready set go
12.
That's it.
--What about other numbers--
.. tbc ..
Visit my Fractint fractals
Tierazon fractals room, or image
Splayer my speaker box
Bryce Page