Up to this point, I have been generous to you, showing examples
with a very simple image file of text only. However,
realistically, computers read things differently. They like
binary compared to numbers and text. Let's look at the
file above one more time:
PPM Image
P3
3 2
255
255 0 0 0 255 0 0 0 255
255 255 0 255 255 255 0 0 0
This is the
text format of PPM. Notice how the numbers
after the "255" colour range identifier are human-readable
numbers. What determines whether or not a program should read a
file like this as binary or text is its
magic number at
the very top of the file. Here is a table of how the magic
number works for PBM, PGM, and PPM files:
Magic Number |
File Type |
Extension |
Type |
P1 |
Portable BitMap |
PBM |
ASCII |
P2 |
Portable GrayMap |
PGM |
ASCII |
P3 |
Portable PixMap |
PPM |
ASCII |
P4 |
Portable BitMap |
PBM |
Binary |
P5 |
Portable GrayMap |
PGM |
Binary |
P6 |
Portable PixMap |
PPM |
Binary |
P7 |
Portable ArbitraryMap |
PAM |
Unknown |
In there, you can clearly see that P3 is defined as a
PPM file that is Text-based (ASCII). However, there are
other variants. P6 is the binary version of PPM.
"So Clara, what does a binary PPM file look like?"
I'm glad you asked. Chances are that your web browser can't even
support me showing it here in raw binary... so I'll just show you a
hex dump of it:
Hex Dump (6colour_ppmb.ppm)
00000000 50 36 0a 33 20 32 0a 32 35 35 0a ff 00 00 00 ff |P6.3 2.255......|
00000010 00 00 00 ff ff ff 00 ff ff ff 00 00 00 |.............|
0000001d
As you can see, the first 3 lines are readable text (line break =
0x0A). Notice the magic number is
P6. This isn't that bad
either to be honest. Since the file is binary, we can store the
information on a colour in
3 bytes as opposed to the text
format. One byte for red, one for green, and one for blue.
"Right... how do I read it?"
Take a look at this byte coloured in
red...
Hex Dump (6colour_ppmb.ppm)
00000000 50 36 0a 33 20 32 0a 32 35 35 0a ff 00 00 00 ff |P6.3 2.255......|
00000010 00 00 00 ff ff ff 00 ff ff ff 00 00 00 |.............|
0000001d
This is the byte that comes right after the "255" in the file, and
is a newline character. The colour data for each pixel exists
right after it, starting at 0xFF. Here's it colour coded:
Hex Dump (6colour_ppmb.ppm)
00000000 50 36 0a 33 20 32 0a 32 35 35 0a ff 00 00 00 ff |P6.3 2.255......|
00000010 00 00 00 ff ff ff 00 ff ff ff 00 00 00 |.............|
0000001d
"Why is this a big deal? Who cares?"
In the text format, if you tried to store the number "100", you'd
be using 3 bytes for that single number. Along with that,
you'd be using bytes for spacing too between each number. The
result is that each text file is usually around 2-4x larger
than their binary variants, and takes longer to load. The
only benefit you get out of it is that it's easier to read with
your own eyes, which doesn't matter to the computer. You tell me
which is superior. I think the answer is obvious.
Don't believe me? Here's a file size comparison:
UNIX Command
UNIX> fa
DIRECTORY: .
DIRECTORY COUNT: 0
FILE COUNT: 2
-rwxr-xr-x 1 96 6colour_ppma.ppm
-rwxr-xr-x 1 29 6colour_ppmb.ppm
For the
text variant, the size is
96 bytes. For the
binary variant, the size is
29 bytes. And they both
store the same exact information.
In a more realistic sense, my phone shoots a picture at a
resolution of 4032x3024. That's 12,192,768 pixels. If converted
to a Binary PPM, it'd be around 36 megabytes. If converted
to a Text PPM, it'd be (at max) around 146 megabytes.
Binary is important.