Speed and Size, bits and Bytes

Transfer Speed (also known as throughput or bandwidth) is confusing in itself. But it gets extra messy when you also throw sizes into the mix (“how big is that file?”). The words for expressing throughput per second and file sizes are confusingly similar but are very different.
Often people use the incorrect term (bit vs byte) which is not surprising.

The background for the two terms is that the world of computers is binary. All information is broken down into binary digits called bits. A bit can either have the value 0 (zero) or 1 (one). Whenever you store something on your computer, such as a photo, then the picture is stored on the computer’s hard disk drive in the form of binary digits or bits.

If you think about how we use numbers in the real world we don’t always use the most basic form such as “gram”. Instead, we say “kilos” or “Kilograms” (1,000 grams) which is much more convenient for most day to day use cases. The same goes for the terms score, dozen or gross. They are all used for convenience instead of the specific numbers they correspond to.

In the same way, we have the word byte in the computer world. A byte is 8 bits. One reason for why we started to use the word byte is that a lot of information that was stored on computers required 8 bits of data. For example, a normal typed character used to require 8 bits to be stored on a computer. It then makes sense to have a separate word to express the most common used number of bits.

So the basic formula to convert between bytes and bits is:

  • 1 byte = 8 bits

1 bit is a very small value. Same thing with 1 byte. So we have to be able to add prefixes just like we do with weights (gram, kilogram…)

Table over metric multiples of 1000

  • 1 Kilobit is 1,000 bits
  • 10 Kilobit is 10,000 bits
  • 100 Kilobit is 100,000 bits, which could also be written as 0,1 Megabit
  • 1000 Kilobit is 1,000,000 bits, which could also be written as 1 Megabit

But just to add some confusion – the above is just true for bits. For bytes the formula for converting between the prefixes is different. A Kilobyte is not exactly 1,000 bytes but rather 1024 bytes. We will show you further down how the conversion works for bytes.

Unit symbols

The unit symbols are really important when you talk about bits and bytes. It is one of the most common causes for confusion to use the wrong unit symbol.

  • bytes commonly uses the unit symbol “B” with a capital B
  • bits can use “b” as its symbol, but that is easily confused with the capital B for byte. So it is also common to use the full word “bit

Since a byte is 8 times as much as 1 bit it is important to keep them apart and understand the difference.

Examples:

  • MB means Megabyte
  • KB means Kilobyte
  • Mb or Mbit means Megabit
  • Gb or Gbit means Gigabit

As previously mentioned, it is very common for people to be unaware of the difference between bits and bytes. A lot of the time the wrong term is used. This can lead to misconceptions about both files sizes and transfer speeds. It is more common for people to know file sizes than it is to know transfer speeds since you often work with files but rarely have to deal with bandwidth or throughput.

Sometimes you can see someone posting on the Internet because they are unsatisfied with their download speeds. They might have a 20Mbps Internet connection, but they can “only download files at about 2.4 MB per second!”. Some programs that you use to download files report the download speeds using bits per second, whereas other programs might report the number of bytes per second. If you don’t know the difference then you wouldn’t even be able to spot the difference between “Mb” and “MB”.

Based on the above, should they be dissatisfied with their Internet connection? Nope, on the contrary, they are getting really good results! Each Byte is 8 bits. So how many Megabit (Mbit) are 2.4 Megabyte (MB)? It is more or less as simple as calculating 2.4MB x 8 = 19.2Mb.

So in the example above the difference between being dissatisfied and satisfied is in the upper- and lower-case B!

But where did the last 0.8 Mbps go? We have a 20Mbps Internet connection, but we are downloading at 19.2Mbps. You will obtain the answer to that question further down.

Why this mix of bits and bytes?

We already touched on this subject, but for storage and for expressing space on hard disk drives it was much simpler to express file sizes using bytes. Files are big and most files were text files containing text. And since text consists of characters and each character took up 8 bits of storage it was easier to express storage space in the number of bytes that the storage could hold.

For data transfers and computer networks, however, it makes more sense to measure the number of bits per second that are being transferred. This is because the network equipment can typically only transfer one bit at a time.

Throughput, the simplified version

In this simplified version we skip explaining something called “overhead”, but we bring it up further down.

Let’s say we have an Internet connection handling 50 Mbps. Mbps stands for Megabit per second and is often written as either “Mbps”, “Mb/s” or “Mbit/s”. By now you will note the lower case b which stands for bit.

Now we are downloading a file that is 6.25 MB big.

  • 6.25 MB is 50 Mbit (6.25 x 8)

So under optimal conditions, it would take a second to download the 6.25MB big file if your Internet connection is 50 Mbit/s.

How about if we instead download a file that is 4.5 GB big?

  • 4.5 GB is about 4600 MB (4.5 x 1024). 4600 MB is about 36800 Mbit (4600 x 8)

In a best case scenario, it would then take 736 seconds (36800 Mbit / 50 Mbit/s) to download the file.

There are a lot of good online calculators out there on the Internet that can help you convert between Bytes and bits. Some of them can also help you calculate things like download speeds.

As you can tell there are good reasons for why confusion often arises around bits and Bytes. Not even the manufacturers themselves can agree on how they should perform the calculations.

Hard disk drives for examples are storage areas for files, and their capacity is measured in GB or TB today. But hard disk manufacturers measure hard disk capacity based on decimal base 1,000. So according to hard disk manufacturers, 100,000 MB equals 100 GB. However, the computer OS uses binary base 1024 for calculating hard disk drive capacity. So when you try to store files on the hard disk drive you can’t fit 100 GB on there.

This is how hard disk manufacturers do the math:

  • 100,000,000,000 bytes = 100,000,000 KB (divided by 1,000) = 100,000 MB (divided by 1,000) = 100 GB (divided by 1,000)

But this is how much storage space that you actually get as reported by the OS:

  • 100,000,000,000 Bytes = 97,656,250 KB (divided by 1024) = 95,367 MB (divided by 1024) = 93,1 GB (divided by 1024)

Throughput, the advanced version

Unfortunately, it is not enough to state that an Internet connection with 50 Mbps bandwidth could transfer 50 Mbit of data per second. Those 50 Mbit per second include all of the data that has to be transferred. Not only the data that you want to transmit but everything else as well.

There are several more things that need to be covered by that bandwidth, including for example the following:

  • Overhead
  • Session setup
  • Application Data and Control Data
  • Adaptive transfer rates

Overhead and Session Setup

Overhead means information that is not the data you and me actually want to send, but which has to be transmitted anyway.

When you send a letter to somebody by mail you write the letter on a piece of paper. Then you put the paper in an envelope and you type down the address of the recipient on the envelope.

The whole envelope has to be sent in the mail, even if the important message that you want to transfer is just the text on the paper inside of the envelope. The address that you put on the envelope has to be there, but it does not contain any of the important information that you wanted to transmit.

The envelope is the mail equivalent of Overhead. Required information which is necessary to transmit the message, but which does not belong to the message itself.

In computer communication, the overhead consists of for example IP addresses on the IP packets, MAC addresses, Port numbers, TCP or UDP information, and so on. All of which has to be transmitted with each packet but which is not the actual data you want to transmit.

Overhead and how data is packaged in "envelopes"

When the message is being sent to the network all parts of the message are being sent including the overhead information. A 50 Mbps Internet connection can only transmit 50 Mbit per second, including that extra overhead.

If the file that we want to transfer is big it also has to be chopped up in smaller parts, and these parts are then put in different IP packets. Each of those packets then need the same type of overhead information with addresses on it.

Normally each packet can contain a maximum of 1,460 Bytes of information. Then some additional 40 Bytes are used for overhead addressing and such. So about 2.7% (40/1500) of the available bandwidth is used to transfer overhead information, and this is just under optimal circumstances.

And if you have read the traffic example in the specialisation section then you might remember the TCP 3-way handshake we showcased there. Network communication often has to be setup using sessions, which also uses bandwidth for the messaging.

Not to mention that a lot of applications are communicating in the background non-stop without you even knowing about it. If you perform a file transfer using Windows Explorer your computer will send hundreds of messages in the background to verify data, check the file system on the other computer, browse to the correct destination, figure out how much space is available and so on. All of this is information that you never see in Windows Explorer but which is still consuming available bandwidth. These messages contain Control Data and various types of background Application Data.

Adaptive transfer rate

Most file transfers use the TCP protocol. TCP always tries to help by being nice and not send more data than the receiver can handle, so TCP tries to find a sweet spot for how quickly to send the data. Not too fast and not too slow.

This means that often a computer that is going to send a file is starting out a bit slower while TCP is testing the territory of how quick transfer speeds that the connection and the other computer can handle. The speed will then gradually increase until TCP has found what it thinks is the maximum transfer speed. But it is not a perfect system. The transfer speed will vary up and down because of little adjustments here and there.

The result is that it is rare for a single file transfer to be able to utilise all of the available bandwidth.

However there are some applications such as BitTorrent that work by using multiple simultaneous downloads from several different sources, and the different bits that have been downloaded are then assembled after they are fully downloaded. This makes it easier to reach higher speeds even over rather bad internet connections with high latency.

Previous part:
Traffic example, the full picture

Next part:
Radio communication basics