t byfield on Mon, 27 Jul 2015 15:28:25 +0200 (CEST)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> Lori Emerson: What's Wrong With the Internet and How We Can Fix It: Interview With Internet Pioneer John Day


Via RISKS <http://catless.ncl.ac.uk/Risks/28.81.html>

<http://loriemerson.net/2015/07/23/whats-wrong-with-the-internet-and-how-we-can-fix-it-interview-with-internet-pioneer-john-day/>

   loriemerson

   July 23, 2015

What's Wrong With the Internet and How We Can Fix It: Interview With
Internet Pioneer John Day

Below is an interview I conducted with the computer scientist and
Internet pioneer John Day via email over the last six months or so.
The interview came about as a result of a chapter I've been working on
for my "Other Networks" project, called "The Net Has Never Been
Neutral." In this piece, I try to expand the materialist bent of media
archaeology, with its investment in hardware and software, to networks.
Specifically, I'm working through the importance of understanding the
technical specs of the Internet to figure out how we are unwittingly
living out the legacy of the power/knowledge structures that produced
TCP/IP. I also think through how the Internet could have been and may
still be utterly different. In the course of researching that piece, I
ran across fascinating work by Day in which he argues that "the
Internet is an unfinished demo" and that we have become blind not only
to its flaws but also to how and why it works the way it works. Below
you'll see Day expand specifically on five flaws of the TCP /IP
model that are still entrenched in our contemporary Internet
architecture and, even more fascinating, the ways in which a more
sensible structure (like the one proposed by the French CYCLADES
group) to handle network congestion would have made the issue of net
neutrality beside the point. I hope you enjoy and many, many thanks to
John for taking the time to correspond with me.

*

Emerson: You've written quite vigorously about the flaws of the TCP/IP
model that go all the way back to the 1970s and about how our
contemporary Internet is living out the legacy of those flaws.
Particularly, you've pointed out repeatedly over the years how the
problems with TCP were carried over not from the American ARPANET but
from an attempt to create a transport protocol that was different from
the one proposed by the French Cyclades group. First, could you explain
to readers what Cyclades did that TCP should have done?

Day: There were several fundamental properties of networks the CYCLADES
crew understood that the Internet group missed:

 * The Nature of Layers,
 * Why the Layers they had were there,
 * A complete naming and addressing model,
 * The fundamental conditions for synchronization,
 * That congestion could occur in networks, and
 * A raft of other missteps most of which follow from the previous 5,
   but some are unique.

First and probably foremost was the concept of layers. Computer
Scientists use "layers" to structure and organize complex pieces of
software. Think of a layer as a black box that does something, but the
internal mechanism is hidden from the user of the box. One example is a
black box that calculates the 24 hour weather forecast. We put in a
bunch of data about temperature, pressure and wind speed and out pops a
24 hour weather forecast. We don't have to understand how the blackbox
did it. We don't have to interact with all the different aspects it
went through to do that. The black box hides the complexity so we can
concentrate on other complicated problems for which the output of the
black box is input. The operating system of your laptop is a black box.
It does incredibly complex things but you don't see what it is doing.
Similarly, the layers of a network are organized that way. For the
ARPANET group, BBN [Bolt, Barenek, and Newman] built the network
and everyone else was responsible for the hosts. To the people
responsible for the hosts, the network of IMPs was a blackbox that
delivered packets. Consequently, for the problems they needed to solve,
their concept of layers focused on the black boxes in the hosts. So the
Internet's concept of layers was focused on the layer in the Hosts
where its primary purpose was modularity. The layers in the ARPANET
hosts were the Physical Layer, the wire; IMP-HOST Protocol; the NCP;
and the applications, such as Telnet, and maybe FTP.[1] For the
Internet, they were Ethernet, IP, TCP, Telnet or HTTP, etc. as
application. It is important to remember that the ARPANET was built to
be a production network to lower the cost of doing research on a
variety of scientific and engineering problems.

The CYCLADES group, on the other hand, was building a network to do
research on the nature of networks. They were looking at the whole
system to understand how it was supposed to work. They saw that layers
were more than just local modularity but a set of cooperating processes
in different systems, and most importantly different layers had
different scope, i.e. number of elements in them. This concept of the
scope of a layer is the most important property of layers. The Internet
never understood its importance.

The layers that the CYCLADES group came up with in 1972 were the
following: 1) the Physical Layer - the wires that go between boxes. 2)
The Data Link Layer - that operates over one physical media and detects
errors on the wire and in some cases keeps the sender from overrunning
the receiver. But most physical media have limitations on how far they
can be used. The further data is transmitted on them the more likely
there are errors. So these may be short. To go longer distances, a
higher layer with greater scope exists over the Data Link Layer to
relay the data. This is traditionally called the 3) Network Layer.

But of course, the transmission of data is not just done in straight
lines, but as a network so that there are alternate paths. We can show
from queuing theory that regardless of how lightly loaded a network is
it can have congestion, where there are too many packets trying to get
through the same router at the same time. If the congestion lasts too
long, it will get worse and worse and eventually the network will
collapse. It can be shown that no amount of memory in the router is
enough, so when congestion happens packets must be discarded. To
recover from this, we need a 4) Transport Layer protocol, mostly to
recover lost packets due to congestion. The CYCLADES group realized
this which is why there is a Transport Layer in their model. They
started doing research on congestion around 1972. By 1979, there had
been enough research that a conference was held near Paris. DEC and
others in the US were doing research on it too. Those working on the
Internet didn't understand that such a collapse from congest could
happen until 1986 when it happened to the Internet. So much for seeing
problems before they occur.

Emerson: Before we go on, can you expand more on how and why the
Internet collapsed in 1986?

Day: There are situations where too many packets arrive at a router and
a queue forms, like everyone showing up at the cash register at the
same time, even though the store isn't crowded. The network (or store)
isn't really overloaded but it is experiencing congestion. However in
the Transport Layer of the network, the TCP sender is waiting to get an
acknowledgement (known as an "ack") from the destination that indicates
the destination got the packet(s) it sent.  If the sender does not get
an ack in a certain amount of time, the sender assumes that packet and
possibly others were lost or damaged re-transmits everything it has
sent since it sent the packet that timed out.  If the reason the ack
didn't arrive is that it was delayed too long at an intervening router
and the router has not been able to clear its queue of packets to
forward before this happens, the retransmissions will just make the
queue at that router even longer.  Now remember, this isn't the only
TCP connection whose packets are going through this router.  Many
others are too. And as the day progresses, there is more and more load
on the network with more connections doing the same thing.  They are
all seeing the same thing contributing to the length of the queue.  So
while the router is sending packets as fast as it can, its queue is
getting longer and longer.  In fact, it can get so long and delay
packets so much, that the TCP sender's timers will expire again and it
will re-transmit again, making the problem even worse. Eventually, the
throughput drops to a trickle.

As you can see, this is not a problem of not enough memory in the
router; it is a problem of not being able to get through the queue.
(Once there are more packets in the queue than the router can send
before retransmissions are triggered, collapse is assured.)  Of course
delays like that at one router will cause similar delays at other
routers.  The only thing to do is discard packets.

What you see in terms of the throughput of the network vs load is that
throughput will climb very nicely, increasing, then it begins to
flatten out as the capacity of the network is reached, then as
congestion takes hold and the queues get longer, throughput starts to
go down until it is just a trickle.  The network has collapsed.  The
Internet did not see this coming. Nagel warned them in 1984 but they
ignored it.  They were the Internet - what did someone from Ford Motor
Company know?  It was a bit like the Frank Zappa song, "It can't happen
here."  They will say (and have said) that because the ARPANET handled
congestion control, they never noticed it could be a problem.  As more
and more IP routers were added to the Internet, the ARPANET became a
smaller and smaller part of the Internet as a whole and it no longer
had sufficient influence to hold the congestion problem at bay.

This is an amazing admission. They shouldn't have needed to see it
happen to know that it could. Everyone else knew about it and had for
well over a decade. CYCLADES had been doing research on the problem
since the early 1970s.  The Internet's inability to see problems before
they occur is not unusual.  So far we have been lucky and Moore's Law
has bailed us out each time.

Emerson: Thank you - please, continue on about what Cyclades did that
TCP should have done.

Day: The other thing that CYCLADES noticed about layers in networks was
that they weren't just modules and they realized this because they were
looking at the whole network. They realized that layers in networks
were more general because they used protocols to coordinate their
actions in different computers. Layers were distributed share states
with different scopes. Scope? Think of it as building with bricks. At
the bottom, we use short bricks to set a foundation, protocols that go
a short distance. On top of that are longer bricks, and on top of that
longer yet. So what we have is the Physical and Data Link Layer have
one scope; the Network and Transport Layers have a larger scope over
multiple Data Link Layers. Quite soon, circa 1972, researchers started
to think about networks of networks. The CYCLADES group realized that
the Internet Transport Layer was a layer of greater scope yet it also
operated over multiple networks. So by the mid-1970s, they were looking
at a model that consisted of Physical and Data Link Layers of one small
scope that is used to create networks with a Network Layer of greater
scope, and an Internet Layer over multiple networks of greater scope
yet. The Internet today has the model I described above for a network
architecture of two scopes, not an internet of 3 scopes.

Why is this a problem? Because congestion control goes in that middle
scope. Without that scope, the Internet group put congestion control in
TCP, which is about the worse place to put it and thwarts any attempt
to provide Quality of Service for voice and video, which must be done
in the Network Layer and ultimately precipitated a completely
unnecessary debate over net neutrality.

Emerson: Do you mean that a more sensible structure to handle network
congestion would have made the issue of net neutrality beside the
point? Can you say anything more about this? I'm assuming others
besides you have pointed this out before?

Day: Yes, this is my point and I am not sure that anyone else has
pointed it out, at least not clearly.  It is a little hard to see
clearly when you're "inside the Internet."  There are several points of
confusion in the net neutrality issue. One is that most non-technical
people think that bandwidth is a measure of speed when it is more a
measure of capacity.  Bits move at the speed of light (or close to it)
and they don't go any faster or slower. So bandwidth really isn't a
measure of speed. The only aspect of speed in bandwidth is how long it
takes to move a fixed number of bits and whatever that is consumes
capacity of a link. If a link has a capacity of 100Mb/sec and I send a
movie at 50Mb/sec, I only have another 50Mb/sec I can use for other
traffic. So to some extent, talk of a "fast lane" doesn't make any
sense. Again, bandwidth is a matter of capacity.

For example, you have probably heard the argument that Internet
providers like Comcast and Verizon want "poor little" Netflix to pay
for a higher speed, to pay for a faster lane. In fact, Comcast and
Verizon are asking Netflix to pay for more capacity! Netflix uses the
rhetoric of speed to wrap themselves in the flag of net neutrality for
their own profit and to bank on the fact that most people don't
understand that bandwidth is capacity. Netflix is playing on people's
ignorance.

From the earliest days of the Net, providers have had an agreement that
as long as the amount of traffic going between them is about the same
in both directions they don't charge each other. In a sense it would
"all come out in the wash." But if the traffic became lop-sided, if one
was sending much more traffic into one than the other was sending the
other way, then they would charge each other. This is just fair.
Suddenly, because movies consume a lot of capacity, Netflix is
generating considerable load that wasn't there before. This isn't about
blocking a single Verizon customer from getting his movie; this is
about the 1000s of Verizon Customers all downloading movies at the same
time and all of that capacity is being consumed at a point between
Netflix's network provider and Verizon.  It is even likely they didn't
have lines with that much capacity, so new ones had to be installed.
That is very expensive.  Verizon wants to charge Netflix or Netflix's
provider because the capacity moving from them to Verizon is now
lop-sided by a lot.  This request is perfectly reasonable and it has
nothing to do with the Internet being neutral. Here's an analogy:
imagine your neighbor suddenly installed an aluminum smelter in his
home and was going to use 10,000 times more electricity than he use to.
He then tells the electric company that they have to install much
higher capacity power lines to his house and provide all of that
electricity and his monthly electric bill should not go up. I doubt the
electric company would be convinced.

Net neutrality basically confuses two things: traffic engineering vs
discriminating against certain sources of traffic. The confusion is
created because of the flaws introduced fairly early and then what that
forced the makers of Internet equipment to do to try to work around
those flaws.  Internet applications don't tell the network what kind of
service they need from the Net.  So when customers demanded better
quality for voice and video traffic, the providers had two basic
choices: over provision their networks to run at about 20% efficiency
(you can imagine how well that went over) or push the manufacturers of
routers to provide better traffic engineering. Because of the problems
in the Internet, about the only option open to manufacturers was for
them to look deeper into the packet than just making sure they routed
the packet to its destination.  However, looking deeper into a packet
also means being able to tell who sent it. (If applications start
encrypting everything, this will no longer work.)  This of course not
only makes it possible to know which traffic needs special handling,
but makes it tempting to slow down a competitor's traffic.  Had the Net
been properly structured to begin with (and in ways we knew about at
the time), then these two things would be completely distinct: one
would have been able to determine what kind of packet was being relayed
without also learning who was sending it and net neutrality would only
be about discriminating between different sources of data so that
traffic engineering would not be part of the problem at all.

Of course, Comcast shouldn't be allowed to slow down Skype traffic
because it is in competition with Comcast's phone service.  Or Netflix
traffic that is in competition with its on-demand video service. But if
Skype and Netflix are using more than ordinary amounts of capacity,
then of course they should have to pay for it.

Emerson: That takes care of three of the five flaws in TCP. What about
the next two?

Day: The next two are somewhat hard to explain to a lay audience but
let me try. A Transport Protocol like TCP has two major functions: 1)
make sure that all of the messages are received and put in order, and
2) don't let the sender send so fast that the receiver has no place to
put the data. Both of these require the sender and receiver to
coordinate their behavior. This is often called feedback, where the
receiver is feeding back information to the sender about what it should
be doing. We could do this by having the sender send a message and the
receiver send back a special message that indicates it was received
(the "ack" we mentioned earlier) and to send another. However, this
process is not very efficient. Instead, we like to have as many
messages as possible `in flight' between them, so they must be loosely
synchronized. However, if an ack is lost, then the sender may conclude
the messages were lost and re-transmit data unnecessarily. Or worse,
the message telling the sender how much it can send might get lost. The
sender is waiting to be told it can send more, while the receiver
thinks it told the sender it could send more. This is called deadlock.
In the early days of protocol development a lot of work was done to
figure out what sequence of messages was necessary to achieve
synchronization. Engineers working on TCP decided that a 3-way exchange
of messages (3-way handshake) could be used at the beginning of a
connection. This is what is currently taught in all of the textbooks.
However, in 1978 Richard Watson made a startling discovery: the message
exchange was not what achieved the synchronization. It was explicitly
bounding three timers. The messages are basically irrelevant to the
problem. I can't tell you what an astounding result this is. It is an
amazingly deep, fundamental result - Nobel Prize level! It not only
yields a simpler protocol, but one that is more robust and more secure
than TCP. Other protocols, notably the OSI Transport Protocol,
incorporate Watson's result but TCP only partially does and not the
parts that improves security. We have also found this implies the
bounds of what is networking. If an exchange of messages requires the
bounding of these timers to work correctly, it is networking or
interprocess communication. If they aren't bounded, then it is merely a
remote file transfer. Needless to say, simplicity, working well under
harsh conditions (or robustness), and security are all hard to get too
much of.

Addressing is even more subtle and its ramifications even greater. The
simple view is that if we are to deliver a message in a network, we
need to say where the message is going. It needs an address, just like
when you mail a letter. While that is the basic problem to be solved,
it gets a bit more complicated with computers. In the early days of
telephones and even data communications, addressing was not a big deal.
The telephones or terminals were merely assigned the names of the wire
that connected them to the network. (This is sometimes referred to as
"naming the interface.") Until fairly recently, the last 4 digits of
your phone number were the name of the wire between your phone and the
telephone office (or exchange) where the wire came from. In data
networks, this often was simply assigning numbers in the order the
terminals were installed.

But addressing for a computer network is more like the problem in a
computer operating system than in a telephone network. We first saw
this difference in 1972. The ARPANET did addressing just like other
early networks. IMP addresses were simply numbered in the order they
were installed. A host address was an IMP port number, or the wire from
the IMP to the host. (Had BBN give a lot of thought to addressing? Not
really. After all this was an experimental network. The big question
was, would it work at all!!?? Let alone could it do fancy things!
Believe me, just getting a computer that had never been intended to
talk to another computer to do that was a big job. Everyone knew that
addressing issues were important, difficult to get right, so a little
experience first would be good before we tackled them.) Heck, the
maximum number of hosts was only 64 in those days.)

In 1972, Tinker AFB joined the `Net and wanted two connections to the
ARPANET for redundancy! My boss told me this one morning, and I first
said, `Great! Good ide . . . ` I didn't finish it and instead, I said,
O, cr*p! That won't work! (It was a head slap moment!) ;-) And a half
second after that said, `O, not a big deal, we are operating system
guys, we have seen this before. We need to name the node.'

Why wouldn't it work? If Tinker had two connections to the network,
each one would have a different address because they connected to
different IMPs. The host knows it can send on either interface, but the
network doesn't know it can deliver on either one. To the network, it
looks like two different hosts. The network couldn't know those two
interfaces went to the same place. But as I said, the solution is
simple: the address should name the node, not the interface.[2]

Just getting to the node is not enough. We need to get to an
application on the node. So we need to name the applications we want to
talk to as well. Moreover, we don't want the name of the application to
be tied to the computer it is on. We want to be able to move the
application and still use the same name. In 1976, John Shoch put this
into words as: application names indicate what you want to talk to;
network addresses indicate where it is; and routes tell you how to get
there.

The Internet still only has interface addresses. They have tried
various work-arounds to solve not having two-thirds of what is
necessary. But like many kludges, they only kind of work, as long as
there aren't too many hosts that need it. They don't really scale. But
worse, none of them achieve the huge simplification that naming the
node does. These problems are as big a threat to the future of the
Internet as the congestion control and security problems. And before
you ask, no, IPv6 that you have heard so much about does nothing to
solve them. Actually from our work, the problem IPv6 solves is a
non-problem, if you have a well-formed architecture to begin with.

The biggest problem is router table size. Each router has to know where
next to send a packet. For that it uses the address. However for years,
the Internet continued to assign addresses in order. So unlike a letter
where your local post office can look at the State or Country and know
which direction to send it, the Internet addresses didn't have that
property. Hence, routers in the core of the `Net needed to know where
every address went. As the Internet boom took off that table was
growing exponentially and was exceeding 100K routes. (This table has to
be searched on every packet.) Finally in the early 90s, they took steps
to make IP addresses more like postal addresses. However, since they
were interface addresses, they were structured to reflect what
provider's network they were associated with, i.e. the ISP becomes the
State part of the address. If one has two interfaces on different
providers, the problem above is not fixed. Actually, it needs a
provider-independent address, which also has to be in the router table.
Since even modest sized businesses want multiple connections to the
`Net, there are a lot of places with this problem and router table size
keeps getting bigger and bigger, now around 500K and 512K is an upper
bound that we can go beyond, but it impairs adoption of IPv6 to do so.
In the early 90s, there was a proposal[3] to name the node rather
than the interface. But the IETF threw a temper tantrum refused to
consider breaking with tradition. Had they done that it would have
reduced router table size by a factor of between 3 and 4, so router
table size would be closer to 150K. In addition, naming the interface
only makes doing mobility a complex mess.

Emerson: I see - so, every new "fix" to make the Internet work more
quickly and efficiently is only masking the fundamental underlying
problems with the architecture itself. What is the last flaw in TCP
you'd like to touch on before we wrap up?

Day: Well, I wouldn't say `more quickly and efficiently.' We have been
throwing Moore's Law at these problems: processors and memories have
been getting faster and cheaper faster than the Internet problems have
been growing, but that solution is becoming less effective. Actually,
the Internet is becoming more complex and inefficient.

But as to your last question, another flaw with TCP is that it has a
single message type rather than separating control and data. This not
only leads to a more complex protocol but greater overhead. They will
argue that being able to send acknowledgements with the data in return
messages saved a lot of bandwidth. And they are right. It save about
35% bandwidth when using the most prevalent machine on the 'Net in the
1970s, but that behavior hasn't been prevalent for 25 years. Today the
savings are miniscule. Splitting IP from TCP required putting packet
fragmentation in IP, which doesn't work. But if they had merely
separated control and data it would still work. TCP delivers an
undifferentiated stream of bytes which means that applications have to
figure out what is meaningful rather than delivering to a destination
the same amount the sender asked TCP to send. This turns out to be what
most Applications want. Also, TCP sequence numbers (to put the packets
in order) are in units of bytes not messages. Not only does this mean
they "roll-over" quickly, either putting an upper bound on TCP speed or
forcing the use of an extended sequence number option which is more
overhead. This also greatly complicates reassembling messages, since
there is no requirement to re-transmit lost packets starting with the
same sequence number.

Of the 4 protocols we could have chosen in the late 70s, TCP was (and
remains) the worse choice, but they were spending many times more money
than everyone else combined. As you know, he with the most money to
spend wins. And the best part was that it wasn't even their money.

Emerson: Finally, I wondered if you could briefly talk about RINA and
how it could or should fix some of the flaws of TCP you discuss above?
Pragmatically speaking, is it fairly unlikely that we'll adopt
RINA, even though it's a more elegant and more efficient protocol
than TCP/IP?

Day: Basically RINA picks up where we left off in the mid-70s and
extends what we were seeing then but hadn't quite recognized. What RINA
has found is that all layers of the same functions they just are
focused on different ranges of the problem space. So in our model there
is one layer that repeats over different scopes. This by itself solves
many of the existing problem of the current Internet, including those
described here. But in addition, it is more secure as multihoming and
mobility falls out for free. It solves the router table problem because
the repeating structure allows the architecture to scale, etc.

I wish I had a dollar for every time someone has said (in effect),
"gosh, you can't replace the whole Internet." There must be something
in the water these days. They told us that we would never replace the
phone company, but it didn't stop us and we did.

I was at a high-powered meeting a few weeks ago in London that was
concerned about the future direction of architecture. The IETF
[Internet Engineering Task Force] representative was not
optimistic. He said that within 5-10 years, the number of Internet
devices in the London area would exceed the number of devices on the
`Net today, and they had no idea how to do the routing so the routing
tables would converge fast enough.

My message was somewhat more positive. I said, I have good news and bad
news. The bad news is: the Internet has been fundamentally flawed from
the start. The flaws are deep enough that either they can't be fixed or
the socio-political will is not there to fix them. (They are still
convinced that not naming the node when they had the chance was the
right decision.) The good news is: we know the answer and how to build
it, and these routing problems are easily solved.

[1] An IMP was an ARPANET switch or today router. (It stood for
Interface Message Processor, but is one of those acronyms where the
definition is more important than what it stood for.) NCP was the
Network Control Program, that managed the flows between applications
such as Telnet, a terminal device driver protocol; and FTP, a File
Transfer Protocol.

[2] It would be tempting to say "host" here rather than "node," but
one might have more than one node on a host. This is especially true
today with Virtual Machines so popular, each one is a node. Actually,
by the early 80s we had realized that naming the host was irrelevant to
the problem.

[3] Actually, it wasn't a proposal, it was already deployed in the
routers and being widely used.


Author:Lori Emerson

I am an Assistant Professor in the Department of English at the
University of Colorado at Boulder. I'm the author of Reading Writing
Interfaces: From the Digital to the Bookbound and co-editor of the
Johns Hopkins Guide to Digital Media.


#  distributed via <nettime>: no commercial use without permission
#  <nettime>  is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: http://mx.kein.org/mailman/listinfo/nettime-l
#  archive: http://www.nettime.org contact: nettime@kein.org