A question that often pops when the validity of ICN research is discussed, is how its goals differ from what is already available in CDN solutions. There are many answers to this question, but I would like to focus on one: in my opinion, ICN should not include data storage. Regardless of their implementation, CDNs attempt to push data to where it will be needed next, therefore they incorporate both distribution and storage capabilities: a content provider makes data available and the CDN intelligently replicates these data in its servers. Current ICN architectures on the other hand only offer distribution capabilities, leaving storage to separate entities. There is an argument for incorporating at least some storage at ICN nodes, for example, storing data made available by applications inside the ICN core, so that it remains available even if applications terminate. But delegating storage to another entity requires trust relationships to avoid Denial of Service attacks; do we want to embed these in the network? Even further, do we want to embed CDN-like replication strategies in the network in order to make CDNs obsolete? I would answer no to both: ICN should be a better substrate for CDNs than IP (and the DNS) but it should not attempt to become a CDN. Naming data and decoupling names from locations are powerful tools that can be used in many ways to facilitate diverse CDNs – and not only CDNs.
A common starting point among ICN efforts (and clean-slate designs in general) is the mismatch between the TCP/IP core design goals and current Internet usage. Despite this common starting point however, there have been quite a few different design proposals for ICN (e.g. DONA, CCN and CONET to name a few), with similarities and dissimilarities among them when it comes to functional organization, the mechanisms adopted and the service models provided.
The upcoming special issue of the IEEE Communications Magazine on Information-centric networks (expected for July 2012) will feature an article that describes the PURSUIT architecture at some level of detail, filling a gap in the ICN literature and clearing the picture of what PURSUIT does for ICN. The architecture described in the article is a product of collective work that has taken place in both PURSUIT and PSIRP, PURSUIT’s ancestor project.
In the article we provide a thorough description of how several things are realized in PURSUIT, such as locating an item (commonly referred to as name-based routing), managing the network topology through an explicit topology management system and forwarding data with new, novel forwarding techniques such as LIPSIN. Apart from the technical description, we also point out the clear separation of the network core functions, that is, Rendezvous, Topology and Forwarding, an approach that has been a main driver in PURSUIT thinking.
Based on this architectural background, the paper proceeds to show how PURSUIT can accommodate dedicated content replication operators (similar to today’s CDNs), acting as mediators between information providers who can offload content distribution and access networks who minimize inter-domain traffic and improve user experience in turn. With respect to caching, PURSUIT supports two options: (i) on-path caching that can take place at any nodes throughout transport paths according to local policies, (ii) οff-path caching that allows access networks to orchestrate local caching of popular information. Both forms of caching allow for avoiding needless connectivity costs by network providers and improving quality of experience for users.
Finally, PURSUIT provides native mobility support, as it decouples information resolution from data transfer in both time and space. Regarding time, information providers and consumers do not need to be simultaneously connected to the network. Regarding space, information consumers can be served by any source providing the desired information, e.g., depending on location, even for different chunks of the same item. Since caching is also an integral part of PURSUIT, it is far easier to serve mobile devices from current or recent attachment points than it is with today’s IP architecture.
A pre-print version of the article is available online.
Some time ago, when the original PSIRP prototype came out, we convinced an undergraduate student to get acquainted with the (then current) BlackHawk prototype and write a socket emulator on top of it, so that we could demonstrate backward compatibility of pub/sub networking with the Internet. The plan was to write a simple socket library that would translate at least the datagram socket calls to pub/sub calls so that we could run TFTP on top (conveniently, the student had written a short TFTP program for a networking class). What could be simpler than translating sendto() to publish() and recvfrom() to subscribe(), making each packet one publication? Or, maybe, it should be the other way around. Anyway, it seemed pretty straightforward, as long as we could understand how the prototype worked and could keep it stable for the duration of a small TFTP transfer.
Having sorted out this part, we moved to the issue of mapping the socket addresses to content identifiers. In PURSUIT (and PSIRP) we assume that each publication would be identified by a scope identifier (SId) and a rendezvous identifier (RId), with the SId indicating a set of related information (with a common access control policy) and the RId indicating a specific publication in this information set. The problem is that originally we also assumed that each such pair would uniquely identify a piece of content, essentially making publications immutable (i.e. the content identified by SId/RId pair would always be the same). This is very handy for caching (no need to check if the content is valid, as it never changes), but a lot of trouble when you need to emulate sockets!
If you think about it, a socket is an endpoint of an information stream, that is, it identifies a sequence of data items sent to (or transmitted from) that endpoint. In a way, sockets by definition correspond to mutable data. What the emulator needs to do then, is to map a socket address indicating mutable data, to SId/RId pairs indicating immutable data. How on earth can you do that? Short answer: no way! Long answer: read below (but not nicely).
Our solution was to exploit the versioning scheme implemented by the BlackHawk prototype: a publication could be modified, leading to a new version of it becoming available. We therefore translated sendto() calls towards a remote socket to a set of versions of a single publication, using the remote socket address to produce the SId/RId pair (by hashing). We even found a way to do this for stream sockets, where the endpoint changes on the server side after a new connection is established: we hashed the socket addresses of both endpoints to indicate the connected socket. But this solution is not good: it cannot work with multiple clients talking to a single server at a well known socket endpoint, as synchronizing version numbers between them would be too expensive.
To maintain the same basic model, i.e. that the SId/RId pair can be computed by only knowing the remote address, we tried different ways to avoid collisions on this well known SId/RId pair: we could also add the current time into the hash to avoid using the same SId/RId too long (this requires loose synchronization, but NTP can handle this) and within each such period produce multiple RIds, with the sender choosing one randomly. This reduced the chance of collisions but did not eliminate them.
In the end we gave up and wrote a paper arguing that even in an Information-Centric Network (ICN) one may also need the ability to subscribe/publish to channels, i.e. streams of data that change over time. An example of that would be switching on your TV to see what’s on, as opposed to asking for a specific program: in the former case, you subscribe to a channel where the SId/Rid pair does not denote specific content (it depends on the time of day), in the latter you subscribe to a document which is uniquely denoted by the SId/RId pair. The paper appears in the ICN workshop at SIGCOMM 2011, and the current PURSUIT prototype (BlackAdder) supports both channels and documents. It is an open question to what extent channels and documents should be differentiated in an ICN.