Oct 102011

Some time ago, when the original PSIRP prototype came out, we convinced an undergraduate student to get acquainted with the (then current) BlackHawk prototype and write a socket emulator on top of it, so that we could demonstrate backward compatibility of pub/sub networking with the Internet. The plan was to write a simple socket library that would translate at least the datagram socket calls to pub/sub calls so that we could run TFTP on top (conveniently, the student had written a short TFTP program for a networking class). What could be simpler than translating sendto() to publish() and recvfrom() to subscribe(), making each packet one publication? Or, maybe, it should be the other way around. Anyway, it seemed pretty straightforward, as long as we could understand how the prototype worked and could keep it stable for the duration of a small TFTP transfer.

Having sorted out this part, we moved to the issue of mapping the socket addresses to content identifiers. In PURSUIT (and PSIRP) we assume that each publication would be identified by a scope identifier (SId) and a rendezvous identifier (RId), with the SId indicating a set of related information (with a common access control policy) and the RId indicating a specific publication in this information set. The problem is that originally we also assumed that each such pair would uniquely identify a piece of content, essentially making publications immutable (i.e. the content identified by SId/RId pair would always be the same). This is very handy for caching (no need to check if the content is valid, as it never changes), but a lot of trouble when you need to emulate sockets!

If you think about it, a socket is an endpoint of an information stream, that is, it identifies a sequence of data items sent to (or transmitted from) that endpoint. In a way, sockets by definition correspond to mutable data. What the emulator needs to do then, is to map a socket address indicating mutable data, to SId/RId pairs indicating immutable data. How on earth can you do that? Short answer: no way! Long answer: read below (but not nicely).

Our solution was to exploit the versioning scheme implemented by the BlackHawk prototype: a publication could be modified, leading to a new version of it becoming available. We therefore translated sendto() calls towards a remote socket to a set of versions of a single publication, using the remote socket address to produce the SId/RId pair (by hashing). We even found a way to do this for stream sockets, where the endpoint changes on the server side after a new connection is established: we hashed the socket addresses of both endpoints to indicate the connected socket. But this solution is not good: it cannot work with multiple clients talking to a single server at a well known socket endpoint, as synchronizing version numbers between them would be too expensive.

To maintain the same basic model, i.e. that the SId/RId pair can be computed by only knowing the remote address, we tried different ways to avoid collisions on this well known SId/RId pair: we could also add the current time into the hash to avoid using the same SId/RId too long (this requires loose synchronization, but NTP can handle this) and within each such period produce multiple RIds, with the sender choosing one randomly. This reduced the chance of collisions but did not eliminate them.

In the end we gave up and wrote a paper arguing that even in an Information-Centric Network (ICN) one may also need the ability to subscribe/publish to channels, i.e. streams of data that change over time. An example of that would be switching on your TV to see what’s on, as opposed to asking for a specific program: in the former case, you subscribe to a channel where the SId/Rid pair does not denote specific content (it depends on the time of day), in the latter you subscribe to a document which is uniquely denoted by the SId/RId pair. The paper appears in the ICN workshop at SIGCOMM 2011, and the current PURSUIT prototype (BlackAdder) supports both channels and documents. It is an open question to what extent channels and documents should be differentiated in an ICN.

Sorry, the comment form is closed at this time.