Pushlets in Java

November 17th, 2005

Back in the olden days, there was this thing called push. Push rode a wave of hype all to familiar to those us who have been around the block, before dropping off the face of the earth.

The protocols to support it still exist, however. They occured to me when thinking about a multicast publish/subscribe system in HTTP. How would one go about implementing such a system, for Java, today?

There are three transport strategies that exist. Two of which are “push” strategies, using “push” protocols.

First, there is “Server Push”. This is the original push mechanism concocted by Netscape way back in the days of Navigator 1.1. The orginal document is still the best reference.

In “Server Push” you do not send a Content-Length. You send a multipart/mime document. This is the format used to send an HTML e-mail messages, with the HTML and the images in one e-mail body. The document has “parts” and the “parts” are divided by boundries.

In “Server Push” the server sends information in parts. When the client sees a boundry, it assembles the part that has just arrived and does something with it.

The transaction ends when either the server or the client closes the socket.

Second, there is HTTP 1.1 chunked transfer encoding. HTTP 1.1 requires a Content-Length.

Some applications may generate and won’t know the size of the document until generation is complete. They’ll have to buffer the document to get it’s size. If the document is very large, this is going to cost memory, and it is going to delay the response to the client.

Chunking allows the server to attach a Content-Length to a chunk of data, and send a chunk, instead of sending the Content-Length for the entire document. A chunk is a chunk of data, with it’s own set of HTTP headers.

Java has transparent support for HTTP chunking. All a Servlet has to do is call flush method of the ServletOutputStream.flush() or ServletHttpResponse.flushBuffer() and a chunk is on the way.

Here’s the Servlet I wrote for a distributed clock.

package com.agtrz.swag.pushlet;

import java.io.IOException;
import java.io.ObjectOutputStream;
import java.net.SocketException;
import java.util.Date;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class Pushlet
extends HttpServlet
{
    private final static long serialVersionUID = 20051117L;

    protected void doGet(HttpServletRequest request,
                         HttpServletResponse response)
    throws ServletException, IOException
    {
        try
        {
            ObjectOutputStream output = new ObjectOutputStream(response.getOutputStream());
            for (;;)
            {
                output.writeObject(new Date());
                output.flush();
                try
                {
                    Thread.sleep(1000);
                }
                catch (InterruptedException e)
                {
                }
            }
        }
        catch (SocketException e)
        {
            if (!e.getMessage().equals("Socket closed"))
            {
                throw e;
            }
        }
    }
}

The client will end the transaction by closing the connection, which will produce an IOException of one sort or another. In the case of Jetty, this will be a SocketException and the message will be “Socket closed”, but others containers will have different messages.

Here I catch the exception to make a point, but in production, it would be better to let it find it’s way to the application log, where you can choose to ignore it using log filters.

The chunk reassembly on the client side is done by the HTTP protocol, so there is no need to look for boundries. There is no application level mime support needed.

The client simply reads its input normally. Assuming that the connection is read from the InputStream returned by HttpURLConnection.getInputStream(), the client will block until the bytes are available. The chunk headers are not available to the application.

package com.agtrz.swag.pushlet;

import java.io.IOException;
import java.io.ObjectInputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class Pullet
implements Runnable
{
    private final URL url;

    private final int count;

    public Pullet(URL url, int count)
    {
        this.url = url;
        this.count = count;
    }

    private void tryRun()
    throws IOException, ClassNotFoundException
    {
        HttpURLConnection connection = (HttpURLConnection) url.openConnection();
        ObjectInputStream objects = new ObjectInputStream(connection.getInputStream());
        for (int i = 0; i < count; i++)
        {
            Object object = objects.readObject();
            System.out.println(object.toString());
        }
        objects.close();
    }

    public void run()
    {
        try
        {
            tryRun();
        }
        catch (Exception e)
        {
            throw new RuntimeException(e);
        }
    }
}

When the client asks for an object, and one is not avaiable, it will block until one become available.

When the client has had enough objects, it closes the connection to the server.

This works out quite nicely in testing. Flushing the buffer on the server side sends an object to the client.

package com.agtrz.swag.pushlet;

import java.net.URL;

import junit.framework.TestCase;

import org.mortbay.http.HttpContext;
import org.mortbay.http.HttpServer;
import org.mortbay.jetty.servlet.ServletHandler;
import org.mortbay.util.InetAddrPort;

public class PushletTestCase
extends TestCase
{
    private HttpServer server;

    private void startJetty(String path, Class servletClass)
    throws Exception
    {
        server = new HttpServer();
        server.addListener(new InetAddrPort(8008));
        HttpContext context = server.addContext("/");
        ServletHandler handler = new ServletHandler();
        handler.addServlet(path, servletClass.getName());
        context.addHandler(handler);
        server.start();
    }

    private void stopJetty()
    throws InterruptedException
    {
        server.stop();
    }

    public void testPushlet()
    throws Exception
    {
        startJetty("/pushlet/", Pushlet.class);

        Thread one = new Thread(new Pullet(new URL("http://localhost:8008/pushlet/"), 3));
        Thread two = new Thread(new Pullet(new URL("http://localhost:8008/pushlet/"), 3));

        one.start();
        two.start();

        one.join();
        two.join();

        stopJetty();
    }
}

Running the above unit test gives me the following.

log4j:WARN No appenders could be found for logger (org.mortbay.util.Container).
log4j:WARN Please initialize the log4j system properly.
Thu Nov 17 13:12:44 EST 2005
Thu Nov 17 13:12:44 EST 2005
Thu Nov 17 13:12:45 EST 2005
Thu Nov 17 13:12:45 EST 2005
Thu Nov 17 13:12:46 EST 2005
Thu Nov 17 13:12:46 EST 2005

The third way to implement push would be to forget about all this push non-sense and trust that keep-alive will save the connetion overhead, and that the Servlet engine will quickly get a Servlet to work on your request.

The client could send a normal HTTP request, get a response that has a list of objects and a version number. Deal with the response. Then send the next request for objects created since the last version number. If at any point their are not objects available, the server can hold onto the connection until the objects are available.

The loop in the client would be the to the loop for “chunked” client, except that the input stream would be drained, and a new one reopened.

The Problem with Push

The problem with the push solutions is that they don’t work well with proxies. Proxies may choose to gather the chunks and resend them as larger chunks. If the chunk is not large enough to send, the proxy will wait until more chunks arrive. Our scheme to push small objects to the client in a timely fashion is thus thwarted.

This is the problem I’m facing because I like to put my servlet engines behind mod_proxy, and mod_proxy likes to send buffers when they are full. It does not seem that the motivation for chunked transfer was open simplex connections, and mod_proxy is probably doing the right thing, or not doing anything that offends the HTTP 1.1 spec.

Thus, the keep-alive strategy, which I’ll code up later, is probably the best way to go.

Product Placement

November 16th, 2005

A paragraph I just wrote made me think that with all the brand names entering our vocabulary, English is starting to look like German.

WebDAV

November 8th, 2005

Installed an running. Upgraded to run the YourKit profiler. Also, I’m moving servers this month, and I want to unify my services before I move, reducing the servers I run to a bare minium.

Thus, I want to work with WebDAV. Instead of running in integration builds on the server, I’d like to run them locally, on my OS X machine or Linux machine. Then I’d like to publish. Rather configuring a networked Ant build, I like to simply copy the results to a directory, and have that directory magically appear on the server.

This could be done by serving directly out of Subversion, for otherwise static things like test results. Perhaps, through a caching proxy, if Subversion is too slow.

The crux of the idea is that all trasfter is done via HTTP, WebDAV, SMTP, IMAP or SCP. Many protocols to chose from, but only five servers to run; Apache 2.0, Jetty, qmail, CourierIMAP, and sshd.

I’ve mounted one of my Subversion repositories as file share already. There is information that I won’t want to version, and I can use standard WebDAV for that.

Trouble in paradise, already. Spotlight won’t be able to track changes to a WebDAV mounted drive, so indexing needs to be run manually. This is very slow, and chews up a bit of bandwidth. Oh, and it doesn’t work. I’m searching now that the index is built, and I’m getting no results what-so-ever.

Oh, well. That’s okay. I still get drag and drop organization of Subversion repositories, fore those that are holding data and documents, not source code, that is very useful. I can get the spotlight search through a checkout.

So, I can use the command line and SVN X to manage documents when editing, and use WebDAV for auto storage, and drag and drop storage. Before using WebDAV from OS X, go and turn off .DS_Store creation.

References used are Apache 2.0 mod_dav. I’ve had Subversion running as DAV forever, so I have no notes on that. There is a Linux WebDAV filesystem that I am going to install on my older laptop. Then I can use that old laptop to run integrations and publish, taking pressure off the server CPU. Some straight-forward instructions on installation of the filesystem.

People, Circles, and Syndicated Feeds

October 30th, 2005

The model for Think New Orleans. Three concepts.

  • A person, like you or me.
  • A circle, of people, based around, a school, a church, a neighborhood, a cafe, bar, or restaraunt.
  • A feed, of information.
  • These are the building blocks of the software. If it doesn’t fit within this model, it doesn’t fit within Think New Orleans.

Hadley’s Sextent

October 30th, 2005

The GPS of yore. The Sextent.

Came across this thinking about maps. I’m linking this as an example of tech writing.

Pal’s Is Open

October 30th, 2005

Pal’s is open.

Google Maps and Think New Orleans

October 30th, 2005

I was going to make an article of this, but it went to quick.

The documetation covers everything. The Google Maps API is simple. It fits quite nicely in a JavaScript environment. Working with Prototype makes this even easier.

Watch the Mid-City article map.

Proud to Swim Home

October 30th, 2005

There is a bumper sticker, New Orleans, Proud to Call Call it Home, and play on that bumper stick, New Orleans, Proud to Crawl Home.

New Orleans, Proud to Swim Home.

Google Base

October 28th, 2005

Ran across this in XML-DEV, and Googled it. Google Base: All your base are, in fact, belong to us, tells the story very well.

This is the right model. It’s very clever.

Google is moving past tags, and into different sorts of containers. The different components of this technology are going to allow people to create ad hoc applications. We saw a lot of this after Katrina, and we’ve saw a lot of it come out of Greasemonkey hacking.

The ability to create a special page that has a few different controls that can interact with each other is enough to give any one person the ability to create a real application.

Smaller Distributables

October 28th, 2005

Set out to create smaller JAR distributables last month. My projects are breaking themselves up into smaller artifacts. Many are independent.

I read through script.aculo.us and Prototype in fifteen minutes. It is amazing that you can get so much out so little code.

The read goes quick, because there is not much code, and because all the script.aculo.us libraries are independent of each other, and dependent on Prototoype.

It is so much easier to understand the code, when you are not following shadowy cyclical dependencies. Divide and conquer.