2007-10-17

Code Review: MySQLDAO.java

(DRY, bottom-up programming)



The Code

Today's review is of a Java class called MySQLDAO.java:
MySQLDAO.java
...
public class MySQLDAO implements DAO {
  public List<Part> fetchPartsBySinkId(long sinkId,
    long begin, long end) {
    StringBuffer query = new StringBuffer("select "
      + "part.name as name, part.id as id from part, "
      + "sink, sink_part where sink.id = ? and "
      + "sink_part.sink_id = sink.id and part.id = "
      + "sink_part.part_id group by part.name, part.id");
    if (end > 0) {
      query.append(" limit ?,?");
    }
    Connection connection = getPooledConnection();
    PreparedStatement ps =
      connection.prepareStatement(query.toString());
    ps.setLong(1, sinkId);
    if (end > 0) {
      ps.setLong(2, begin);
      ps.setLong(3, end - begin);
    }
    ResultSet rs = ps.executeQuery();

    List<Part> parts = new ArrayList<Part>();
    while (rs.next()) {
      Part part = new Part();
      part.setName(rs.getString("name"));
      part.setId(rs.getLong("id"));
      parts.add(part);
    }
    return parts;
  }

  public List<Sink> fetchSinksNeedingPartId(long partId) {
    StringBuffer query = new StringBuffer("select "
     + "sink.id as id, count(part.id) as count from "
     + "sink, sink_part, part where part.id = ? and "
     + "sink_part.part_id = part.id and sink.id = "
     + "sink_part.sink_id group by sink.id");
    Connection connection = getPooledConnection();
    PreparedStatement ps =
      connection.prepareStatement(query.toString());
    ps.setLong(1, partId);

    ResultSet rs = ps.executeQuery();

    List<Sink> sinks = new ArrayList<Sink>();
    while (rs.next()) {
      Sink sink = new Sink();
      sink.setId(rs.getLong("id"));
      sink.setPartCount(rs.getLong("count"));
      sinks.add(sink);
    }
    return sinks;
  }
... 30 more of these fetch* methods.

This class is a side-effect of using some kind of semi-automatic SQL-generating ORM framework like Hibernate or entity beans in a big project. Once the complexity of what they need to fetch from the database outgrows what the framework is capable of auto-fetching, projects tend to end up with something like the above. I believe there are two major reasons: first, one of the nice things about such a framework is that it removes embedded SQL from the main-line code -- and embedded SQL in Java code is hideously ugly, as you can see above. Second, these frameworks are supposed to make your code portable between different databases -- so even if, in practice, we're committed to one particular database, you have to write the code in such a way that switching databases is conceivable (or else face the fact that the framework's complexity is probably not justified!)

The big problem I see with this code is how repetitive it has become: every single public method in here has the same shape: build query, execute query, iterate through the result set converting into collection of java objects, clean up and return the collection. This kind of structural repetition seems to be invisible to many, if not most, Java programmers. But it's just as bad as more obvious forms of repetition. Consider two situations where this repetition suddenly causes big headaches:
  1. You really do have to port this application to Oracle. All of a sudden, it is apparent that the repetitive shape is actually database agnostic, because the strategy is to copy the entire MySQLDAO anS then only modify the actual embedded queries.
  2. Certain critical parts of the application start running out of memory, and you realize that instantiating and returning potentially huge collections isn't scalable. Now it is apparent that the repetitive shape is also query agnostic, because the strategy is to make copies of the methods in which the only thing that changes is the control shape -- the query building and converting code are the same.
Either of these requirements will likely turn this code from a barely-maintainable mess into a completely unmaintainable mess, unless the underlying repetitiveness of the code is addressed.

How Can I Improve This Code?

I'll start by stating my goal: I want to remove the repetition.

My strategy is to decompose the code until the repetition is painfully obvious. In this case, I'll break out the "build query" and "convert into java objects" chunks:

MySQLDAO.java
...
public class MySQLDAO implements DAO {
  private PreparedStatement preparePartsBySinkIdQuery(
    Connection connection, long sinkId, long begin,
    long end) {
    StringBuffer query = new StringBuffer("select "
      + "part.name as name, part.id as id from part, "
      + "sink, sink_part where sink.id = ? and "
      + "sink_part.sink_id = sink.id and part.id = "
      + "sink_part.part_id group by part.name, part.id");
    if (end > 0) {
      query.append(" limit ?,?");
    }
    PreparedStatement ps =
      connection.prepareStatement(query.toString());
    ps.setLong(1, sinkId);
    if (end > 0) {
      ps.setLong(2, begin);
      ps.setLong(3, end - begin);
    }
    return ps;
  }
  private Part convertPartsBySinkIdResult(ResultSet rs) {
    Part part = new Part();
    part.setName(rs.getString("name"));
    part.setId(rs.getLong("id"));
    return part;
  }
  public List<Part> fetchPartsBySinkId(long sinkId,
    long begin, long end) {
    Connection connection = getPooledConnection();
    PreparedStatement ps =
      preparePartsBySinkIdQuery(connection, sinkId, begin,
        end);

    ResultSet rs = ps.executeQuery();

    List<Part> parts = new ArrayList<Part>();
    while (rs.next()) {
      parts.add(convertPartsBySinkIdResult(rs));
    }
    return parts;
  }
...
  public List<Sink> fetchSinksNeedingPartId(long partId) {
    Connection connection = getPooledConnection();
    PreparedStatement ps =
      prepareSinksNeedingPartIdQuery(connection, partId);

    ResultSet rs = ps.executeQuery();

    List<Sink> sinks = new ArrayList<Sink>();
    while (rs.next()) {
      sinks.add(convertSinksNeedingPartIdResult(rs));
    }
    return sinks;
  }
...

Two big observations about this revision. First, the prepare and convert methods form a logical pair, and should probably be wrapped together in a simple object which encapsulates all of the logic for working with this specific query: let's call this 'interface Query'. Second, the fetch*() methods are looking pretty damn... generic. What's the abstract version? Given an object which implements the Query interface, it should return a List of java objects. But with generics, I can do better: by parameterizing the Query interface by the type of object it returns and parameterizing the fetchList() method the same way, I get all of the type safety and none of the repetition.

At the same time, notice that once the specific queries have been factored out, the abstract fetching mechanism is actually independent of the database, so it can be moved out of the MySQL-specific code.

Finally, the MySQLDAO, with all of the generic DAO code factored out, should probably be renamed to MySQLQueryFactory.

Here's the new version:

DAO.java
...
public class DAO {
  private static QueryFactory queryFactory =
    new MySQLQueryFactory();

  public interface Query<T> {
    public PreparedStatement
      prepare(Connection connection);
    public T convert(ResultSet rs);
  }

  private <T> List<T> fetchList(Query<T> q) {
    Connection connection = getPooledConnection();
    PreparedStatement ps = q.prepare(connection);
    ResultSet rs = ps.executeQuery();

    List<T> data = new ArrayList<T>();
    while (rs.next()) {
      data.add(q.convert(rs));
    }
    return data;
  }

  public List<Part> fetchPartsById(long sinkId,
    long begin, long end) {
    return fetchList(queryFactory.partsById(sinkId, begin,
      end));
  }

  public List<Sink> fetchSinksNeedingPart(long partId) {
    return
      fetchList(queryFactory.sinksNeedingPartId(partId));
  }
...

MySQLQueryFactory.java
...
public class MySQLQueryFactory implements QueryFactory {
  public DAO.Query partsById(final long sinkId,
    final long begin, final long end) {
    return new DAO.Query() {
      public PreparedStatement prepare(Connection c) {
        StringBuffer query = new StringBuffer("select "
          + "part.name as name, part.id as id from part, "
          + "sink, sink_part where sink.id = ? and "
          + "sink_part.sink_id = sink.id and part.id = "
          + "sink_part.part_id group by part.name, "
          + "part.id");
        if (end > 0) {
          query.append(" limit ?,?");
        }
        PreparedStatement ps =
          c.prepareStatement(query.toString());
        ps.setLong(1, sinkId);
        if (end > 0) {
          ps.setLong(2, begin);
          ps.setLong(3, end);
        }
        return ps;
      }

      public Part convert(ResultSet rs) {
        Part p = new Part();
        p.setId(rs.getLong("ID"));
        p.setString(rs.getString("NAME"));
        return p;
      }
    };
  }
...

Much better!

Back to my earlier examples:
  1. Port to Oracle. Still a big job, and still probably starts with a big cut & paste hack. But there's less code, and a much higher proportion of that code is actually (potentially) database-specific.
  2. Fetch an iterator instead of a list. Easy:
DAO.java
...
public class DAO {
...
  private <T> Iterator<T> iterateList(final Query<T> q) {
    Connection connection = getPooledConnection();
    PreparedStatement ps = q.prepare(connection);
    final ResultSet rs = ps.executeQuery();

    return new Iterator<T>() {
      private T next;

      public boolean hasNext() {
        if (next == null) {
          if (! rs.next()) {
            return false;
          }
          next = q.convert(rs);
        }
        return true;
      }
      public T next() {
        if (! hasNext()) {
          throw new NoSuchElementException();
        }
        T retval = next;
        next = null;
        return retval;
      }
      public void remove() {
        throw new UnsupportedOperationException();
      }
    };
  }
...
  public Iterator<Part> iteratePartsById(long sinkId) {
    return iterateList(queryFactory.partsByIdQuery(sinkId,
      0, 0));
  }
...

And in case this isn't obvious, iterateList() can be reused on any Query object. This brings me to bottom-up programming: identifying operations which are fundamental to your software, and making them part of the language. I've gone about it backwards in this case, by factoring the "bottom" out of an existing lump of code, but the effect is the same: by recognizing that this project is going to do a lot of database fetches, adding functionality to make this easier and less repetitive to the language itself makes development that much easier.

For what it's worth, I think this code can still be improved, by looking at it through the lens of cohesion. But this is enough for now.