Developers Corner: March 2014

Sunday 30 March 2014

Servlet API - Part 1

As you may have known, Java have strong support for Network programming. You can deal with lower level protocols like TCP/UDP or higher level protocol like HTTP. In this series of articles, I will only cover Java API for HTTP protocol.

Background

TCP/IP and HTTP

As usual, I will start with my favourite part first. HTTP is a protocol on top of TCP. It means that it is perfectly possible to use Java API for TCP (socket programming) to implement HTTP server. However, no one do that unless for study purpose or have too much time to burn. Using higher level API definitely makes coding simpler and much faster.

Still, if you need to deal with some other protocols without built-in support like FTP, you will need to use low level API to implement your application.

From Java 1.2, Java is split into 3 versions, J2SE, J2ME and J2EE, each version serve a special purpose. The J2SE is the core of Java, it can be used to develop desktop application. J2ME is the special library to develop mobile application. J2ME is already extinct by now. J2EE, our main focus today is the library to build web application.

J2EE

When you choose to download Java to your system, the package will always have JRE runtime included. It is J2SE.

In contrast, J2EE is not part of Java is the beginning. After seeing developers struggle to build their own tools for their application to meet industry requirement, Sun combine the most popular ideas and APIs in the market to create J2EE. It includes these technologies:

Java Server Pages
Enterprise Java Bean
JDBC
JMS
JNDI
Java Transaction API
Java Mail

J2EE is a set of API and standards rather than concrete implementation. Most of its contents are interface rather than class. If a server was built with the implementation of all J2EE APIs, it is a compliant J2EE application server. Up to today, the border between J2EE and J2SE is not so clear any more. For example, the default J2SE package already include JDBC and JNDI interfaces. You can see them in the rt.jar file (run time library).

For other APIs of J2EE, you need to include them in the class path to use.

Session

HTTP is a stateless protocol. It means the server generally do not remember who you are and what have you done. However, it is critical to have this feature if you implement authorization or due to business requirements. To achieve that, normally container will include a session cookie to the first response. That help container to identify user and create server side session. The session cookie for Java normally have the name JSESSIONID. To avoid space issue, the container will delete the server-side session if there is no request up to a certain amount of time. If this happen, the session cookie is not recognized any more and server will assign a new session cookie and session object.

Servlet API

HttpServlet is not part of core Java. Hence, to do servlet programming, you need to include Servlet API to the project classpath. The most common way is to include the server runtime to your project. Any Java server should have ServletAPI implementation and API. If you do not want your project classpath to have any server runtime, you can manually include ServletAPI to classpath.

As mentioned above, J2EE was born with the goal of providing common interface for various vendor implementation. That why ServletAPI has nothing but a few interfaces, XML schema and some specific requirements. Servlet API started from version 2.2, gradually upgraded to 2.3, 2.4. 2.5 and totally revamped in version 3.0.

Servlet is a very primitive API, that why it is not so convenient to use. Rarely you see anyone using Servlet to render webpage unless the application is super simple. Any developers working with JavaEE should be familiar with framework build on top of ServletAPI like SpringMVC, Strut or JSF.

After getting tired with using OutputStream to render html content, JSP was introduced as Java version of Php script. Jsp makes creating html content is much simple to write. However, as Java is not dynamic language, Jsp file is converted to Servlet before serving first customer. Slowly, as Java world slowly adopt Ajax and RestAPI, contents is often delivered with Json format and server-side rendered HTML is used less often.

Servlet API 2.5

You can Google and download servlet-api-2.5.jar to take a look at the content of the API. The jar file include 3 packages javax.servlet, javax.servlet.http, javax.servlet.resources. In the scope of single article, I will cover major interfaces that developer usually used to develop web application.

Filter and Servlet

The two most important interfaces to handle HTTP request are Servlet and Filter.

As the HTTP request come to web container from internet, the container generate a ServletRequest or HttpRequest object that contain the information of the request. Later, it use the returned object of type ServletResponse or HttpServletResponse to render HTTP response. In the above example, the container suppose to generate HttpServletRequest because we use HttpServlet as handler.

The main motivation of splitting Filter and Servlet as to have Servlet focusing on business logic and Filter to handle general concerns like logging, or security. HttpServlet include support for all HTTP method GET, POST, HEAD, PUT and DELETE. However, most people only implement GET and POST requests. This is surprise though because HTML 4.0 and XHTML 1.0 only support GET and POST (which mean you can not send other kinds of requests in old browsers like IE7).

Container use single instance to serve all request to the same URL, that why you need to ensure ThreadSafe when implementing Servlet. The Servlet API give you HttpServletRequest and HttpSession, both are method parameters and thread safe. If you choose to have other field variables, it is a must that these variables are thread safe as well.

Container make use of HTTP thread pools to serve request. If the container running out of threads, it will hold the requests waiting for the first available thread from the Thread pool. For example, the maximum amount of HTTP thread for Tomcat is 200 and we know that Tomcat can serve up to 200 concurrent requests.

As of Servlet API 2.5, the HTTP thread pool is fully occupied until the servlet and filter complete processing. Even if the thread sleeps, waiting for some resources, it is still not available for other request. Hence, if you let the Http Thread hang up, you effectively reduce the throughput of system.

Deployment Descriptor

Deployment Descriptor is the fancy name for web.xml. Any Java web application must always have this file WEB-INF/web.xml. The container look for this file to know how load webapp. There are two other optional folders, WEB-INF/lib and WEB-INF/classes. Any jar files drop to WEB-INF/lib will be included to webapp classpath. Project source code and resources will be compiled and drop inside WEB-INF/classes folder. Hence, if you are worry about deployment process, here is the place to check.

Here is an empty deployment descriptor

<web-app xmlns="http://java.sun.com/xml/ns/javaee"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://java.sun.com/xml/ns/javaee 
       http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
       version="2.5">
<web-app>
  <display-name>Servlet 2.5 Web Application</display-name>
</web-app>

Each Servlet API have different templates for deployment descriptor and it is important to put this right. The package javax.servlet.resources contains all the schema definitions for Servlet API. You can use these schemas to validate the content of Deployment Descriptor file. In real life, developer rarely need to deal with these steps as popular IDE includes schema definitions for all versions of Servlet API. If you use Eclipse, can check this at Window/Preferences/XML/Xml Catalog

Normally, a standard deployment descriptor will contain important information like declarations of filter, servlet, welcome files, context listener, error pages, security constraints, context param... It is essential that you know about all of these concepts very well as they are the fundamentals for building java web application.

Frameworks

Servlet API is pretty simple, it is easy to understand but not so convenient to build application on top of it. For example, I created this code before for a very simple purpose:

String path = (request.getPathInfo()==null) ? "" : request.getPathInfo().replaceFirst("/", "");

This is just to handle the url like:

localhost:8080/twits/

and

localhost:8080/twits

In the above example, request.getPathInfo return "/" for the first URL and return null object for second url but for my application, both URLs should be identical. So, this line of code is simply boilerplate code that I need to add in just to cope with limit of API. Another well-known limit is the ability to read request parameters and request body.

Because of this, it is very soon that developers start building framework on top of Servlet API to cope with this limit. All the MVC framework deviate from Servelt API by letting a single servlet serve all requests for all URL. This servlet will invoke some services to serve the request.

By the time I graduated, the most popular framework in the market is Struts. It create mapper concept. This mapper will automatically map request paramter to Form object, which is a Java bean. In the service's methods, the form was included as parameter to use. This is a pretty cool idea.

The Spring MVC even simplify things further by letting you choose any method parameter name and type. It will try to match the method parameter with any request parameter and automatically assign value. If the request parameter is different, you can override the mapping with annotation.

Thursday 27 March 2014

Using Git

When it comes to Software Version/Configuration Management, there might be a whole lot of vendor or open source implementations to choose from but in recent years, there’s none that could parallel Git in terms of being the most development/hacking friendly.

I’ve used quite a few different forms of software management tools, from the CVS/SVN family to the Clearcase/Perforce family (which I personally feel is absolutely horrible) but it is with Git that I finally think that Software Versioning is no longer a necessary evil but something that actually helps in the software development process. Perhaps, I will need to corroborate my statement with some examples later but using Git can actually encourage developers to experiment and be creative in their code, knowing that they can always reset back any code changes without any penalty or overheads.

I would not want to spend too much time talking about Git’s history. (if you are interested, you can always readwikipedia. If you are like myself, who was previously using CVS/SVN or Clearcase/Perforce to manage your software, I hope this article would improve your understanding on how Git works and how it could increase your productivity as well.

How a distributed version control system works

As the name suggests, a DVCS does not require a centralised server to be present for you to use it. It’s perfectly fine to use Git as a way to store a history of your changes in your local system and it’s able to do so efficiently and conveniently. However, as with all Software management tools, one of the main benefits is to be able to collaborate effectively in a team and manage changes made to a software repository. So how does Git (or any DVCS) allow you to work in a standalone manner and yet allow you to collaborate on the same codebase?

The answer is that Git stores a copy of the whole software repository in each local machine that contains the codebase. This might seemed like a very inefficient and space-consuming method but as a matter of fact, this wouldn’t be a big issue if your files are mostly text (as most source code are) and these files are usually stored as a blob (and highly compressed). So when you use Git, you are actually working within your local environment and this means that besides a few commands that do require network communications, most commands are actually pretty responsive. When you “commit” code into a Git repository, you are not actually in a “collaborative” mode yet as your codebase is actually stored in your local system. This is a concept that is somewhat different from other VCS systems where a “commit” actually puts your latest changes into a common repository where everyone can sync or access it. The concept can be simply illustrated with the following image I pilfered from Atlassian’s site here :

So that being said, how do I actually share my code with the rest of my team ? The wonderful thing about Git is that it allows you to define your workflow. Do you want to synchronize your code directly with your peers? Or would you prefer a traditional “centralized” model where everyone will update the “centralized” server with their code. The most common way is the latter where each developer or local workstation will synchronized their codebases with this centralized repository. This centralized repo is also the authoritative copy of the repository which all workstations should clone from and update to.

So, for the centralized model, everyone will perform a “git push” to this central repo which every workstation will nominate as its “origin” for this software repo. Before a “git push” succeeds, git will need to ensure that no one else has actually modified the base copy that you have retrieved from this centralised server. Otherwise, Git will require that you perform a “git pull” to merge the changes performed by others into your local server (which might sometimes result in a conflicted state if someone changes the same files you have). Only after this is done will you be allowed to push the new merged commit to the server.

If you have yet to make changes to your local copy and there’s a new commit in the “centralized” server since you last synchronization, all you need to do is perform a “git pull” and git does something called a “fast-forward” which essentially is to bring your local copy to the latest code from the centralized server. If all this sounds rather convoluted, it is actually simpler than it sounds. I would recommend Scott Chacon’s Pro-Git book which explains clearly Git’s workings (here’s a link to his blog)

So what if it’s DVCS?

When you use Git in your software development, you will start to realize that making experimental code changes is not as painful as it used to be with other tools. And the main reason for that would be the ability to do a “git stash” whenever you want to .. or a “git branch”, which as the name implies, creates a new branch off your current working code. Branching in Git is an extremely cheap operation and it allows you to define an experimental branch almost instantaneously without having to explain yourself to all your team mates who are working on the same code base. This is because you choose what you want to push to the “centralized” Git repo. Also, whenever you need to checkout a version of the software from history or remote, you can “git stash” your work into a temporary store in the Git repository and retrieve this stash later when you are done with whatever you need to do with that version.

Ever tried creating a branch in Clearcase or Perforce? I shudder even at the thought of doing it. SVN does it better but it is still a rather slow operation which requires plenty of network transfers. Once you have done branching using Git, you will never want to go back to your old VCS tool.

Wednesday 26 March 2014

Scrum

Originally, I asked a Scrum Master to help me write something about Scrum. However, he said it may not worth it because there are already too many documents, books out there and whatever he say will not be authentic. Thinking about what he said, may be it will be better if I assume that reader can do their own research on Google to find out more about Scrum and I only do my part of sharing personal experience with it.

Definition of Scrum

Let start with definition first. This is what Wiki say about Scrum:

"Scrum is an iterative and incremental Agile software development framework for managing software projects and product or application development."

From my understanding, Agile is a software ware development method that aim to develop software incrementally. The progress will be reviewed regularly and new features will be created based on current situation. Scrum is one well-known Agile methodology that define the practice to do Agile development effectively. The idea behind Agile or Scrum is embracing changes rather than locking requirements.

Why we need to embrace change

We need to embrace change because it is the reality that we cannot avoid. To understand Agile, we should go back to earlier day of software development. There are something special about these earlier days. It is a big mess.

Early day of software development

Developers of that time did not think the same way as what we think now. Professionalism means writing lost of documents, defining the requirement as precise as possible and spending more time to prepare for implementation before actual implementation. This is what we called Water Fall model.

Water Fall is the norm of the industry in the past and it is still popular nowadays. When I took my degree last decade, Water Fall is the only methodology taught in Software Engineering course. Even after I graduated, job titles in the market reflect Water Fall model with Project Manager, Analyst, Designer and Programmer roles.

Water Fall bases on the assumption that if we spend a lot of time thinking, preparing, analysing, we can do it right when we need to do it. Theoretically, this idea is quite cool. Practically, it is still popular in Asia, especially in the conservative industry like banking and finance.

Another point is the important role of project manager. In the traditional development environment, project manager is the most important person. They care about project progress more than anyone else and play the role of negotiating with customer and pushing developers to deliver work. If there is anyone else that need to
interact with customer, he/she must be analyst or designer and at most, architect.

The limit and the solution

As most of us know, good ideas does not necessarily be applied successfully in real life situation. What had happened is the high rate of failure in software industry. There are many kind of failures like failing to deliver, budget overshoot or less serious, delivering a software that no one interested.

However, for anyone that stay in the industry for a decade, it should not be a big surprise. Here are some common issues why it is so hard to develop software the right way.

1. Pushing does not work well in Software Industry.

Unlike factory, software development requires developer to spend effort and good will to deliver high quality product. There is no machine that can write code yet. The only source is still human brain and it does not work very well if it reside in the head of an unhappy person.

But it is unfair if we blame it all to the Project manager. Deadline and schedule are decided by people who does not involve very closely to development and sometimes, by business requirement. You must be very lucky if got chance to work in the environment that do not OT (over time).

2. Customer do not know what they want

Steve Job knew it and we know it too:

“people don't know what they want until you show it to them.”

Locking requirement is good. It help to put the blame to someone else, not us, for producing crappy software that no one want to use.

Still, I feel pity for our customers. It is incredibly difficult to know what they really need before they start using it. But unfortunately, no one let them do that and they continue to produce all kinds of nonsense and unrealistic requirement. To be fair, analyst help them to create the mess too.

3. How people react to failure.

Ancient wisdom say if you do not do it right, you may not prepare well enough. So, what a smart person will do? He spend more time preparing or if we say another way, less time doing. Unfortunately, by doing that, problem is getting more and more severe.

So, what is the real problem here and how should we solve it?

I feel the biggest problem we face here it the challenge of predicting what we need to build. I recall my

personal experience of playing paint ball not too long ago. To summarize, it is a disaster. To elaborate further, I feel it is hard to hide and shoot a person at the same time. After enjoying hitting tons of bullet on my, the pro told me that I shoot the wrong way. They say do not shoot single bullet or waste all your bullets to look like Rambo.

The key point is to shoot 2 bullets at one shot. Why?

Because the bullet fly fast and you need 2 bullets to have enough time to see where the bullets landed. Base on that information, you adjust gun direction and aim a more precise shoot.

That is the point, you shoot first, got initial feedback and improve from there. This is how things should be done in software development. You build something first, see how it look and slowly improve on it.

Creating wonderful application is not easier than learning how to ride bicycle. The best approach is to sit on it, ride it, fall down, stand up and sit on it again. For the sake of projects and for the people, do not lock project requirements, constantly review what you have build and add new requirement is the formula to success.

How to apply Scrum

There are 3 roles in Scrum: Product Owner, who represent to customer voice; development team and Scrum Master. In practice, the Product Owner is played by traditional Project Manager. The Scrum Master role virtually does not exist. Most of the time, developer or manager will play the role of Scrum Master. Here are the activities that Scrum team need to apply:

Iteration/Sprint

To constantly review progress and define new work, Scrum team divide project schedule into iterations. Commonly, the iteration length is two weeks. It is the sweet spot because if it is less than 2 weeks, the overhead of management is too high, if longer, then the team cannot adapt to change fast enough.

Iteration Planning

Beginning of each iteration, the team spend time to do iteration planning. Iteration planning normally include tasking, estimation of stories that being scheduled to the current iteration.

This is the fun part where we play Poker style estimation game. Each developer will have a stack of cards, each card include the estimated effort for a story. Due to someone's brilliant idea, the number in the card is not random, they follow Fibonacci number. For each story, developer show cards at the same time to voice out their idea of how much effort is needed for the story. After debating and voting, the team commit to the story with the estimated effort.

Responsible team even plan work for iteration ahead. It cannot be too far and too precise but it help us to have a quick view of what is going on. They called it T+1 or T+2 planning (means 1 or 2 iterations ahead).

Daily stand-up meeting

Scrum emphasize on information sharing and collaborating. It requires the team to meet up regularly to share difficulties, solutions and share progress. Effectively, over this super short daily meeting, each team member take turn to elaborate on few things below:

What did I accomplish yesterday?
What will I do today?
What obstacles are impeding my progress?

Daily stand-up makes sense when the team size is small enough and people know what other people are doing. It is better to time-box it (normally 15 mins).

Retrospective

Retrospective meeting normally happen at the end of iteration. It is less important, hence, many teams choose to skip it or do it less often. Still, retrospective play a great role of generating insight and discussing how to improve performance. There are many ways to do retrospective but you need to stay to its objective, that is to discuss what had happened in last iteration and suggest solutions to overcome issues.

Backlog

Backlog is for the Product Owner to express what he want to achieve. Basically, it is a collection of user stories. Developers help to estimate each story, note down the dependencies among stories. After that, it is Product Owner role to plan in stories for next iteration from backlog. It should fit team capacity nicely.

The benefit of Scrum

Normally, the life in Scrum team is pretty balanced, no rush, no last minute surprise and no cutting corner to deliver work. It is true that due to business requirement, product need to be launched on time but constantly review progress give product owner early information and more space to manoeuvre when facing blocker.

It is also helps to build better product due to quick feedback and helps to create a more realistic schedule.

I am the guy who believe more in human more than process but if the process is wrong, life is more painful than it should be. So, for the sake of developer world, spread out the ideas so that we suffer less and the quality of software can be improved.

Saturday 22 March 2014

Java Tutorial 3 - Simple Servlet and Remote Repository

On the third day of your office life, you will create a simple servlet and remote repository. SVN and GIT are not equally popular in developer world but we will use GIT in this tutorial as it is distributed version control system.

Prerequisite

Completed Java Tutorial Part 2.

Setup Git Repository

1. Create GitHub account

GitHub provide public and free Git repository. If you are a developer and do not have GitHub account yet, go to GitHub website and create one.

I am going to put all of the related stuffed of this blog to

https://github.com/tuanngda/sgdev-blog.git

If you are using Linux or Mac, you may already have Linux as part of OS installation, if you use Window, follow instruction in the website and install git bash.

2. Create Git Repo for project

Git repository require your repository folder to have identical name with the repository name. As the repository I created before named sgdev-blog, I rename my workspace to identical name so that I can sync workspace to GitHub.

I did that because I want to share all project in a single repository. If you choose to create one repository for each of your project, it is not necessary. However, if you choose to do the same way, then kindly rename workspace to fit your repository name.

Use the console in Linux/Mac or Git Bash in Window, go to java/{workspaceName}

Type following command:

$ git init
--> This command is to create a local repository in the current folder. It will create a .git folder that contain this local repository. As our workspace is going to contain several projects, creating repository in workspace folder let us share single repository for several projects, which is good.

$ git status
--> This command show current status of projects, which files/folders have not been checked in.

$ git add sample_webapp
--> This command add everything under current folder to git repository. This is actually very bad practice to add project setting files to repository as it is environment specific, we will not do it any more after learning Maven.

$ git commit -m 'First Commit'
--> Git add command added the files to staged status, which indicate these file are supposed to be commit. Git commit command commit all the staged file with a message.

$ git remote add origin https://github.com/tuanngda/sgdev-blog.git
$ git push origin master
--> This is the most tricky part of distributed repository. You have a local repository, which store inside .git folder. You also have a remote repository, stored in GitHub. You commit to your local repository and sync your local repository to remote repository. This sound troublesome but it helps you to continue your work when GitHub is down. Your local repository is identical to remote repository, just find another place to use as remote repository.

You suppose to see your webapp in GitHub repository after completing these steps.

Create Simple Servlet

At the end of the earlier tutorial, we have created and deployed one webapp to Tomcat. Sadly, this webapp has only one html page, which does not justify why we need Java. Now we will need to make it smarter by adding a simple servlet into it.

1. Goals

As a boring developer, I want to create a local, minimal Twitter like application so that user can post and view their current twits.

Look at above, it is an user story. This is a very popular practice in the IT world today, when missions were created under the form of user story. The format of user story is

As [role], I want [goals] so that [benefit]

It is very concise but it helps to identify 3 concerns:

Who will benefit from the task
What user think that need to be develop
What benefit that user want to have

In the old days, developer did what they are told, which means they only care about goal but not benefit. As things go on, people realize that it is not so effective as sometimes user do not know the best strategies to achieve what they really want. That why, modern development requires developers to actively involve and consult user on whether they are building the right thing.

2. Acceptance Criteria

Together with goals, in the planning meeting, developers will provide estimation plus finalize acceptance criteria. Acceptance Criteria is what will be tested to decide whether you have completed the story.

Assumption:

No Authentication required.
Each user has and only has one twit.
Each user will need to upload twit with one unique user id.
No deletion of twit are allowed. Update is achieve by posting new twit with the same user id.
The application can only be used in local, no clustering support.

Acceptance Criteria:

Have a RestAPI interface for any user view all user twits, specific user twits, upload twits.
Show status 404 if user id or twit is not found.

Tasks:

Build a simple bean that contains all the Twits.
Create a Servlet that serve bean content.

Estimation: 0.5 man day.

3. Create your service.

Create first package as com.blogspot.sgdev_blog. This is convention that base package is reversed naming of internet domain.

Create an interface for the service that we need to implements:

package com.blogspot.sgdev_blog.service;

import java.util.Collection;

import com.blogspot.sgdev_blog.exception.RecordNotFoundException;

public interface TwitService {
 
 public static int MAX_LENGTH = 160;
 
 public String getTwit(String userId) throws RecordNotFoundException;
 
 public void insertTwit(String userId, String text);
 
 public Collection getAllTwits();

}

People still argue that whether do we need interface for each concrete class if the class happen to be the only one that implements the interface. The supporters say interface serve as contract, it allows you to focus on defining the usage of a service first then implement it later. Declaring a service by its interface rather than concrete class make changes easier in the future. While it takes time for you to build your own style, I will build interface for every class.

I will implements it this way

package com.blogspot.sgdev_blog.service.impl;

import java.util.Collection;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

import com.blogspot.sgdev_blog.exception.RecordNotFoundException;
import com.blogspot.sgdev_blog.service.TwitService;

public class TwitServiceImpl implements TwitService {
 
 Map twitMap = new ConcurrentHashMap();

 @Override
 public String getTwit(String userId) throws RecordNotFoundException {
  
  String twit = twitMap.get(userId); 
  
  if (twit == null){
   throw new RecordNotFoundException("No twit found for user "+userId);
  }
  
  return twit;
 }

 @Override
 public void insertTwit(String userId, String text) {
  
  if (userId==null || userId.length()==0 
                   || text==null || text.length()==0 || text.length()>MAX_LENGTH){
   throw new RuntimeException("Invalid twit!");
  }
  
  twitMap.put(userId, text);
 }

 @Override
 public Collection getAllTwits() {
  
  return twitMap.values();
 }
}

There is nothing fancy in the code above, no clustering support, no authorization and persistence, but it help to achieve the goal of the story.

4. Create a servlet

If you have not known about Rest yet, study it. We are going to create a true Restful API here because Twit does not have ID but we will borrow the ideas of GET to retrieve record and POST to upload/update record.

http://www.restapitutorial.com/lessons/whatisrest.html

Restful API can be seen as one effort that representing a major shift in the software design paradigm, convention over configuration. In early day of Java programming, developing software properly means writing as much document as possible, putting as much comments as you can and creating API as specific as possible. After a while, people start to realise that developers spending more time writing documents rather than writing code and creating all kinds of contract everywhere is too tedious.

Finally, everyone feel that that software should be developed with smarter, less painful way. Over the last decade, most of the frameworks were shifted with default configuration, which let developer modify configuration rather than create configuration. Usage of Json is more ambiguous but shorter and easier to extends than XML. More over, developers do not need to create the bulky remote and local interface. With Restful API, you do not need to handover a document to tell people how to use your API. You simply tell them that you will develop RestAPI and hope that they are smart enough to figure out how your API may look like.

Professional developer know their IDE well, for example, this is how you create Servlet:

Depends on your active Perspective, Eclipse give you different values when you click New button. That why I switch over between JavaEE and Java perspective when I change focus from writing service to writing front end.

After choosing package to put your Servlet, Servlet class name, we can continue to choose URL mapping

Continue until the end, Eclipse will give you an empty TwitServlet inside the package you choose and add an entry to your web.xml

TwitServlet

/**
 * Servlet implementation class TwitServlet
 */
public class TwitServlet extends HttpServlet {
 private static final long serialVersionUID = 1L;
       
    public TwitServlet() {
        super();
        // TODO Auto-generated constructor stub
    }

 protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
  // TODO Auto-generated method stub
 }

 protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
  // TODO Auto-generated method stub
 }
}

web.xml

<servlet>
<description></description>
<display-name>TwitServlet</display-name>
<servlet-name>TwitServlet</servlet-name>
<servlet-class>com.blogspot.sgdev_blog.TwitServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>TwitServlet</servlet-name>
<url-pattern>/twits</url-pattern>
</servlet-mapping>

Simple, but good enough. We have a servlet, which is defined by servlet-class. We have the url-pattern that will be served by this servlet. Even better, our stupid servlet know how to handle GET and POST requests.

5. Integrate this servlet with your service

Now, we already have a servlet and a service, and they are totally unrelated. Let integrate it.

In earlier step, Eclipse give us a specific mapping, which does not fit our need. We will change it to extension mapping

<servlet-mapping>
<servlet-name>TwitServlet</servlet-name>
<url-pattern>/twits/*</url-pattern>
</servlet-mapping>

After this, we need to add Gson library to our project to convert complex data type to Json. Simply download and drop the gson-2.2.4.jar to WEB-INF/lib folder. Container will include this folder as source for webapp class loader.

Make use of TwitService to serve contents:

public class TwitServlet extends HttpServlet {
 
 private TwitService twitService;
 
 private Gson gson = new Gson();
       
    public TwitServlet() {
        super();
        twitService = new TwitServiceImpl();
    }

 protected void doGet(HttpServletRequest request, HttpServletResponse response) 
   throws ServletException, IOException {
  String responseBody = "";
  String path = (request.getPathInfo()==null) ? "" : 
   request.getPathInfo().replaceFirst("/", "");
  
  if (path.length()==0){
   Collection twits = twitService.getAllTwits();
   responseBody = gson.toJson(twits);
  }
  else {
   try {
    String userId = path;
    responseBody = twitService.getTwit(userId);
   } catch (RecordNotFoundException e) {
    responseBody = e.getMessage();
    response.setStatus(404);
   }
  }
  
  response.getWriter().write(responseBody);
 }

 protected void doPost(HttpServletRequest request, HttpServletResponse response) 
   throws ServletException, IOException {
  String responseBody = "";
  String path = (request.getPathInfo()==null) ? "" : 
   request.getPathInfo().replaceFirst("/", "");
  
  if (path.length()==0){
   responseBody = "Invalid usage";
   response.setStatus(401);
  }
  else {
   String userId = path;
   String twit = convertStreamToString(request.getInputStream());
   
   twitService.insertTwit(userId, twit);
   responseBody = "Twit inserted";
  }
  
  response.getWriter().write(responseBody);
 }
 
 static String convertStreamToString(java.io.InputStream is) {
     java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
     return s.hasNext() ? s.next() : "";
 }
}

6. Test the webapp with Rest client

Let test this application with one Rest client. As most of us has Chrome installed, let use Postman to test it. Download Postman plugin to Chrome and open it.

Use it to send some twits to our webapp

Discussion & Thinking

The service and Gson object can be shared because they are ThreadSafe. Gson library is already ThreadSafe. The TwitService is thread safe as well because the only shared object, twitMap is concurrent hashmap. This is a normal map except that every operation is synchronized.
The twitService does not need to be static field variable because by specification, container will only create one servlet. That mean any request hit this URL will be served by the same instance of Servlet.

Tuesday 18 March 2014

Java Tutorial 2 - Setup and Deploy Web Project

On the second day, the objective is to create and deploy a simple web project. There is not much explanation in the first few articles, just to let learner get used to the popular tools for developing software. As the tutorial progress, learner will get to know faster way of getting things done and more understanding of how things done.

Prerequisite

Completed Java Tutorial Part 1.

Create Simple Web Project

1. Introducing Eclipse interface

Provide that you have downloaded the proper version of Eclipse, start it up, you should have the right tool to develop web project.

This is a screen of how your Eclipse may look like

The screen is normally divided like above, the left is for project explorer, middle for editor, right for Outline and bottom for Console and other miscellaneous views. The usage is quite natural, select a file on the explorer, edit it on editor and view result in Console.

Each box you see is one View. The layout of Views is called Perspective. On the top right corner of the screen, you should be able to see some default perspectives like JavaEE, Java. Choosing the right Perspective can make Eclipse temporarily customized to your need.

For example, if you compare Java and JavaEE perspectives, Java Perspective gives you short keys to create new class, interface. Clicking on New button show menu with creating new Class, Interface, Package,... In contrast, Java EE New button show menu with creating Servlet, Dynamic Web Project,... It also automatically include Server view at the bottom. Depend on your need, you will need to switch over among different perspectives.

If you are not happy with current Perspective, select Window -> Show View to add whatever View you need to the existing Perspective. The view also can be dragged and drop to any portion of screen.

2. Register Jdk with Eclipse

In the earlier tutorial, we unpacked Eclipse to "java/jdk1.7.0_51". Let Eclipse know this. Select Window -> Preference.

Navigate through the left panel until Java -> Installed JREs. From the screen shot above, I already registered some JREs with my Eclipse. If you follow this tutorial from beginning, you should only have the default jre7 from system. This is not what we want to develop with because the JRE included inside JDK installation give us better support and source code. Therefore, please click on Add, and point the jdk folder to "java/jdk1.7.0_51".

Please check the check box to make Eclipse use this JRE as default for future projects.

3. Register Tomcat with Eclipse

Open Preferences again, this time, navigate to Server -> Runtime Environment

Click on Add to register new Server Run time. Choose Apache Tomcat 7 runtime.

Continue to fill up with the Tomcat and Jdk that we prepared from last tutorial.

Too bad, this is just the runtime, not the server itself. Next step is creating new server using this runtime. You can use the New button mentioned above to create new Server but it requires more mouse clicks because New Server is not included in default menu. The faster way is to open Server view, right click to create new Server.

Repeat this stupid step of selecting server type again.

Please remember to select the server type first, because the Server Runtime Environment combo box only show the Runtime compatible to your server type. Click Finish to end the server creation.

4. Create first webapp

In this tutorial, you will simply use Eclipse to create new webapp. Click New button, select Dynamic Web Project

We are not going to use any fantastic feature of Servlet API 3.0 yet, so start with module version 2.5 first. Select the target runtime we created earlier and choose any name you like for the webapp.

You should see the project in the explorer. However, the web project created by Eclipse does not include the welcome page, we are going to create one for it

In the picture above, I used Package Explorer to view project, you can see slightly different UI if you use Project Explorer. As we set the runtime as Tomcat 7 when creating project and default JRE is jdk 1.7.0_51, we can see that both libraries are added by default to project. Eclipse store all web web contents inside WebContent folder. We are going to add welcome file index.html for it. Select New -> Web -> HTML File. Choose file name as index.html and put to WebContent folder. Put whatever content you like to the index file, for me, I just simply choose "Hello World"

5. Deploy this webapp to Tomcat

Let deploy this webapp to Tomcat server. Right click on Tomcat server and choose "Add and Remove ..." webapp. After add the sample_webapp, click finish.

The last step is to right click on Server and choose to start. By default, Tomcat will bind to port 8080. This is what printed in Console

INFO: Starting Servlet Engine: Apache Tomcat/7.0.52
Mar 19, 2014 1:31:22 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8080"]
Mar 19, 2014 1:31:22 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["ajp-bio-8009"]
Mar 19, 2014 1:31:22 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 374 ms

The console confirm that Tomcat start at port 8080. Hence, open your browser and key in the address

http://localhost:8080/sample_webapp

The index.html is the welcome file, it will be served when you do not specify anything after webapp.

Congratulation!

This tutorial is already too long and I will not torture you with any more theories. Congratulation if you have enough patient to complete this tutorial

Saturday 15 March 2014

Java Tutorial - Part 1

This article simulate what you will go through in the first day of job. The ultimate goal of the first day is to setup your working environment.

Prerequisite

2 hands, 10 fingers
2 eyes
1 computer
1 functional brain.

Setup environment

1. Download Java

Create one folder named "java" to store all your related development stuffs. In my computer, it is E:\java. If you use Linux or Mac, can set it to "~/java"

Download latest jdk version (currently v1.7.0_51). Install it to "java/jdk1.7.0_51". It is advisable to avoid extracting to "C:\Program Files" because this folder name has space. Please remember to keep the jdk version number in folder name because it is likely that we will need to have multiple jdk versions in the future.

Add "jdk1.7.0_51\bin" to system path. After this step, open command line and type "java -version" to see if system can find the proper Java environment.

Create new system variable JAVA_HOME point to jdk folder that setup in the earlier step.

2. Download Maven

Go to http://maven.apache.org/download.cgi and follow the instructions to install Maven. As above, please do not put Maven in your program files folder. I would recommend to put it in "java\apache-maven-3.2.1".

3. Download Tomcat

Download Tomcat 7 from http://tomcat.apache.org/download-70.cgi, choose the binary, core package. Unpack Tomcat to java folder.

4. Download IDE

There are 3 well-known IDEs and all of them are perfectly fit for java development: Eclipse, Netbean and IntelliJ. If you have never heard about them I would say Eclipse is like Linux, NetBean and IntelliJ are like Windows . Eclipse is developed by community, difficult to configure but very customizable. This tutorial will use Eclipse as the default IDE.

Go to http://www.eclipse.org/downloads/ and download "Eclipse IDE for Java EE Developers". Extract it do java\eclipse4.3

Create folder "java/workspace". Start Eclipse, it will ask for workspace folder. Please point to the folder we just created.

Discussion & Thinking

Many of the popular tools and applications need JAVA_HOME to find out JRE or JDK to launch Java process. This practice is preferred over calling java from command line so that it is possible to have JRE available at system path for general use and JDK is for development. In case you need to use different version of JDK for different process, setting JAVA_HOME for each console is easier than modifying system path.

Can take a look at mvn.bat or mvn in "java/apache-maven-3.2.1/bin" folder. Maven use the JDK/JRE as specified by JAVA_HOME.

Take a look at file eclipse.ini in the eclipse folder. This is the configuration file for eclipse. The default memory setup for Eclipse is a bit low, you can set to higher value if your computer have more Ram and you are going to develop more than one project. This is my setting

-Xms128m

-Xmx512m

-XX:MaxPermSize=256m

Open Eclipse, let navigate to Help -> About Eclipse -> Installation Detail -> Configuration. Verify that your memory configuration in the earlier step really take effect.

Other References

https://examples.javacodegeeks.com/java-tutorial-for-beginners/

Wednesday 12 March 2014

Caching - part 3

In the last part of this Caching series, I would like to discuss Virtual DB, the technique we use to build effective caching for database.

Caching on Database Tier

Caching on Database tier share most of common attributes with Caching on business logic tier. In deed, if you treat DAO as one service, most of the techniques from earlier article can be applied.

However, due to one unique characteristic of relational database, caching on database tier has one special technique that only can be applied on this tier. This is Virtual DB.

What is Virtual DB?

To understand Virtual DB, let take a look at the sample below:

Imagine if we have a reporting table that contains the lowest and highest temperature for every day from beginning of this century. The system we build must allow user to enter start date, end date and provide lowest and highest temperature for that period.

With the requirement above, active caching is unrealistic due to huge volume of possible queries. Passive caching is also ineffective unless due to some reasons, users keep querying the same period.

The table size is small as we only have 14 years from beginning of this century and for each day, we only need to store 2 temperatures. It is the combination of data that explode the caching size. Moreover, the data store in the cache is highly repetitive. For example, if we already store lowest temperature of year 2012 and year 2013, the lowest temperature of from 2012 to 2013 should be known automatically without accessing DB.

Due to this, it is wise to build the data model inside caching engine and apply business logic to minimize the cache size. I simply call it Virtual DB as it function as a minimized database in memory. It is not realistic to store the whole DB in memory because database is built for scalability, caching do not need this, we only need performance. Because of different goals, we only need to store most frequent accessed records in Virtual DB. It is the developer responsibility to decide which query can be run on Virtual DB and which one is not.

Go back to earlier example, daily temperature do not change after being recorded; hence, it is safe to store these data to Virtual DB without worrying of frequently refreshing data. Still, developer can do polling to update the Virtual DB if there is new record. The polling should be smart enough to only fetch additional records rather than fully load the whole table.

It is not necessary to build the full data model. The real DB can contain humidity, whether there was storm or rain in that day but we can strip off not interested data. It is also not necessary to store all the data. We can choose to include only the daily temperatures for the last 3 years if observation confirm that most of queries fall within this period. The goal is to replace majority of queries not all possible queries. As can be shown the from diagram below, there are queries to both normal DB and Virtual DB.

How to build Virtual DB?

To build Virtual DB, choose a database system that can use memory as storage. For Java, the most well-known one is HSQLDB. If the systems crash and restarts, simply rebuild the DB as part of webapp starting process. It is a bit complex as we need to maintain 2 data sources in DAO and selectively choose which data source to run the query.

Kindly notice that Virtual DB is different from DB Replication. DB replication builds mirrors of DB. To simplify implementation, most of DB replication use master/slave model where the data is one-way synchronized from master to slave nodes. DB replication offer scalability but in term of performance, it is far slower than Virtual DB. Virtual DB even do not need to send request to Database, responses are generated within system memory.

The queries to Virtual DB can be hundreds of times faster than querying real DB. Therefore, if you manage to replace 80% of DB queries with Virtual DB queries, it will provide system a huge boost of performance. We have once improved the server load from few hundred requests per second to above 2000 requests per second using this technique.

Virtual DB can be used together with other techniques to improve performance. Of the system that we built that suffered the worst DDOS attack in 2011, both DB replica and Virtual DB are used. Generally, we do not like to mix the frequently write with frequently read data. No locking is better than both pessimistic and optimistic locking.

Conclusions

So far, we have walked through several well-known techniques to build caching. It is easy to learn the concept but not so easy to implement it properly in real life situation. Developer may need some practices and good understanding of business requirement to build effective caching.

Caching is better to be done with some investigation and analysis. In the future, I will provides some more articles about scalability and Load Analysis. We will need to use them to aid Caching design.

Monday 10 March 2014

Caching - Part 2

In the first part, I have discussed the caching implementation in Client Tier. In this article, I will continue to discuss caching on business and database tier.

Caching on business logic tier

Caching on business tier is reliable in term of cache control. Developer can decide when to flush the cache, the maximum size of cache and how should the cache be updated. However, clustering make cache management more difficult on server side. Here are some of the concerns that you need to tackle

1/ Passive Caching or Active Caching

If you choose to implement passive caching, there is no need to frequently update the cache. The cache is build as wrapper around a service, intercept any call to it. If the cache does not contain wanted information, it simply let the service handle the request and memorize the response. If there is a identical method invoke next time, the cached response will be served.

Active caching take initiative of generating the expected response before any method invoke. It also take responsibility of regularly refreshing the cache. It is possible to do so if the data size is small enough.

The biggest advantage of active caching is prevention of slow response on first hit. However, if the calls are not predictable or not repetitive enough, the system may waste resources maintaining the unused cached records.

Passive Caching is also ineffective if the calls are not repetitive but it do not waste effort on caching unused records. It is also more scalable due to the ability of caching a small fraction of frequently used data. However, for passive caching to be effective, the size of cache should be comfortable larger than the amount of concurrent users to avoid purging useful records.

Active Caching is normally implemented manually by developers. Passive Caching can be supported by framework (for example, putting annotation on top of getter method) or done manually. The benefit of manual caching come from ability to manually refresh the cache if there is data change (for example, invalidate all the cache of getter methods if there is a call to update data). However, this is not popular any more due to the fact that we can not reliably detect data change in cluster environment (thinking of what happen if update request come to other server).

2/ How to maintain cache synchronization on clustering environment

Caching is not so challenging in the past, when clustering is not the norm. Distributed caching add more complexity into implementation as developers need to maintain the synchronization of data. Apparently, caching add some delay when displaying data. If the cached resources are not synced among servers, users may see different values of data within one session, which is awkward.

Here are some of methods that can be used to rectify the issue:

Using stickiness session. If we can guaranteed that all requests from user will be forward to the same server, there is no need to synchronized data among servers. This is not preferred as non-stickiness session is better for scalability and silent fail-over.
Using the distributed cache like MemCached or Terracotta. Distributed cache is slower because the records is download through internal network but it still helps on boosting scalability as there is no single point of data access. There are some major different between MemCached and TerraCotta on the link below. Generally, TerraCotta is preferred when you need to manipulate data. It also help to synchronized records between nodes, which is more scalable than MemCached.

http://debasishg.blogspot.sg/2008/09/memcached-and-terracotta-alternatives.html

For Active Caching, we can sync the effort to refresh caches of all the servers. The easiest way is to have a central server pushing cached records to all nodes. With this design, there is no duplicated reading of data and all the nodes will serve identical records.

3/ Memory

Memory is the next thing you need to worry about caching. Generally, caching will not cause memory leak because the oldest records will be flushed to slower storage like hard disk or totally discarded. However, you may not want that to happen because it decrease the performance and efficiency of caching. Caching is optimized when the records are often purged due to expiration rather than memory constraint.

Cache Eviction is unwanted but when it happen, there are 3 popular choices of how to choose record to purge. They are Least Recently Used, Least Frequently Used and First In First Out. If you want to know further, can take a look at EhCache documentation for Cache Conviction at

http://ehcache.org/documentation/apis/cache-eviction-algorithms

If you are interested to build your own Conviction Algorithm, can refer to Wikipedia for some well-known techniques

http://en.wikipedia.org/wiki/Cache_algorithms

In the past, heap space is used to store records but recently, EhCache introduce off-heap store. Off-heap store reside on RAM but out of JVM. Off-heap store is slower than on-heap store as EhCache need to serialize object before writing it to the store and de-serialize it when retrieving back the object. However, as it reside outside heap space, it is invulnerable to intermittent performance issue caused by garbage collection.

Data Structure also plays a big role in caching performance. For a performance sensitive application like caching, it is highly recommended to implement your own data structure instead of using built-in implementation provided by Java. Your data structure need to reflect the business requirement of how your data will be queried.

If the data queries are random, you may be interested in Random Sampling Technique. For example, can take a look at class SelectableConcurentHashMap from EhCache. It is a good example of organizing data so that random record retrieval is faster.

If the related or similar data queries come together, you can introduce insertion order to the data inside hash map so that access to least recently accessed data is faster. Detail explanation of this technique is out of scope for this article but audiences can take a look at Java LinkedHashMap to see a sample.

4/ Business Layer

Business Logic tier is rarely flat. In a simplest application, you still have Service on top of DAO objects. In more complex scenarios, the system may include complex services that make use of several other services and DAOs. On the top, the MVC controller is in charge of invoking service and rendering response.

The above diagram is a realistic sample of how your application may be built. There are many place to implement caching but as we already mentioned in the earlier article, too many layers of caching is ineffective.

As usual, the decision come from business requirement rather than developer preference. Caching at higher layer provide more performance gain but also increase the amount of possible method invoke. Let make a simplest assumption of each DAO only have 3 possible values of data, then caching at top layer require 27 records while caching at lowest layer only require 9 records.

If some how, due to business query, the queries for DAOs are always related, we may have much less than 27 possible records to cache. In this case, caching at higher layer make more sense.

Sunday 9 March 2014

Caching - Part 1

I feel caching is the most effective but least utilized feature in building application. I think it come from the fact that no product manager would pay for it. It is true that caching bring huge benefit to system, but sometimes, no one take initiative to calculate and represent how much performance the system will earn from it. Moreover, the benefit of caching is not so obvious under low load condition. The benefit is only shown clearly when the production server is up and serving huge load.

However, we should develop the application with the assumption that it will serve high load in the future; because if we do not, it will be too late when this happen. In this article, I will share my experience on integrating caching to the system that I have built before.

Cost and benefit of caching

Caching cost memory and little of processing time. It is worth the effort, providing that load and hit rate both are high enough and you do not let it occupied all the memory available. To simplify the calculation, we can build a simple model:

h : hit rate, the proportion of access hitting cache. 0 <= h <= 1
t1: processing time for fetching record from cache.
t2: processing time for fetching record from original source.

When we get the record from the cache, response time will be t1. When we fail to get record from the cache, the response time can be simplified as (t1+t2).

Then the average response time when having cache will be:

t = h * t1 + (1-h) * (t1+t2) = t1 + (1-h)*t2

The response time when not having cache is t2. Then the average response time change is

dt = t1 + (1-h)*t2 -t2 = t1 - h*t2.

Providing that normally t1 is normally smaller than t2 by large margin, you do not need hit rate to be very high to improve average response time. However, caching does not only cost processing time, it also use memory. If the cache storage is smaller than amount of concurrent users accessing data, the records on the cache will be constantly purged due to out of storage, which will greatly lower the hit rate.

Another concern for caching is validity of data. It depends on the mechanism to maintain validity of data, it may add to total cost of caching. If the record is stored with expiration time, it will periodically be refreshed when the old record is time out. Some developers prefer to have a better control by regularly fetching new records from data source. In this case, the effort of fetching new records need to be included as well.

Because of that, developer need to maintain balance between cost and benefit of caching. For a general rule of thumb, I would suggest that we should implement caching if we manage to achieve hit rate of at least 50% and still have at least of 50% of RAM available under modest load. Any lower amount will not justify the effort.

Where to cache?

As the most popular architecture now a day is 3-tier, I will discuss the caching mechanism on this architecture.

Illustration of 3-tier Architecture by blog.simcrest.com

There is no clear guidance for the best place to implement caching; however, you still need to carefully design how you want to do caching. If you over use caching, you may waste resource if the records on lower tier cache being idle, waiting to be expired because the data have been served from higher tier cache. In this case, the cost is still induced but no benefit can be gained at all.

Caching on each tier has its own characteristics, with the client tier caching provide greatest performance boost but less reliable and lower tier caching provide less benefit but give higher level control.

Let go into detail of caching on each tier.

Caching on client tier

Caching on client make use of browser to store data. This is the fastest cache as it even do not require the request to be send to server. As, it is summarised here,

"the fastest http request is the one that not made"

There are three places that can be used to cache different types of information. They are browser cache, cookie and javascript objects. Each of them have their own life-cycle and aim to store different kinds of resources but share a common attribute of low reliability. This is no surprise because the browser belong to user, not developer.

1/ Browser Cache

There is a default-on feature on any browser. Depend on resources available and configuration, the browser may choose to cache up to a certain amount static resource. Generally, Chrome has the biggest cache with storage size up to 80MB, IE and Firefox only limit at 50MBs. Through HTTP 1.1, the server can tell browser to turn on/off caching for each response and whether the resource should have expiration time. As mentioned above, browser cache is not reliable, any time, user can clear the whole cache and turn off caching.

To facilitate browser cache, developer need to know how to enable/disable caching for each request. Other than using HTTP header, there are some other useful information related to browser caching:

GET request can be cached, POST request can not be cached. However, POST request is generally use more bytes to transfer content and can be broken into 2 consecutive requests, one for header, one for request body

http://benramsey.com/blog/2008/04/http-status-100-continue/

The resource is uniquely identified by URL. It is a well-known technique to include a dynamic query to force browser to load resource from server as it make the URL always unique. The simplest way of generating dynamic query is to apply current time to the resource URL. This solution is included as part of jQuery.ajax() API (as mentioned above, only applicable to GET requests). However, kindly notice the impact of this method as browser still attempt to cache the response, it just never use the data in the cache. So, the browser storage is wasted.

There is no need to worry about expiration time of a resource if the resource content is uniquely identified by resource url. You can achieve this by using some Javascript framework like Rhino, YUI Compressor or UglifyJS.

There are two kinds of GET requests, conditional update request and unconditional request. With the former request, browser include a header named If-Modified-Since to the request. The server simply response with HTTP response code 304 Not Modified instead of serving full content. Normally, if user press F5, browser will load resources with conditional update requests and if user press Ctrl + F5, browser will load resources with unconditional request. This work well, except for IE. IE do not bother to make a conditional update request as long as the cache still alive.

GWT RPC use POST request to transfer data. Because of this, GWT never cache the response. However, it is achievable if you construct your own HTTP request instead of using RPC, just remember to use the GET request rather than POST. I my self always prefer mixing GWT with static HTML by loading HtmlFrame. GWT by native do not cache but lots of contents are safe and supposed to be cached. To make caching more reliable, I make use of technique above to include hash code to any url to avoid rendering obsoleted data.

2/ Cookie

Cookie can be used to store information but the size is limited (4 kb) and we can only store string. The advantage of cookie is this resource persist when user hit F5. Because of this, you can use it to store small information. For example, if your web page has a combo box contain all phone manufactures, it is not necessary to hit server to load this content again.

However, kindly notice that this is on theory only. I have not used cookie for this kind of purpose. If we found the combox box is simple enough, I prefer to load the content of combo box as part of main page html rather than using Ajax request to populate content.

3/ Javascript Objects

I use this often, coming from the fact that it is easy to write code this way if you make use of GWT. Even if you use plain javascript, it is still easy to cache object but GWT offer great benefit by providing scope for your javascript object.

Javascript objects should be used only when your webapp is rich client application and have repetitive calls. When user refresh web page, all java-script objects are deleted.