Monday, 14 July 2014

From framework to platform

When I started my career as a Java developer close to 10 years ago, the industry is going through a revolutionary change. Spring framework, which was released in 2003, was quickly gaining ground and became a serious challenger to the bulky J2EE platform. Having gone through the transition time, I quickly found myself in favour of Spring framework instead of J2EE platform, even the earlier versions of Spring are very tedious to declare beans.

What happened next is the revamping of J2EE standard, which was later renamed to Java EE. Still, dominating of this era is the use of opensource framework over the platform proposed by Sun. This practice gives developers full control over the technologies they used but inflating the deployment size. Slowly, when cloud application become the norm for modern applications, I observed the trend of moving the infrastructure service from framework to platform again. However, this time, it is not motivated by Cloud application.

Framework vs Platform

I have never heard of or had to used any framework in school. However, after joining the industry, it is tough to build scalable and configurable software without the help of any framework.

From my understanding, any application is consist of codes that implement business logic and some other codes that are helpers, utilities or to setup infrastructure. The codes that are not related to business logic, being used repetitively in many projects, can be generalised and extracted for reuse.  The output of this extraction process is framework.

To make it shorter, framework is any codes that is not related to business logic but helps to dress common concerns in applications and fit to be reused.

If following this definition then MVC, Dependency Injection, Caching, JDBC Template, ORM are all consider frameworks.

Platform is similar to framework as it also helps to dress common concerns in applications but in contrast to framework, the service is provided outside the application. Therefore, a common service endpoint can serve multiple applications at the same time. The services provided by JEE application server or Amazon Web Services are sample of platforms.

Compare the two approaches, platform is more scalable, easier to use than framework but it also offers less control. Because of these advantage, platform seem to be the better approach to use when we build Cloud Application.

When should we use platform over framework

Moving toward platform does not guarantee that developers will get rid of framework. Rather, platform only complements framework in building applications. However, one some special occasions we have a choice to use platform or framework to achieve final goal.  From my personal opinion, platform is greater that framework when following conditions are matched:
  • Framework is tedious to use and maintain
  • The service has some common information to be shared among instances.
  • Can utilize additional hardware to improve performance.
In office, we still uses Spring framework, Play framework or RoR in our applications and this will not change any time soon. However, to move to Cloud era, we migrated some of our existing products from internal hosting to Amazon EC2 servers. In order to make the best use of Amazon infrastructure and improve software quality, we have done some major refactoring to our current software architecture. 

Here are some platforms that we are integrating our product to:

Amazon Simple Storage Service (Amazon S3) &  Amazon Cloud Front

We found that Amazon Cloud Front is pretty useful to boost average response time for our applications. Previously, we host most of the applications in our internal server farms, which located in UK and US. This lead to noticeable increase in response time for customers in other continents. Fortunately, Amazon has much greater infrastructure with server farms built all around the worlds. That helps to guarantee a constant delivery time for package, no matter customer locations.

Currently, due to manual effort to setup new instance for applications, we feel that the best use for Amazon Cloud Front is with static contents, which we host separately from application in Amazon S3. This practice give us double benefit in performance with more consistent delivery time offered by the CDN plus the separate connection count in browser for the static content.

Amazon Elastic Cache

Caching has never been easy on cluster environment. The word "cluster" means that your object will not be stored and retrieve from system memory. Rather, it was sent and retrieved over the network. This task was quite tricky in the past because developers need to sync the records from one node to another node. Unfortunately, not all caching framework support this feature automatically. Our best framework for distributed caching was Terracotta.

Now, we turned to Amazon Elastic Cache because it is cheap, reliable and save us the huge effort for setting up and maintain distributed cache. It is worth to highlight that distributed caching is never mean to replace local cache. The difference in performance suggest that we should only use distributed caching over local caching when user need to access real-time temporary data.

Event Logging for Data Analytics

In the past, we used Google Analytics for analysing user behaviour but later decided to build internal data warehouse. One of the motivation is the ability to track events from both browsers and servers. The Event Tracking system uses MongoDB as the database as it allow us to quickly store huge amount of events.

To simplify the creation and retrieval of events, we choose JSON as the format for events. We cannot simply send this event directly to event tracking server due to browser prevention of cross-domain attack. For this reason, Google Analytic send the events to server under the form of a GET request for static resource. As we have the full control over how the application was built, we choose to let the events send back to application server first and route to event tracking server later. This approach is much more convenient and powerful.

Knowledge Portal

In the past, applications access data from database or internal file repository. However, to be able to scale better, we gathered all knowledge to build a knowledge portal. We also built query language to retrieve knowledge from this portal. This approach add one additional layer to the knowledge retrieval process but fortunately for us, our system does not need to serve real time data. Therefore, we can utilize caching to improve performance.

Conclusion

Above is some of our experience on transforming software architecture when moving to the Cloud. Please share with us your experience and opinion.

Saturday, 5 July 2014

Common mistakes when using Spring MVC

When I started my career around 10 years ago, Struts MVC is the norm in the market. However, over the years, I observed the Spring MVC slowly gaining popularity. This is not a surprise to me, given the seamless integration of Spring MVC with Spring container and the flexibility and extensibility that it offers.

From my journey with Spring so far, I usually saw people making some common mistakes when configuring Spring framework. This happened more often compare to the time people still used Struts framework. I guess it is the trade off between flexibility and usability. Plus, Spring documentation is full of samples but lack of explanation. To help filling up this gap, this article will try to elaborate and explain 3 common issues that I often see people encounter.

Declare beans in Servlet context definition file

So, everyone of us know that Spring use ContextLoaderListener to load Spring application context. Still, when declaring the DispatcherServlet, we need to create the servlet context definition file with the name "${servlet.name}-context.xml". Ever wonder why?

Application Context Hierarchy

Not all developers know that Spring application context has hierarchy. Let look at this method

org.springframework.context.ApplicationContext.getParent()

It tells us that Spring Application Context has parent. So, what is this parent for?

If you download the source code and do a quick references search, you should find that Spring Application Context treat parent as its extension. If you do not mind to read code, let I show you one example of the usage in method BeanFactoryUtils.beansOfTypeIncludingAncestors():

if (lbf instanceof HierarchicalBeanFactory) {
    HierarchicalBeanFactory hbf = (HierarchicalBeanFactory) lbf;
    if (hbf.getParentBeanFactory() instanceof ListableBeanFactory) {
 Map parentResult = 
              beansOfTypeIncludingAncestors((ListableBeanFactory) hbf.getParentBeanFactory(), type);
 ...
    }
}
return result;
}

If you go through the whole method, you will find that Spring Application Context scan to find beans in internal context before searching parent context. With this strategy, effectively, Spring Application Context will do a reverse breadth first search to look for beans.

ContextLoaderListener

This is a well known class that every developers should know. It helps to load the Spring application context from a pre-defined context definition file. As it implements ServletContextListener, the Spring application context will be loaded as soon as the web application is loaded. This bring indisputable benefit when loading the Spring container  that contain beans with @PostContruct annotation or batch jobs.

In contrast, any bean define in the servlet context definition file will not be constructed until the servlet is initialized. When does the servlet be initialized? It is indeterministic. In worst case, you may need to wait until users make the first hit to the servlet mapping URL to get the spring context loaded.

With the above information, where should you declare all your precious beans? I feel the best place to do so is the context definition file loaded by ContextLoaderListener and no where else. The trick here is the storage of ApplicationContext as a servlet attribute under the key

org.springframework.web.context.WebApplicationContext.ROOT_WEB_APPLICATION_CONTEXT_ATTRIBUTE   

Later, DispatcherServlet will load this context from ServletContext and assign it as the parent application context.

protected WebApplicationContext initWebApplicationContext() {
   WebApplicationContext rootContext =
      WebApplicationContextUtils.getWebApplicationContext(getServletContext());
   ...
}

Because of this behaviour, it is highly recommended to create an empty servlet application context definition file and define your beans in the parent context. This will help to avoid duplicating the bean creation when web application is loaded and guarantee that batch jobs are executed immediately.

Theoretically, defining the bean in servlet application context definition file make the bean unique and visible to that servlet only. However, in my 8 years of using Spring, I hardly found any use for this feature except defining Web Service end point.

Declare Log4jConfigListener after ContextLoaderListener

This is a minor bug but it catch you when you do not pay attention to it. Log4jConfigListener is my preferred solution over -Dlog4j.configuration as we can control the log4j loading without altering server bootstrap process.

Obviously, this should be the first listener to be declared in your web.xml. Otherwise, all of your effort to declare proper logging configuration will be wasted.

Duplicated Beans due to mismanagement of bean exploration

In the early day of Spring, developers spent more time typing on xml files than Java classes. For every new bean, we need to declare and wiring the dependencies ourselves, which is clean, neat but very painful. No surprise that later versions of Spring framework evolved toward greater usability. Now a day, developers may only need to declare transaction manager, data source, property source, web service endpoint and leave the rest to component scan and auto-wiring.

I like these new features but this great power need to come with great responsibility; otherwise, thing will be messy quickly. Component Scan and bean declaration in XML files are totally independent. Therefore, it is perfectly possible to have identical beans of the same class in the bean container if the bean are annotated for component scan and declare manually as well. Fortunately, this kind of mistake should only happen with beginners.

The situation get more complicated when we need to integrate some embedded components into the final product. Then we really need a strategy to avoid duplicated bean declaration.



The above diagram show a realistic sample of the kind of problems we face in daily life. Most of the time, a system is composed from multiple components and often, one component serves multiple product. Each application and component has it own beans. In this case, what should be the best way to declare to avoid duplicated bean declaration?

Here is my proposed strategy:

  • Ensure that each component need to start with a dedicated package name. It makes our life easier when we need to do component scan.
  • Don't dictate the team that develop the component on the approach to declare the bean in the component itself (annotation versus xml declaration). It is the responsibility of the developer whom packs the components to final product to ensure no duplicated bean declaration.
  • If there is context definition file packed within the component, give it a package rather than in the root of classpath. It is even better to give it a specific name. For example src/main/resources/spring-core/spring-core-context.xml is way better than src/main/resource/application-context.xml. Imagine what can we do if we pack few components that contains the same file application-context.xml on the identical package!
  • Don't provide any annotation for component scan (@Component, @Service or @Repository) if you already declare the bean in one context file.
  • Split the environment specific bean like data-source, property-source to a separate file and reuse.
  • Do not do component scan on the general package. For example, instead of scanning org.springframework package, it is easier to manage if we scan several sub-packages like org.springframework.core, org.springframework.context, org.springframework.ui,...


Conclusions

I hope you found the above tips useful for your daily usage. If there is any doubt or any other ideas, please help to feedback.