Monday, 14 July 2014

From framework to platform

When I started my career as a Java developer close to 10 years ago, the industry is going through a revolutionary change. Spring framework, which was released in 2003, was quickly gaining ground and became a serious challenger to the bulky J2EE platform. Having gone through the transition time, I quickly found myself in favour of Spring framework instead of J2EE platform, even the earlier versions of Spring are very tedious to declare beans.

What happened next is the revamping of J2EE standard, which was later renamed to Java EE. Still, dominating of this era is the use of opensource framework over the platform proposed by Sun. This practice gives developers full control over the technologies they used but inflating the deployment size. Slowly, when cloud application become the norm for modern applications, I observed the trend of moving the infrastructure service from framework to platform again. However, this time, it is not motivated by Cloud application.

Framework vs Platform

I have never heard of or had to used any framework in school. However, after joining the industry, it is tough to build scalable and configurable software without the help of any framework.

From my understanding, any application is consist of codes that implement business logic and some other codes that are helpers, utilities or to setup infrastructure. The codes that are not related to business logic, being used repetitively in many projects, can be generalised and extracted for reuse.  The output of this extraction process is framework.

To make it shorter, framework is any codes that is not related to business logic but helps to dress common concerns in applications and fit to be reused.

If following this definition then MVC, Dependency Injection, Caching, JDBC Template, ORM are all consider frameworks.

Platform is similar to framework as it also helps to dress common concerns in applications but in contrast to framework, the service is provided outside the application. Therefore, a common service endpoint can serve multiple applications at the same time. The services provided by JEE application server or Amazon Web Services are sample of platforms.

Compare the two approaches, platform is more scalable, easier to use than framework but it also offers less control. Because of these advantage, platform seem to be the better approach to use when we build Cloud Application.

When should we use platform over framework

Moving toward platform does not guarantee that developers will get rid of framework. Rather, platform only complements framework in building applications. However, one some special occasions we have a choice to use platform or framework to achieve final goal.  From my personal opinion, platform is greater that framework when following conditions are matched:
  • Framework is tedious to use and maintain
  • The service has some common information to be shared among instances.
  • Can utilize additional hardware to improve performance.
In office, we still uses Spring framework, Play framework or RoR in our applications and this will not change any time soon. However, to move to Cloud era, we migrated some of our existing products from internal hosting to Amazon EC2 servers. In order to make the best use of Amazon infrastructure and improve software quality, we have done some major refactoring to our current software architecture. 

Here are some platforms that we are integrating our product to:

Amazon Simple Storage Service (Amazon S3) &  Amazon Cloud Front

We found that Amazon Cloud Front is pretty useful to boost average response time for our applications. Previously, we host most of the applications in our internal server farms, which located in UK and US. This lead to noticeable increase in response time for customers in other continents. Fortunately, Amazon has much greater infrastructure with server farms built all around the worlds. That helps to guarantee a constant delivery time for package, no matter customer locations.

Currently, due to manual effort to setup new instance for applications, we feel that the best use for Amazon Cloud Front is with static contents, which we host separately from application in Amazon S3. This practice give us double benefit in performance with more consistent delivery time offered by the CDN plus the separate connection count in browser for the static content.

Amazon Elastic Cache

Caching has never been easy on cluster environment. The word "cluster" means that your object will not be stored and retrieve from system memory. Rather, it was sent and retrieved over the network. This task was quite tricky in the past because developers need to sync the records from one node to another node. Unfortunately, not all caching framework support this feature automatically. Our best framework for distributed caching was Terracotta.

Now, we turned to Amazon Elastic Cache because it is cheap, reliable and save us the huge effort for setting up and maintain distributed cache. It is worth to highlight that distributed caching is never mean to replace local cache. The difference in performance suggest that we should only use distributed caching over local caching when user need to access real-time temporary data.

Event Logging for Data Analytics

In the past, we used Google Analytics for analysing user behaviour but later decided to build internal data warehouse. One of the motivation is the ability to track events from both browsers and servers. The Event Tracking system uses MongoDB as the database as it allow us to quickly store huge amount of events.

To simplify the creation and retrieval of events, we choose JSON as the format for events. We cannot simply send this event directly to event tracking server due to browser prevention of cross-domain attack. For this reason, Google Analytic send the events to server under the form of a GET request for static resource. As we have the full control over how the application was built, we choose to let the events send back to application server first and route to event tracking server later. This approach is much more convenient and powerful.

Knowledge Portal

In the past, applications access data from database or internal file repository. However, to be able to scale better, we gathered all knowledge to build a knowledge portal. We also built query language to retrieve knowledge from this portal. This approach add one additional layer to the knowledge retrieval process but fortunately for us, our system does not need to serve real time data. Therefore, we can utilize caching to improve performance.


Above is some of our experience on transforming software architecture when moving to the Cloud. Please share with us your experience and opinion.

Saturday, 5 July 2014

Common mistakes when using Spring MVC

When I started my career around 10 years ago, Struts MVC is the norm in the market. However, over the years, I observed the Spring MVC slowly gaining popularity. This is not a surprise to me, given the seamless integration of Spring MVC with Spring container and the flexibility and extensibility that it offers.

From my journey with Spring so far, I usually saw people making some common mistakes when configuring Spring framework. This happened more often compare to the time people still used Struts framework. I guess it is the trade off between flexibility and usability. Plus, Spring documentation is full of samples but lack of explanation. To help filling up this gap, this article will try to elaborate and explain 3 common issues that I often see people encounter.

Declare beans in Servlet context definition file

So, everyone of us know that Spring use ContextLoaderListener to load Spring application context. Still, when declaring the DispatcherServlet, we need to create the servlet context definition file with the name "${}-context.xml". Ever wonder why?

Application Context Hierarchy

Not all developers know that Spring application context has hierarchy. Let look at this method


It tells us that Spring Application Context has parent. So, what is this parent for?

If you download the source code and do a quick references search, you should find that Spring Application Context treat parent as its extension. If you do not mind to read code, let I show you one example of the usage in method BeanFactoryUtils.beansOfTypeIncludingAncestors():

if (lbf instanceof HierarchicalBeanFactory) {
    HierarchicalBeanFactory hbf = (HierarchicalBeanFactory) lbf;
    if (hbf.getParentBeanFactory() instanceof ListableBeanFactory) {
 Map parentResult = 
              beansOfTypeIncludingAncestors((ListableBeanFactory) hbf.getParentBeanFactory(), type);
return result;

If you go through the whole method, you will find that Spring Application Context scan to find beans in internal context before searching parent context. With this strategy, effectively, Spring Application Context will do a reverse breadth first search to look for beans.


This is a well known class that every developers should know. It helps to load the Spring application context from a pre-defined context definition file. As it implements ServletContextListener, the Spring application context will be loaded as soon as the web application is loaded. This bring indisputable benefit when loading the Spring container  that contain beans with @PostContruct annotation or batch jobs.

In contrast, any bean define in the servlet context definition file will not be constructed until the servlet is initialized. When does the servlet be initialized? It is indeterministic. In worst case, you may need to wait until users make the first hit to the servlet mapping URL to get the spring context loaded.

With the above information, where should you declare all your precious beans? I feel the best place to do so is the context definition file loaded by ContextLoaderListener and no where else. The trick here is the storage of ApplicationContext as a servlet attribute under the key


Later, DispatcherServlet will load this context from ServletContext and assign it as the parent application context.

protected WebApplicationContext initWebApplicationContext() {
   WebApplicationContext rootContext =

Because of this behaviour, it is highly recommended to create an empty servlet application context definition file and define your beans in the parent context. This will help to avoid duplicating the bean creation when web application is loaded and guarantee that batch jobs are executed immediately.

Theoretically, defining the bean in servlet application context definition file make the bean unique and visible to that servlet only. However, in my 8 years of using Spring, I hardly found any use for this feature except defining Web Service end point.

Declare Log4jConfigListener after ContextLoaderListener

This is a minor bug but it catch you when you do not pay attention to it. Log4jConfigListener is my preferred solution over -Dlog4j.configuration as we can control the log4j loading without altering server bootstrap process.

Obviously, this should be the first listener to be declared in your web.xml. Otherwise, all of your effort to declare proper logging configuration will be wasted.

Duplicated Beans due to mismanagement of bean exploration

In the early day of Spring, developers spent more time typing on xml files than Java classes. For every new bean, we need to declare and wiring the dependencies ourselves, which is clean, neat but very painful. No surprise that later versions of Spring framework evolved toward greater usability. Now a day, developers may only need to declare transaction manager, data source, property source, web service endpoint and leave the rest to component scan and auto-wiring.

I like these new features but this great power need to come with great responsibility; otherwise, thing will be messy quickly. Component Scan and bean declaration in XML files are totally independent. Therefore, it is perfectly possible to have identical beans of the same class in the bean container if the bean are annotated for component scan and declare manually as well. Fortunately, this kind of mistake should only happen with beginners.

The situation get more complicated when we need to integrate some embedded components into the final product. Then we really need a strategy to avoid duplicated bean declaration.

The above diagram show a realistic sample of the kind of problems we face in daily life. Most of the time, a system is composed from multiple components and often, one component serves multiple product. Each application and component has it own beans. In this case, what should be the best way to declare to avoid duplicated bean declaration?

Here is my proposed strategy:

  • Ensure that each component need to start with a dedicated package name. It makes our life easier when we need to do component scan.
  • Don't dictate the team that develop the component on the approach to declare the bean in the component itself (annotation versus xml declaration). It is the responsibility of the developer whom packs the components to final product to ensure no duplicated bean declaration.
  • If there is context definition file packed within the component, give it a package rather than in the root of classpath. It is even better to give it a specific name. For example src/main/resources/spring-core/spring-core-context.xml is way better than src/main/resource/application-context.xml. Imagine what can we do if we pack few components that contains the same file application-context.xml on the identical package!
  • Don't provide any annotation for component scan (@Component, @Service or @Repository) if you already declare the bean in one context file.
  • Split the environment specific bean like data-source, property-source to a separate file and reuse.
  • Do not do component scan on the general package. For example, instead of scanning org.springframework package, it is easier to manage if we scan several sub-packages like org.springframework.core, org.springframework.context, org.springframework.ui,...


I hope you found the above tips useful for your daily usage. If there is any doubt or any other ideas, please help to feedback.

Wednesday, 18 June 2014

How to increase productivity

Unlock productivity is one of the bigger concerns for any person taking management role. However, people rarely agree on the best approaches to improve performance. Over the years, I have observed different managers using the opposite practices to churn out best performance of the team they are managing. Unfortunately, some works and other don't. To be more accurate, what does not increase performance, actually reduce performance.

In this article, I would like to review what I have seen and learnt over the years and share personal view on the best approaches to unlock productivity.

What factors define teams performance?

Let start with analysing what compose a team. Obviously, a team is composed from team members, each has own expertise, strength and weakness. However, the total productivity of the team is not necessarily the total sum of individual productivity. Other factors like team work, process and environment also have major impact to total performance, which can be both positive or negative.

To sum up, the 3 major factors discussed in this article will be technical skills, working process and culture.

Technical Skills

In a factory, we can count the total productivity as sum of individual productivity of each worker, but this simplicity does not apply to IT field. The differences lie in natural of work. Programming until today is still an innovative work, which cannot be automated. In IT industry, nothing is more valuable than innovation and vision. That explains why Japan may be well known for producing high quality car but US is much more famous for producing well known IT company.

Contradict to factory environment, in a software team, developers does not necessarily do or good at the same things. Even if they have graduated from the same school, taking the same job, personal preference and the self studying quickly make developer's skills different again. For the sake of increasing total productivity, this may be a good thing. There is no use for all of member to be competent on the same kind of tasks. As it is too difficult to good at everything, life will be much easier if members of the team can compensate for each other weakness.

This is not easy to improve on technical skills of the team as it take many years for a developer to build up his/her skill set. The fastest way to pump up the team skill sets is to recruit new talent that offer what the team is lack of. That why the popular practice in the industry is to let the team recruit new member themselves. Because of this, the team, which is slowly built over the years normally normally offers a more balance skills set.

While recruitment is a quick and short term solution, the long term solution is to keep the team up to date with latest trends of technology. In this field, if you do not go forward, you go backward. There is no skill set that can be useful forever. One of my colleague even emphasize that upgrading developers's skills is beneficial to the company in the long run. Even if we do not count inflation, it is quite common that the company will offer pay rise after each annual review to retain staffs. If the staff do not acquire new skills, effectively, the company is paying higher price every year for a depreciating asset. It may be a good suggestion for the company to use monetary prize like KPI to motivate self-study and upgrading.

There are a lot of training courses in the industry but it is not necessarily the best method for upgrading skills. Personally, I feel most of the coursework offer more branding value than real life usage. If a developer is keen to learn, there should be quite sufficient knowledge on internet to pick up anything. Therefore, unless for commercial API or product, spending money on monetary prize should be more worthy than on training course.

Another well-known challenge for self-studying is the human natural laziness. There is nothing surprise about it. However, the best way to fight laziness is to find fun in learning new things. This only can be achieved if developers take programming as his hobby more than professional. Even not, it is quite reasonable that one should re-invest effort on his bread and butter tool. One of my friend even argue that if singer/musician take own responsibility in training, programmer should do the same.

Sometimes, we may feel lost due to the huge amount of technologies exposed to us every year. I myself feel that too. My approach for self studying is adding a delay in absorbing concepts and ideas. I try to understand but do not invest too much until the new concepts and ideas are reasonable accepted by the market.

Working Process

Working process can contribute greatly to team performance, positively or negatively. Great developer write great code but he will not be able to do so if wastes to much effort on something not essential. Obviously, when the process is wrong, developers may feel uncomfortable about their daily life. Unhappy developer may not perform his best.

There is no clear guideline to judge if the working process is well defined but people in the environment will feel it right a way if something is wrong. However, it is not as easy to get it right as people who have the right to make decision not necessarily the guys who suffer from bad process. We need an environment with effective feedback channels to improve on working process.

The common pitfall for working process is the lack of result oriented nature. The process is less effective if it is too reporting oriented, attitude oriented or based on some unreal assumptions. To define the process, it may be good if the executive can decide whether he want to build an innovative company or operation oriented company. The samples for former kind is Google, Facebook, Twitter while the latter may be GM, Ford, Toyota. It is not that operation-oriented company cannot innovate but the process was not built with the first priority for innovation. Therefore, the metric for measuring performance may be slightly different, which causes different results in long term. Not all companies in IT fields are innovative company. One counter example is the outsourcing companies or software house in Asia. To encourage innovation, the working process need to focus on people, minimize hassle, maximize collaboration and sharing.

Through my years in the industry with Water Fall, not so Agile and Agile companies, I feel that Agile work quite well for IT fields. It was built based on the right assumptions that software development is innovation work and less predictable compare to other kinds of engineering.

Company Culture

When Steve Job passed away in 2011, I bought his authorized biography by Walter Isaacson. The book clearly explains why Sony failed to keep its competitive edge because of inner competition amongst its departments. Microsoft suffer similar problem due to the controversy stack ranking system that enforce inner competition. I think that IT fields is getting more complicated and we need more collaboration than in the past to implement new ideas.

It is tough to maintain collaboration when your company grow to become an multi-culture MNC. However, it still can be done if management got the right mindset and continuously communicate their visions to the team. As above, the management need to be clear if they want to build an innovative company as it requires a distinct culture, which is more open, and highly motivated.

In silicon valley, office life end up quite late as most of developers are geeks and they love nothing more than coding. However, it is not necessary a good practice as all of us have a family to take care of. It is up to individual to define his/her own work life balance but the requirement is employee fully charged and feel exited whenever he come to office. He must feel that his work is appreciated and he has the support when he need it.


To makes it short, here are the kind of things that management can apply to increase productivity of the team:

  • Let the team involve in the recruitment. Recruit the person who takes programming as hobby.
  • Monetary prize or other kind of encouragements for self-study, self-upgrading.
  • Save money for company sponsored course unless for commercial products.
  • Make sure that the working process result oriented.
  • Apply Agile practices
  • Encourage collaboration, eliminate inner competition.
  • Encourage sharing
  • Encourage feedback.
  • Maintain employee work-life balance and motivation.
  • Make sure employee can find support when he need it.

Saturday, 24 May 2014

Testing effectively

Recently, there is a heaty debate regarding TDD which started by DHH when he claimed that TDD is dead.
This ongoing debate managed to capture the attention of developers world, including us.

Some mini debates have happened in our office regarding the right practices to do testing.

In this article, I will represent my own view.

How many kinds of tests have you seen?

From the time I joined industry, here are the kinds of tests that I have worked on:

  • Unit Test
  • System/Integration/Functional Test
  • Regression Test
  • Test Harness/Load Test
  • Smoke Test/Spider Test
The above test categories are not necessarily mutually exclusive. For example, you can crate a set of automated functional tests or Smoke tests to be used as regression test. For the benefit of newbie, let do a quick review for these old concepts. 

Unit Test

Unit Test aim to test the functional of a unit of code/component. For Java world, unit of code is the class and each Java class suppose to have an unit test. The philosophy of Unit Test is simple. When all the components are working, the system as a whole should work.

A component rarely work alone. Rather, it normally interacts with other components. Therefore, in order to write Unit Test, developers need to mock other components. This is the problem that DHH and James O Coplien criticize Unit Test for huge effort that gain little benefit. 

System/Integration/Functional Test

There is no concrete naming as people often use different terms to describe similar things. Contradict to Unit Test, for functional test, developers aim to test a system function as a whole, which may involve multiple components. 

Normally, for functional test, the data is retrieved and store to the test database. Of course, there should be a pre-step to set-up test data before running. DHH likes this kind of test. It helps developers test all the functions of the system without huge effort to set-up mock object.

Functional test may involve asserting web output. In the past, it is mostly done with htmlUnit but with recent improvement of Selenium Grid, Selenium became the preferred choice.

Regression Test

In this industry, you may end up spend more time maintaining system than developing new one. Software changes all the time and it is hard to avoid risk whenever making changes. Regression Test supposes to capture any defect that caused by changes. 

In the past, software house did have one army of testers but the current trend is automated testing. It means that developers will deliver software with full set of tests that suppose to be broken whenever a function is spoiled. 

Whenever a bug is detected, a new test case should be added to cover new bug. Developers create the test, let it fail, and fix the bug to make it pass. This practice is called Test Driven Development.

Test Harness/Load Test

Normal test case does not capture system performance. Therefore, we need to develop another set of tests for this purpose. In the simplest form, we can set the time out for the functional test that run in continuous integration server. The tricky part is this kind of test is very system dependant and may fail if the system is overloaded. 

The more popular solution is to run load test manually by using profiling tool like JMeter or create our own load test app. 

Smoke Test/Spider Test

Smoke Test and Spider Test are two special kinds of tests that may be more relevant to us. WDS provides KAAS (Knowledge as a Service) for wireless industry. Therefore, our applications are refreshed everyday with data changes rather than business logic changes. It is specific to us that system failure may come from data change rather than business logic. 

Smoke Test are set of pre-defined test cases run on integration server with production data. It helps us to find out any potential issues for the daily LIVE deployment.

Similar to Smoke Test, Spider Test runs with real data but it work like a crawler that randomly click on any link or button available. One of our system contains so many combination of inputs that it is not possible to be tested by human (closed to 100.000 combinations of inputs). 

Our Smoke Test randomly choose some combination of data to test. If it manage to run for a few hours without any defect, we will proceed with our daily/weekly deployment.

The Test Culture in our environment

To make it short, WDS is a TDD temple. If you create the implementation before writing test cases, better be quiet about it. If you look at WDS self introduction, TDD is mentioned only after Agile and XP

"We are:- agile & XP, TDD & pairing, Java & JavaScript, git & continuous deployment, Linux & AWS, Jeans & T-shirts, Tea & cake"

Many high level executives in WDS start their career as developers. That helps to fostering our culture as an engineering-oriented company. Requesting resources to improve test coverage or infrastructure are common here. 

We do not have QA. In worst case, Product Owner or customers detect bugs. In best case, we detect bugs by test cases or by team mates during peer review stage.  

Regarding Singapore office, most of our team members grow up absorbing Ken Beck and Martin Fowler books and philosophy. That why most of them are hardcore TDD worshipers. 

The focus of testing in our working environment did bear fruits. WDS production defects rate is relatively low.

My own experience and personal view with testing

That is enough about self appraisal. Now, let me share my experience about testing.

Generally, Automated Testing works better than QA 

Comparing the output of traditional software house that packed with an army of QA with modern Agile team that deliver fully test coverage products, the latter normally outperform in term of quality and even cost effectiveness. Should QA jobs be extinct soon?

Over monitoring may hint lack of quality

It sounds strange but over the years, I developed insecure feeling whenever I saw a project that have too many layer of monitoring. Over monitoring may hint lack of confidence and in deed, these systems crash very often with unknown reasons. 

Writing test cases takes more time that developing features

DDH is definitely right on this. Writing Test Cases mean that you need to mock input and assert lots of things. Unless you keep writing spaghetti code, developing features take much less times compare to writing tests.

UI Testing with javascript is painful

You know it when you did it. Life is much better if you only need to test Restful API or static html pages. Unfortunately, the trend of modern web application development involve lots of javascripts on client side. For UI Testing, Asynchronous is evil. 

Whether you want to go with full control testing framework like htmlUnit or using a more practical, generic one like Selenium, it will be a great surprise for me if you never encounter random failures. 

I guess every developer know the feeling of failing to get the build pass at the end of the week due to random failure test cases.

Developers always over-estimate their software quality

It is applicable to me as well because I am an optimistic person. We tend to think that our implementation is perfect until the tests failed or someone help to point out a bug.

Sometimes, we change our code to make writing test cases easier

Want it or not, we must agree with DHH on this point. Pertaining to Java world, I have seen people exposing internal variable, creating dummy wrapper for framework object (like HttpSession, HttpRequest,...) so that it is easier to write Unit Test. DHH find it so uncomfortable that he chose to walk way from Unit Test.

On this part, I half agree and half disagree with him. From my own view, altering design, implementation for the sake of testing is not favourable. It is better if developers can write the code without any concern of mocking input.

However, aborting Unit Testing for the sake of having a simple and convenient life is too extreme. The right solution should be designing the system is such a way that business logic is not so tight-coupling with framework or infrastructure. 

This is what called Domain Driven Design.

Domain Driven Design

For newbie, Domain Driven Design give us a system with following layers.

If you notice, the above diagram has more abstract layers than Rails or the Java adoption of Rails, Play framework. I understand that creating more abstract layers can cause bloated system but for DDD, it is a reasonable compromise.  

Let elaborate further on the content of each layer:


This layer is where you store your repository implementation or any other environment specific concerns. For infrastructure, keep the API as simple, dummy as possible and avoid having any business logic implemented here. 

For this layer, Unit Test is a joke. If there is any thing to write, it should be integration test, which working with real database.


Domain layer is the most important layer. It contains all system business logics without any framework, infrastructure, environment concern. Your implementation should look like a direct translation of user requirements. Any input, output, parameter are POJO only. 

Domain layer should be the first layer to be implemented. To fully complete the logic, you may need interface/API of the infrastructure layer. It is best practice to keep the API in Domain Layer and concrete implementation in Infrastructure layer. 

The best kind of test cases for Domain layer is Unit Test as your concern is not the system UI or environment. Therefore, it helps developers to avoid doing dirty works of mocking framework object. 

For mocking internal state of object, my preferred choice is using Reflection utility to setup object rather than exposing internal variables through setters.

Application Layer/User Interface

Application Layer is where you start thinking about how to represent your business logic to customer. If the logic is complex or involving many consecutive requests, it is possible to create Facades.

Reaching this point, developers should think more about clients than the system. The major concerns should be customer's devices, UI responsiveness, load balance, stateless or stateful session, Restful API. This is the place for developers to showcase framework talent and knowledge.

For this layer, the better kind of test cases is functional/integration test. 

Similar as above, try your best to avoid having any business logic in Application Layer.

Why it is hard to write Unit Test in Rails?

Now, if you look back to Rails or Play framework, there is no clear separation of layers like above. The Controllers render inputs, outputs and may contains business logic as well. Similar behaviours applied if you use the ServletAPI without adding any additional layer. 

The Domain object in Rails is an active record and has a tight-coupling with database schema. 

Hence, for whatever unit of code that developers want to write test cases, the inputs and output are nots POJO. This make writing Unit Test tough.

We should not blame DHH for this design as he follow another philosophy of software development with many benefits like simple design, low development effort and quick feedback. However, I myself do not follow and adopt all of his ideas for developing enterprise applications. 

Some of his ideas like convention over configuration are great and did cause a major mindset change in developers world but other ideas end up as trade off. Being able to quickly bring up a website may later turn to troubles implementing features that Rails/Play do not support. 

  • Unit Test is hard to write if you business logic is tight-coupling to framework.
  • Focusing and developing business logic first may help you create better design.
  • Each kinds of components suit different kinds of test cases.
This is my own view of Testing. If you have any other opinions, please feedback.

Friday, 23 May 2014

Software Development and Newton's Laws of Motion


I have no idea since when the word velocity found a new home in software development, it is nevertheless popular these days. However I am pretty sure that Mr Isaac Newton would not be happy if you talk about motion without mentioning his laws.

First Law

When viewed in an inertial reference frame, an object either remains at rest or continues to move at a constant velocity, unless acted upon by an external force.

There are a lot of external forces

  • developers are fixing bugs
  • developers are adding new features
  • developers are introducing more bugs (lol)
  • business requests to cut down the operation cost
  • third party competition is changing the market
  • users are changing
  • this list goes on and on

However a team/product is either dead (therefore remains at rest) or is moving at a constant velocity (let's say generating certain amount of revenue or eating certain amount of buget per day).

Now I declare, it is against the law to talk about team velocity, because what should you do to maintain the team's velocity? Nothing, you should do nothing!

Well, that will upset most of the managers, "I'd rather my developers do something".

So we need another law.

Second Law

F = ma. The vector sum of the forces F on an object is equal to the mass m of that object multiplied by the acceleration vector a of the object.

Acceleration is the ability to change the velocity. The F is treated as a constant here, because, come on, let's be honest, your team is pretty much fix sized, unless you are Google. Your time is pretty much fixed to 24 hours per day unless you live on Mars which is slightly longer, 24.622962 hours to be exact. Now we are screwed ... there is only one variable left to play. According to second law, for a given force F, the acceleration is inversely proportional to the mass. Mass is the burden, it is going against acceloration.

Here is a short list of how to gain some mass

  • too many good-to-have features
  • too much technical debt
  • too many abstractions, layers upon layers, ORM, DAO, service, controller, view. We need all of them to get some trivial {"user_id": 123} out of that database. oh forget to mention, there is SQL, and NoSQL ...
  • too many processes
  • too many patterns, EnterprisyStrategyFactoryBuilderAdapterListenerInterceptor
  • too many communication delegations, business -> project manager -> business analyst -> team leader -> developer (add more roles at your own will)
  • too many frameworks. JavaEE, Spring, Hibernate, Struts, Bootstrap, jQuery, Angular.js, Ember.js. Dare to lookup JavaEE? There are 39 JSRs listed under JavaEE7!
  • too many servers. Web servers, relational database servers, NoSQL servers, cache servers, message queue servers, third party integration servers ...

Yet, in the end you do want to make a change, do you? If your answser is NO, grats, you can stop reading here. Even the answer is yes, you can only say so after you read the third law.

Third Law

To every action there is always opposed an equal reaction: or the mutual actions of two bodies upon each other are always equal, and directed to contrary parts.

A: "Can we remove feature XYZ? so that the codes can be greatly simplified"
R: "Please no, that is Shareholder ABC's favorite"
A: "Ooookie, nvm"

A: "Can we change to git?"
R: "Nah, zip and email is our best friend"
A: "Maybe next time"

A: "Can we upgrade java 1.4?"
R: "There are too many servers in production"
A: "Fine, let's stick to manual casting"

Aaaaah, I still want to type some more words but there is an equal reaction preventing me from doing that ... So let's call this a day.

Thanks for wasting your time reading my rants.

Happy Coding ...



Wednesday, 21 May 2014

MySQL Transaction Isolation Levels and Locks

Recently, an application that my team was working on encountered problems with a MySQL deadlock situation and it took us some time to figure out the reasons behind it. This application that we deployed was running on a 2-node cluster and they both are connected to an AWS MySQL database. The MySQL db tables are mostly based on InnoDB which supports transaction (meaning all the usual commit and rollback semantics) as well as row-level locking that MyISAM engine does not provide. So the problem arose when our users, due to some poorly designed user interface, was able to execute the same long running operation twice on the database.
As it turned out, due to the fact that we have a dual node cluster, each of the user operation originated from a different web application (which in turn meant 2 different transaction running the same queries). The deadlock query happened to be a “INSERT INTO T… SELECT FROM S WHERE” query that introduced shared locks on the records that were used in the SELECT query. It didn’t help that both T and S in this case happened to be the same table. In effect, both the shared locks and exclusive locks were applied on the same table. An attempt to explain the possible cause of the deadlock on the queries could be explained by the following table. This is based on the assumption that we are using a default REPEATABLE_READ transaction isolation level (I will explain the concept of transaction isolation later)
Assuming that we have a table as such
1Collection 1
2Collection 2
Collection N
450000Collection 450000
The following is a sample sequence that could possibly cause a deadlock based on the 2 transactions running an SQL query like “INSERT INTO T SELECT FROM T WHERE … “ :
TimeTransaction 1Transaction 2Comment
T1Statement executed Statement executed. A shared lock is applied to records that are read by selection
T2Read lock s1 on Row 10-20 The lock on the index across a range. InnoDB has a concept of gap locks.
T3 Statement executedTransaction 2 statement executed. Similar shared lock to s1 applied by selection
T4 Read lock s2 on Row 10-20Shared read locks allow both transaction to read the records only
T5Insert lock x1 into Row 13 in index wanted Transaction 1 attempts to get exclusive lock on Row 13 for insertion but Transaction 2 is holding a shared lock
T6 Insert lock x2 into Row 13 in index wantedTransaction 2 attempts to get exclusive lock on Row 13 for insertion but Transaction 1 is holding a shared lock
T7  Deadlock!
The above scenario occurs only when we use REPEATABLE_READ (which introduces shared read locks). If we were to lower the transation isolation level to READ_COMMITTED, we would reduce the chances of a deadlock happening. Of course, this would mean relaxing the consistency of the database records. In the case of our data requirements, we do not have such strict requirements for strong consistency. Thus, it is acceptable for one transaction to read records that are committed by other transactions.
So, to delve deeper into the idea of Transaction Isolation, this concept has been defined by ANSI/ISO SQL as the following from highest isolation levels to lowest
  1. Serializable
    This is the highest isolation level and usually requires the use of shared read locks and exclusive write locks (as in the case of MySQL).
    What this means in essence that any query made will require access to a shared read lock on the records which prevents another transaction’s query to modify these records. Every update statement will require access to an exclusive write lock
    Also, range-locks must be acquired when a select statement with a WHERE condition is used. This is implemented as a gap lock in MySQL.
  2. Repeatable Reads
    This is the default level used in MySQL. This is mainly similar to Serializable beside the fact that a range lock is not used. However, the way that MySQL implements this level seemed to me a little different. Based on Wikipedia’s article on Transaction Isolation, a range lock is not implemented and so phantom reads can still occur. Phantom reads refer to a possibility that select queries will have additional records when the same query is made within a transaction. However, what I understand from MySQL’s document is that range locks are still used and the same select queries made in the same transaction will always return the same records. Maybe I’m mistaken in my understanding and if there’s any mistakes in my intepretations, I stand ready to be corrected.
  3. Read Committed
    This is an isolation level that will maintain a write lock until the end of the transaction but read locks will be released at the end of the SELECT statement. It does not promise that a SELECT statement will find the same data if it is re-run again in the same transaction. It will, however, guarantee that the data that is read are not “dirty” and has been committed.
  4. Read Uncommitted
    This is an isolation level that I doubt would be useful for most use cases. Basically, it allows a transaction to see all data that has been modified, including “dirty” or uncommitted data. This is the lowest isolation level
Having gone through the different transaction isolation levels, we could see how the selection of the Transaction Isolation level determines the kind of database locking mechanism. From a practical standpoint, the default MySQL isolation level (REPEATABLE_READ) might not always be a good choice when you are dealing with a scenario like ours where there is really no need for such strong consistency in the data reads. I believe that by lowering the isolation level, it is likely to reduce chances that your database queries meet with a deadlock. Also, it might even allow a higher concurrent access to your database which improve the performance level of your queries. Of course, this comes with the caveat that you need to understand how important consistent reads are for your application. If you are dealing with data where precision is paramount (e.g. your bank accounts), then it is definitely necessary to impose as much isolation as possible so that you would not read inconsistent information within your transaction.

Monday, 12 May 2014

How to build Java based cloud application

Recently, we were tasked to develop a SAAS application for big data analysis. To do data mining, the system need to store multi billion public posts in the database and run the classification process on them.

Classification in our context is a slow, resource intensive and painful process to assign a topic or sentiment to any record in the database. The process can last up to 24 hours with our testing data.

To cope with these requirements, our obvious choice is to build a cloud application on Amazon Web Services. After working on the project for a while, I want to share my own thought, understanding and approach to build Java based cloud application.

What is Cloud Computing

Let start with Wikipedia first:

"Cloud computing involves distributed computing over a network, where a program or application may run on many connected computers at the same time."

The definition may be a bit ambiguous but it is understandable as In The Cloud itself is more of a marketing term rather than technical term. For a newbie, it is easier to understand if we define it with a more practical way:

The only difference between traditional web application with the cloud web application is the ability to scale perfectly. Cloud application should be able to cope with unlimited amount of works given unlimited hardware. 

Cloud application is getting popular nowadays because of higher requirement for modern application. In the past, Google is famous for building high scale application that contains almost all available information in the internet. However, for now, many other corporates need to build applications that serve similar scale of data and computation (Facebook, Youtube, LinkedIn, Twitter,.. and also the people who crawl and process their data like us).

This amount of data and processing cannot be achieved with the traditional way of developing application. That lead us to an entirely different approach to build application that can scale very well. This is cloud application.

Why traditional approach of developing web application does not scale well enough

Traditional Approach of developing web application

Let take a look on why traditional application cannot serve that scale of data.

If you have developed one traditional web application, it should be pretty much similar to the diagram above. There are some other minor variations as merging of application server and web server or multiple enterprise servers. However, most of the time, the database is relational. Web servers are normally stateful while enterprise servers can serve both stateless and stateful services. 

There are some crucial weaknesses that cause this architect does not scale well enough. Let start our analysis with defining perfect scalability first.

Perfect scalability can be achieved if a system can always provide identical response time for double amount of work given double amount of bandwidth and double amount of hardware.

Perfect scalability cannot be achieved in real life. Rather, developers only aim to achieve near perfect scalability. For example, DNS servers are out of our control. Hence, theoretically, we cannot serve higher amount of requests than the DNS servers. This is the upper bound for any system, even Google.


Come back to the diagram above, the biggest weakness is the database scalability. When the amount of requests and size of data are small enough, developers should not notice any performance impact when increasing load. Continue to increase the load higher, the impact can be very obvious, if the CPU is 100% utilized or memory fully occupied. At this point, the most realistic option is to pump more memory and CPU to the database system. After this, the system may perform well again. 

Unfortunately, this approach cannot be repeated forever whenever problems arise. There will be a limit where no matter how much ram and CPU you have, performance will slowly getting worse. This is expectable because you will have some certain records that need to be create, read, update, delete (CRUD) by many requests. No matter whether you choose to cache them, store them on memory or do whatever trick, they are unique records, persisting in a single machine and there is a limit on amount of access requests that can be sent to a single memory address. 

This is the unavoidable limit as SQL is built for integrity. To ensure integrity, it is necessary that any information in SQL server should be unique. This characteristic still applicable even after data segregation or replication are done (at least for the primary instance).

In contrast, NoSQL does not attempt to normalize data. Instead, it chooses to store the aggregate objects, which may contain duplicated information. Therefore, NoSQL is only applicable if data integrity is not compulsory. 

Above example (from shows how data is stored in a document database versus relational database. If a family contains many members, relational database only store a single address for all of them while NoSQL database simply replicate the housing address. When a family relocate, the housing addresses of all members may not be updated in a single transaction, which cause data integrity violation. 

However, for our application and many others, this temporary violation is acceptable. For example, you may not need the amount of page views on your social page or amount of public posts in a social website to be 100% accurate.

Data duplication effectively removes the concurrent access to a single memory address that we mentioned above and give developers the option to store data anywhere they want, as long as the changes in one node can be slowly synced up to other nodes. This architect is much more scalable.


The next problem is stateful service. Stateful service requires the same set of hardware to serve requests from the same client. When the amount of clients increase, the best possible move is to deploy more application servers and web servers into the system. However, the resource allocation cannot be fully optimized with stateful services. 

For traditional applications, load balancer does not have any information of system load and normally spread the requests to different servers using Round Robin technique. The problem here is not all requests are equals and not all clients are sending identical amount of requests. That cause some servers are heavily overloaded while others are still idle.

Mixing of data retrieval and processing

For traditional applications, the server that retrieve data from database ends up processing it. There is no clear separation of processing data and retrieving data. Both of the two tasks can cause bottle neck to the system. If the bottle neck come from data retrieval, data processing is under-utilized and vice versa. 

Rethinking best approaches to build scalable application

Look at what have been adopted in our IT fields recently, I hardly found them as new inventions. Rather, they are adoption of the practices that have been used succesfully in real life to solve scalability issue. To illustrate this, let imagine a real life situation of tackling scalability issue.


Assume that we have a small hospital. For our hospital, we mostly serve loyal customers. Each loyal customer have a personal doctor, who keeps track of his/her medical record. Because of this, customers only need to show the ICs to be served by the preferred doctors. 

To make things challenging, our hospital is functioning before the internet era.

Stateless versus stateful

Is the description above look similar enough to stateful service? Now, your hospital is getting famous and the amount of customers suddenly surges. Provide that you have enough infrastructure, the obvious option is to hire more doctors and nurses. However, customers are not willing to try out new doctors. That cause the new staffs are free while old staffs are busy. 

To ensure optimization, you choose to change the hospital policy so that the customers must keep their medical records and the hospital will assign them to any available doctors. This new practice helps to resolve all of your headache and give you the option to deploy more seasonal staffs to cope with sudden surge of clients. 

Well, this policy may not make the customers happy but for IT fields, stateless and stateful services provide identical results.  

Data Duplication

Let say the amount of customers constantly surge and you start to consider opening more branches. At the same time, there is a new rising problem that customers constantly complain about the need of bringing medical records while visiting hospital. 

To solve this problem, you come back to the original policy of storing the medical records at the hospital. However, as you are having more than one branch, each branch need to store a copy of user medical records. At the end of the day or the week, any record change need to be synced to every branch.

Separation of Services

After running the hospital for a few months, you recognize that the resources allocation are not very optimized. For example, you have blood test and X-ray faculty in both branch A and B. However, there are many customer doing blood test in branch A and many people taking X-ray in branch B. 

It cause the customers keep waiting in one branch, while no one visit the other branch. To optimize resource, you shutdown the under-utilized faculties and setup unique blood test centre and X-ray centre. Customers will be sent from the branches to the specialized centres for special services.

Adhoc Resource

It is hard to do resource planning for hospital. There are seasonal diseases that only happens at a certain time of the year. Moreover, catastrophe may happen any time. They cause sudden surge of warded patients for a short period. To cope with this, you may want to sign agreement with the city council to temporarily rent facilities when needed and hire more part-time staffs.

Apply these ideas to build cloud application

Now, after looking at the example above, you may feel that most of the ideas make sense. It only take a short while before developers start to apply these ideas into building web application. 

Then, we move to the cloud application era.  

How to build cloud application

To build a cloud application, we need to find way to apply the mentioned ideas into our application. Here is my suggest approach


If you start to think about building cloud application, infrastructure is the first concern. If your platform does not support adhoc resource (dynamically bursting of existing server spec or spawning new instance), it is very hard to build cloud application. 

At the moment, we choose AWS because it is the most matured platform in the market. We have moved from internal hosting to AWS hosting one year ago due to some major benefits
  • Mutiple Locations: Our customers are coming from all 5 continents, using Amazon Region, we can deploy the instance closer to customer location, through that, reduce the response time.
  • Monitoring & Auto Scaling: Amazon offers quite a decent monitoring service for their platform. Due to server load, it is possible to do Auto Scaling.
  • Content Delivery Network: Amazon CloudFront give us the options to offload static contents from our main deployment, which will improve page load time. Similar to normal instances, static contents can be served from the nearest instances to customer. 
  • Synchronized & Distributed Caching: MemCache has been our preferred caching solution over the years. However, one major concern is the lack of support for synchronization among the nodes. Amazon Elastic Cache give us the option to use MemCache without worrying about node synchronization
  • Management API: This is one major advantage. Recently, we start to make use of Management API to spawn up instance for a short while to run integration test.

Provide that you have select the platform for developing cloud application, the next step should be selecting the right database for your system. The first decision you need to make is whether SQL or NoSQL is the right choice for your system. If the system is not data intensive, SQL should be fine, if the reverse is true, you should consider NoSQL. 

Sometimes, multiple databases can be used together. For example, if we want to implement a Social Network application like Facebook, it is possible to store system settings or even user profiles in SQL database. In contrast, user posts must be stored in the NoSQL database due to huge volume of data. Moreover, we can choose SOLR to store public posts due to strong searching capability and Mongo DB for storing of user activities. 

If possible, please choose the database system that support clustering, data segregation and load balancing. If not, you may end up implement all of these features yourself. For example, SOLR should be the better choice compare to Lucene unless we want to do our own data segregation.  

Computing Intensive or Data Intensive

It is better if we know that the system is data intensive or computing intensive. For example, Social Network like Facebook is pretty much data intensive while our big data analysis are both data intensive and computing intensive. 

For data intensive system, we can let any node in the cloud retrieve data and do processing as well. For computing intensive node, it is better to split out data retrieval and data processing. 

Data intensive system normally serve real-time data while computing intensive system run the background jobs to process data. Mixing these two heavy tasks in the same environment may end up reducing system effectiveness.

For computing cloud, it is better to have a framework to monitor load, distribute tasks and collect results at the end of computing process. If you do not need the processing to be real time, Hadoop is the best choice in the market. If real time computation is required, please consider Apache Storm.

Design Pattern for Cloud Application

To build a successful Cloud Application, there are something that we should keep in mind.

1. Stateless

It is a must to make all your services and server stateless. If the service need user data, include them as parameter in the API.

It is worth noticed that to implement Stateless Session on Web Server, we have a few choices to consider:
  • Cookie based session
  • Distributed Cache session
  • Database Session
The solutions above are sorted from up to down with lower scalability but easier management. 

For Cloud Application, most of the API call will happen through the network rather than internal method calls. Therefore, it is better if we can make the method calls safe. If you stick to the Stateless principle above, it is  likely that the services you implement are already idempotent.

Remote Facade is different with Facade pattern. They may look similar in term of practice but aim to fix different problems. As most of your API calls happen over the network, the network latency contribute a great part to the response time. With Remote Facade pattern, developers should build a coarse-grained API so that the amount of calls can be reduced. 

In layman's terms, it is better to go to supermarket and buy 10 things in one shot rather than visit 10 times, each time buy 1 thing.

4. Data Access Object

As you may transfer the data around, be careful with the amount of data you transfer. It is best to only give the minimum data as required. 

5. Play Safe

This is not a design pattern but you will thanks yourself for playing safe in the future. Due to the nature of distributed computing, when something go wrong, it is very difficult to find out which part is wrong. If possible, implement health check, ping, thoroughly logging, debug mode to every component in the system.


I hope this approach to build Cloud Application can bring some benefit to everyone. If you have other opinions or experience, kindly feedback and share with us.

In the next article, I will share the design of our Social Monitoring Tool.