Friday 28 February 2014

Welcome aboard, Lim Han and Da Peng

I am glad to inform that we have two new contributors joining this blog, Lim Han and Dapeng. This end my streak of solo posting for one month plus. This help to ease my loneliness and bring this blog to new height given the experiences and expertises they own.

Let give me a honour introducing them to all audiences.

Lim Han 

Lim Han is a senior Java developer with 10 year plus experience. He took Computer Engineering degree from NTU. He is one of the widest experiences developers that I have ever met in my career. The amount of jargons that Han use in his daily life may be greater than my English vocabulary. In term of personal characteristic, he is warm, friendly and have high passion on whatever thing he do. He cycle to office, handling huge amount of projects, refuse to go home if there is any challenge in work and treat us free coffee when he is in good mood. He like to share his stuffs and has started his personal blog long time back.

In term of technical skills, Han is an explorer. He worship Elon Musk and is a hard-core Linux user. He keep himself updated with technical news everyday and know about any concept or new technology even if it is still incubation. Han is famous for finding any chance to introduce new way of doing in our daily work and bring us endless joy of asking him to fix infrastructure. With his career full of experiments, not only in Java but also on infrastructure, dynamic language, big data, ... I hope he will bring us fresh view of what is going on in the world.

Liu Da Peng

Liu Dapeng graduated on same batch with me in NTU and even working on the same project once. This is very surprise, given the negligible amount of time we spent in school. He has never failed to impress me with his technical expertises and talent on problem solving and innovation. In term of characteristic, he like new ideas, talkative and hyper-extrovert. Dapeng has strong interest in NetBean, JavaEE 6+ and opposing whatever design I propose. 

Few years back, we joined a start-up company and have fun building a system from scratch, which finally have the best load test of the world.

http://www.networkworld.com/news/2011/112411-largest-ddos-attack-so-far-253462.html

That happened when someone worried that our system is not reliable enough and helped to hire an army of hackers to load test our system. The load test peaks at 100 Gbps with 250.000 infected computers joining the show. The load test happened 2 weeks after I left, kudos to Dapeng and the team to hold the line and fought well. Due to this achievement, I found no one better to share with us about security and scalability.

Let warmly welcome both of them!



Thursday 27 February 2014

The Answers to The Questions

Tony invited me to answer these multitudes of questions he posed in an earlier blog post and so I humbly offered some of my personal thoughts to these questions.


Name 3 improvements/revolutions in Java world that you are most impressed with

In the Java world, revolutions are overstated given the amount of backward compatibility they have with previous versions. The people behind the Java standards ensure that changes are always incremental and deprecated methods can survive at least 2 major revisions before being taken out. There's a reasonable amount of confidence that code compiled in Java 1.4 can still work with the latest JVM. So, if I would want to name 3 improvements that I feel impacted the Java world most, these are the following improvements that would be up there amongst the most impactful improvement

1. The popularization of the concept of Inversion of Control (IOC)

While technically not just a Java concept, the popularisation of the concept of IoC brought forward in the Spring Framework provided an alternative to the J2EE framework that has been standardised by Sun Microsystems. The concept of assembling components during runtime provided a means for applications to be written in a structured and cogent manner while reducing the amount of coupling between them. Many concepts from Spring has since made their way into the official JEE framework (together with major technologies like Hibernate)

2. The introduction of generics and annotations in Java 5

If there's any revolution to the language, this probably is it. While I would not necessary say I was impressed by it, I must say that it changed the way Java was written and it provided quite a fair bit of joy to people who are writing code using annotations while simultaneously inducing tears to those who had to debug code with it.

3. Enter the Lambda

Java 8 is going to introduce the concept of the lambda, a construct that is familiar to fans of dynamic languages while Java developers live in total oblivion to its existence. Millions of lines have been expended trying to write a Swing GUI Event Listener using anonymous inner classes like this :


        btnPrint.addActionListener(new java.awt.event.ActionListener(){
            public void actionPerformed(java.awt.event.ActionEvent evt) {
                btnPrintActionPerformed(evt)
            }
        }

it's now gonna look like this :


        btnPrint.addActionListener(evt -> btnPrintActionPerformed(evt));

Happy Days!


Name 3 ideas or technology trends that you believe will be popular in future.

I'm not much of a star gazer but judging from the current technological trends and patterns, I believe the following will become key drivers (or continue to become key drivers) in the technology space.

1. Migration of information from traditional locally based systems to cloud-based platforms

This is probably a no-brainer but I believe that the cloud-based platforms are going to continue driving technological trends in the next few years. We have cloud-based services, storage, applications, build environments, test environments, development environments etc...

If you are not familiar with virtualization and its accompanying technologies, then I believe you are falling behind the curve. I can argue that knowledge of major cloud platforms like AWS is particularly crucial for IT professionals who are part of a highly agile and evolutionary organisation.

2. Predictive Data Analytics 

This is probably the biggest buzz word in the past 2 years. So what if machines are not probably going to surpass human intelligence anytime soon ala I, Robot? They are probably gonna be analysing and mining more data than ever in the history of mankind with improvements in technology and hardware. Any business that is able to predict the behaviour of human will have an edge over their competitors. Thus, any technology or techniques that aid in performing this kind of complex data analysis is going to be popular.

Think BigData storage technologies that can store zillion bytes of data. (pick your favourite Distributed File Systems and NoSql database servers)

Think technologies that can sift through those zillion bytes of data. (Hello Hadoop!)

Think advanced algorithms in A.I and Machine Learning

When combined together, these technologies are able to push food ads to you even before you realised you are hungry (ok.. that could be exaggerating a bit)

So in order to truly capture the zeitgeist of our age, as IT professionals, it is our duty to be conversant at least in some of these principles, if not an expert in them.

3. Restful Web Services and the Systems Integration of the World

Web systems will connect to each other in greater numbers and the language that they will speak to each other will probably be JSON. It is a language that is concise and lightweight compared to the verbose XML that was all the rage a decade ago. 

XML and the various incarnations it lives by has proven too complex and heavy for it to grow organically in an IT system. The mere mention of SOAP instills fear into the minds of many. Ditto for SOA systems .. the mind boggling array of implementation standards that have germinated from the XML seed has made it such a huge deterrent to implementation that it is unsurprising that the people who are really doing the development work and not theorizing about standards have gravitated towards JSON instead.

Together with the concept of Representational State Transfer architecture proposed by Roy Fielding in this paper , web services are talking to each other with greater ease due to the implementation of this concept built on top of a standard that is already well understood (HTTP)

The growing popularity of RESTFul APIs will make the web the gigantic interconnected system that it was originally planned to be.

Name 3 major issues that you would want to elaborate about IT in Singapore and/or Asia.


1. Career progression in a software career is how fast you can get out of it.

In Singapore, a successful software engineer/developer/<put your favourite title> is someone who managed to work his way "up" the career ladder so he doesn't touch any development but instead focus on "managing" people or analysing business needs. It's interesting how much management needs to be done for people who are sufficiently intelligent enough to be developing software. Truth be told, it's highly unlikely that Singapore's going to have a Mark Zuckerberg or Larry Page/Sergey Brin given the general disdain for even doing the dreary work known as coding. In order for software engineers to excel in Singapore/Asia, the same amount of recognition given to good engineers in the States should be accorded to software engineers here. Truth be told, there is still a demand for routine IT work that makes the systems in corporations all over the hum smoothly. However, there should be distinction between software engineers who have the know-how, the agility and the rapid learning curve to embrace change and build strategic software Vs the software maintainers in charge of ensuring corporate systems run smoothly. Think about software that will enable businesses to leapfrog ahead of their competitors vs your typical payroll/accounting/HR software. In fact, I would argue that a better use of resources would be to outsource all these software to 3rd party vendors who will make these their strategic software.

2. There is insufficiently exposure/vibrancy to emerging technologies due to the tepidness of the scene here.

In Singapore, the technology / startup scene is relatively cooler than many parts of the world right now. Despite the flourishing startup scene all over the world spearheaded by Internet-based enterprises, Singapore has been relatively cool in spawning up companies that is hitching on this Internet wave. There's a general perception that good software engineers are hard to find over here. There is probably some truth in it given that most people in the IT software industry generally give up doing practical development work a few years into their work so that they take on a "leadership/management/consulting/business analyst" role. The myth and perception is that people who remain in development are generally not moving in their career. With this kind of perception, it's nearly impossible to get good people with sufficient experience and well-honed skills to develop systems. This would eventually lead to a brain drain in development on the higher echelons of development/technical experience as good software engineers get siphoned into alternate roles that pay better.

If more than anything, it is this mindset that will eventually set us back against other countries who have people with decades of experience in development and technical work.

3. Innovation is stifled

In a culture where deviation from the norm is scoffed at and where attempts to trail blaze are put down as fast as you can say SOP, it's hard for any disciplines that require a combination of technical prowess and creativity to thrive. I don't think there's any other technical disciplines that could come as close as writing code to being an art form. Truly, code can be written beautifully. Any developer worth their salt can tell you the difference between well-written and ugly code.

The education process in Singapore/Asia also plays a part in inculcating such a mindset. Rote learning and memorization to game the education system has become an economically valuable skill as compared to investigative and exploratory learning. We are trained to get straight As in the shortest amount of time. Spending more time to understand what we've learnt has been secondary to cut-and-pasting code so that we get our goals. No innovation can be made by merely aping the work of others.

If software professionals are to really make meaningful progress in Singapore/Asia, we have to address these issues first.

If you need to share 3 lessons to your junior developers, what would you say?


1. Keep learning and exploring

It may sound cliched but in this industry, you simply cannot stop learning more. Read up on industry norms and practices. There are so many valuable online reading material that it will be inexcusable not to find new things to learn.

Read sites like InfoQ , Hacker NewsRedditArsTechnica etc.. for trends in technology and development.

There are also tons of resources that focus on teaching programming / development skills.

2. Wear a T-shirt persona to your self-development

A developer with a T-shirt character is an expert in a specific field e.g. Java/C#/Javascript etc development (depth of skill) while maintaining general expertise in a large number of fields .. (breadth of skills)

So a valuable member of a team is someone who can perform basic skills like setup computer OS/es , design simple pages using CSS/HTML while having expert level skills in Java development.

3. Attitude is paramount

I would argue that a developer with the right attitude would surpass a more intelligent developer in terms of contributions to a team. However, attitude is an abstract idea that makes it hard to distill into character qualities that are obtrusive.

If I could some sieve it into something more concrete, I would consider the right attitude to consist of the following:


  1. Perseverance in the face of obstacles or discomfort to achieve a greater goal. Someone who gives up easily on the first encounter with resistance would not make a good developer.
  2. Someone who considers his works with pride and would not settle for mediocrity.
  3. Someone who would take up challenges and not avoid them.
  4. Someone who is open to better ideas and not stick to dogma or their comfort zone.
  5. Someone who works to accomplish goals and not clock hours.

Name 3 common mistakes that developers tend to make.


1. Copy pasting code "that worked before"

This is subjective as there are many proponents of code-reuse but I would argue that this is not an example of code reuse. Rather, an example of code reuse would be refactoring code so that a method can be shared classes. Many times during the course of my career, I've seen people who copy and paste code from the Internet and pray that the same code work without truly understanding what it is trying to do. Most of the time, such an approach may work but when it doesn't, it's gonna be very trying to debug.

2. Inadequate understanding of multi-threaded concepts e.g. Thread Safety

I'm personally guilty of making this mistake as concurrency and thread safety is a tough concept to grasp and working on the level of having to ensure that your code is safe from deadlocks is not something that I do often. However, it's important even in day-to-day web development to understand how different threads working concurrently could severely impact how your code run if there are shared objects between these threads. So it is important not to make the same mistake I committed by having a thorough understanding of concurrency concepts if you are working in Java. For proponents of single threaded processing paradigms like Javascript or Node.js , I guess this isn't much of an issue.

3. Applying the wrong database solution to a problem.

Relational Databases have been around for years. People are definitely more comfortable writing SQL queries than tinkering with non-relational databases. However, as I've mentioned in an earlier response to technological trends, the era of BigData is upon us. Unfortunately, I've yet to be convinced how relational databases can function reliably and speedily with large database sizes. My team is now currently working with datasets that are not even in the terabyte range and we are experiencing timeout issues with MySQL when we perform queries (and that is with some degree of optimisation work we have done). Thus, I'm convinced that the out of the box MySQL definitely would not work and even with a powerful database server that has been tuned to death, I'm not sure if fitting a terabyte of data would even work practically even if the filesystem/table primary key size supports tables of that size. Thus, it's important to size up the use case for each problem that we have and select the most suitable database to address the problem. Martin Fowler's book seemed to have a reference to such a philosophy with the concept of Polyglot Persistence. I have not read this book yet but I'm definitely gonna be picking it up.

If you need to define you programming/design style in 3 points, what would you say?

I would say my development style would to be as concise as possible while maintaining a certain degree of flexibility. This might sound like a oxymoron given that flexibility is usually provided by increasing the amount of code but I believe that with simple rules of thumb e.g. code to interfaces (instead of concrete classes) and then relying on IoC or Dependency Injection for runtime assembly, you can create flexible code without the excess baggage.

I spend equal amounts of time developing in both dynamic languages (Python, Ruby, Javascript) as well as Java. I can't say I am partial to either as they all try to achieve the same goal, which is to provide a way to express code that form reliable, useful applications. Java provides a stable reliable way to navigate code within the IDE and with sufficient experience, one can be very productive when working with Java. However, the amount of concepts involved in mastering Java could be mind-boggling for a newbie.

On the other hand, dynamic languages like Python and Ruby can easily be picked up and they have a very natural language style of coding which expresses succintly what Java might take 3 times more code to express.

I'm also a believer in using the right language for the right problems. For example, right now I'm also looking at using a little R in my quest to master machine learning concepts. For interaction with hardware or low level devices, I think it's hard to avoid using C. I'm not as experienced in these fields but if I need to solve a problem in these domains, I will not hesitate to polish up my rusty C programming skills.

Wednesday 26 February 2014

Self Interview

As many of us have been in the industry for quite a long time, I believe each individual has an interesting story to tell. I did share my thoughts and my colleagues did that as well. However, would it be more useful if we write down our story and let ideas spread out and discussed? It will benefit others and benefit us as well.

To begin, I will come up with some questions and volunteer myself to be the first to answer. Hold your breath, my friends, these questions come to you soon!

1. Name 3 improvements/revolutions in Java world that you are most impressed with.

2. Name 3 ideas or technology trends that you believe will be popular in future.

3. Name 3 major issues that you would want to elaborate about IT in Singapore and/or Asia.

4. If you need to share 3 lessons to your junior developers, what would you say?

5. Name 3 common mistakes that developers tend to make.

6. If you need to define you programming/design style in 3 points, what would you say?

Now, let I take my turn!

1. Name 3 improvements/revolutions in Java world that you are most impressed with.

My first candidate for this will be the resurrection of Reflection. If you happen to be one of the guy from the early day of J2EE that last until now, you must have gone through major self-upgrade plus re-thinking of practices and designs. The way we developed Enterprise applications in the past and now are simply too different. The differences do not simply come for syntax change, it is also come from change of concepts and philosophy. There was a time when I revisited a book of design patterns for J2EE and hardly found any of the popular design patterns in the past managed to stand the test of time. I would say the modern design concepts (IOC, AOP, POJO,...) would never be real without reflection.

The second candidate I like to mention is the Lean Software Development. Originally, I plan to put Agile Software Development but later decided to switch to Lean Software Development (LSD). Agile is cool but LSD is even more awesome, it is the cure of Java world from the mess we created earlier. Early days of Java witness bloated framework plus the factory style of Java development. Some smart people may already knew that this way will not work but to be able to convince the whole world to walk away from Water Fall is impressive. I still got chance to experience the Water Fall style in the beginning of my career and I am happy to confirm that LSD is much more natural and effective way of working. Developers have better chance to voice out and contribute toward final goal. I would never take a single chance to go back to old style working environment.

The last candidate that I would like to mention is Asynchronous Programming. I would like to include AJAX as well because it is more or less the same concept. Asynchronous allow us to do work concurrently and free user from waiting for background process. This idea shape up most of the improvements on client side and also help to boost performance on server side. I bet that it will be much more popular in the future because of multi-cores CPU.

2. Name 3 ideas or technology trends that you believe will be popular in future.

My first bet would be Data Analytic. You do not need to be a developer to know that Google give everything for free but steal your information. It is quite obvious that information is money. We, the one who build the system get first hand access to this money and let it slip through our finger almost all of the time. So, this need to be changed soon. The margin for software development is brutally compressed in the last few years, thanks to the competition from China and India. The greener pastures is the combination of data analytic with software development. With this model, developers do not only offer application, they also help customers to understand more about consumer behaviour and provide data to aid business decision.

My second bet is Artificial Intelligent. Same thinking as above. If you develop dumb application as others, then who is going to offer lowest price will get the contract. This price war will sooner or later put pressure to developers working life and make work life balance seem more like a fantasy. The only viable exist strategy is to stay in front and deliver what the rest of the world does not deliver. I find AI is one of those things. Again, you may notice that Google is already on the front line of this. Google Prediction API is solid and Google give you the ad that somehow relate to your online activity. Definitely, there is no one sitting there to monitor you but from the data collected, Google application tweak itself to fit you better. Sooner or later, people will adopt this idea and you will find yourself being spied on all the time.

The last bet is the popularity of UX Design (User Experience). I am impressed with the UX Designer that working together with me. The projects meetings are in Asia and Euro but they fly her there to join meetings. The way UX Designer analyse consumer, creating persona and manage to produce that design that fit each persona well is amazing. That make what they teach me in Information Architect course seem suck. Contradict to my original thoughts, it is not that bad to have duplication of data in your interface. Consumers don't care; so, developers should not care as well. However, any saving of mouse click for potential users do count and increase the chance of user come back to your site.

3. Name 3 major issues that you would want to elaborate about IT in Singapore and/or Asia.

What I feel upset about IT industry in Singapore/Asia is that our mindset are locked. Software house is still treated as a factory that produce code instead of TVs or cars. Developers are white collar worker that suppose to follow instructions and produce products. Agile is rooted but it grow here much slower than other parts of the world. It is proven that the old style of coding do not work. What have we learnt from Facebook, Twitter, WhatApp, Google,...? In this industry, the winner is the innovators not the hard-working people. Innovators work hard but working hard does not produce innovation. OT do more harm than good in the long run as it suck out the happiness in developers and unhappy people are less innovative.

Another concern is the way we think about management here is different from what I saw in Western world. From what I have seen, they treat management as one profession and one kind of boring task that someone need to take to let the developers focus on innovating. It have nothing to do with hierarchy or corporate ladder. It may be surprise at first but it seem more naturally this way. Project Management is the combination of resource planning, organizing meeting and none of it need a hierarchy to work. Giving manager more than what he need to do task is redundant and less efficient. The better way is the self-managing team as developers are educated and goal-oriented people.

The last concern of mine is the standard of developer in the market is too low. I got a feeling that any one can pick up a book like "Learning Java in 7 days" and step into labour market. This make interviewing very tedious and it is a hard work training new staff.

4. If you need to share 3 lessons to your junior developers, what would you say?

So, to every person who choose Java as career, I would emphasizing on understanding fundamental. To be honest, this is not what I observed people like to do but this only make it more important. Following instruction only make you a follower but understanding fundamental give you confident to make changes when needed and this is what we really lack here in Asia.

Another advise I would give is to take any chance to be creative on your daily work. If you do not have fun while working, office hours is long and tired. Unless you are super hero that devote life to make business better, finding a balance is crucial in long term. And remember what I say earlier, innovation make people happy and happy people innovate better.

The last thing I would say that presentation skill paid off. Do not think that communicating with computer is enough. Even if you have good ideas, if you cannot manage to let other people feel the greatness of your idea, it may not have a chance to be tried on. Moreover, effective team require effective communication. I do pairing everyday and I would prefer to pair with extrovert over introvert people.

5. Name 3 common mistakes that developers tend to make.

My first vote for this is Thread Safe. This is one of the thing highlighted in any Java course but I keep seeing people making this mistake. May be no one represent it to them like StringBuffer is threadsafe and StringBuilder is not.

The next vote is for scalability. I often see people writing method like getAll() instead of get(from, to). The problem that you do not know how much data you will have in DB in the future. Therefore, this kind of code is like a time bomb that willing to explode in the future.

The last mistake I would vote for is the trend of over-use of Ajax. Ajax do shine but only when you use it with a proper amount. Ajax to maximum should not be the way to go. I would recommend the resurrection of Facade pattern in client-server communication. The code is a bit hard to write but if you look at debug console in your browser, the effort is paid off.


6. If you need to define you programming/design style in 3 points, what would you say?



Look at the picture above, this is my illustration of DDD (Domain Driven Design). There was one time, I was tasked by my boss to build a system that can be extensible to serve mobile, serve web or any other protocols in the worlds. To do that, I build the core with POJO and nothing else. I would call it my PRECIOUS.  I would let the MVC framework or whatever things I add on top of this core do the translation so that the world can communicate with my Precious.

Another thing that I like to organize packages by layer rather than business logic. Simply speaking, service sit next to service and DAO sit next to DAO. It go against one of the guideline I read that all class in one package need to be related. However, organizing package this way make my AOP simpler to write.

People complains that I like to rewrite code. I would defend myself that I like to fix logic bug. I count inefficient or not so natural code as logic bug. Handling with logic bug reduce my efficiency and cause bad mood, so I will always try to eliminate whenever I see it.

I like static type language and like to make the fullest use of IDE. People say dynamic language cool and help to speed up the learning curve. I personally do not need this. I already gone through 10 years of coding and have plenty of samples to clone. I will not need longer time than a ROR developer to start new webapp. Moreover, most of my applications are heavy weight-lifting category and this is where static type language shine.

Last words, I hate long code, if got time, I will try my best to shorten it.





















Saturday 22 February 2014

Aspect Oriented Programming

To clarify, I am a big fan of Aspect Oriented Programming (AOP). Interestingly, AOP was invented by folks in Xerox PARC, who I am working with now. As this programming paradigm is success and widely adopted in the industry, all the programmers should have a good understanding of it.

What is Aspect Oriented Programming?

If you look at Wikipedia, they say

"aspect-oriented programming (AOP) is a programming paradigm that aims to increase modularity by allowing the separation of cross-cutting concerns"

I am not very sure if you understand what they are saying. For me, after 6 or 7 years using AOP, I still have a hard time figuring out what this definition is trying to describe. Technically, it is perfectly accurate but it may not give you a clue of what is AOP and why should you use AOP. Let me try to explain it in my own way.

When you do your coding, there are always business logic and some boilerplate code. Boilerplate mixing with business logic is what you do not like because it make the code difficult to read and developers less focus on business logic. AOP attempt to solve this issue by an innovative way of splitting boilerplate code out of business logic.

There are two characteristics that boilerplate code need to satisfy if it is to be removed from business logic:
  • It is generic enough to be commonly executed for various objects.
  • It must happens before and/or after business logic .   
When the first characteristic is satisfied, you can remove the boilerplate code and put it to some other places without any harm to the functionality. When the second characteristic is satisfied, we can define a point-cut or cross point on any objects that suppose to run this boilerplate code. Then congratulation, the rest of work is being done by the framework. It will help you to automatically execute the boilerplate code before/after your business logic. 

Why AOP is cool?

AOP is cool because is make your project cleaner in many ways
  • You do not mix boilerplate code with business logic. This make your code easier to read.
  • You save your self from typing the same code again and again.
  • The code base is smaller and less repetitive. 
  • Your code is more manageable. You have the option of adding a behaviour to all class/methods in your project in one shot.
To let you feel the power of AOP, let imagine what will happen if you have this magic in real life. Let say in one command, you can make every citizen of Singapore donate 10% of income for charitable work. This sound much faster and less hassle than go to ask every individual to do this for you. This example show that AOP work best when you have lots of objects in your system that sharing the same feature. 

Fortunately, there are lots of common features like that in real applications. For example, here is some stuffs that you will often need to implements:
  • Log the execution time of long running method.
  • Check the permission before executing method.
  • Initiate transaction before method and close transaction after method completed.
How to implement AOP?

If you want to share boilerplate code, you can implement it in the old-fashion way. For example,

import java.util.logging.Logger;

public abstract class LoggableWork {
 
 public void doSomething(){
  long startTime = System.currentTimeMillis();
  reallyDoWork();
  long endTime = System.currentTimeMillis();
  Logger.getLogger("executableTime").info("Execution time is " + (endTime-startTime) + " ms");
 }

 protected abstract void reallyDoWork(); 
}

Then any class can extend the abstract class above and have their execution time be logged. However, the world has long abandoned this approach because it is not so clean and extensible. If you finally figure out that you need transaction and security check, you may need to create TransactionWork and SecuredWork. However, Java do not allow one class to extend from more than one parent and you are stuck with your awkward design. For your information, there are two more problems with the approach mention above. It force developer to think of what boilerplate code they need to have before writing business logic. It is not natural and sometimes not predictable. Moreover, inheritance is not a favourable way of adding boilerplate code. Logging execution time is not part of business logic and you should not abuse your business logic for whatever cool stuff that you want to add on your project. 

So, what is the right way of having AOP? Focus on your business logic and create your class/method without worrying of logging, transaction, security or anything else. Now, let add the logging for execution time for every methods in the system:

@Aspect
public class LogAspect {

  @Pointcut("execution(public * *(..))")
  public Object traceAdvice ( ProceedingJintPoint jP, Trace trace ) {

    Object result;
    long startTime = System.currentTimeMillis();

    try { 
      result = jp.procced();
    } finally { 
      long endTime = System.currentTimeMillis();
      Logger.getLogger("executableTime").info("Execution time is " + (endTime-startTime) + " ms");
    }

    return result;
  }
}

What you just create is call interceptor. The annotation @Pointcut define the place that you want your boilerplate code to be insert to. This line tell the framework to let the method executed as usual:

result = jp.procced();

Then, you mark the time before and after the real method is executed and log result. Similarly, we can create Interceptor for transaction and security check as well. The terms you see above (PointCut, Aspect) is taken from AspectJ. One aspect represent one behaviour you want to add to your project. That why the approach of writing business logic first and adding aspect later is called Aspect Oriented Programming.

There are thousands of ways to create a point cut and above example is one of them. You can create point cut base on package, annotation, class name, method name, parameter type and combination of them. Here is a good source to study AspectJ language:

https://eclipse.org/aspectj/doc/next/progguide/language.html

Want to know more about underlying implementation of AOP?

If you just want to use AOP, the above parts are sufficient, but if you really want to understand AOP, then better go through this part. AOP is implemented using Reflection. At the beginning day of Java, reflection is not recommended to be used in real-time execution because of performance issue. However, as the performance of JVM improve, reflection is getting popular and become the base technology to build many frameworks. When you use any framework that based on reflection (almost all frameworks in the market), the Java object is no longer WYSIWYG (what you see is what you get). 

The object created in JVM still respect  the contract of class, interface that it belong to but it have more hidden features than what you see. 


In our case, the objects, which execute method doSomething() is a proxy of the contract class. The framework construct the bean and create the proxy to wrap around it. Therefore, any time you call the bean to doSomething(), the proxy code doBefore() and doAfter() are executed as well. You even have the choice to bypass the invoking of method doSomething() on the inner bean inside (for example if they fail permission check). 

It is easy to see this only work if you let the framework create the bean for you rather than create the bean yourself. I have encountered some developers asking me why this code does not log:

Bean bean = new Bean();
bean.doSomething();

It is so obvious that the developer create the bean him self rather than the framework. In this case, the bean is just an ordinary Java object and not proxy. Hence, it is out of framework control and no aspect can be applied on this bean. 







Thursday 20 February 2014

A brief history of Java EE

Java was introduced in 1995. It is suppose to be an object-oriented, cross-platform language that simplify the development of the World Wide Web era. As many of us joined the industry long after the born of Java, I think it will be useful to review the progress of Java over the years.

Promise of Java and popularity of Aplet 

Before Java, programming languages are not cross-platform. All the applications (in that era context, desktop applications) are developed for a specific OS platform. The Window OS was in its early day and most of the enterprise applications were built for Unix. Cross-platform software was not a real concern until until World Wide Web getting more and more popular.

Developer want to provide richer experience to their website need to figure out some ways to provide more interaction to web page. The code/logic suppose to run on client machine rather than server, where developers got no idea which OS, platform is being used. The applet of Java provide a perfect solution to the problem. Let do not mention some other benefits of Java like garbage collection and fully object-oriented, cross-platform make development for client-side application become much easier and cheaper. This is the major limitation of popular languages at that time (C++, VB, Delphi), which make them cannot compete with Java on portable device or client.

Slowly, everything turn to Applet, static page is getting animated and content is dynamically generated. The protocol for Applet to communicate with server was RMI. The popular architecture is to have multiple applet, running on different browsers communication with a single server, which may include or connect to database.



However, as  the browser wars scaled up and security became the major concern, applet seem to suffer heavily from the deviation of browser standards, sandbox and firewalls. If there are too many specific code to handle specific browsers, it is pointless to choose Java as the tool to develop client application. As applet is a fully functioned software that run on the client OS, it is quite easy for hacker to do harm to client computer. To prevent this threat, sandbox was introduced so that many Java API call will return SecurityException when being used in applet (for example accessing file system). Moreover, applet communication make use of RMI, which operate on other port rather than HTTP protocol which is often get blocked by client firewall

Another limit of applet is the singleton nature of RMI. RMI bound the service to a specific port, and use a single UnicastRemoteObject to serve all clients. To make the situation worse, Java Transaction API is not yet created until 1998. Without transaction isolation, the only way to make database action thread-safe is serialization of access, which is extremely punishing in term of performance.

J2EE

The next era of Java observed the trend of shifting of processing from client back to server. As building the logic on client is harder and RMI is pretty limit of term of multi-thread, developer choose to build complicated logic on server side. The introduction of Servlet API make the task simpler. The format of content is simply dynamically generated HTML, served through HTTP protocol, which is firewall friendly. The Servlet is not thread safe but multi-threaded, and can be used to serve content for multiple requests.

As Java is getting more matured, many other APIs are created around this time to solve various concerns of building Java Enterprise Applications:

Java Server Pages
Enterprise Java Bean
JDBC
JMS
JNDI
Java Transaction API
Java Mail

All vendors attempted to lure customers with commercial APIs that help to build enterprise application. That pose a threat to the growing of Java ecosystem. To bring Java forward, industry leader discussed and introduced Java 2 Enterprise Edition (J2EE) in late 1999. The idea is to create a common standard, which will make enterprise application portable. J2EE was widely adopted with many vendor provide commercial application servers following the standard.

However, there are many mistakes in creating of J2EE, which were soon to be discovered. J2EE is too ambiguous, it left out many aspect in database handling such as connection pool, transaction isolation, locking mechanism. That left vendors with no other choice rather than build their own standards on top of J2EE. This make write one, run every where idea less and less practical.

Moreover, the choice of specifications to be parts of J2EE is questionable. EJB is criticized to be bloated. Deployment descriptor is long and cumbersome. The idea of providing home and remote interfaces make the specification repeated as developer got high chance to provide the same methods for both interface. The idea of letting the bean handling the persistence of itself make the class heavy, mixing business logic code with persistence logic. It is very unfortunate that before J2EE, there is already one Java technology named Java Blend that offer a clean implementation of ORM and operate on POJO object. However, EJB was chosen over Java Blend and developers need to write lots more code to do simple tasks.

Still, this is the period where Java is getting more and more popular. The introduction of .Net in late 2000 provided a solid alternative to J2EE to build enterprise application but Java was still dominating in Banking and Finance industry.  

A new beginning

One of the things that Java outperform .NET is its vibrant community. It is famous for challenging the norm and provide alternative solutions than adopting it. Many awesome products are originated from Java community to fix/challenge the standard solution dictated by Sun. Let take a look at some of them:

Log4j is globally accepted as replacement over Java Logging API
Ant offer a solid automation tool for Java project
JodaTime offer cleaner and more intuitive API over Java Date Time API
Hibernate provide alternative approach of persistence which operate on POJO, pretty much similar to Java Blend.
Spring framework offer cleaner approach over EJB which operate on POJO
and many other products, ...

Getting tired with writing big chunks of code and cumbersome API, Java developers has shown creativity on designing cleaner and simple solution that most of the time easier to use and faster to run. The work contributed by the community is so impressive this time that it soon brought J2EE to obsoleted. Sun slowly adapt to the situation and let the community drive the development of Java. Many works and ideas are slowly absorbed to later version of Java EE.

In this era, Java observed to strong trend of simplify development and separating framework code with business logic. POJO, the class you wrote from the first day of Java was give a new name as Java Plain Old Object. Inheritance become the villain as it put framework related code to business logic. So how we attempt to guide framework to do all the hard work for us without inheriting some troublesome abstract class? It all done with XML and Java Reflection API.

Java reflection API has been introduced much earlier but it is not advisable to use due to performance issue. However the JVM performance has improve so much in those years that it finally become a viable option. Suddenly, the whole lots of possible solutions is open in front of designer eye. Dependency Injection become the norm of constructing bean. The persisted object also can be built using reflection. Also from Spring, it is recommended to only expose getter for what you do not want client to change value. That make the object immutable, which is safer and cleaner to pass around.

Another notable trend is the rise of AOP. There is nothing I like more than AOP when developing complicated system. AOP separate the concerns and let developer focus on business logic when framework owner focus on other concern like security, transaction or logging. The interceptor make upgrading or adding new feature to existing system cheaper and code is easier to read.

The exception handling is passed over to framework rather than handling in business logic code. In the early day of Java, we need to create and need to catch lots of exception. The code to handle exception cause disruption of logic when writing code. Moreover, without the simple improvement from Java 7, the API is repeated and troublesome to write. The reason behind this is the introduction of exception handler in framework. Instead of catching checked exception, developer can let the Runtime exception flow up all the way to exception handler, where it is being commonly processed.

Declarative programming has never been known in Java. If there is some works to be done, developer need to write code to make that happen. However, kudos to annotation, it is much easier to instruct framework to do work for you rather than do it yourselves. This idea is not new, it has been much wanted but annotation definitely a better choice than Java Doc to put your instruction.

Java is also adopting ideas from other languages. Convention over configuration is original suggested by RoR and adopted by Java. Comparing to my early day of configuring Spring 1.x, it is much lesser thing to configure with Spring 3.x. However, it is easier does not mean that you do not need to learn fundamental concept. I think it will benefit for developers to go through API and learn about things that they do not normally need to write to get a better understanding of system.

Other than above, there are some other minor trends that I can highlight here. Stateless Session over Stateful Session is introduced not to far ago. It make scaling and deploying application on cluster more natural. RestAPI is now implemented by JavaEE and almost all other framework. With Restful API, it is still convention over configuration. Developer depend on common-standard to write client rather than the early day Remote interface. The introduction of GWT and Android continue to push the border of Java forward and give more advantage to Java deveoper in job market. On the other side, JVM give other language an option of portability by compiling to byte-code. This also help to boost performance as JVM is very matured at this point of time.

Moving forward

By the time I wrote this article, it is pretty much all the same whether you choose which framework and server to build your application. Java community has gone through storming of idea and reach the common consensus of how should Java be developed. Still, the path above is widely open. The growing of social network set new requirement for the scalability and performance of applications. As usual Java is not lagging behind in pursuing new goals.

I think it is very soon for us to see noSql and big data become the norm of the new Java application. I also observe the trend of bringing AI and data analytic to daily activity. Google analytic become a must for most of products that I worked on over last few years. It is even more interesting that most of them have some kinds of AI integrated that help user on data mining and even decision making.

I also want to see GWT getting more popular. It is not simply because I belong to the static type language camp but also because GWT is improving quite fast. I have seen some of sophisticated website built on GWT that it is really difficult to know that from the look or feeling.

I also welcome the introduction of Lambda in Java 8. Writing anonymous class has been a pain for long. I hope we will get a way with it when Java 8 coming. Java suppose to provide better support on the era of multiple core CPU with Fork-Join framework but I do not find it API intuitive enough. That seem to be a problem to be fixed on the future. I still got irritated by the pseudo-generic introduced in Java 5. I think it is better to implement full generic or simply take it away.

Java API has been criticized of being too verbose. When I appreciate the clear and easy to read of Java code I hope Java adopt some more API from other lean languages like Rail to shorten the code on some part.

If all wishes come true, then long live Java.

Other references





Sunday 16 February 2014

Comet and Web Socket

JavaEE 6 introduce many improvement on Ajax processing. The new API allow server to serve long AJAX request without wasting HTTP thread pool. This major improvement make the waiting in Ajax request become less expensive for server resource and make server push/comet become more feasible.

In this article, I would like to revisit the strategies of implementing server push using HTTP and compare them with latest Java Socket API 3.0

BACKGROUND
HTTP is created with the goal to be protocol for simple, 1-way communication. The request must be initiated from web client and server provide the response to this request. This work well for earlier day where the main goal for website is to provide content. However, with the web evolving to Web 2.0 with lots of interaction, developers soon find the need to initiating request from server to web client. The name of this feature is server push, reverse Ajax or Comet

Generally, there are two strategies to solve this problem, hacking HTTP to implement Comet or step back further and introducing another protocol that is full-duplex. The new protocol is named Web Socket, which is fully supported by Java from Java EE 6 (Java Socket API 1.0).

SERVER PUSH METHODS

Regular Polling

In regular polling, the client regularly send request to server to ask for any new message to digest. This method is better fit to status check/health check purpose but still can be used for server push.

The main benefit is simple implementation and providing early alert of server failure. However, the drawback of this method is the over-head of  creating multiple HTTP requests and wasting of resource when there is no message. Another problem is the delay of messages because it is only can delivered when there is a polling request from client.

Service Streaming

In Service Streaming, the client send a request to server and server send back a long response, piece by piece whenever there is a message available. It can be implemented using Chunk Transfer Encoding of HTTP 1.1. Basically, the response last forever until the channel is closed.

As this method send multiple messages in the same response, it is client responsibility to handle the streaming and decode the message. The best javascript I can find that support streaming is jQueryStream (https://code.google.com/p/jquery-stream/wiki/API)

This method assure no delay of message delivery but implementation is more difficult. Moreover, as the response is hanging on server, likely one thread is occupied handling the response stream. Fortunately, from Servlet API 3.0, the thread that handling the response is not necessarily to be HTTP Thread. Hence, there is less impact to server scalability.

One more concern is the dedicated usage of one HTTP request. As browser enforce the maximum concurrent requests to one domain, wasting one request for service streaming may cause serious performance issue on old browser (IE 6 and 7 only allow 2 concurrent connections)

Long Polling

Long Polling is based on Bayeux protocol, which uses a topic-based publish-subscribe scheme. Let skip the part of topic subcribe and channel as this is not our main concern. The long polling is pretty similar to service streaming as the client need to initiate a request for server to send back response and server hold on the response until a message available. The main difference is the server end the response when delivering the message. Upon receiving the message, client immediately send another request and waiting for the response again. In case there is no message, the server return a empty response and client will submit another request.

This method sharing most of the advantage and disadvantage with service streaming but implementation is much easier for both server and client. I personally feel this method is hybrid of regular polling and service streaming. It still consume network when there is no message but the over-head of making http request is minimized.

Passive Piggyback

In passive piggyback, the server wait for the next request from client to include the message in the response. In real life scenario, it is used to update session cookie of stateless session. It is easy to see that this method is less reliable comparing to the above methods but also consume less resource.

I suggest this method to be used only when by nature of your application, regular requests were sent back to server. Clean implementation for this method is quite challenging. There are 3 problems that you need to solve here.

The first thing you need to consider is how to conveniently attach your message to any response without interfering with the original response. One option is to use cookie for storing message but with this option, you always need to check value of cookie for every callback handler in javascript.

The second problem is to intercept any response to attach message in server. This can be achieved easily nowadays with filter or AOP. However, thing is more difficult on client side. There is nothing like AOP for javascript. Therefore, you need to make consistent use of a javascript framework to handle every Ajax request in your system. Otherwise, you may miss the message.

Web Socket

All of the above method is quite hacky. They attempt to implement 2-way communication over a one-way protocol. The alternative solution is to implement another protocol as replacement for HTTP which is full-duplex. This protocol is web socket.

This is one example from Wiki how client and server establish web socket connection

Client request:
GET /mychat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Protocol: chat
Sec-WebSocket-Version: 12
Origin: http://example.com
Server response:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk=
Sec-WebSocket-Protocol: chat


Lucky enough, Java have Java API for Web Socket 1.0 has been included to JavaEE from version 7. Tomcat also support WebSocket from version 7. If you want to know in detail how to implement Java Socket, can take a look at the example I found in this website

http://hmkcode.com/first-time-java-api-for-websocket/

At this moment, I would suggest you to carefully considering if you really need to use Web Socket. It is effective, easy to implement and secure but the main concern that it is stateful. Web Socket is basically server to client channel, its require all the message to be delivered to and from the same server.

The server need to maintain the state of its connection and if you want to install Load Balancer, the load balancer need to know how to forward the request to the same server. Until today, stickiness session is difficult to achieve with a non-http protocol. The best strategy is to use a L3 Load Balancer that calculate the server to redirect to by client IP hashing. Client IP is the information of TCP protocol (L2). Hence, it is applicable for both HTTP and Web Socket (both L3).


(image of 7 layer of communication from http://www.digikey.com/en-US/articles/techzone/2012/apr/communication-in-industrial-networks)


Even AWS Load Balancer cannot support stickiness for L3 load balancer. That why I believe it is better to wait until web socket become more widely adopted before start using it.


Wednesday 12 February 2014

Asynchronous Programming in Server

To clarify, in this article I discuss about Asynchronous Programming (AP) rather than Asynchronous Request and they are different. By definition, Asynchronous request is a request that lets the client do other work while the request is being processed, enhancing parallelism within an application. Asynchronous Programming refer to a method of programming where you let the work executing in another thread and let the main thread free to do other work. As Asynchronous Programming is getting more popular these days I want to elaborate on the need of it and recent improvement in Java that make Asynchronous Programming easier to use.

WHY SHOULD WE USE AP

Most of use has been used to Async Programming from the beginning day of Java when dealing with UI. Swing use one thread to draw the graphic and event handling. Hence, when the system is busy processing the event, the UI virtually hang. To avoid this, we were told to keep the handling event code as simple as possible. If we feel that the work will take long time to run, we must let it executed in another thread.

However, as Java Swing become less and less popular nowadays, most developers do not need to deal with asynchronous programming that often any more and the word "asynchronous" seem more related to web protocol.

Still, asynchronous programming is not only applicable to UI. Lots of logic on server side benefit tremendously from proper use of AP. Scheduling, Input/Output processing and background processing all requires developers to process work in a separate thread to avoid blocking of main thread or long waiting.

With the trend of hardware developers introducing multiple cores CPU and high bandwidth network, Asynchronous becoming more and more important in programming.

HOW SHOULD WE USE AP
Let's cover three most popular scenarios of Asynchronous Programming

1/ Fire and forget

The simplest form of Asynchronous Programming is to create a runnable object that contain the business logic and create a thread to run it. Your main thread can happily terminate after this.  For example, client send a request to server to start a background process. The triggering of background process complete and the response is sent back to client.

2/ Collaborate of work

In the second scenario, you need to spawn multiple threads to complete a heavy task but your main thread cannot exit without the task completed. For real life example, we can think of spamming emails to all of your clients. The work is huge but separable and can be done concurrently. The simplest way to do things is to create one array of runnable objects and create one thread to execute each of them. However, this simple method is strongly un-recommended as we may not know the amount of runnable we need to create. Therefore, we may create unknown amount of threads, which may throw exceptions (whether due to out of memory or limit file handler, depend on which threshold reach first). The better solution in this scenario should be usage of thread pool executor and submit all your runnable into array blocking queue. With this strategy, executor will only use a fixed amount of thread to execute your code and the thread can be reused to execute other tasks.

Recently, Java introduce a fork and join framework. It suppose to out-perform Executor method above with Divide and Conquer problems. I will cover it in another article.

3/ Scheduling

The simplest way for Scheduling is to start-up a thread, do some work and sleep for a while, then repeat the process again. I would not recommend this as well because Executor seem to be a better option. Even a single thread executor still give you more power as you can schedule with fixed interval rather than fixed delay (sleep actually provide you a fixed delay rather than fixed interval of execution). If you use some of the scheduling framework like Quartz, it is even more recommended as you have a declarative style of scheduling rather than coding.

In a similar scenario, when you need to schedule a work in the future but not repetitive, executor still a better choice. In this case, you create a FutureTask object that wrap around the piece of code that you need to execute in the future and submit it to the executor. This work well and clean. Spawning a thread to wait for the time to come to execute the task is unforgivable.

HOW TO WRITE BETTER AP CODE

1/ Know how to deal with multi-thread issues

To write better AP code you should understand the common issues with Asynchronous. Asynchronous is effectively multi-thread; hence, suffer from all multi-thread issues like deadlock and thread safe. You should pay attention to your runnable object. There is no point to create multiple thread to do the work concurrently if your runnable object have a long synchronize block and even can cause deadlock.

2/ Control the amount of threads and reuse as much as possible

Thread is expensive to create. Hence, please use it responsibly. It is highly recommended that you always know exactly how many threads is created by you are running at any moment. Using thread pool allow you to limit the amount of thread and reuse the object to execute new task.

3/ Avoid sleeping/unproductive time

Not all the threads are active all the time. It is quite often that a thread wake up, check for a condition to execute and if the condition not exist, go back to idle. In this case, there may be wasting time if the condition is ready but your thread is not executed. The basic guideline is to try to minimize this wasting time. There are two methods to detect if a condition is matched, polling and callback. With polling, you constantly check with the fixed interval to see if the condition is matched. With callback, you have the luxury of waiting until someone wake you up to do the work.

For most of the time callback give you less delay. Therefore I would advise you to use callback if possible. Still, kindly remember that callback may not guaranteed that you will be notified immediately if there are more than one listener register them selves to the event. Some of the event handler framework that I have seen sequentially execute listener codes in a single thread. Because of this, you may need to wait for some other listeners to complete before being notified. In worst case, your callback mechanism may perform worse than polling.

RECENT IMPROVEMENT

As you may already know, thread have it own stack memory to store local variable, method and return value. As a direct consequence, thread when executing a method will not be available for any other task, even if you put it to sleep. Unless this method is completed, it is likely that your thread is not available to reuse.

In the earlier day, most of server implement thread per connection for HTTP 1.1. Needless to say, this approach put heavy burden for server and inefficient, especially when the user only make a request in a while. Today, most of popular web servers serving thread per request rather than per connection. The threads are provided by HTTP thread pool, which have a upper limit on the amount of the HTTP threads it can issue. The Http Thread is blocked from the time the request constructed and pass to HTTP Servlet until the servlet return a response. With the knowledge of how thread allocate memory above, this seem to be the limit until recent breakthrough in Java API.

The problem to solve is the blocking of HTTP thread when waiting for a long process to run until getting a response. Even if we try to put the long running process to a separate thread, the http thread is still blocked, waiting to render the response; hence, no benefit can be seen. Because of this, changes have been made to make the HTTP thread reusable when the long running work is happening on the background. On this matter, I want to discuss two strategies that widely adopted in the market as in Servlet API 3.0 and Play framework.

Servlet API 3.0 is a set of API. Therefore, it attempt to fix the problem with introducing of new API. Oracle provide this example to illustrate the usage of AsyncContext

 AsyncContext aCtx = request.startAsync(req, res);
 ScheduledThreadPoolExecutor executor = new ThreadPoolExecutor(10);
 executor.execute(new AsyncWebService(aCtx));

The doGet/doPost method can return immediately after constructing the AsyncContext. However, a response has not been generated here. When the Asynchronous task is completed, it render the response and the web container finally response to user request. I think this API is clean but it impose a limit on sharing code between the original doGet method and the asynchronous task. If there is anything you want to share, it must be passed as parameter to AsyncWebService constructor.

Play framework follow a different approach. As the byte code executed is enhanced by Play framework, it do not bother to do much on introducing new API to solve the problem. Here is a sample I create using Play framework

public class Application extends Controller {

    public static void index() {
     String welcome = "synchronous text";
     Promise<String> welcomeJob = new Job<String>(){
      public String doJobWithResult(){
       try {
     Thread.sleep(5000);
    } catch (InterruptedException e) {
     e.printStackTrace();
    }
       return "async text";
      }
     }.now();
     String awaitResult = await(welcomeJob);
        renderText(welcome + awaitResult);
    }
}


Look at the code above, you only see one method index() that render the response. However, what really happen after byte code enhancement if the split of the method to 2 methods, before and after await(). The two methods are perfectly available to be executed by different HTTP Thread. It is interesting that Play manage to restore the thread state perfectly and I can use the local variable "welcome" that I create on the other thread. This is quite amazing.








Tuesday 11 February 2014

Thread Safe

Thread-Safe is definitely one of the thing I want to write about. It simply something too important to ignore in a developer day of life. Not that it is a constant concern, it is also one of the most common source of error that we need to deal with.

WHAT IS THREAD SAFE
Come back to the earlier day, C developers rarely need to worry about thread. C does not support multithreading and the text book never mentioned about it as well. Things was changed when Java come to life. The language natively support multi-threading. The same portion of code in Java can be executed concurrently by more than 1 threads. Unfortunately, these threads can simultaneously read and write to the object state and interfere with each other. By definition, a piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads.

To illustrate the thread safe issue, we can take a look at this example.

public class Inverter { int value; public int invert(int origin){ value = origin; return value * -1; } }


Please do not laugh at the stupid implementation, this sample was created just to illustrate how multi-threading can spoil the functionality of your class. Assume that we have 2 threads that make use of the same inverter:

Thread 1: inverter.invert(10)
Thread 2: inverter.invert(20)

If you are extremely unlucky, thread 1 can execute up to assign origin to the field value but has not returned and thread 2 come to execute the same command. In this case, inverter.invert(10) will return -20 instead of -10. 

This issue is not rare. Actually, you will encounter it very often as lots of class in Java are not thread safe (for example SimpleDateFormatter, StringBuillder, ...)

HOW TO PREVENT THREAD SAFE ISSUE
Thread safe issue happen easily but also can be prevented easily. If we look deep into how things happen, thread safe issue can only occur when there are two conditions:

1/ Multiple thread access the same variable.
2/ The code require multiple atomic steps to complete. The code only function properly if there is no change to the variable at the middle of execution. 

Hence, to prevent thread safe issue, we should make sure this two conditions cannot happens together. However, thread safe prevention come at a price of performance reduction. That why, not all the class was created thread safe at the beginning. It is developer responsibility to prevent thread safe issue from happening.

A. No instance variable
Yep, we never need to worry about thread safe if there is nothing to share among threads. In Java, there is stack memory, heap memory and permgen memory. PermGen should not be our concern here because it is used to store class definition rather than variable. Generally, all Java objects and its instance variables are stored in heap space. However, return value, reference variable, local variables inside method are stored in stack memory. Stack is dedicated memory for each thread and therefore protected from first condition of thread safe.

Let say we fix the above class this way:

public class SafeInverter {
public int invert(int origin){
int value = origin;
return value * -1;
}
}

This class is equally stupid to earlier class but it is thread safe. The local variable "value" and return value of this method is stored in current thread stack memory. It will not accessible to any other thread. This method is highly recommended if you can achieve it. Sometimes, you do not have this luxury as instance variables are necessary for business logic. 

On the side note, it also worth highlighting in JVM do create a return variable for each non-void method. You may never know about its existence until you encounter a finally block that can overwrite your return value.

B. No sharing of object 
Assume that you keep the same Inverter class as original but you never share the inverter object, thread safe issue cannot happen as each thread access their own Inverter object.

Inverter inverter = new Inverter();
inverter.invert(10);

This method is pretty simple but its create burden for garbage collection as a lot of temporary object need to be created. Moreover, some constructors take long time to execute.  

C. Synchronize method and code
This solution aim to prevent the second condition of thread safe issue. It simply issue a lock of execution to method or block of code. Only one thread can execute the code at one time.

public class SynchronizedInverter {
private int value;
public synchronized int invert(int origin){
value = origin;
return value * -1;
}
}

This method simply disable the multi-threading support of Java and generally reduce performance. However, if the portion of code that need to be synchronize is short enough and not too many threads are being run, you can use this method.
There is one more variation of this method where we create threadsafe wrapper for the object.

public class InverterWrapper {
Inverter inverter = new Inverter();
public synchronized int invert(int origin) {
return inverter.invert(origin);
}
}

Use this solution when you do not have access to the original method or when you want to provide both threadsafe and non threadsafe version to user. Java collection framework use this approach. 

D. Object Pool
Object Pool is the combination of both solution B and C. You use Object pool when you want to avoid the pain of keep creating new object:

import java.util.Stack;
public class InverterPool {
static Stack<Inverter> inverters = new Stack<Inverter>();
public static synchronized Inverter getInverter(){
if (inverters.isEmpty()){
return new Inverter();
}
else {
return inverters.pop();
}
}

public static synchronized void pushBackInverter(Inverter inverter){
inverters.push(inverter);
}
}

Inverter inverter = InverterPool.getInverter();
inverter.invert(10);
InverterPool.pushBackInverter(inverter);

With this implementation, you still cannot avoid using synchronize method but you limit it to short and simple method. You also cannot avoid creating some Inverter but you can reuse many objects and minimize creating too many object. So, this method is recommended when the object creation take a lot of resource or the code that vulnerable to thread safe issue is long.




Thursday 6 February 2014

Stateful and Stateless application

At the beginning day, web page are stateless and static. It does not really matter how many time you visit the page, you will receive the same content. However, as web applications getting more and more complicated, people find need to provide customized, dedicated and dynamic content. In order to achieve that, authentication become a must have feature for most of the modern web application.

However, things seem not that straight forward because http is an unsecured and stateless protocol. To make thing worse, HTTP 1.0 does not remember any information of the web client, who initiated the request.

To overcome this obstacle, the common solution is to include some kind of ID as cookie for subsequence requests. With this simple technique, the server can identify the requests from the same client and manage to serve dedicated content for each client.  This solution has been so popular that it is automatically included and implement for most of the web server that support dynamic content. For example, if you check the cookie of a webpage and see JSESSIONID, the webpage must be implemented with Java, similarly for PHP and .NET. There was one time, I tested how the server recognize Java session by copy the JSESSIONID cookie from Chrome to Firefox and the session still maintain. It prove that at least for Tomcat, the server only check JSESSIONID, not browser version or any other information.

As you may guess from above, the server may need to have some ways to recognize the session cookie of the web client. The strategy to identify the session cookie split web applications to stateful and stateless applications.

Stateful is supported by Servlet API. For whichever web server that implement Servlet API, the server or web application store the session cookie some where and attempt to reconstruct the HttpSession object. For better performance, most of the server implementation store this session information in memory and only dump to file system when memory running low or to persist session before restarting.

Stateless is not part of Servlet API. Most of the time, developers may need to implement it them selves unless they use some of the frameworks that support stateless session as Play framework. Because of this, I will go further into explaining how to implement stateless session in this article.

To implement stateless session, cookie should be used as place to store all the session content. After that, the server sign this content with a secret key that can later be used to verify the session cookie content. Depend on the nature of your application, you can decide to encrypt the content of session cookie or not. For Play framework, the session cookie is kept plain. In this case, you may need to pay more attention to not store something confidential to session cookie. Normally, session has time out; so, you may want to include timestamps information to your session cookie. In stateful session implementation, the server need to regularly check and clean-up expired session but for stateless you do not need to clear anything. If the session cookie is time-out, reject it, otherwise refresh the timestamps on every request to keep the session alive.  

There is also one more variation of stateless session as in Ruby of Rail framework where session is stored to DB rather than the server itself. With this strategy, RoR still have the silent failed over of stateless application and still can store confidential data on session.

So now, you have known about both strategy, I would like to offer some thoughts on advantage and disadvantage of stateful and stateless session.

In term of complexity, they are pretty equal. However, as stateful is part of Servlet API, you do not need to implement it. If you use stateless, you may not find yourself lucky unless you pick some specific frameworks that support stateless session.

In term of efficiency, both strategies have weakness. Stateful is very vulnerable to DOS attack. Even if you choose the option to dump the session to file system storage, sending mass requests with empty session can quickly fill-up your session table and make the server take the pain of maintaining session table. Lost of session when server down is another major weakness in cluster environment.

Stateless session has its own problems. Cookie size is pretty limited (4kb). Hence, you will encounter some kinds of funny exception thrown if you store too much information on session. When you need to store more things in the session you may need to implement your own framework on top of this. Simply speaking, you may need to setup your own filter to populate full user profile based on limited data you have in the client-side session (you may only store userID in session cookie). It means that frequent DB accesses is required and you better implement some smart caching here to avoid overloaded your DB. For Ror, I still suspect their strategy to store session in DB. The biggest compromise is performance. For most of my career, the biggest bottleneck that we need to solve for high-scale application is the DB and writing too frequent to DB for each session creation still let your app vulnerable to DOS attack. In this case, I guess they only achieve silent fail-over.

With this information, I hope you can make decision for yourselves which strategy fit your application best.