(Reading time: approximately 5 minutes)
We are in a time where developers have more technology options than ever before. There are lots of reasons to choose different stacks. Some choose a stack because it is ‘new and cool’, some choose because of the community and pool of talent associated with the stack, some choose because of cost. With all things being equal, a technology stack should always be selected based on the business strategy for your company.
We have been fortunate that Hubba has been so embraced by the brand and retail community since our launch. However, due to this quick adoption and our anticipated growth, we decided to completely rethink our technical architecture from the ground up. We had three basic goals:
- Performance and scalability
- Ease of use
- Testability and testable system
First we defined each of our goals. These were meant to serve as practical definitions as we evaluated the various technologies.
Performance and scalability
Performance can be described as raw speed of your application as observed by a single user. How long (on average) does a single read and write operation take? Scalability can be described as the ability of your application to continue to perform at consistent and predictable speeds as the number of users and requests grow.
Both performance and scalability go hand in hand to create blazingly fast user experiences.
Ease of use
Ease of use can be described as the degree to which an average developer can adapt to the technology stack averaged over different technology backgrounds. How easily can our existing team and new developers adapt to a technology stack?
It is important to remember that individual technologies can be very easy to use but they might not easily fit together, so it’s critical to measure the ease of use of your stack and not ease of use of individual technologies. The assumption here is that higher ‘ease of use’ score leads to higher productivity, higher job satisfaction and eventually higher retention.
Testability and testable system
Testability can be described as the your ability test each component in complete isolation. Testable systems can be described as the ability to test the entire system as a whole in a reasonably amount of time. With high testability, new features and bug fixes can be pushed out faster and more reliably.
So, armed with our goals and their definitions, we went out hunting for a new technology stack!
How we determined our new technology stack
We evaluated many different technologies and finally ended with MongoDB, Expressjs, Angularjs and Nodejs which is more commonly known as the ‘MEAN’ stack. Each technology was rated as ‘High’, ‘Medium’, ‘Low’ for each of the goals we set out to achieve.
|Category||Performance and Scalability||Ease of Use||Testability and Testable system||Compatibility with others|
MongoDB is an open-source document database and one of the leading NoSQL databases today.
Right away it was obvious that MongoDB’s performance was incredible but scalability of that performance required a lot more testing and understanding. Quickly we realized that schema-less backends require more design than even relational databases. It became clear pretty quickly the we hadn’t paid enough attention to our schema design. When we started experimenting with large datasets and high number of requests we realized that we needed to go back to the basics. This is when we started realizing the true power of MongoDB including indexing, replication, sharding. All these features make MongoDB incredibly powerful and right for a large number of applications…including Hubba!
MongoDB is relatively easy to test. But it lacks a fully supported mocking framework. It is easily debatable, there is no need of a mocking framework but its hard to debate that it won’t be useful to have one to write unit tests. There are options like Mockgoose but its hard for these frameworks to keep up with MongoDB until we get a lot of open source support.
MongoDB has incredible driver support. You can find a full list of drivers here. This meant that virtually any server technology we picked would play well with MongoDB.
Our performance and scalability tests performed very well with node, we experimented handling 100K requests with single node and multiple nodes in a clustered environment and monitored various metrics such as CPU usage and memory impact.
With huge number of open sources modules and an amazing community behind it, nodejs felt like a great choice. Few modules that we heavily depend on the following:
- Expressjs as a web application framework.
- Kue to act as a priority queue.
- Redis to allow pub sub messaging service and backend for Kue.
- q to avoid callback hell. If you are working with a complicated Nodejs project, you definitely want to start using something like ‘q’ to avoid callback hell.
Angularjs is an open-source front end web application framework.
Angularjs provides an amazing data binding system but if too many bindings on a page can lead to performance issues. After about 2000 watchers the UI starts to lag. Slowly we realized that most of our data on page is immuatable after it was rendered the first time and we don’t need to keep watching it. There are a few different ways to workaround the performance issues. Here is a post that outlines https://www.exratione.com/2013/12/considering-speed-and-slowness-in-angularjs/ many different ways you can improve performance in an angular app.
Much like Nodejs, Angularjs also has a huge community behind it with many amazing modules available.
How it all comes together
The Angular App interacts with multiple nodejs instances over a REST API through haproxy. Haproxy offers high availability and load balancing for HTTP based applications by spreading them across multiple servers. Anyone of the nodejs instances can handle the request that comes into the server.
The diagram above shows 3 nodejs instances but this model can be scaled to any number of nodejs instances running on multiple boxes. If you were looking to run them on the same box I would suggest a clustered environment instead as shown in the nodejs docs here.
The diagram above shows Redis as well. Redis is an open source, key-value store. In this case Redis is primarily used for two purposes :
1. Scheduling jobs using a priority queue Kue. For example this is used for delayed jobs and scheduled jobs like generating the newsfeed for all clients on a time-interval, or sending delayed emails etc..
2. Publish/Subscribe messaging between nodejs instances. Redis makes this easy by providing channels, subscribers can subscribe to one or more channels and then only receive messages on that channel.
The diagram above shows two replica sets. Each replica set is meant to serve as a MongoDB shard. Sharding allows horizontal scaling with MongoDB. The query router is responsible for routing the query to the correct shard. I recommend starting with a single replica set and only sharding the collections that need to be sharded. Its also better to get an idea of your data before you pick your shard keys.
Remember, every project is unique
Every project has different requirement and different constraints like resources and time availability. This post is not meant to serve as right or wrong architecture, its simply the architecture that works best for us at Hubba given our constraints.
In less than 6 months, we evaluated our options, designed and implemented the above architecture, deployed production and preproduction environments and transitioned all existing users over to the new app with minimal downtime. We have now been live for over a month and we have achieved all our targets of performance and scalability, ease of use and testability.