Sunday, May 25, 2014

Network storage performance spanning multiple clouds

In previous posts I've looked at different factors in performance for s3 and Google storage. I looked first at the viability of s3 as a web application data store, and then compared s3's performance for this purpose with Google storage. For all of my benchmarks I allocated virtual machines on the farm hosted by the cloud storage provider, i.e., to test s3 I used an ec2 box, and to test Google storage I used a Google compute engine box. This approach implies that one would always use a single provider for both storage and VMs (and presumably all other cloud function, e.g., load balancing, etc.) but there are some drawbacks to relying on a single source for all of these services.

Using a single source makes you vulnerable to outages at that source. I almost never hear this discussed, I believe because it is considered a legitimate excuse for your application to be down if Amazon is down. Hey, if netflix puts up with it, then clearly this isn't a problem only faced by idiots, right? But wouldn't it be nice to be sufficiently decoupled that a major outage upstream doesn't sink your application?

The second drawback is the inability to compare prices between providers. Although Amazon and Google and others post prices for various things, for many essential aspects of the service it is practically impossible to compare without having a running instance of your application on the infrastructure of each provider. The only exception to this is if your need is particularly simple; if, for example, you just need to store large amounts of data persistently without a care for performance, then you can just look up the price sheets and see the price per gigabyte for a provider's cheapest storage. But for most applications, the needs are more varied and complex and there is a large performance component that must be evaluated. The different cloud providers run different hardware, run their servers on networks with different capabilities, and charge for a dizzying variety of different metrics which make apples-to-apples comparisons practically impossible.

A trivial example of this billing complexity can be seen in the invoice spreadsheet Amazon sends out for even the simplest configuration. A single VM consumes resources which are described by a spreadsheet containing hundreds of rows. I think in Amazon's case this is an intentional obfuscation of the costs designed to impede the commodification of their business that might follow from arranging charges that could be compared.

My vote is to avoid the quagmire of a deep analysis of the cloud services invoices; an easier approach is install your application everywhere and then measure the work being done by each cloud provider from within your application; later reconcile that record of work in terms of your application against the cost incurred. Since any serious web application will need to track performance and volume of work anyway, it is little extra work to measure these quantities against different server farms. Then it is just a matter of evaluating the work done versus the expense paid and one can have a precise, real world reckoning of the relative values of the different cloud providers.

Of course determining the relative value of cloud offerings is not the only question one faces in determining how to deploy. It is also an open question whether the performance yielded by mixing services across multiple clouds will be viable. In my benchmarks it appears that cloud storage accessed from ec2 or gce is adequate for an interactive application. To give some context to the numbers I also benchmarked performance from my home over a Comcast line; the results with this last option were not great (though not crazily bad either).

The numbers generally follow the pattern that we've already seen with a couple of interesting variations. The biggest surprise for me was that small and medium sized s3 reads were faster executed on Google machines than on Amazon's own ec2. That result really doesn't pass the smell test; I wonder if some temporary network anomaly combined with the short duration of the tests to yield this not very credible finding? But I feel more comfortable answering the broader question of the viability of mixing cloud services across providers based on these numbers: although there is a performance hit to spanning cloud providers, it is not significant for small data sizes. Here are the results:

rate/stop typestorage typehost typechunk size
46.50.02readgsgcesmall
36.00.03readgsgcemedium
20.50.05readgsgcelarge
14.70.07reads3gcesmall
14.10.07reads3gcemedium
11.40.09readgsec2medium
10.30.10reads3ec2small
10.30.10readgsec2small
10.20.10reads3ec2medium
8.10.12reads3comcastsmall
8.00.12readgscomcastsmall
7.50.13readgsec2large
7.00.14writes3gcesmall
6.20.16writes3gcemedium
6.20.16reads3gcelarge
6.10.16writegsgcesmall
5.90.17readgscomcastmedium
5.40.18writegsgcemedium
5.20.19writes3ec2small
4.80.21reads3comcastmedium
4.60.22writes3comcastsmall
4.50.22writes3ec2medium
4.50.22writegsgcelarge
4.30.23reads3ec2large
3.40.30writes3gcelarge
3.10.32writegscomcastsmall
2.90.35writes3ec2large
2.50.40writegsec2medium
2.40.41writegscomcastmedium
2.30.44readgscomcastlarge
2.10.47writegsec2large
1.90.53writes3comcastmedium
1.80.55writegsec2small
1.60.64reads3comcastlarge
0.61.69writegscomcastlarge
0.42.25writes3comcastlarge

Thursday, May 22, 2014

Google storage is faster than s3

In a previous post I looked at the potential for cloud storage to serve as a web application's datastore, and found s3's performance to be quite adequate for that purpose. I was then curious to compare the performance of s3 with that of Google storage.

The crude benchmarks I ran showed Google storage to be much faster, with the difference particularly dramatic for reads where Google's optimizations were evident, running an order of magnitude faster than comparably sized writes. s3 also shows superior performance for reads over writes, but the difference is much smaller. Google storage's performance is superior across the board, and for small data sizes, even Google storage writes were faster than s3 reads of any size.

rate/stop typestorage typehost typechunk size
46.50.02readgsgcesmall
36.00.03readgsgcemedium
20.50.05readgsgcelarge
10.30.10reads3ec2small
10.20.10reads3ec2medium
6.10.16writegsgcesmall
5.40.18writegsgcemedium
5.20.19writes3ec2small
4.50.22writes3ec2medium
4.50.22writegsgcelarge
4.30.23reads3ec2large
2.90.35writes3ec2large

Sunday, May 18, 2014

Could s3 be the datastore for a web application? Yes.

When writing a scalable web application, typically the biggest technical challenge is to create a scalable underlying data store. As cloud network storage becomes more common, I wondered if something like s3 or Google storage would be adequate as an application backing data store. If it did work, this would greatly simplify the implementation of a scalable web app, since a central promise of cloud storage is that it scales, from a practical point of view, infinitely.

Clearly this would not be a practical architecture for all applications -- a suitable application would be one where the user data was not pulled from a number of sources requiring lots of queries to build. Ideally there would be a single key for a given user -- perhaps a combination of that user's ID and hashed password, which would allow the retrieval of all relevant data in a single fetch. Whether this would work would hinge on the latency of that fetch, since the data size would likely not be terribly large.

To determine whether this is feasible, I did some simple benchmarks of reads and writes of s3, just to get some sense for the performance we can expect. The results are naturally much slower than what one would get from a local database, but in my tests the performance was adequate for supporting an application. I like the guidelines from Jakob Nielsen's work, which hold that an application responding in 0.1 seconds or less is perceived as instantaneous, while applications responding in a second or less at least do not disrupt the user's flow of thought. Of course an app based on s3 will add some other overhead to the data fetch times, but I wouldn't expect that overhead to be significant, and therefore we can conclude that s3 is fast enough to serve this purpose. Here are the results:

rate/stop typestorage typehost typechunk size
10.30.10reads3ec2small
10.20.10reads3ec2medium
5.20.19writes3ec2small
4.50.22writes3ec2medium
4.30.23reads3ec2large
2.90.35writes3ec2large

The benchmarks measure performance for small medium and large amounts of data, which in this context, of the problem of getting enough data to complete a login, represents what might normally be described as "extremely small," "quite small," and "small." Specifically, I designated my small chunks as 17 bytes, medium as 32kb, and large as 256kb.

For these small s3 fetches from an ec2 machine, latency clearly was dominant. There was no significant difference between small and medium reads, and only a small difference between small and medium writes. As one would expect, reading was significantly faster than writing, though not by a huge margin. The biggest difference of 2 to 1 was seen for small amounts of data.