When writing a scalable web application, typically the biggest technical challenge is to create a scalable underlying data store. As cloud network storage becomes more common, I wondered if something like s3 or Google storage would be adequate as an application backing data store. If it did work, this would greatly simplify the implementation of a scalable web app, since a central promise of cloud storage is that it scales, from a practical point of view, infinitely.
Clearly this would not be a practical architecture for all applications -- a suitable application would be one where the user data was not pulled from a number of sources requiring lots of queries to build. Ideally there would be a single key for a given user -- perhaps a combination of that user's ID and hashed password, which would allow the retrieval of all relevant data in a single fetch. Whether this would work would hinge on the latency of that fetch, since the data size would likely not be terribly large.
To determine whether this is feasible, I did some simple benchmarks of reads and writes of s3, just to get some sense for the performance we can expect. The results are naturally much slower than what one would get from a local database, but in my tests the performance was adequate for supporting an application. I like the guidelines from Jakob Nielsen's work, which hold that an application responding in 0.1 seconds or less is perceived as instantaneous, while applications responding in a second or less at least do not disrupt the user's flow of thought. Of course an app based on s3 will add some other overhead to the data fetch times, but I wouldn't expect that overhead to be significant, and therefore we can conclude that s3 is fast enough to serve this purpose. Here are the results:
| rate/s | t | op type | storage type | host type | chunk size |
| 10.3 | 0.10 | read | s3 | ec2 | small |
| 10.2 | 0.10 | read | s3 | ec2 | medium |
| 5.2 | 0.19 | write | s3 | ec2 | small |
| 4.5 | 0.22 | write | s3 | ec2 | medium |
| 4.3 | 0.23 | read | s3 | ec2 | large |
| 2.9 | 0.35 | write | s3 | ec2 | large |
The benchmarks measure performance for small medium and large amounts of data, which in this context, of the problem of getting enough data to complete a login, represents what might normally be described as "extremely small," "quite small," and "small." Specifically, I designated my small chunks as 17 bytes, medium as 32kb, and large as 256kb.
For these small s3 fetches from an ec2 machine, latency clearly was dominant. There was no significant difference between small and medium reads, and only a small difference between small and medium writes. As one would expect, reading was significantly faster than writing, though not by a huge margin. The biggest difference of 2 to 1 was seen for small amounts of data.
No comments:
Post a Comment