Wednesday, September 7, 2011

Review Warehouse-Scale Computing: Entering the Teenage Decade & The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

The video and the book provided some very unique insights from the people who actually use WSC for  real business and the real problems they encounter or see in this area, here is some interesting points:

- Power PUE is overrated since it only measures the efficiency of the building/cooling infrastructures, which as it reaches the upper 1.1 range, the space of improvement diminishes, and perhaps it’s time to re-focus on computer utilization rate and power consumption.

- SSD can be the future, since it is a very nice alternative that bridges the huge performance gap (storage capacity and random access time) between conventional disks and DRAM, which might help tremendously in improving IO throughput. As more and more applications are having bottlenecks in IO instead of CPUs.

- Networking has had a lot progress in the last decade which makes connecting small number of servers with extremely high bandwidth connections very affordable. Even though connecting all the nodes with extremely high bandwidth connections is still very very expensive, perhaps connecting the nodes on the same rack will bring a lot improvements, and future research can be done in how to leverage the extremely fast connectivity between small local clusters in a large WSC.

- Resource disaggregation is the future, since the range of resources used by various applications are getting wider and wider, maybe it is a good idea to disaggregate the computing, storage or networking resources from the physical machines and have abstractions that allow developers to assume each node is either 128 cores or 1024GB of memory.

- Tradeoff of between price-premium of throughput and utilization of servers is still a challenge and it seems the current industry standard is the low-end server machines.

Tuesday, September 6, 2011

Review: Above the Clouds: A Berkeley View of Cloud Computing

Summary

Cloud Computing refers to the sum of Software as a Service (SaaS), those who provide application level services to the end users over the Internet, and Utility Computing provided by Cloud Providers via hardware and software systems in the data centers which is consumed by SaaS providers in order for them to provide their service. The illusion of infinite computing resources available on demand, the elimination of an up-front commitment by Cloud users and the ability to pay for use of computing resources on a short-term basis as needed are new in Cloud Computing and separates it from previous large scale computation facilities. Similar to the rise of TSMC, Cloud Computing is able to provide a more economic solution to companies by leverage the statistical multiplexing of computing resources as well as eliminating redundancy of every company having tons of IT staff to take care of expensive and complicated servers. Thus, by creating a centralized place for offering computing resources, cloud providers are able to uncovered a factors of 5 to 7 decrease in cost of electricity, network bandwidth, operations, etc. Currently, level of abstraction presented to the programmer and level of management of resources (more specifically, computation model, storage model, communication model) are what distinguish different Cloud providers, such as Amazon EC2, lowest level toward hardware, Microsoft Azure in the middle and Google AppEngine being the most high level solution mentioned in the paper. The paper demonstrated the incentives for companies to become cloud providers and cost-savings that moving to cloud will allow for business to reduce their IT cost. In addition to that the paper highlighted three areas where more innovations is needed: applications software, infrastructure software and hardware systems, and also a new range of applications that are made possible by Cloud Computing, including mobile interactive applications, parallel batch processing, large scale data analytics, extension of compute-intensive desktop applications, etc. Furthermore, another “keyword” for Cloud Computing aside from cost-saving is elasticity which refers to the “pay as you go” model and the benefits it brings to users when couping with demand variations, also there’s very little or no cost penalty for using 20 times more resources for 1/20 time, thus allowing many tasks to be expedited cheaply. Lastly, the paper went through top 10 obstacles to and opportunities for growth and highlighted future areas of advances as well as research directions.

Review

I really like the paper’s “top 10 obstacles to and opportunities for growth” list since even though I have known the concept for a long time, they offered many unique and new insights into some challenges and shined light on future research directions that will better aid industry in making Cloud Computing the de facto future of large scale computing. The paper provided a really nice theoretical foundation for thinking about cloud computing and also identifying problems with it.

There are 3 places that I don’t quite like about the paper. Firstly, I think it will be much more convincing if more real usage data can be obtained and added to the paper. Since in many places, the authors just state facts x, y and z which will benefit greatly, if some relative information can be obtained from the Cloud Providers (I understand this can be very hard and is a general problem when studying industry technologies that are moving very fast, but more real usage data can make many claims much more solid). Hidden human resource costs seem to be ignored in many of the calculations in the paper. Secondly, I am not sure about the use of elasticity to prevent DDoS, since elasticity can only help in terms of application layer DDoS or computation/IO exhaustion attacks, such as sending a bunch of computation and IO heavy queries; however, in terms of network bandwidth attacks

Security and confidentiality/privacy seems to be covered very briefly in this paper, which I think is an understatement to the importance of these two properties in the modern world. Aside from the single points of failures caused by homogenous hardware, software stack, shared network links, data store, power grid and collateral damage from compromised machines in the cloud, etc. More importantly, it is very hard to convince companies that their private and confidential data will be safe and will not be leaked. Even though the paper suggested encrypted data, key management can still be an issue and complicated access control can be hard to replicate on the service. It will be better if paper can spend a bit more space looking into these issues as well. Also the 3 main models of cloud computing, model of computation, model of storage and model of communication, mentioned in the paper, can be a useful starting point to start looking into the security and confidentiality implications.


Other Thoughts

Reuses of the virtual machines might have interesting implications, including side channel data leaks, such as whether the data of the previous users will still be accessible by the second user using existing forensic techniques, since I believe most cloud providers do not DBAN the hard disk when an instance changes its owner.

I think the dynamic scaling and optimization is very interesting, since in Hadoop’s example, a lot application chain multiple map reduce jobs together to have an end-to-end workflow; however, some stages in the pipeline might require much lesser resources, thus maximizing the elasticity of cloud computing. In addition to that, it will be very interesting to have some tool that can profile the computation and find out the resource needed to achieve the maximum cost vs. data ratio, which can be useful for cost-conscious users, and the resource needed to achieve some threshold marginal utility when adding more machines, which can be useful for time-conscious users. In addition to that, maybe some advanced architecture can be devised allowing users to specify fine-grained IO and CPU usage and giving them more elastic options in getting the resources that will fit their workload the best. Also I am curious if there is already any work in using program analysis and compiler techniques to automatically parallelize an program written in conventional languages such as C, C++, or python.

The paper mentioned the economies behind the cloud-computing shift that has made it all possible and made numerous calculations on the benefit it will bring to the companies. I think it will be very interesting to ask some economists on what kind of market the cloud providers will eventually become, given its unique properties such as extremely high overhead, the leverage of statistical multiplexing and reputation-sharing, I am sure there should be some economy model behind this kind of market and maybe it will give the readers some insights what might happen at the end.

Wednesday, August 31, 2011