/
NTRODUCTIONWeb and other distributed software systems areincreasingly NTRODUCTIONWeb and other distributed software systems areincreasingly

NTRODUCTIONWeb and other distributed software systems areincreasingly - PDF document

pasty-toler
pasty-toler . @pasty-toler
Follow
380 views
Uploaded On 2015-10-16

NTRODUCTIONWeb and other distributed software systems areincreasingly - PPT Presentation

and where the required measurements can beobtainedWe begin with a motivating example followed by areview of scalability measures and models A simplecase study then illustrates how these models apply ID: 162893

and where the required measurements

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "NTRODUCTIONWeb and other distributed sof..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

NTRODUCTIONWeb and other distributed software systems areincreasingly being deployed to support key aspects ofg sales, customer relationshipmanagement (CRM), and data processing. Examplesinclude: online shopping, processing insurance claims,and processing financial trades. Many of these sys-tems are deployed on the Web or on dedicated net-works using Web technologies such as J2EE. and where the required measurements can beobtained.We begin with a motivating example followed by areview of scalability measures and models. A simplecase study then illustrates how these models apply toWeb applications.OTIVATIONConsider the following data (published on the Web) forhorizontal scalability of a sample application developedusing commercial Web server software.The report also contains the statement: “Multiple Webservers scale near-linearly (80–90%) at this level using[Product X].” While this statement is, at a casualglance, true it is also potentially dangerously mislead-ing. A straightforward extrapolation to higher numbersof processors would give the scalability shown by thedashed line in Figure 1.This type of extrapolation is all too common in ourexperience. However, as we will see, the actual scal-ability of this system is more likely to follow the solidcurve. The difference between these two predictions issignificant. It could mean the difference between suc-cessful deployment of the application and financialdisaster.This example underscores the importance of obtaininga thorough understanding of the scalability characteris-tics of a system before committing resources to itsexpansion. One way to accomplish this is to deter-mine—through analysis of measured data—whetherthe behavior of the system fits that of a known model ofscalability. Once there is a degree of confidence in howwell a model describes the system, extrapolations suchas the one above become less risky.PEEDUPANDCALEUPSpeedup is a measure of the reduction in time requiredto execute a fixed workload as a result of addingresources such as processors or disks. Speedup is themost common metric for evaluating parallel algorithmsand architectures. , on the other hand, is ameasure of the increase in the amount of work that canbe done in a fixed time as a result of adding resources.Scaleup is a more relevant metric for Web applicationswhere the principal concern is whether we can processmore transactions or support more users as we addAlthough they might appear to be very different met-rics, speedup and scaleup are really two sides of thesame coin. Clearly, if we can execute a transactionmore quickly, we can execute more transactions in agiven amount of time. More formally, the speedup isgiven by:(1) is the time required to perform the work withone processor and ) is the time required to performthe same amount of work with processors.Scaleup may be expressed as a ratio of the capacity processors to the capacity with one processor.This ratio is sometimes known as the scaling factorhas also been called the relative capacity, [Gunther 2000]. If we use the maximum throughput asa measure of the capacity, we can express the scalingfactor or relative capacity as:(1) is the maximum capacity with one pro-) is the maximum capacity with processors. Since the maximum throughput of a sys-tem is equal to the inverse of the demand at the bottle-neck resource [Jain 1990], we have:The demand at a resource is the total time requiredto execute the workload at that resource (i.e., visits service time). Thus, if we approximate the behavior ofTable 1: Horizontal Scaling Data of Nodes Transactionsper Second166.62126.43178.24235.6Figure 1: Extrapolated Scaling 5001,0001,5002,00005101520253035Number of Nodes----------maxmax--------------------maxmax---------------------------------- the system by a single queue/server representing thebottleneck device, we can write:While this approximation should be verified (see thecase study below), it is a good one in most cases.We will use ) as a measure of the scalability of thesystem. Once the function (model) describing been determined, the throughput of the system with processors can be obtained by multiplying (1) byATEGORIESCALABILITYWe will categorize the scalability of a system based onthe behavior of ). This classification scheme is simi-lar to that proposed by Alba for speedup in parallel evo-lutionary algorithms [Alba 2002]. The categories are:Linear—the relative capacity is equal to thenumber of processors, , i.e., ) = —the relative capacity with proces-sors is less than , i.e., Super-linear—the relative capacity with pro-cessors is greater than , i.e., �) Linear ScalabilityLinear scalability can occur if the degree of parallelismin the application is such that it can make full use of theadditional resources provided by scaling. For example,if the application is a data acquisition system thatreceives data from multiple sources, processes it andprepares it for additional downstream processing, itmay be possible to run multiple streams in parallel toincrease capacity. In order for this application to scalelinearly, the streams must not interfere with each other(for example via contention for database or othershared resources) or require a shared state. Either ofthese conditions will reduce the scalability below linear.Sub-Linear ScalabilitySub-linear scalability occurs when the system is unableto make full use of the additional resources. This maybe due to properties of the application software, forexample if delays waiting for a software resource suchas a database lock prevent the software from makinguse of additional processors. It may also be due toproperties of the execution environment that reducethe processing power of additional processors, forexample overhead for scheduling, contention amongprocessors for shared resources such as the systembus or communication among processors to maintain aglobal state. These factors cause the relative capacityto increase more slowly than linearly.Super-Linear ScalabilityAt first glance, super-linear scalability would seem tobe impossible—a violation of the laws of thermody-namics. After all, isn’t it impossible to get more out of amachine than you put in? If so, how can we more thandouble the throughput of a computer system by dou-bling the number of processors? The fact is, however, that super-linear scalability is areal phenomenon. The easiest way to see how thiscomes about is to recognize that, when we add a pro-cessor to a system, we are sometimes adding morethan just a CPU. We often also add additional memory,disks, network interconnects, and so on. This is espe-cially true when expanding clusters. Thus, we are add-ing more than just processing power and this is why wemay realize more than linear scaleup. For example, ifwe also add memory when we add a processor, it maybe possible to cache data in main memory and elimi-nate database queries to retrieve it. This will reduce thedemand on the processors, resulting in a scaleup thatis more than can be accounted for by the additionalprocessing power alone.The next section discusses several models that exhibitthese categories of behavior.CALABILITYThis section discusses four models of scalability: 1.Linear scalability2.Amdahl’s law3.Super-Serial Model4.Gustafson’s Law The Super-Serial Model is an extension of Amdahl’sLaw and has appeared in the context of on-line trans-action processing (OLTP) systems [Gunther 2000], inbeowulf-style clusters of computers [Brown 2003] andothers. The other three were developed in the contextof speedup for parallel algorithms and architectures.Amdahl’s Law and the Super-Serial Model, describesub-linear scalability. Gustafson’s Law, describessuper-linear scalability. These models were developedin a different context, but as we illustrate they apply toWeb applications.Other models of scalability, such as memory-boundedspeedup [Sun and Ni 1993], are available but arebeyond the scope of this paper.Linear ScalabilityWith linear scalability the relative capacity, ), isequal to the number of processors, maxmax----------------------------------------------=== For a system that scales linearly, a graph of ) ver- is a straight line with a slope of one and a y-inter-cept of zero [C(0)=0]. Figure 2 shows a graph ofEquation 1.Note that our definition of ) does not allow for linearscalability with a slope that is not equal to one, i.e. where 1. If this were possible, (1) could bedifferent from one. But, since must be equal to one.This means that both sub-linear and super-linear scal-ability must, in fact, be described by non-linear func-tions. While measurements at small numbers ofprocessors may appear to be linear, measurements athigher numbers of processors will reveal the non-lin-earity.This also means that linear extrapolations of “near-lin-ear” results, such as that in our opening example, canbe misleading. Since the actual function is necessarilynon-linear, these extrapolations will overestimate thescalability of the system if the slope of the extrapolatedline is less than one and underestimate it if the slope isgreater than one.Amdahl’s LawAmdahl’s Law [Amdahl 1967] states that the maximumspeedup obtainable from an infinite number of proces-sors is 1/ where is the fraction of the work that mustbe performed sequentially. If is the number of proces-sors, is the time spent by a sequential processor onthe sequential parts of the program, and is the timespent by a sequential processor on the parts of the pro-gram that can be executed in parallel, we can write theSpeedup for Amdahl’s Law, is the fraction of the time spent on the sequen-tial parts of the program and is the fraction of timespent on the parts of the program that can be executedin parallel. Since = 1 – Or, using the equivalence between speedup andscaleup,If = 0 (i.e., no portion of the workload is executedsequentially), Amdahl’s Law predicts unlimited linearscaleup (i.e., ) = ). For non-zero values of scaleup will be less than linear. Figure 3 shows a com-parison of linear scalability with Amdahl’s Law scalabil-ity for = 0.02.The maximum speedup that can be obtained, evenusing an infinite number of processors is:This means that, if the serial fraction of the workload is0.02, the maximum speedup that can be achieved is50—and it will take an infinite number of processors toachieve that! Amdahl’s argument was that, given thislimitation, a fast single-processor machine is morecost-effective than a multiprocessor machine.As Figure 3 shows, there are diminishing returns foradding more processors. The penalty increases as Figure 2: Linear Scalability C Lp= 05101520253035Number of Processors (Relative Capacity (maxmax--------------------Figure 3: Amdahl’s Law versus Linear ScalabilitySpeedupS-------------------------------===--------------------------------------------------------------------------------------------------------------------------------------------------------- 05101520253035Number of Processors (Relative Capacity ( Linear Amdahl's Law = 0.02--- increases. For =0.20, Amdahl’s Law predicts that add-ing a second processor will yield a relative capacity of1.67. That is, the maximum throughput with two pro-cessors will be 1.67 times that with one processor. If,instead of adding a second processor, we replace thesingle processor with one twice as fast, the throughputwill then be exactly twice that with the slower proces-sor. This is because the faster processor reduces thetime required for both the serial and parallel portions ofthe workload. Because of this, it is generally more cost-effective to use a faster single processor than to addprocessors to achieve increased throughput in caseswhere Amdahl’s Law applies.Gunther [Gunther 2000] points out that Amdahl’s Lawmay be optimistic in cases where there is interproces-sor communication, for example to maintain cache con-sistency among processors. In these cases if, when anupdate is needed, a processor sequentially sendsupdates to each of the – 1 other processors and thetime required to process and send a message is have the super-serial capacity for p processors, Or, after some algebra:where is the fraction of the serial work that is used forinterprocessor communication. This result is identicalto the Amdahl’s Law result with an extra term in thedenominator for overhead due to interprocessor com-munication. Gunther has called Equation 3 the Super-Serial Model [Gunther 2000].The term in Equation 3 that contains grows as thesquare of the number of processors. This means that,even if the overhead for interprocessor communicationis small, as the number of processors increases, thecommunication overhead will eventually cause reach a maximum and then decrease. Figure 4 showsa comparison of Equation 3 for = 0.02 and variousvalues of with linear scalability and Amdahl’s Law.Gustafson’s LawFor certain applications, it was found that speedupsgreater than that predicted by Amdahl’s Law are possi-ble. For example, some scientific applications werefound to undergo a speedup of more than 1,000 on a1,024 processor hypercube. Gustafson [Gustafson1988] noted that Amdahl’s Law assumes that the paral-lel fraction of the application ( = 1 – ) is constant, i.e.,independent of the number of processors. Yet, in manycases, the amount of parallel work increases inresponse to the presence of additional computationalresources but the amount of serial work remains con-stant. For example, with more computing power, matrixmanipulations can be performed on larger matrices inthe same amount of time. In these cases, (and, there-) is actually a function of the number of proces-sors.If and are the times required to execute the serialand parallel portions of the workload on a parallel sys- processors, a sequential processor wouldrequire a time of x ) to perform the same work.Gustafson termed this scaled speedup which isdescribed in the following equations: is the serial fraction of the work performed on processors. Equation 4 is known as Gustafson’s Lawor the Gustafson-Barsis Law.Gustafson’s Law describes speedup whileAmdahl’s Law describes fixed-size speedup. As with Amdahl’s Law, Gustafson’s Law also applies toscalability. We can not use the formulation in Equation4 directly, however. Equation 4 describes the speedupas the ratio of the time required to execute the work-load on a system with processors to that required toexecute the amount of work on a single proces-sor. This is not a ratio that is likely to be measured,however. We are more likely to have measurements ofthe maximum throughput at various numbers of pro-cessors. Thus, to use Gustafson’s Law for web applica-tion scalability, we need to express it in terms of the ratio of the maximum throughput with processorsto the maximum throughput with one processor.-------------------------------------------------- γ------------------------------------------------------------------Figure 4: Super-Serial Model 050100150200250300Number of Processors (Relative Capacity ( = 0.02, = 0.02, = 0.001 = 0.02, = 0.005 = 0.02, = 0.010Scaled SpeedupS----------------------------- The demand with one processor is (1) + (1) and themaximum throughput is therefore:Gustafson’s Law assumes that the parallel portion ofthe workload increases as the number of processors.Thus, the total demand with processors is (1)(1) x . However, this demand is spread over processors, so the average demand per processor is:Note that the average demand per processor is adecreasing function of . This is because only one pro-cessor can execute the serial portion of the workload. Ifthe degree of parallelism is such that the application isable to make use of this additional capacity, each pro-cessor beyond one will be able to execute more paral-lel work than tp(1), resulting in super-linear scaling.This can occur in Web applications when loading apage results in multiple concurrent requests from theclient to retrieve information, such as gifs and otherpage elements. Additional processors enable thoserequests to execute in parallel.Under these conditions, the average maximumthroughput per processor is:and the maximum throughput for processors is:Using these results, we can write the relative capacityfor Gustafson’s Law as:where (1) is the serial fraction of the work with oneprocessor.As the value of (1) approaches zero, this functionapproaches linear scalability. In fact, for small values of(1), Equation 5 is difficult to distinguish from linearscalability. With non-zero values of (1), the secondterm in the denominator is negative for values of greater than one, however. This means that willincrease faster than giving super-linear scalability.Figure 5 shows a graph of ) versus forGustafson’s Law with two values of (1). As the figureshows, at small values of (1) (e.g., 0.01) Gustafson’sLaw is difficult to distinguish from linear scalability. Athigher values of (1), however, the curve definitelyshows its non-linearity.The next section illustrates the applicability of thesemodels with a case study.ASETUDYThis case study is based on the Avitek MedicalRecords (MedRec) sample application developed byBEA Systems, Inc. [BEA 2003a]. This application is aneducational tool that is intended to illustrate best prac-tices for designing and implementing J2EE applica-tions. It has also been used to demonstrate thescalability of BEA’s WebLogic Server [BEA 2003b].MedRec is a three-tier application that consists of thefollowing layers [BEA 2003a]:Presentation layer—The presentation layer isresponsible for all user interaction with MedRec.It accepts user input for forwarding to the appli-cation layer and displays application data to theuser. Application layer—The application layer encap-sulates MedRec’s business logic. It receivesuser requests from the presentation layer orfrom external clients via Web Services and mayinteract with the database layer in response tothose requests.Database layer—The database layer stores andretrieves patient data.Users interact with MedRec via a browser using HTTPrequests and responses. External clients may also useMedRec as a Web Service. The presentation and appli-cation layers reside on one or more application serv----------------------------------------------------------------------------------------------------------------------------------------max----------------------------------------------------------------------------------Figure 5: Gustafson’s Law 1.WebLogic Server is a trademark of BEA Systems, Inc. 05101520253035Number of Processors (Relative Capacity ((1) = 0.01(1) = 0.20 ers. The database layer resides on a separatedatabase server. Figure 6 shows a schematic view ofthe MedRec application.This case study is based on measurements of thisapplication published by BEA Systems, Inc. [BEA2003b]. These measurements were made to demon-strate the scalability of BEA’s WebLogic Server.Because of this, no specific performance requirementsTable 2 shows the demand for each of the resources inthe system. The application server CPU has the high-est demand and is therefore the bottleneck resource.We begin by examining the validity of the single-queueapproximation for describing the behavior of the sys-tem. The following sections then explore vertical andhorizontal scaling characteristics of this application.Single Queue ApproximationThis analysis is based on measured data for an appli-cation server with a single 750 MHz processor. Thedatabase server was an 8-processor (750 MHz)machine with RAID disks. The measurements appearin Table 3. Note that the high number of transactionsper second for a single client is due to using a thinktime of zero for these measurements. We recommenda more realistic think time for your measurement stud-ies. There are a few other aspects of the measure-ments that we question, but we are unable to confirmtheir validity because we did not conduct these mea-surement studies.From the maximum throughput and the throughput withone client, we can calculate the demand at the bottle-neck resource, and the total demand, [Jain1990].Note that the total demand with one user is the sum ofthe demands in Table 2, 0.0242 sec. Also note that thebottleneck demand is 93.8% of the total demand. Withsuch a large percent of the overall demand attributableto the bottleneck resource, the single-queue approxi-mation is a good one. Using the results for and , we can construct asimple system model using a single queue/server thatrepresents the bottleneck resource—in this case, theapplication server CPU. Figure 7 shows a portion of themeasured data for the MedRec application on the sin-gle-processor server along with the modeled curve fora QNM solution using a single queue/server represent-ing the bottleneck resource. The model was con-structed and solved using . As the figureindicates, the single queue approximation is a goodone for this system (because the Application ServerCPU dominates the demand).Figure 6: MedRec Application ConfigurationTable 2: Resource Demand Resource Application Server CPU0.0227Database Server CPU0.000168Disk0.00138 Application Server Database Server Client max---------------------------0.0227===Table 3: Measured Throughput versus Number of Clients Number of Clients Transactionsper Second141.39443.961042.782043.544042.748043.0110043.05Max TPS43.96Appserver CPU Utilizationat Max TPSDB Server CPU Utilizationat Max TPS%DB Server Disk Utilizationat Max TPSFigure 7: Measured versus Modeled Data 41.390.0242=== 05101520Number of ClientsThroughput (TPS) Measured Modeled Vertical ScalabilityTo demonstrate the vertical scaling characteristics ofthis application, throughput versus number of simu-lated clients was measured for 1-, 2-, 4-, and 8 proces-sor application server configurations. The applicationserver is a 750 MHz platform capable of holding up to24 processors. Utilizations for the application and data-base server CPUs as well as the database server diskwere also measured. Table 4 shows the results ofthese measurements. Note that the number of clients at Max TPS decreasesat 4 processors. This is one of the questionable mea-surements that we found. You can see in Table 3 thatthe maximum transactions per second fluctuatesbetween 4 and 80 clients, but remains close to 43transactions per second. Amdahl’s LawRegression analysis determines thatAmdahl’s Law provides a good fit to this data with a of 0.0881 ( = 0.992). This value of s indicates that8.8% of the workload must be performed sequentially.The maximum number of processors that can beinstalled in this server is 24. Thus the maximum valueof ) that can be obtained is:The maximum throughput with one processor, (1),is 43.96 transactions per second so the maximumthroughput with 24 processors would be:The limit on ) for Amdahl’s Law is 1/. Thus, evenwith an infinite number of processors, the maximumvalue of ) that could be obtained is 11.4 for a maxi-mum throughput of 500 transactions per second.Super-Serial ModelRegression analysis determinesthat the Super-Serial Model also provides a good fit tothis data. This is not surprising since the Super-SerialModel is an extension of Amdahl’s Law with an extraterm for interprocessor communication.The values of the super-serial parameters obtainedfrom a regression analysis are: = 0.0787 and 0.0164 ( = 0.993). This indicates that, according tothe Super-Serial Model, 7.9% of the work must be per-formed sequentially. Approximately 1.6% of thatsequential work is used for interprocessor communica-tion. The value of ) predicted by the Super-Serial Modelat 24 processors is:which gives a maximum throughput of 299 transactionsOverall EvaluationFigure 8 summarizes the mea-sured data as well as the maximum throughput versusnumber of processors predicted by both Amdahl’s Lawand the Super-Serial Model.As Figure 8 shows, there is little difference betweenAmdahl’s Law and the Super-Serial Model in the regioncovered by the measured data and both models pro-vide a reasonable fit. The values of -squared from theregression analysis indicate a slightly better fit for theSuper-Serial Model = 0.993) than Amdahl’s Law 0.992).The difference is small, however, and the may be due to the extra degree of free-dom introduced by the additional parameter in theSuper-Serial Model rather than an improved fit. SPE•ED is a trademark of Performance Engineering Services Division, L&S Computer Technology, Inc.Table 4: Measured Throughput for Vertical Scaling Number ofProcessors Max TPS43.9678.74142.51216.35Number ofClients at Max 4402040Appserver CPU Utilization98%97%95%91%DB ServerCPU Utilization1.30%2.25%3.91%DB ServerDisk Utilization6%12%19%28.24% ------------------------------10.0881241------------------------------------------------7.93===43.96348 tpsFigure 8: Summary of Modeled and Measured Data10.0787230.01642423------------------------------------------------------------------------------------------6.82 1001502002503003504000510152025Number of Processorsmax Measured Amdahl's Law Super-Serial Model Without additional information, it is not possible to saywhich model fits the data better. Measurements athigher numbers of processors would help resolve theambiguity. Knowledge of the software architecturecould also help select the most appropriate model. Forexample, if we know that there is a shared state thatmust be maintained among the processors, the Super-Serial model would be the most likely choice.In view of this, we consider the Amdahl’s Law result of348 transactions per second to be upper limit on thecapacity that can be obtained by scaling this systemvertically and the Super-Serial result of 299 transac-tions per second to be a lower bound.Horizontal ScalabilityIn this section, we explore the horizontal scaling char-acteristics of the MedRec application. In this study,each node contains four 400 MHz processors. Ratherthan adding more processors to a node, additionalnodes are added to scale the system.Measurements of throughput versus number of simu-lated clients were made for 1-, 2-, 3-, and 4-node con-figurations. The database and network configurationswere the same as those used for the vertical scalingstudy. Table 5 shows the results of these measure-ments.Regression analysis shows that both Gustafson’s Lawand the linear model provide good fits to the measureddata. Regression analysis based on Amdahl’s Law andthe Super-Serial model do not yield good fits to theexperimental data. In addition, both analyses give neg-ative values for the serial parameters. Thus, Amdahl’sLaw and the Super-Serial model are not appropriate forthis data.Gustafson’s LawRegression analysis indicates thatGustafson’s Law provides an excellent fit to the hori-zontal scaling measurements with (1)=0.0477 (0.9999). This indicates that approximately 4.8% of thework with one processor is performed sequentially.Since this amount of work is constant, as more proces-sors are added, this fraction decreases.Linear ScalabilityThe linear model also provided anexcellent fit to the horizontal scaling measurements (= 0.9997). The slope of the regression line is 1.039.Table 6 shows the measured values of maximumthroughput along with the values predicted byGustafson’s Law and the linear model. Note that, as discussed earlier, for linear scalability theslope must be one. Thus, a slope of one was used tocalculate the linear predictions in Table 6. Figure 9shows a graphical comparison of the measured andmodeled throughput for up to ten processors.As Table 6 and Figure 9 indicate, the Gustafson’s Lawestimates more nearly match the measured values.However, an additional measurement at a higher num-ber of processors would help distinguish betweenthese two models.Secondary BottlenecksBoth models predict that wecan expect considerably more throughput from the hor-izontal scaling strategy than from the vertical. However,it is still important to be careful when extrapolating. Forexample, if our performance requirement is 1,000transactions per second, Gustafson’s Law predicts thatusing ten nodes will give a maximum throughput of1,050 transactions per second and the linear modelTable 5: Measured Throughput for Horizontal Scaling Number of Max TPS100.49208.66313.06418.76Number of Clients at Max 404040100AppserverCPU Utilization95.09%95.58%95.35%95.22%DB ServerCPU Utilization1.70%3.64%6.21%9.70%DB ServerDisk Utilization13.94%27.79%40.58%53.25% Table 6: Measured Versus Modeled Throughput Number ofNodes Gustafson’s Linear1100.49100.49100.492208.66205.89200.983313.06311.36301.474418.76416.86401.96Figure 9: Measured versus Modeled Data 2004006008001,0001,2000246810Number of Nodes (max Measured Gustafson's Law Linear predicts a maximum throughput of 1,005 transactionsThese projections assume that no other bottleneckresource will limit our ability to achieve the requiredthroughput with ten nodes, however. Looking at Table5, we see that removing the application server CPUbottleneck leaves the database server disk as theresource with the highest utilization. The demand atthis resource is 0.00132 seconds which agrees with thevalue of 0.00138 obtained in the vertical scalabilitymeasurements to within experimental error. This corre-sponds to a maximum throughput of 757 transactionsper second. Therefore, to achieve the performancerequirement of 1,000 transactions per second, it will benecessary to either upgrade to faster disks or add asecond disk to the database server.Of course, adding only ten nodes will mean that we areoperating at near one-hundred percent utilization onthe application server CPU. This will result in unaccept-ably long response times, so additional nodes shouldbe added to reduce the CPU utilization to a more rea-sonable number. Case Study DiscussionThe results of the analysis indicate a significant differ-ence in the behavior of the system for vertical versushorizontal scaling. Since the application software is thesame in both cases, the difference must be due to theplatforms. The most likely explanation for this differ-ence is the presence of the MP Effect when scalingvertically. The MP Effect is a loss of computing capacitythat occurs when adding processors to a single plat-form. This loss of capacity is due to additional over-head and/or contention between processors for sharedsystem resources (e.g., the system bus) [Artis 1991],[Gunther 1996]. Gunther has shown that Amdahl’s Lawand the Super-Serial Model apply to SMP scaling withsuper-serial effects becoming more significant as theamount of interprocessor communication increases[Gunther 1993].It is tempting to generalize this result and conclude thathorizontal scaling is superior to vertical scaling in allcases. However, the scalability of a system depends onthe characteristics of both the application and the exe-cution environment. In this case, it appears that thesoftware was structured into units that could executeindependently in parallel. If a larger percent of theworkload were serial, both the vertical and horizontalscalability would have followed Amdahl’s Law or theSuper-Serial model. In addition, the good horizontalscalability indicates that back-end database interac-tions were effectively parallelized so that there was littleor no serial contention there either. This will not be trueof all applications. In particular, for distributed applica-tions that must maintain a common state (e.g., data-bases) there will be overhead to maintain coherence.In these cases, interprocessor communication will be asignificant factor. For such applications, the enhancedcommunication efficiency of a bus versus a networkmay favor vertical scaling on a single platform. Eachapplication/platform combination should be evaluatedindividually.This case study also emphasizes the importance ofthoroughly understanding the system before commit-ting to a scaling strategy. Secondary bottlenecks (suchas the database disk here) are a fact of life. It is impor-tant to know what they are, how they impact the scal-ability of the system, and how costly they are to removebefore undertaking expensive system upgrades.Finally, it should be noted that the data published byBEA Systems, Inc. provides a good example of thetype of information that is useful for understanding andevaluating scalability.CONOMICSCALABILITYScalability is an economic as well as a technical issue.In many cases there are alternative scaling strategiesthat will meet performance requirements. The choiceamong them should then be based on cost. The impor-tance of cost in meeting performance requirements isimplicit in the inclusion of cost figures in many bench-mark reports (see, e.g., [TPC]).These costs are rarely simple, one-time costs. Addingmore hardware means that there will be costs for pur-chase or lease, software licenses, maintenance con-tracts, additional system administration, facilities, andso on. The timing of these expenditures is likely to behighly dependent on the scaling strategy. For example,one strategy may require frequent, small upgradeswhile another may require fewer, more expensiveones. In order to make an unbiased comparison of thealternatives, it may be necessary to convert the coststo current dollars. Techniques for this are discussed inmany places, including [Reifer 2002] and [Williams andSmith 2003].It is important to consider all costs associated with thescaling strategy. In some cases, hardware expendi-tures may be dwarfed by costs for support softwareand middleware [Mohr 2003].Costs of upgrading are also subject to discontinuitiesas the amount of hardware is increased. For example,at some point adding a server may require hiring anadditional system administrator or expanding the facil-ity to accommodate the additional footprint. Don’t forget that scalability isn’t just a hardware issue.It is often easy to increase scalability with a different(initial) software architecture [Williams and Smith2003]. In this case study, a design alternative thatreduces the CPU consumption of the application couldalso improve scalability. The modification would pro-duce a new version of the application with new scal-ability characteristics.Note that the most cost-effective choice for meeting agiven performance requirement might not be the onewith the highest overall scalability.ANDONCLUSIONSScalability is one of the most important qualityattributes of today’s distributed software systems. Yet,despite its importance, scalability in these applicationsis poorly understood. This paper has presented amodel-based view of scalability in Web and other dis-tributed applications that is aimed at removing some ofthe myth and mystery surrounding this important soft-ware quality.We use the relative capacity or scalability factoras a measure of the scalability of a system. Scalabilityis classified according to the behavior of ) as:Linear—the relative capacity is equal to thenumber of processors, , i.e., ) = —the relative capacity is less than Super-linear—the relative capacity is greater, i.e., A key consequence of using ) as the metric for scal-ability is that for linear scalability, the slope of the linemust be equal to one. This means that both sub-linearand super-linear scalability must, in fact, be describedby non-linear functions. While measurements at smallnumbers of processors may appear to be linear, mea-surements at higher numbers of processors will revealthe non-linearity.This also means that linear extrapolations of “near-lin-ear” results, such as that in our opening example, canbe misleading. Since the actual function is necessarilynon-linear, these extrapolations will overestimate thescalability of the system if the slope of the extrapolatedline is less than one and underestimate it if the slope isgreater than one.This paper has reviewed four models of scalability thatare applicable to Web and other distributed applica-tions. These are summarized in Table 7.We have also demonstrated the applicability of thesemodels to Web applications via a case study of a sim-ple Web application. Analysis of measured data for thecase study system indicates that its vertical scalabilityis best described by either Amdahl’s Law or the Super-Serial Model while its horizontal scalability is bestdescribed by either Gustafson’s Law or linear scalabil-ity.The case study also demonstrates that scalability is asystem property. In this case, the same applicationexhibits different scalability properties when scalingvertically or horizontally.As this paper demonstrates, these models are applica-ble to Web and other distributed systems. However,due to the complexity of such systems, it is likely thatsome systems will not conform to any of these models.It is therefore important to determine, via analysis ofmeasured data, that a given system follows a knownmodel before making decisions based on predictedscalability.Scalability is also affected by software resource con-straints such as a One Lane Bridge Performance Anti-pattern [Smith and Williams 2002]. This paperaddressed only hardware bottlenecks. This paper alsoconsidered only a single dominant workload. Thesetechniques can be adapted to cover these otheraspects of the problem.EFERENCES[Alba 2002]E. Alba, “Parallel Evolutionary Algorithms Can Achieve Super-Linear Performance,” Infor-mation Processing Letters, vol. 82, pp. 7-13, 2002.maxmax--------------------Table 7: Scalability Models Model LinearAmdahl’s LawSuper-Serial ModelGustafson’s Law ------------------------------ γ--------------------------------------------------------------------------------------------------------- [Amdahl 1967]G. M. Amdahl, “Validity of the Single-Processor Approach To Achieving Large Scale Computing Capabilities,” Proceedings of AFIPSAtlantic City, NJ, AFIPS Press, April, 1967, pp. 483-485.[Artis 1991]H. P. Artis, “Quantifying MultiProcessor Overheads,” Proceedings of CMG '91, December, 1991, pp. 363 - 365.[BEA 2003a]BEA Systems, Inc., “Avitek Medical Records 1.0 Architecture Guide,” http://edocs.bea.com/wls/docs81/medrec_arch/[BEA 2003b]BEA Systems, Inc. “BEA WebLogic Server: Capacity Planning,” http://e-docs.bea.com/wls/docs81/capplan/.[Brown 2003]R. G. Brown, “Engineering a Beowulf-style Compute Cluster,” Duke University, 2003, www.phy.duke.edu/resources/computing/brahma/beowulf_book/.[Gunther 2000]N. J. Gunther, mance Analyst, iUniverse.com, 2000.[Gunther 1996]N. J. Gunther, “Understanding the MP Effect: Multiprocessing in Pictures,” Proceedings of CMG '96, December, 1996.[Gunther 1993]N. J. Gunther, “A Simple Capacity Model for Massively Parallel Transaction Sys-tems,” Proceedings of CMG ‘93, San Diego, December, 1993, pp. 1035-1044.[Gustafson 1988]J. L. Gustafson, “Reevaluating Amdahl's Law,” Communications of the ACM, vol. 31, no. 5, pp. 532-533, 1988.[Jain 1990]R. Jain, The Art of Computer Systems Per-formance Analysis: Techniques for Experimental Design, Measurement, Simulation, and ModelingNew York, NY, John Wiley, 1990.[Mohr 2003]J. Mohr, “SPE on IRS Business Systems Modernization,” Panel: “The Economics of SPE,” CMG, Dallas, 2003.[Reifer 2002]D. J. Reifer, Making the Software Busi-ness Case: Improvement by the Numbers, Bos-ton, Addison-Wesley, 2002.[Smith and Williams 2002]C. U. Smith and L. G. Will-Performance Solutions: Creating Responsive, Scalable SoftwareMA, Addison-Wesley, 2002.[Sun and Ni 1993]X.-H. H. Sun and L. M. Ni, “Scal-able Problems and Memory-Bounded Speedup,” Journal of Parallel and Distributed Computingvol. 19, no. 1, pp. 27-37, 1993.[TPC]Transaction Processing Council, www.tpc.org.[Williams and Smith 2003]L. G. Williams and C. U. Smith, “Making the Business Case for Software Performance Engineering,” Proceedings of CMGDallas, December, 2003.[Williams and Smith 2002]L. G. Williams and C. U. Smith, “PASA: An Architectural Approach to Fixing Software Problems,” Proc. CMG, Reno, December, 2002