/
WHERE catvar = value;    OUTPUT;/* OPTIONAL */    RETURN;/* OPTIONAL * WHERE catvar = value;    OUTPUT;/* OPTIONAL */    RETURN;/* OPTIONAL *

WHERE catvar = value; OUTPUT;/* OPTIONAL */ RETURN;/* OPTIONAL * - PDF document

celsa-spraggs
celsa-spraggs . @celsa-spraggs
Follow
435 views
Uploaded On 2016-08-24

WHERE catvar = value; OUTPUT;/* OPTIONAL */ RETURN;/* OPTIONAL * - PPT Presentation

Comparison to Subsetting IFobtained The values of FIRST and LAST may beobservations before the merge while SubIF selectsAdvantages of Subsetting IFrecord lengths In any case SubIF should appear ID: 455625

Comparison Subsetting IFobtained. The

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "WHERE catvar = value; OUTPUT;/* OPTIO..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

WHERE catvar = value; OUTPUT;/* OPTIONAL */ RETURN;/* OPTIONAL */RUN;/* OPTIONAL */PROC FREQ DATA=libref.indata;"Asprin" and "Advil" would be selected, but so wouldnot select "Zithromax" because the CONTAINSAll missing values for the variable TREATMNT WHERE state LIKE '%C%' ;"Gastrointestinal" and "GastroIntestinal" would be� WHERE spdlimit 65 ; WHERE age BETWEEN 18 AND 34 ; Comparison to Subsetting IFobtained. The values of FIRST. and LAST. may beobservations before the merge while SubIF selectsAdvantages of Subsetting IFrecord lengths). In any case, SubIF should appear MERGE �libref.indata(WHERE=(age 35)) BEYOND BASICSACCESS View DescriptorsThere are special considerations for selectingrecords using WHERE if you use ACCESS viewdescriptors. In general, limitations exist because theWHERE clause will be passed to the DBMS forprocessing. Also note that WHERE is more efficientthan SubIF because using SubIF returns all rowsinstead of just the matching subset.For more efficient WHERE clauses for viewdescriptors, you should avoid:· �using = and · including numeric conversions· padding character strings with blanks· using % or _ in LIKE comparisons· putting arithmetic expressions on right side of· searching for NULL or NOT NULL.There are a variety of issues specific to which DBMSis involved. For instance, the reason searching forNULL can be inefficient for ORACLE is that theindex is not used in this situation. You should checkthe Technical Support notes before creatingproduction view descriptors.The Original WHEREStandard SQL WHERE clauses are implemented inPROC SQL in the CREATE, SELECT, DELETE, andUPDATE statements. In addition to allowing basicwhere-expressions, you can use much more com-plex constructions. For instance, you can matchrecords across tables just for the purpose ofselecting records. The resulting records can bestored in a data set or simply listed in a report.Obviously, a full discussion of SQL is beyond thescope of this paper.To create a simple subset with all variables, thesyntax would be:PROC SQL; CREATE TABLE subset ASNow consider an example given in a manual forPROC SQL:PROC SQL; SELECT distinct prodnamec.custnum=i.custnum AND INTERSECTe.empcity=ÕVirginia BeachÕ;The resulting records will contain the products thatwere sold in Myrtle Beach by employees who live inVirginia Beach. Using the selection criteria in aCREATE TABLE statement would allow the data tobe saved. How many steps would an equivalentDATA step program require?WHERE PROCESSINGUsing IndexesSelecting observations using WHERE or WHERE=can take advantage of indexes. SAS softwareautomatically uses an appropriate index by followinga standard algorithm that considers index availabilityand predicted machine resource requirements.Simple or complex indexes are considered given thevariables in the where-expression. Usually, a rele-vant index is used when the predicted number of theselected observations is less than one third. Whenan index applies, the resulting subset will be sortedin index order as opposed to physical order. If theNOMISS option has been used to create the index,then WHERE processing will use that index only ifmissing values do not satisfy the where-expression,e.g., DEPT='01'. In Release 6.12, you can optimizeWHERE expressions with IN or with both upper andlower bounds against an appropriate compositeEfficient StrategiesThe primary advantage of using WHERE strategiesis machine efficiency. Obviously, if you can avoid aDATA step completely then subset analyses aremore efficient both in terms of programming effortand machine resources. If appropriate indexes arecreated and maintained, WHERE processing willtake advantage of them automatically. Planningahead when creating analysis data sets is the bestway to make use of WHERE processing for optimalmachine efficiency. When DATA steps using WHERE or SubIF producethe same subset, the difference in execution logicmeans that using the WHERE strategy requires lessmachine resources. This can be true whether or notWHERE processing uses an index. The percentagedifference varies widely with the type of data setsand hardware platform but is generally in the 10% to30% range. Example 3 provides a specific compar-ison. The advantage of using WHERE processing isgreatest when a subset is a small percentage of theoriginal data set and/or the original data set is wide.Human efficiency becomes a factor when youconsider PROC SQL as an alternative. Situationsexist when SQL code is much easier to understandthat the equivalent manipulation with DATA stepcode. However, predicting which method is moremachine efficient is difficult. As always, testing withyour data in your computer environment is neces-sary to make informed choices.ProceduresUsing WHERE processing is straightforward forProcedures that allow only one input data set. Fourprocedures adjust to allow for multiple input datasets: APPEND, CALENDAR, COMPARE,DATASETS(APPEND). APPEND allows either aWHERE statement or separate WHERE=.COMPARE allows a WHERE statement that appliesto both BASE= and COMPARE= or a WHERE dataset option associated with BASE= only.End-user InterfacesWHERE functionality is also available in end-userinterfaces. With these interfaces, the WHERE clauseis constructed by pointing-and-clicking rather than bywriting code directly. Examples include the SASSystem Viewer and SAS/ASSIST.The SAS System Viewer was provided with Release6.12 for personal computers. A user can view datasets, programs, logs, and other types of SAS fileswithout having SAS software. In the viewer, you cansubset the data with a WHERE clause.SAS/ASSIST provides a lookup facility in theWHERE clause window, starting with Release 6.12.The WHERE clause window provides a point-and-click way to build the expressions. The lookup facilityprovides an added feature of listing the values forthe variable selected.EXAMPLESExample 1This example shows an automatic variable thatcannot be used with WHERE. The objective of theDATA step in Figure 1 is to merge two data sets andselect observations using IN=. Specifically, thepatient and investigator data sets are merged inorder to select only patients that match theinvestigator data set. The error message appearsbecause WHERE selection occurs before the IN=variables are defined. This situation requires SubIFExample 2The results shown in Figures 2b and 2c demonstratethe difference between WHERE and SubIF whenMERGE and BY are involved. When WHERE isused, all three sites are selected since the values ofNUMPATS are all ten before the merge is done.Since SubIF selection comes after the merge, onlySite AA is included in that subset. There are no errormessages in this situation, just the potential forerroneous results. MERGE patdata(IN= inpat)Invdata(IN= ininv) ; Example 3The table in Figure 3 shows differences in proces-sing time when comparing specific WHERE andSubIF strategies. The objective of each programwas to process a subset of an existing SAS data setin order to produce a simple frequency table. Thefive strategies were:1) DATA with Subsetting IF followed by PROC2) DATA using WHERE followed by PROC FREQ3) PROC SQL followed by PROC FREQ4) DATA using WHERE on index followed byPROC FREQ5) PROC FREQ using WHERENo data manipulation was done in the DATA stepbesides selecting observations. For the fourthmethod, a simple index of the variable used to selectrecords was created using PROC DATASETS.The programs were run on a Pentium II micro-computer using Release 6.12 under MicrosoftWindows95. The input data set had 36 variables,mostly numeric, and 27,567 observations. Thesubset contained 9189 observations (33%). Thetiming figures are averages of four repetitions.The exact figures in this example are not asimportant as the conclusion that using WHERE canreduce processing time in certain situations. UsingWHERE with a relevant index clearly can increase SITEINVNAMENUMPATSAAALBERT10ABBROWN10ACCORWIN10 SITEDATASITEDONESITEINVNAMENUMPATSSITEINVNAMENUMPATSAAALBERT10AAALBERT10ABBROWN10ABBROWN12ACCORWIN10ACCORWIN 8 SITEINVNAMENUMPATSAAALBERT10 StrategiesDATAPROCTotal1) DATA with Subsetting IF, PROC1.440.421.862) DATA with WHERE, PROC1.330.411.743) PROC SQL, PROC1.320.451.774) DATA with WHERE, index, PROC1.250.391.645) PROC using WHERE--0.440.44 machine efficiency. However, the best strategy is toplan ahead and create data sets so that you canavoid DATA steps altogether in analysis programs.This strategy assumes, of course, that machineefficiency is a priority over the human time thatadvanced planning may involve.ENHANCEMENTSVersion 7 AdditionsMost of the Version 7 enhancements to WHEREprocessing involve indexed data sets. To· enhanced processing for LIKE and NOT LIKE· WHERE expressions that contain EQ or IN cannow contain· &#x, 00;directional inequalities ()· NOT operators· truncated comparisons· IDXNAME= to use a specific index· IDXWHERE= to override automatic index use(YES or NO).Version 7 makes greater use of composite indexesfor compound optimization, which is the process ofoptimizing multiple WHERE conditions with a singlecomposite index. Version 6 includes the use ofcompound optimization for a limited number ofsituations, including where-expressions with· EQ comparison operators joined with ANDs· IN operators· Fully-bounded comparisons, such as55Version 7 expands this list to include many moreoperators and is correspondingly more complex. Thepaper by Beatrous and Clifford (1998) givesexamples of when a composite index will be used.Generally, it depends on the variables used in theWHERE, the position of those variables in the index,the comparison operators used in the WHERE, andthe logical operators used to join clauses (must beA new option, WHEREUP=, allows you to specifywhether or not to evaluate added or modifiedobservations against a WHERE clause.Frequently Requested EnhancementsWHERE is a valuable part of the SAS programmingtoolbox and is widely used. In the results from the1998 SASware Ballot, three WHERE-related itemsappeared in the top-100. These were:· Add a note indicating the number ofobservations that meet the WHERE condition· Provide the ability to display the WHERE clausein printed output (overall rank of 28)· In PROC FSEDIT, provide a selection list ofpreviously applied WHERE clauses (overall rankCONCLUSIONThe various forms of WHERE are tools worthexploring when subset creation or processing isnecessary. Because of the wide variety of areas inwhich WHERE can be found, you should use care inresearching related problems. While the onlineTechnical Support Notes are an excellent source ofinformation, they can also cause informationAs an example, searching on "WHERE" gave 761hits and searching on "WHERE Statement" gave393 hits in early 1999. Unless your goal is toexhaustively research WHERE, that is too many.Suppose your goal is to find situations whenWHERE gives incorrect results. Your WHEREclause is not giving what you expect and you want tofind out if someone else has found the sameproblem before calling Technical Support. Use"WHERE incorrect results" as a search and you get195 hits. After thinking further, you use "WHEREincorrect results ASSIST" as a search because youare working with the WHERE clause window inSAS/ASSIST. Now you get 16 hits, which is a muchmore reasonable number of issues to review.We recommend that anyone routinely working withsubsets make the effort to understand the executionlogic behind WHERE processing. Knowing whattypes of WHERE exist and how to construct where-expressions is the easy part. By consideringmachine and human efficiency, in the context ofprocessing files with or without indexes and SQLcode possibilities, you can better determine whenWHERE is the better programming choice in yourcomputer environment. Beatrous, S. and Clifford, W., (1998), ÒSometimes, Cary, NC: SAS Institute, Cary, NC: SAStuning: buffers, compression, index files.)Gilsen, B., (1998), ÒSAS Program Efficiency for, Cary, NC: SAS Institute, Cary, NC: SAS Institute Inc., 249Ð257. Cary, NC: SAS Cary, NC: SAS, Cary, NC: SASCary, NC: SAS Institute Inc. 197 pp. (SQL Cary, NC: SAS Institute Inc., 63Ð67. called I/O for Input/Output. Efficiency for humans is