Arthur Tabachneck Myqna Inc John King Ouachita Clinical Data Services Inc Ben Powell Genworth Financial Nate Derby Stakana Analytics Randy Herbison Westat Richard DeVenezia Independent Consultant ID: 587662
Download Presentation The PPT/PDF document "Copy and paste almost anything" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Copy and paste almost anything
Arthur TabachneckMyqna, Inc.John KingOuachita Clinical Data Services, Inc.Ben PowellGenworth Financial
Nate Derby
Stakana AnalyticsRandy HerbisonWestatRichard DeVeneziaIndependent Consultant
2012OrlandoFloridaApril 22-25, 2012Slide2
Suppose you found a table on the web that you needed to have in a SAS dataset
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdfSlide3
The table might be in the form of:
an html pagea pdf document
a spreadsheet
a word document
a page from a wiki
or any other form that you can copy to your system's clipboard
and, of course, it could come from any source .. not just the webSlide4
things that can complicate copying and pasting:
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9
some columns may not have variable names
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdfSlide5
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9variable names may take up more than one row
Row 3
Row 2
Row 1
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
things that can complicate copying and pasting:Slide6
tables may contain one or more blank rows
things that can complicate copying and pasting:
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
199520002004200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.0
50.7
49.1
48.9Slide7
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9rows may have some missing values
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
things that can complicate copying and pasting:Slide8
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9data may contain sub or superscript values
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
things that can complicate copying and pasting:Slide9
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
you may want to name or rename some variables
plus, there might be things you want to change:
CountrySlide10
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
Revenue
1975
you may want to add a variable name prefix/suffix
Country
Revenue
2000
Revenue
1985
Revenue
1990
Revenue
1995
Revenue
2004
Revenue
2005
Revenue
2006
plus, there might be things you want to change:Slide11
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
you might want to add a variable label
2007
Guesstimate
plus, there might be things you want to change:Slide12
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
you might want to specify missing values
n.a.
plus, there might be things you want to change:Slide13
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
you may want to multiply a variable by a constant
.384
plus, there might be things you want to change:Slide14
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
37.5%
you may want to specify formats or informats
plus, there might be things you want to change:Slide15
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
you may want to specify which row(s) should be used to guess formats and informatsSlide16
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKOREA15.116.418.919.423.624.625.526.828.7NEW ZEALAND28.531.137.436.633.635.337.536.736.0AUSTRIA36.740.939.641.242.642.842.141.741.9BELGIUM39.544.442.043.644.944.844.844.5n.a.CZECH REPUBLIC37.535.337.837.536.936.4DENMARK138.446.146.548.849.449.050.749.148.9Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
you may want
some data converted to upper case
plus, there might be things you want to change:Slide17
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
you may not want all of the data
Revenue
1975
Country
Revenue
2000
Revenue
1985
Revenue
1990
Revenue
1995
Revenue
2004
Revenue
2005
Revenue
2006
plus, there might be things you want to change:Slide18
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004
200520062007ProvisionalKorea15.116.418.919.423.624.625.526.828.7New Zealand28.531.137.436.633.635.337.536.736.0Austria36.740.939.641.242.642.842.141.741.9Belgium39.544.442.043.644.944.844.844.5n.a.Czech Republic37.535.337.837.536.936.4Denmark138.446.146.548.849.449.050.749.148.9Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
you may
want to drop one or more columns
Country
plus, there might be things you want to change:Slide19
things that can complicate copying and pasting:
you might want merged cells to apply to more than one variable Slide20
so that you get a table that looks like:
things that can complicate copying and pasting:Slide21
things that can complicate copying and pasting:
the table might need to be transposed – Have:Slide22
things that can complicate copying and pasting:
Need: what you really want isSlide23
things that can complicate copying and pasting:
the table might copy in the following form – Have:ProjNum1
23
4ClaimNum419129PreAdv11(continued)Slide24
things that can complicate copying and pasting:
Need: what you really want isSlide25
Note: Ensure that use of these methods does not violate a site's terms of use
things that can complicate copying and pasting:
the data might not even be in tabular formSlide26
Note: Ensure that use of these methods does not violate a site's terms of use
things that can complicate copying and pasting:and may even be a bit more complexSlide27
Would you know how to get such data into SAS quickly, painlessly and totally accurately?
other than asking some very carefulperson to re-enter all of the data
"Excuse me. Is this the Society for Asking Stupid Questions?"Slide28
Of course you could use your mouse to highlight and copy everything to your system's clipboardSlide29
and, if only a simple extract was needed,
and SAS/Access to PC-Files was licensed,you might be able to paste it into an Excel file and then use proc importSlide30
or if the task requires some features
not currently offered with proc import?but what if one doesn't haveSAS/Access to PC-Files?Slide31
access the system clipboard
assign or rename variable names
parse a structured document
assign missing values
indicate which data rows to select
specify formats and/or informats
import multiple row variable names
proc import currently doesn’t provide a way to: transpose dataaccount for merged cells in variable namesadd a prefix or suffix to variable namesspecify variable labelschange a variable's unit of measurementupcase any variable's valuesspecify the rows to use to guess (in)formatsdrop one or more columnsSlide32
the code presented in this paper includes all of those options and capabilities
but, for tables that you can paste,Slide33
our Truth in Advertising commitment
may not work on all systems
WARNING: The code/method presented in this paper:
IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. The authors shall not be liable whatsoever for any damages arising out of the use of this documentation or code, including any direct, indirect, or consequential damages. In addition, the authors will provide no support for the materials contained herein. is NOT production quality
are NOT
substitutes
for proc import
should NOT be used if such use violates any copyright or terms of agreementSlide34
copy and paste almost anything
This presentation, the code and a paper on the topic are all available on the past presentation’s page for September 16, 2011 at: www.torsas.caSlide35
copy and paste almost anythingSlide36
what you actually copy is extremely
software and system dependent
things that can complicate copying and pasting:Slide37
our proposed solution
first, declare needed macro variables%let spaces=" ";%let hrows=2;%let var_share=3~2 5~4;
%let var_renames=;%let var_prefix=;%let var_suffix=;%let var_labels=;%let first_data_row=3;
%let var_missing=10~n.a.;%let var_formats=2-5~best12.;%let var_informats=2-5~best12.;%let var_units=;%let var_drop=;%let var_upcase=;%let guessingrows=3-3;%let outfile=revenue;
account for multiple row variable names
clipbrd method lets yo
u
"paste" data from your system's clipboard, but translates tabs into spacesanddifferent systems translate tabs into different numbers of spacesaccount for rows between data and variable namesadd and recode variable namesspecify formats and informatsif needed, indicate amount data should be multiplied byspecify one variableor a range of variablesadd a prefix or suffix to any variable namespecify desired variable labelsaccount for merged cells in variable namesupcase any variablesspecify missing values for any variablesspecify which rows should be considered in guessing formats and informatsdrop any columns that you don’t wantSlide38
our proposed solution
first, declare needed macro variables%let spaces=" ";%let hrows=2;%let var_share=3~2 5~4;
%let var_renames=;%let var_prefix=;%let var_suffix=;%let var_labels=;%let first_data_row=3;
%let var_missing=10~n.a.;%let var_formats=2-5~best12.;%let var_informats=2-5~best12.;%let var_units=;%let var_upcase=;%let guessingrows=3-3;%let outfile=revenue;or any combination you might neede.g.,%let var_informats= 1~$20. 2-5~best12. 6~comma8. 7~percent8. 8~anydtdte21.
9~trailsgn8. ;Slide39
our proposed solution
first, declare needed macro variables%let spaces=" ";%let hrows=2;%let var_share=3~2 5~4;
%let var_renames=;%let var_prefix=;%let var_suffix=;%let var_labels=;%let first_data_row=3;
%let var_missing=10~n.a.;%let var_formats=2-5~best12.;%let var_informats=2-5~best12.;%let var_units=;%let var_upcase=;%let guessingrows=3-3;%let outfile=revenue;or any combination you might neede.g.,%let var_informats= 1~$20. 2-5~best12. 6~comma8. 7~percent8. 8~anydtdte21.
9~negative8. ;Note: there are additional macro variablesto address structured layouts and files that
need to be transposed
%let
transpose=YES;%let columns=5;%let rows=80;%look_for=view all editions and formats;%skip_lines=1;Slide40
our proposed solution
in the code, instructions are shown as comments. e.g.%let var_prefix=; *specify any string that you want appended before any variable name. A ~ must be used to separate variable number(s) and variable prefixes, and either a space or separate line to represent
additional entries. If you want the same prefix used for a range of variables, specify the range as #-#. E.g., if variables 2 thru 4 are named 1996, 1997 and 1998, and you want them to be named Revenue_1996, Revenue_1997 and Revenue_1998 you would specify:
%let var_prefix=2-4~Revenue_;Slide41
create macros for code that is used repeatedly
f %macro expandr (type,string);
i=1; hold_rec=""; do while (scan("&string.",i," ") ne ""); if scan(scan(scan("&string.",i," "),1,"~"),2,"-") ne "" then do;
start=scan(scan(scan("&string.",i," "),1,"~"),1,"-"); end=scan(scan(scan("&string.",i," "),1,"~"),2,"-"); end; else do; start=scan(scan("&string.",i," "),1,"~"); end=scan(scan("&string.",i," "),1,"~"); end; do j=start to end; hold_rec=catx(" ",hold_rec,cat(strip(j)||"~"|| strip(scan(scan("&string.",i," "),2,"~")))); end; i+1; end; call symput(&type.,strip(hold_rec));%mend expandr;Slide42
create macros for code that is used repeatedly
f %macro
expandr (type,string);
hold_formt=catx(" ",hold_formt,cat(strip(j)||"~"|| strip(scan(scan("&string.",i," "),2,"~")))); end; i+1; end; call symput(&type.,strip(hold_formt));%mend expandr;%macro filarray (type,string); if scan("&string.",i," ") ne "" then &type(scan(scan("&string.",i," "),1,"~"))= scan(scan("&string.",i," "),2,"~");%mend filarray;Slide43
create datastep to count number of variables
f %flipfile
data _null_; length hold_rec $32767; infile clippy; input;
_infile_=tranwrd(_infile_, &spaces., '09'x); var_count=countc(_infile_,,"H")+1; call symput('var_count',strip(put(var_count,8.))) %expandr("var_formats",&var_formats) %expandr("var_informats",&var_informats) %expandr(("var_missing",&var_missing.) %expandr("var_units",&var_units) %expandr("var_prefix",&var_prefix) %expandr("var_suffix",&var_suffix) %expandr("var_upcase",&var_upcase.)
%expandr("var_drop",&var_drop.) %expandr("var_labels",&var_labels.) %expandr("var_share",&var_share.) stop;run;Slide44
declare arrays and variables in main datastep
f d
rdata _null_; file revised lrecl=32767; infile clippy end=eof; array headers(%sysfunc(max(&hrows.,1))) $32767.; array varnames(&var_count.) $32.; array formats(&var_count.) $32.; array informats(&var_count.) $32.; array renames(&var_count.) $32.; array prefix(&var_count.) $32.; array suffix(&var_count.) $32.; array labels(&var_count.) $32.; array miss(&var_count.) $255.;
array upcases(&var_count.) $3.; array drops(&var_count.) $3.; array units(&var_count.) $32.; array share(&var_count.) $32.; array varlens(&var_count.); array vartypes(&var_count.); length hold_rec temp ivartype fvartype var_units
var_names
var_labels var_drop
$32767; length missval $255; retain headers renames varnames vartypes varlens formats informats units prefix suffix labels drops miss upcases share grows_start grows_end;Slide45
parse the header row(s) and macro variables
f d
rdata _null_; leinput;
_infile_=tranwrd(_infile_, &spaces., '09'x);if _n_ le &hrows. then headers(_n_)=tranwrd(tranwrd( tranwrd(_infile_, '%', 'percent'),'-','_to_'),'–','_to_');if _n_ eq &hrows. or (_n_ eq 1 and &hrows eq 0) then do; grows_start=scan("&guessingrows.",1,'-'); if missing(grows_start) then grows_start=&first_data_row.; grows_end=scan("&guessingrows.",2,'-'); if missing(grows_end) then grows_end=999999;
do i=1 to &var_count.;
%filarray(renames,&var_renames
.) %filarray(prefix,&var_prefix.) %filarray(suffix,&var_suffix.) %filarray(units,&var_units.) %filarray(formats,&var_formats.) %filarray(informats,&var_informats.) %filarray(upcases,&var_upcase.) %filarray(drops,&var_drop.) %filarray(labels,&var_labels.) %filarray(miss,&var_missing.) %filarray(share,&var_share.) if &hrows. eq 0 then varnames(i)=cat("Col"||strip(i));Slide46
obtain and assign variable names
f d
rdata _null_; le
%
c
else do;
varnames(i)=""; do j=1 to &hrows.; if j eq 1 and share(i) ne "" then do; if strip(scan(headers(j),share(i),,"HM")) ne "" then varnames(i)=strip(scan(headers(j),share(i),,"HM")); end; else do; if strip(scan(headers(j),i,,"HM")) ne "" then do; if strip(varnames(i)) ne "" then varnames(i)= strip(varnames(i))||"_"||strip(scan(headers(j),i,,"HM")); else varnames(i)=strip(scan(headers(j),i,,"HM")); end; end; if j eq &hrows. and varnames(i) eq "" then varnames(i)=cat("Col"||strip(i)); end;end;Slide47
create macro variables and write file (part 1 of 4)
f;f;;%;%;%; ; ;
; r; r; " ;%;
%; ; ; ; ; ; ;;%; ; ; ;%;
g
i
"); 1; ; ; ;;if renames(i) ne "" then varnames(i)=renames(i);if prefix(i) ne "" then varnames(i)= strip(prefix(i))||strip(varnames(i));if suffix(i) ne "" then varnames(i)= strip(varnames(i))||strip(suffix(i));if strip(labels(i)) eq "" then labels(i)=strip(varnames(i)); else labels(i)=tranwrd(strip(labels(i)), '^', ' ');varnames(i)=tranwrd(strip(varnames(i)),'%', 'percent');varnames(i)=tranwrd(strip(varnames(i)),'-','_to_');varnames(i)=tranwrd(strip(varnames(i)),'–','_to_');varnames(i)=tranwrd(strip(varnames(i)),'#', 'number');varnames(i)=tranwrd(strip(varnames(i)), ' ', '_');varnames(i)=compress(varnames(i),,'kn');if anydigit(substr(varnames(i),1,1)) then varnames(i)=cat("_",strip(varnames(i)));var_names=catx(" ",var_names,strip(varnames(i)));var_labels=cat(strip(var_labels)||"label "|| strip(varnames(i))||"="||quote(strip(labels(i)))||";");Slide48
create macro variables and write file (part 2 of 4)
f;f;;%;%;%; ; ;
; r; r; " ;%;
%; ; ; ; ; ; ;;%; ; ; ;%;
g
i
"); 1; ; ; ;; if units(i) ne "" then var_units= catx(" ",var_units,strip(varnames(i))||"="|| strip(varnames(i))||"*"||strip(units(i))||";"); if drops(i) eq “YES” then var_drop= catx((" ",var_drop,strip(varnames(i))); end; if var_drop ne “” then var_drop= “(drop=)||strip(var_drop)||”)”; call symput('varnames',var_names); call symput('varlabls',var_labels); call symput('varunits',var_units); call symput('vardrop',var_drop); end; if _n_ ge &first_data_row. then do; if countc(_infile_,,"H")+1 eq &var_count. then do; do i=1 to &var_count.; temp=strip(scan(_infile_,i,,"HM")); if upcase(upcases(i)) eq "YES" then temp= upcase(temp); if strip(temp) ne "" then do; if miss(i) ne "" then do; k=1;Slide49
create macro variables and write file (part 3 of 4)
f;f;;%;%;%; ; ;
; r; r; " ;%;
%; ; ; ; ; ; ;;%; ; ; ;%;
g
i
"); 1; ; ; ;; do while (scan(miss(i),k,"` ") ne ""); missval=tranwrd(strip(scan(miss(i),k,"` ")),'^',' '); temp=tranwrd(strip(temp),strip(missval), ''); k+1; end; end; if grows_start LE _n_ and grows_end GE _n_ then do; call missing(vartype); in_test = input(temp, ?? best12.); if not missing(in_test) then vartype=0; else do; in_test = input(temp, ?? anydtdte21.); if not missing(in_test) then vartype=2; else do; if index(temp,"$") then in_test = input(temp, ?? dollar21.); if not missing(in_test) then vartype=4; else do; if index(temp,",") then in_test = input(temp, ?? comma21.); if not missing(in_test) then vartype=5;Slide50
create macro variables and write file (part 4 of 4)
f;f;;%;%;%; ; ;
; r; r; " ;%;
%; ; ; ; ; ; ;;%; ; ; ;%;
g
i
"); 1; ; ; ;; else do; if index(temp,"%") then in_test = input(temp, ?? percent21.); if not missing(in_test) then vartype=3; else vartype=1; end; end; end; end; if missing(vartypes(i)) then vartypes(i)=vartype; else if vartype ne vartypes(i) then vartypes(i)=1; if missing(varlens(i)) or length(temp) gt varlens(i) then varlens(i)=length(temp); end; end; if i eq 1 then hold_rec=strip(temp); else hold_rec=cat(strip(hold_rec),"09"x,strip(temp)); end; put hold_rec;end;Slide51
assign formats and informats
f;f;;%;%;%; ; ;
; r; r; " ;%;
%; ; ; ; ; ; ;;%; ; ; ;%;
g
i
"); 1; ; ; ;; g i "); 1; ; ; ;; if eof then do; ivartype=""; fvartype=""; do i=1 to &var_count.; if vartypes(i)=1 then do; itempvar=cat("$",strip(put(varlens(i),3.)),"."); ftempvar=itempvar; end; else if vartypes(i)=2 then do; itempvar="anydtdte21."; ftempvar="date9."; end; else if vartypes(i)=3 then do; itempvar="percent."; ftempvar="percent8.2"; end; else if vartypes(i)=4 then do; itempvar="dollar."; ftempvar=cat("dollar",strip(put(varlens(i),3.)),"."); end; else if vartypes(i)=5 then do; itempvar="comma."; ftempvar=cat("comma",strip(put(varlens(i),3.)),"."); end; else do; itempvar="best12."; ftempvar="best12."; end;Slide52
bring in user specs and create macro variables
f;f;;%;%;%; ; ;
; r; r; " ;%;
%; ; ; ; ; ; ;;%; ; ; ;%;
g
i
"); 1; ; ; ;; g i "); 1; ; ; ;;i ; ; ; r; ; ; ; ; ; ; ; if strip(informats(i)) ne "" then itempvar=strip(informats(i)); if strip(formats(i)) ne "" then ftempvar= strip(formats(i)); ivartype=catx(" ",ivartype,"informat", varnames(i),itempvar,";"); fvartype=catx(" ",fvartype,"format", varnames(i),ftempvar,";"); end; call symput('informt',ivartype); call symput('formt',fvartype); end; end;run;Slide53
create the data step that pastes the data
f;f;;%;%;%; ; ;
; r; r; " ;%;
%; ; ; ; ; ; ;;%; ; ; ;%;
g
i
"); 1; ; ; ;; g i "); 1; ; ; ;;i ; ; ; r; ; ; ; ; ; ; ; = ; rdata &outfile. &vardrop.; infile revised lrecl=32767 dsd delimiter="09"x; &informt.; &formt.; &varlabls.; input &varnames.; &varunits.;run;proc delete data=work.form_varnames;run;filename clippy clear;filename revised clear;Slide54
combined, the code really isn't very intimidatingSlide55
and to complete the task
highlight the tableclick copyenter desired macro variable settingsrun the codeSlide56
however, it may not be THAT simple in all cases!
not all tables are directly copyable(i.e., without losing critical table metadata)Slide57
a useful free set of tools you might find helpful
Adobe Acrobat Reader 6http://www.oldapps.com/adobe_reader.php
Adobe Acrobat 5 TAPS Pluginhttp://www.pdfhacks.com/TAPS/Slide58
any version of Adobe Acrobat (other than Reader) may be needed to convert a table into a PDF file in order to take advantage of TAPSSlide59
how TAPS can be helpful (an example)
copy the first four lines from the table athttp://www.oecd.org/dataoecd/48/27/41498733.pdf Slide60
how TAPS can be helpful (an example)
on your monitor the table will appear as:Slide61
how TAPS can be helpful (an example)
but, if you paste it into notepad, it will appear as:
there is no indication that column #1's heading is missing
the heading for column #10 appears in column #1 on the 2nd and 3rd rows
there is no indication that column #2 is missingSlide62
how TAPS can be helpful (an example)
but if you open it with Adobe 6 Reader with TAPS:Slide63
how TAPS can be helpful (an example)
drag a rectangle around the data you want
right click on Text-Flow, click on Table, then copy
TableSlide64
how TAPS can be helpful (an example)
now, if you paste it into notepad, it will appear as:Slide65
filename clippy clipbrd;
filename revised temp;%let hrows=3;%let spaces=" ";%let first_data_row=4;%let var_renames=1~Country;%let var_labels=;%let var_prefix=2-10~Revenue_;%let var_suffix=;%let var_share=;%let var_formats=2-10~percent8.1;%let var_informats=2-10~best12.;%let var_units=2-10~.01;%let var_drop=;%let var_upcase-;%let var_missing=;%let guessingrows=;
%let outfile=revenue;
how TAPS can be helpful (an example)then, using the following settings:
variable names are to be combined from the first three rows
the data begin on row 4
the variable name for the first column should be "Country"
the 2nd thru 10th variables should all use a percent8.1 formatthe 2nd thru 10th variables should all use a best12. informatthe 2nd thru 10th variables should all be multiplied by .01the 2nd thru 10th variables should all be preceded by the string "Revenue_"Slide66
how TAPS can be helpful (an example)
you will obtain the following SAS file:Slide67
not all browsers were created equally
the method described in this paper appears to work better with Google Chromethan with Microsoft Internet Explorer
and better with a TAPS enhanced version of Adobe Acrobat Reader 6than with newer versions of Acrobat Reader Slide68
sometimes a table will copy as a single column
e.g., copy the four column table athttp://www.thelawyer.com/directory/uk-200-table-top-100/ Slide69
sometimes a table will copy as a single column
you'll find that entire columns get highlighted as you drag your mouse from left to right
although it appears as a 101 row 4 column tableit actually copies as a 404 row 1 column tableSlide70
adding the following macro variable assignments
will transpose the data as it is being "pasted"%let columns=4;%let rows=100;Slide71
importing non-tabular data
for example, what if you wanted to copy a page from a search at: http://www.worldcat.org/
Note: This site's terms of use explicitly prohibits the use of programming routines to "mine" or harvest material amounts of Data
irrelevant listing id is each entry's first record
titles appear without any identifier
author is always preceded by the word "by"
type appears without any identifier
Language is always preceded by the string "Language:" Publisher is always preceded by the string "Publisher:" Record always endswith the string "View all editions and formats" Slide72
our solution: two extra macro variables
options NOQUOTELENMAX;options datestyle=mdy;filename clippy clipbrd;filename revised temp;%let spaces=" ";%let look_for=View all editions and formats;%let skip_lines=1;%let var_renames=2~Author;
Record always ends with the string "View all editions and formats"
There is always one blank line between records
+ the same macro variables used previouslySlide73
a datastep to describe the form's layout
data form_varnames; informat varname $50.; input varname &; cards;1~Title~1~1~02~by~1~1~13~Type~1~1~04~Language:~1~1~15~Publisher:~1~1~1;
Variable number
Variable name or field header
Field's position within a row (in right to left order)
Number of rows that must be read to read the entire field
Whether or not the field has a header in the data (i.e., 0=no, 1=yes) Slide74
and to complete the task
highlight the formclick copyenter macro variable settings and field definitionsrun the codeSlide75
should work on any operating system
has a number of useful import features
tables can be pasted from any source that you can copy to your system's clipboard
modifiable (new features can be added)
benefits
avoid extra datasteps
only requires base SASSlide76
unsupported - can't complain to anyone if it doesn't work correctly
extremely software dependent
limitations
requires you to know your data
may require some additional softwareSlide77Slide78
Author Contact Information
Your comments and questions are valued and encouraged.
Contact the authors:
Richard A. DeVenezia9949 East Steuben RoadRemsen, NY 13438http://www.devenezia.com/contact.phpJohn King,Ouachita Clinical Data Services, Inc.Mount Ida, ARe-mail: iebupdte@gmail.comRandy Herbison,Senior Systems Analyst Westat1650 Research Boulevard
Rockville, MD 20850e-mail: RandyHerbison@westat.comArthur Tabachneck, Ph.D.
M
yqna, Inc.
Toronto, Ontarioe-mail: atabachneck@gmail.comNate DerbyStakana Analytics815 First Ave., Suite 287Seattle, WA 98104-1404e-mail: nderby@stakana.com Ben Powell Genworth Financial e-mail: ben.powell@genworth.com