/
Data Manipulation (with SQL) Data Manipulation (with SQL)

Data Manipulation (with SQL) - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
484 views
Uploaded On 2017-04-28

Data Manipulation (with SQL) - PPT Presentation

HRP223 2010 October 13 2010 Copyright 19992010 Leland Stanford Junior University All rights reserved Warning This presentation is protected by copyright law and international treaties Unauthorized reproduction of this presentation or any portion of it may result in severe ID: 542536

variables data library variable data variables variable library code dataset sql click process rename set fake label reports sharing

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data Manipulation (with SQL)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data Manipulation (with SQL)

HRP223 – 2010

October 13, 2010

Copyright ©

1999-2010

Leland Stanford Junior University. All rights reserved.

Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.Slide2

Topics For TodayOrganization

Sharing a SAS datasetAs .sas7bdat files or other formatsRenamingDatasets

VariablesSubsetting a datasetSelect a few variablesSelect a few recordsSQL reports for a single table of data

Selecting/renaming variables

Applying labels and formats

Creating tables with SQLSlide3

Avoiding Spaghetti Code

Programmers refer to unstructured, poorly thought through, unorganized code as spaghetti code. Your EG projects will literally look like a tangled mess of spaghetti if you do not structure them in advance.Use several named process flowsUse lots of notes in the project

Include a lot of comments if you write code

This is bad.

OrganizationSlide4

Process ManagementTypically you will have a process flow that tells EG

where to find existing SAS data or it says to import from the source file(s) from a database like REDCap or from Excel and then does data

cleaning and splits the data into subsets.If you do different sets of analyses to the subsets, add in a

process flow for each subset

.

Have one of the process flows create a dataset called

analysis

that has the cleaned data with all the information used in the analyses.

OrganizationSlide5

Working with Multiple Process Flows

You can add other process flows with the File menu or by right clicking on the background of a process flow.

or click here.

Click here to move between flowcharts…Slide6

Right click on the process flow and give it a meaningful name.

You may want to link the library to the dataset.

OrganizationSlide7

The Greater Right of the LeftYour process flows should have the source of the data on the left

. The left margin should have:A note saying what the flowchart doesA code node that creates a toy dataset or a library (or libraries) that contains the data

OrganizationSlide8

A Good Process Flow

OrganizationSlide9

Organization in Programs

All my SAS code begins with the same header information.The /* */ are used to mark large comments.Slide10

Display manager deletes output text and log.

Do not show the name of the procedures in output.

Do X commands ASAP. Don’t show the date in output and reset page # to 1.

Delete graphics in the work library.

Specify where output will be stored.

Make the folder where output will be stored if it does not exist. Delete what is there if it exists. Set file path to that directory.

Make a library to store output datasets.

Make a web page to display all output.

Make pretty graphics.

Run other programs.

Turn off graphics and output.Slide11

Sharing DataYou can share SAS data sets just like Excel files.Create a library.

Copy the data into the library.If the data has formats associated with it, be sure to send the formats.More on this on a later date.

Sharing Slide12

Exporting the Point and Click WayDouble click the data set you want to export and use the Export context dependent menu.

Sharing Slide13

LibrariesRecall that a library is reference to a location on a hard drive. If you tell EG to move a data set into a library it moves it into the folder that the library “points at”.Slide14

With Code….Create a library with the GUI or use the

libname

statementlibname blah

"C:\blah"

;

Write a little program to move the data into a permanent library:

proc

copy

in

= work

out

= blah;

select

humans;

run

;

Sharing Slide15

This code is efficient.

Sharing Slide16

AlternativesNovices underuse

proc copy. Instead they typically write less efficient data steps. For example,data

blah.humans;

set

work.humans

;

run

;

Or they may write:

data

"C:\blah\humans.sas7bdat"

;

set

work.humans

;

run

;

Sharing Slide17

Sharing

Either create a library node or write this line.

Functionally the same but less efficient than

proc copy

.

Either create a library node

or

write this line.Slide18

Export Code for a Different Format

Sharing Slide19

Note that you have to manually connect the code node to the right place in the flow chart and the exported item does not show up on the process flow.

Sharing Slide20

Copy and RenameIf you want to copy and rename a file, use the GUI or write code.

Double click the data set.Choose Query Builder from the context sensitive menu.

Renaming datasetsSlide21

Renaming datasetsSlide22

With code…data

blah.test

; set

work.humans

;

run

;

Renaming datasetsSlide23

Make Some Fake DataYou can tell SAS to make an ID variable and have it be output to a file named dudes with the values from 1 to 10 like this:

by 1 is optional.

It will step by 1 by default.

The spaces before and after = are optional.Slide24

Add in a ConstantI want to add in a column to indicate that these are all of type Fake.Slide25

Common Mistakes (1)

What happens if you leave off the quotes around the value

fake?SAS thinks you want to set the variable type equal to the variable fake.Slide26

Always Search Your Log for uninitialized

If you notice an empty variable at the end of your dataset you forgot quotes or you misspelled a variable name … and SAS made it for you.

There was no fake variable so it make one for you…

I wish this was an ERROR!Slide27

Common Mistakes - Semicolons(2)Slide28
Slide29
Slide30

Common Mistakes – Dataset Spaces(3)

SAS lets you use white space to organize your program but you should not use spaces in variable names and you can’t use spaces in dataset names.

Not a syntax error but not what you wanted… a semantic error. You get two datasets.Slide31

More Bulletproof You can specify the name of the dataset you want to output into… this is a good idea.Slide32

Common Mistakes – Variable Spaces(4)Slide33

GUI InsteadYou can use the GUI to make a dataset by hand or include a program and then use the GUI to add:

Gooey = graphical user interface

4. Compute Columns

1. label the node

2

. label the dataset

3. Drag and drop the ID variableSlide34

To add in a column based on existing data:

Click New…Click Recoded Column

Click the column you are basing the new variable upon

6

7Slide35

Specify the new column is character or numberClick Add…

8

9

This is an example of bad GUI design. Commands appear out of logical order.Slide36

Add a constant

Pick from the Replace Values, Replace a Range, Replace Condition tabsSpecify what is replacing what.

We want to add in “Fake” to all records. All records are not missing and ID so use that for the request.Click OK

10

11a

11b

12Slide37

Specify what to do with all other values.Click Next>

13

14

The same bad GUI with commands appearing out of logical order.Slide38

Specify the column labelSpecify the variable name

Click Next>Click Finish

Click Close

15

16

17

Notice the poor GUI design… why is the column type shown here as radio buttons which are disabled?

If the type of variable is wrong push back and fix it!

19Slide39

A Simple 20 Step Process Push Run.

20Slide40

The SQLThis is the code that was written by your pointing and clicking:

Click to see the code.

Consider saving this block of code in your private code library out on Google sites.Slide41

Select a Few Variables From Fake DataThe next task is to select a couple of variables from a data set that has a LOT of variables.

If you get a premade dataset with lots of extra variables, you want to drop the ones you will never use. Do this as soon as you can.First I will make some fake data. The data set will have a simulated test value filled into 6 “month” variables.

Fake dataSlide42

How to make a fake subject

Fake data

Variables are added to the new dataset in the order in which they are created. New variables are created if they show up in array statement (rarely) or on the left side of an equal sign (=).

Comments can start with * and end with ; Slide43

Fake dataSlide44
Slide45

You can use the Filter and Sort context sensitive menu to select a few variables.

To rename a variable or change how it prints in reports you need to use the Query Builder or write code.

Selecting variables and renaming

Rename and label variablesSlide46

Drag and drop the variables you want into the Select Data windowpane.

Rename and label variables

Click on a variable name. Then use the properties button to change the name and the display label.

Month1 is January but for reports I want it to say First Month.Slide47

Rename and label variablesSlide48

Rename and label variables

I usually display the variable names instead of the labels.

To write code, you need the names not the labels.Slide49

What it did…

Rename and label variablesSlide50

Data Step (SAS code) Version

Notice where the ; is found. This is one long statement.

Rename and label variablesSlide51

Minimal SQLPrint a report showing the contents of variables from a single data set.

Put a comma-delimited list of variables here or * for all variables.

Specify a

library.table

here.

Note that there is no create table ____ as

SQL reportsSlide52

What variables?Typically you will use a coma delimited list but you can use an * to indicate that you want all variables selected instead of typing them all.

There is no syntax to specify variables based on position in the source files. That is, you can not specify that you want to select the 2nd and 7th

variables (from left to right) or to select the first 3 variables.

SQL reportsSlide53

Use of Minimal SQL

Note that the order of the list sets the order in the report (or the order in a new dataset).

SQL reports – selecting variablesSlide54

Renaming and LabelsYou can rename a variable in the list with an

as statement.

SQL reports – rename/label

as

creates a new variable. Without

as

SQL just copies the variable

You can also specify variable labels.Slide55

Using Formats

Labels affect column headings and similar titles, and

formats affect how values appear without changing the values themselves.

Notice the lowercase

i

. The capitalization is set when the variable is created.

SQL reports – formatSlide56

Preview of User Defined Formats

Note the $ means a character format.

SQL reports – formatSlide57

blah

SQL tables

New table.

Original tableSlide58

More TweaksThe from

line references tables which are in libraries. Complex queries require you to reference the table name over and over again. Instead of having to type the long library and dataset names repeatedly, you can refer to the files as an alias.

Print the column called dude from the table blah which is in the fakedata

library.

Here the b. is optional because dude is only in one table (the query only uses one table).

SQL reports – table aliasesSlide59

Data Step Version….

Rename label and format variables