Data Dictionaries 1 Prerequisites Recommended modules to complete before viewing this module 1 Introduction to the NLTS2 Training Modules 2 NLTS2 Study Overview 3 NLTS2 Study Design and Sampling ID: 155800
Download Presentation The PPT/PDF document "11. NLTS2 Documentation:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
11. NLTS2 Documentation:
Data DictionariesSlide2
1
Prerequisites
Recommended modules to complete before viewing this module
1. Introduction to the NLTS2 Training Modules
2. NLTS2 Study Overview
3. NLTS2 Study Design and Sampling
NLTS2 Data Sources, either
4. Parent and Youth Surveys or
5. School Surveys, Student Assessments, and Transcripts
10. NLTS2 Documentation OverviewSlide3
2
Overview
Purpose
Data dictionary contents
File specifications
Variable prefix
Missing values
Variable documentation
Variable documentation details
Parent/youth Part 2 documentation distinctions
Transcript data documentation distinctions
Supplemental documentation
Closing
Important informationSlide4
3
Purpose
The data dictionary section of the documentation is the most detailed for individual data items.
The data dictionary includes specific information about each item such as
Which respondents are included in the data element if there is skip logic applied.
Documentation of any modification made to the data element, such as a logical assignment to change a value.
Variable names of corresponding items in other Waves.
Users should refer to the data dictionary before specifying any analysis.Slide5
4
Purpose
Why use the data dictionary rather than the data collection instruments?
Data collection instruments are extremely useful.
Can be a quick reference for finding an item
Show the item in the context of other items
Contain the exact wording of questions that respondents were asked
However, only the data dictionaries describe
Complex skip logic, especially from CATI instruments
Data issues, such as an addition of response categories from one wave to the next
Any programmatic modifications, assignments, or recoding of the data, such as setting a value to yes if a prior response is yesSlide6
5
Data dictionary contents
There is a data dictionary for every data collection source within each wave.
Every dictionary begins with a linked contents.
Links go to
File specifications.
Variable descriptions by section or topic area.Slide7
6
Data dictionary contents exampleSlide8
7
File specifications
The first section of the data dictionary is “File Specifications,” which lists
The associated file name
The data collection source
The prefix for variable names in the file
Linking variable (always “ID”)
Missing valuesSlide9
8
File specifications: ExampleSlide10
9
File specifications
Variable prefix
The prefix for variable names in the file applies to most but not all variables.
With a few exceptions, variables found in this file begin with the variable prefix.
There are specialized variables that have another prefix structure, such as wave-specific demographic variables.
Example
: W2_Age2003 is the age of youth during the
Wave 2 Parent/Youth data collection and W2_Age2004 is the age of youth for the Wave 2 school data collection; the prefix is Wave 2.Slide11
10
File specifications
Missing values
Can be found in this file.
Note about missing values
User-defined missing values specify why a variable is missing.
Missing values are excluded from calculations in procedures unless the user specifies options to include them.
Data were developed in SAS and converted to SPSS.
There are differences in how missing values are defined and stored in SAS and SPSS.Slide12
11
File specifications
Missing values in SAS
System default missing in SAS is a “.”
User-defined missing values in SAS can have a value from “.a” to “.z”
Missing values in a numeric variable have a numeric value in a SAS logical statement.
For example, the logical statement “If npr1B4 < 1” would include all cases for which the value is “0”, a negative number, or a missing value.Slide13
12
File specifications
Missing values in SPSS
SPSS system missing is a “.” in the data and appears as “System” in SPSS analysis output under “Missing.”
SPSS allows for three distinct user-defined missing values, fewer than SAS.
With the range option, users can define a range of missing values to work around the limitation of three distinct missing values.
Missing values are represented as negative numbers in the NLTS2 SPSS data.
-980 through -999 are in the missing values range.
Missing values in SPSS do not have a numeric value in a logical statement, unlike in SAS.
For example, “IF (npr1B4 < 1) B4New = 0.” would result in a missing value in B4New if npr1B4 is missing.Slide14
13
Variable documentation
After “File specifications,” the dictionary lists all variables in tabular format.
The variables in the data dictionary are organized by section, matching the sections in data collection instruments (source data).
Within each section, there are two sets of variables.
Variables that come directly from the data collection instruments.
Variables created from source data within that section.
Variable descriptions include
Name, variable type, variable values, source(s), and information about skip logic, assignments made, and corresponding variable names in other waves.Slide15
14
Variable documentation
Variables that come directly from the data collection instruments (source data)
Variable names usually have the uniform variable prefix.
Source data are drawn from the section, question number, and
subitems
in the source instrument.
It can be relatively straightforward to find an item in an instrument and locate it in the dictionary.
Example
: variable name np4E2c
The “np4” prefix is NLTS2 Parent/Youth Survey Wave 4.
The “E2c” is Section E of the Parent/Youth Instrument, Question 2,
subitem
C. Slide16
15
Variable documentation
Variables created from source data
Variables that are created using data from the associated section are listed at the end of the section.
Created variables typically have names that describe the variable rather than relate to a data collection source, but with the same prefix as the source variables.
Variable np3_JobCompNow is
[
np3
] Parent/Youth interview Wave 3
[
JobCompNow
] currently competitively employed
Collapsed variables, i.e., variables combined from two or more items, sometimes list all contributing variables in the name
Variable np4U8a_J15a is
[
np4
] Parent/Youth Wave 4 [
U8a
] question U8a [
_
] combined with [
J15a
] question J15aSlide17
16
Variable documentation
In addition to variables related to particular items from data collection instruments, there are some other key variables.
Demographic variables that are used for many NLTS2 analyses and published Web tables
Weights, including replicate weights
Linking variable “ID”
Preload, CATI, and/or sample variables
The following slide provides a quick glance at the data dictionary with details in following slides.Slide18
17
Variable documentation: Quick lookSlide19
18
Variable documentation: Formatting key
Bold text in the dictionary indicates a modification to questionnaire categories as a result of coding and categorizing verbatim responses.
Grey text indicates that there are no data for this item in this wave.
For example, Question R1b was asked in Waves 2 to 4 but not in Wave 5; in Wave 5, R1b is shaded.Slide20
19
Variable documentation details
Variable name
Name of the variable as it appears in the data file.
In this example, there is a series of variables for item np4F11b, np4F11b_a through np4F11b_h.
Each variable in the series is listed separately.
Figure 1-A.
Note
: See Figure 1, section C in Module 11 Supporting Materials.Slide21
20
Variable documentation details
Source
Item from data collection source.
If multiple instrument sources, items from each data source listed.
This example comes from the question F11b,
subitems
a-g.
Figure 1-B.
Note
: See Figure 1, section C in Module 11 Supporting Materials.Slide22
21
Variable documentation details
Variable description
Describes the variable.
Often the text of the question from the source instrument
Variable description corresponds with the variable label in the file contents.
Figure 1-C.
Note
: See Figure 1, section C in Module 11 Supporting Materials.Slide23
22
Variable documentation details
Variable description (cont’d)
In this example, the item
is described as types of
life skills training, the
subitems
are the individual types of life skills training
listed in this question.
Subitems
“a-g” come from the source and “h” is created.
Figure 1-D.
Note
: See Figure 1, section C in Module 11 Supporting Materials.Slide24
23
Variable documentation details
Variable type and values
Shows how the variable is coded and what the codes mean.
Variable type is numeric, date, or character.
The variable values match the variable’s associated format referred to in the SAS contents.
This example is a numeric variable with yes/no values.
Figure 1-E.
Note
: See Figure 1, section C in Module 11 Supporting Materials.Slide25
24
Variable documentation details
Describe any changes made to a variable
List logic for making an assignment or modification to an existing variable.
Specify the logic for how new variables were created.
An assignment might increase or decrease the base.
In this example, assignments were made to
subitems
np4F14_[a-g] to set values to “no” if np4F11a is “no.”
A new
subitem
np4F11b_h is created using values from np4F11a and np4F14a_f.
Notes: Assignments,
modifications
, or
validations
Figure 1-F.
Note
: See Figure 1, section C in Module 11 Supporting Materials.Slide26
25
Variable documentation details
Base: Which respondents asked
Logic is expressed as who is
included, not who is skipped.
Explains varying
n
’s
due to
skip logic.
If “All respondents” is noted, it
means no one is skipped.
In this example, the respondents asked this item were limited to those who had not been in secondary school in the past year and had specified this service since leaving high school.
However, in the notes column in the previous slide there was an assignment made.
Although they were not asked this question, those who were “no” to np4F11a were assigned a “no” to np4F14_[a-g].
Figure 1-G.
Note
: See Figure 1, section C in Module 11 Supporting Materials.Slide27
26
Variable documentation details
Variable name by wave
Along with the variable
name for the current
wave, corresponding
variable names are listed
by all other waves.
There may be minor differences in the variables between waves, or an item may not have been asked in another wave.
In this example, there is no corresponding set of variables for this item in Wave 1, and the item is slightly different in Wave 5.
Figure 1-H.
Note
: See Figure 1, section C in Module 11 Supporting Materials.Slide28
27
Variable documentation details
Some of the columns noted above contain information not found elsewhere.
“Base” and “Notes” columns are key for understanding the nature of a variable.
Provide documentation about who is included in an item and any changes made to the data.
Particularly important when using CATI data with complex skip logic.
“Variable name by wave” is a resource for finding longitudinal items.
Provides wave-by-wave variable names.
Indicates if item not collected in a given wave and notes if item differs in other waves. Slide29
28
Parent/youth Part 2 documentation
distinctions
Waves 2 to 5 Parent/Youth Survey has a Part 2 that is completed by either the youth or the parent/guardian.
Documentation for Part 2 in these waves includes all sources and variable names.
For each item, variables are listed in the following order: youth item, the parent/guardian item, and a collapsed youth/parent item.
For collapsed items in cases where there is a value for both items, priority is given to the youth value.
Usually there is either a parent/guardian value or a youth value.Slide30
29
Parent/youth Part 2 documentation: Quick lookSlide31
30
Parent/youth Part 2 documentation
The item is “Youth has done volunteer or community service in the past 12 months”.
np5P8 is the youth item, np5J4 the parent/guardian, and np5P8_J4 is the combined youth/parent guardian item.
Data come from interviews (youth item P8 and parent item J4) and mail questionnaires (youth A7a and parent Q20b).
Figure 2-A.
Note
: See Figure 2, section C in Module 11 Supporting Materials.Slide32
31
Parent/youth Part 2 documentation
This example is a numeric variable that has a yes or no value.
Notes: As we have seen in the previous slide, data come from multiple sources.
Youth interview and youth mail questionnaire, parent/guardian interview, abbreviated interview, and mail questionnaires.
Coding of combined item is described.
Figure 2-B.
Note
: See Figure 2, section C in Module 11 Supporting Materials.Slide33
32
Parent/youth Part 2 documentation
All youth respondents were asked this question and all Parent Part 2 respondents were asked.
There was no youth interview in Wave 1, but otherwise there are corresponding variable names for each wave for youth, parent/guardian, and combined.
Figure 2-C.
Note
: See Figure 2, section C in Module 11 Supporting Materials.Slide34
33
Transcript data documentation distinctions
Transcript data are in multiple files.
Each file is documented in a separate section in the transcript data dictionary.
Files are either from source data or are summarized data
from course-level
transcript data.
Files can have a single record or multiple records per student depending on the type of transcript data.Slide35
34
Transcript data documentation
Source data files
Overall: One record per student with any transcript data.
By year: Multiple records per student with one record for every school year recorded in transcripts.
Course level: Multiple records per student with one record for every course within a grading period.
Summary data files
Overall summary: One record per student with complete transcript data summarizing course taking across all grades attended.
By grade summary: Multiple records per student; one record for every grade attended summarizing course taking within a grade.Slide36
35
Transcript data documentation: Quick lookSlide37
36
Supplemental documentation
Transcript dictionary
List of course codes and course categories
Key to composite variable names in summarized data
Parent/youth survey dictionaries
Types of medications
Job codes
Assessment dictionaries
Direct and alternate assessment references
Cross-instrument data dictionary
Decision rules for cross-instrument dataSlide38
37
Documentation summary
The data documentation contains a wealth of information organized in a variety of ways.
It is good practice to refer to the data dictionary before proceeding with analysis.
Finding a question in a data collection instrument does not provide enough information about that item.
The data dictionary describes each item, including information about skip logic and modifications made to variable values.Slide39
38
Closing
Topics discussed in this module
Purpose
Data dictionary contents
File specifications
Variable prefix
Missing values
Variable documentation
Variable documentation details
Parent/youth Part 2 documentation distinctions
Transcript data documentation distinctions
Supplemental documentation
Next module:
12: NLTS2 Documentation: Quick ReferencesSlide40
39
Important information
NLTS2 website contains reports, data tables, and other project-related information
http://nlts2.org/
Information about obtaining the NLTS2 database and documentation can be found on the NCES website
http://nces.ed.gov/statprog/rudman/
General information about restricted data licenses can be found on the NCES website
http://nces.ed.gov/statprog/instruct.asp
E-mail address: nlts2@sri.com