Peter Nugent LBNLUCB Level 3 Manager 1 Outline Overview of Deliverables Input Data amp Processing Description of NERSC Targeting Database Scalability Traceability Development Plan Schedule amp Milestones ID: 786887
Download The PPT/PDF document "Target Selection Pipeline" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Target Selection Pipeline
Peter Nugent (LBNL/UCB)Level 3 Manager
1
Slide2Outline
Overview of
Deliverables, Input Data & Processing
Description of NERSC
Targeting DatabaseScalabilityTraceabilityDevelopment Plan, Schedule & Milestones
2
Slide3Deliverables
The goal of the target selection pipeline is to deliver a set of catalogs, constructed from multi-filter observations from the optical through IR over the DESI footprint, using the code
Tractor
to produce uniform photometry. Further we will provide an interface to these catalogs such that targets (
QSO’s, ELG’s, LRGS, standard stars and sky) can be selected easily and efficiently by the greater collaboration.We will be making the code for processing the data and constructing the catalogs, as well as the catalogs themselves, available to the public.We will NOT be serving up the data (pretty cut-out images, fits files, etc.)
à
la SDSS to the general public. This is way beyond the scope of the Project.
3
Slide4Inputs to us…
The Science Collaboration will provide us with:
The optical and infrared photometry
Data analysis software such as Tractor,
Astromatic toolkits, first-pass image processing software, etc. Selection algorithms for ELGs, LRGs, QSOs, standard stars and blank sky.
4
Slide5Input Data: DECam
, Bok & WISE
5
DECam
(g,r,z) survey 9k sq. deg. (6.7k approved)
1 TB of raw data per band per 1000 sq. deg.
First-pass pipeline will come from current NOAO processing work (F. Valdes @ NOAO)
Bok (
g,r,z
) 5k sq. deg. of
data
0.4TB of raw data per band per 1000 sq. deg
.
First-pass pipeline will come from current SCUSS data reduction pipeline (Cheng Li @ SHAO
)
WISE is in-hand, currently spinning at NERSC in the
cosmo
data repository – 60TB
. Will grow as NEOWISE comes in.
Will re-write the pipelines utilizing existing C++ (MPI +
OpenMP
) Image-Container code written by R. Thomas. Goal is to be able to reprocess *everything* on Edison & Cori on the timescale of days.
Slide6NERSC Description
6
Hopper (N6): Cray XE6
Opteron
w
/ 153,216 cores
Edison (N7): Cray XC30 Intel Ivy Bridge
w
/ 133,824 cores
Cori (N8) will be one of the first large Intel KNL systems and will have unique data capabilities. 9,300 single-socket nodes with 60 cores per node and burst buffer (NVRAM) for the entire memory footprint.
NERSC has a Global
Filesystem
which is viewable from all compute systems, this is where we will purchase our disk space (40GB/s). Very high-speed local scratch space on each of the big-irons (168 GB/
s
)
240 PB tape archive
Data Transfer nodes using
ESnet
Science Gateway and Database nodes for access outside NERSC
Access though general DOE-HEP call for compute time at NERSC.
Also used for spectroscopic pipeline.
Slide7Targeting DB
7
Baseline db is a
Postgres
db with q3c for spatial queriesBased on efforts within PTF for real-time db (which is more demanding)
Runs at NERSC on their
scidb
nodes: 32-core nodes on a ZFS
filesystem
This currently houses the
iPTF
database which has over 3M images and 3B detections which are queried in real-time 24/7. The DESI needs are far below this.
ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of
filesystem
and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.
Slide8q3c
8
Q3C is the plugin for
PostgreSQL
database, designed for working with large astronomical catalogs or any catalogs of objects on the sphere. Q3C allows you to perform fast circular, elliptical or polygonal searches on the sphere as well as perform fast positional cross-matches and nearest neighbor queries.The
ideas behind Q3C are described
in
Koposov
et al. (2006)
Q3C
sky
pixelization
and cone selection
Slide9DESI Processing Pipeline
9
DECam
/Bok/…
ESnet
Astrometric
Solution
Photometric Solution
Image Processing /
Detrending
Standard Stars
Catalog Generation
Deep
Image Stacking
Candidates
Blank Sky
NCSA/
NOAO/UA/…
NERSC Data Transfer Node
Public Catalogs
Public Data Access
Web
UI
Computing – I/O
Heavy DB Access
Networking Data Transfer
~500
GB/night
~3.0B
objects in
PSQL
db
Publish to Web
Science Gateways
Slide10DESI using “Tractor” photometry
10
Successfully combines data from different instruments
with different PSF sizes for uniform photometry
Developed by Dustin Lang (CMU, on SDSS and DESI teams)
“Tractor” inference algorithm optimally fits raw pixel-level data in optical images + WISE images
Currently in use as primary source of targeting for SDSS-IV, combining SDSS & WISE imaging data
Astrometry currently tied to “USNO system”; will be GAIA in the future
Slide11DESI Target selection pipeline
11
11
WISE images
DECam images
DES
pipeline
Object catalog
Object catalog
(optical only)
“Tractor” pipeline
Object catalog
Object catalog
Object fluxes
(optical + WISE)
Blank sky
selection
DESI target catalog
LRG selection
pipeline
QSO selection
pipeline
ELG selection
pipeline
DECam images + DES pipeline to replace SDSS
Add targeting module for ELGs
Standard star
selection
Slide12Scalability
12
Edison is NERSC's newest supercomputer, a Cray XC30, with a peak performance of 2.57
petaflops
/sec, 133,824 compute cores, 357 terabytes of memory, and 7.56 petabytes of disk. NERSC currently provides > 3B cpu hours/yr for DOE, HEP has ~17% of this.
Cori
and Hopper++ will increase this
compute power
in
2016.
We will obtain an allocation sufficient to re-process all the data several times
Note that DES, BOSS & LSST have allocations ~10M
cpu
hrs/ yr
each. A comparable allocation is sufficient for our needs, and has always been granted for projects of this size in DOE-HEP.
Slide13Traceability
13
Postgres
db is a natural environment for keeping track of which codes (and versions) yielded which targets.
Codes will be stored in a version repository and well documented from the get-go.Past efforts (in PTF) have allowed us to re-use and adapt pipelines for
DECam
as well as easily make the transition to new computing architectures.
Slide14Schedule & Milestones
14
Aug 2014 Preliminary database schema
documented Sep 2015 Preliminary
parallel image
analysis code
working
Aug
2016
Implement pipelines to scale
May 2017 Create final db schema
May
2017
Integrate codes for target selection
Oct 2017 Verify and Validate codes Nov 2017 LBNL Postdoc as interface between scientists and CSE’s
2015 2016 2017 2018
Slide15Management
15
Basically
a postdoc and a CSE over the course of the project and myself. A large effort will be in interfacing with the Collaboration
wrt input software & data-taking and output processing & db’s.
Slide16Conclusions
16
Don’t worry that this is larger and deeper than SDSS, from multiple telescopes, and needs to get done in ~ 3 years.
We have both the manpower and the existing expertise (both with surveys like SDSS -> BOSS and data processing at NERSC with PTF) to make sure this
happens, and several fallback plans in place.