/
Target Selection Pipeline Target Selection Pipeline

Target Selection Pipeline - PowerPoint Presentation

projoutr
projoutr . @projoutr
Follow
344 views
Uploaded On 2020-06-25

Target Selection Pipeline - PPT Presentation

Peter Nugent LBNLUCB Level 3 Manager 1 Outline Overview of Deliverables Input Data amp Processing Description of NERSC Targeting Database Scalability Traceability Development Plan Schedule amp Milestones ID: 786887

pipeline data selection nersc data pipeline nersc selection amp processing sdss images catalogs desi catalog q3c wise decam target

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Target Selection Pipeline" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Target Selection Pipeline

Peter Nugent (LBNL/UCB)Level 3 Manager

1

Slide2

Outline

Overview of

Deliverables, Input Data & Processing

Description of NERSC

Targeting DatabaseScalabilityTraceabilityDevelopment Plan, Schedule & Milestones

2

Slide3

Deliverables

The goal of the target selection pipeline is to deliver a set of catalogs, constructed from multi-filter observations from the optical through IR over the DESI footprint, using the code

Tractor

to produce uniform photometry. Further we will provide an interface to these catalogs such that targets (

QSO’s, ELG’s, LRGS, standard stars and sky) can be selected easily and efficiently by the greater collaboration.We will be making the code for processing the data and constructing the catalogs, as well as the catalogs themselves, available to the public.We will NOT be serving up the data (pretty cut-out images, fits files, etc.)

à

la SDSS to the general public. This is way beyond the scope of the Project.

3

Slide4

Inputs to us…

The Science Collaboration will provide us with:

The optical and infrared photometry

Data analysis software such as Tractor,

Astromatic toolkits, first-pass image processing software, etc. Selection algorithms for ELGs, LRGs, QSOs, standard stars and blank sky.

4

Slide5

Input Data: DECam

, Bok & WISE

5

DECam

(g,r,z) survey 9k sq. deg. (6.7k approved)

1 TB of raw data per band per 1000 sq. deg.

First-pass pipeline will come from current NOAO processing work (F. Valdes @ NOAO)

Bok (

g,r,z

) 5k sq. deg. of

data

0.4TB of raw data per band per 1000 sq. deg

.

First-pass pipeline will come from current SCUSS data reduction pipeline (Cheng Li @ SHAO

)

WISE is in-hand, currently spinning at NERSC in the

cosmo

data repository – 60TB

. Will grow as NEOWISE comes in.

Will re-write the pipelines utilizing existing C++ (MPI +

OpenMP

) Image-Container code written by R. Thomas. Goal is to be able to reprocess *everything* on Edison & Cori on the timescale of days.

Slide6

NERSC Description

6

Hopper (N6): Cray XE6

Opteron

w

/ 153,216 cores

Edison (N7): Cray XC30 Intel Ivy Bridge

w

/ 133,824 cores

Cori (N8) will be one of the first large Intel KNL systems and will have unique data capabilities. 9,300 single-socket nodes with 60 cores per node and burst buffer (NVRAM) for the entire memory footprint.

NERSC has a Global

Filesystem

which is viewable from all compute systems, this is where we will purchase our disk space (40GB/s). Very high-speed local scratch space on each of the big-irons (168 GB/

s

)

240 PB tape archive

Data Transfer nodes using

ESnet

Science Gateway and Database nodes for access outside NERSC

Access though general DOE-HEP call for compute time at NERSC.

Also used for spectroscopic pipeline.

Slide7

Targeting DB

7

Baseline db is a

Postgres

db with q3c for spatial queriesBased on efforts within PTF for real-time db (which is more demanding)

Runs at NERSC on their

scidb

nodes: 32-core nodes on a ZFS

filesystem

This currently houses the

iPTF

database which has over 3M images and 3B detections which are queried in real-time 24/7. The DESI needs are far below this.

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of

filesystem

and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

Slide8

q3c

8

Q3C is the plugin for

PostgreSQL

database, designed for working with large astronomical catalogs or any catalogs of objects on the sphere. Q3C allows you to perform fast circular, elliptical or polygonal searches on the sphere as well as perform fast positional cross-matches and nearest neighbor queries.The

ideas behind Q3C are described

in

Koposov

et al. (2006)

Q3C

sky

pixelization

and cone selection

Slide9

DESI Processing Pipeline

9

DECam

/Bok/…

ESnet

Astrometric

Solution

Photometric Solution

Image Processing /

Detrending

Standard Stars

Catalog Generation

Deep

Image Stacking

Candidates

Blank Sky

NCSA/

NOAO/UA/…

NERSC Data Transfer Node

Public Catalogs

Public Data Access

Web

UI

Computing – I/O

Heavy DB Access

Networking Data Transfer

~500

GB/night

~3.0B

objects in

PSQL

db

Publish to Web

Science Gateways

Slide10

DESI using “Tractor” photometry

10

Successfully combines data from different instruments

with different PSF sizes for uniform photometry

Developed by Dustin Lang (CMU, on SDSS and DESI teams)

“Tractor” inference algorithm optimally fits raw pixel-level data in optical images + WISE images

Currently in use as primary source of targeting for SDSS-IV, combining SDSS & WISE imaging data

Astrometry currently tied to “USNO system”; will be GAIA in the future

Slide11

DESI Target selection pipeline

11

11

WISE images

DECam images

DES

pipeline

Object catalog

Object catalog

(optical only)

“Tractor” pipeline

Object catalog

Object catalog

Object fluxes

(optical + WISE)

Blank sky

selection

DESI target catalog

LRG selection

pipeline

QSO selection

pipeline

ELG selection

pipeline

DECam images + DES pipeline to replace SDSS

Add targeting module for ELGs

Standard star

selection

Slide12

Scalability

12

Edison is NERSC's newest supercomputer, a Cray XC30, with a peak performance of 2.57

petaflops

/sec, 133,824 compute cores, 357 terabytes of memory, and 7.56 petabytes of disk. NERSC currently provides > 3B cpu hours/yr for DOE, HEP has ~17% of this.

Cori

and Hopper++ will increase this

compute power

in

2016.

We will obtain an allocation sufficient to re-process all the data several times

Note that DES, BOSS & LSST have allocations ~10M

cpu

hrs/ yr

each. A comparable allocation is sufficient for our needs, and has always been granted for projects of this size in DOE-HEP.

Slide13

Traceability

13

Postgres

db is a natural environment for keeping track of which codes (and versions) yielded which targets.

Codes will be stored in a version repository and well documented from the get-go.Past efforts (in PTF) have allowed us to re-use and adapt pipelines for

DECam

as well as easily make the transition to new computing architectures.

Slide14

Schedule & Milestones

14

Aug 2014  Preliminary database schema

documented Sep 2015  Preliminary

parallel image

analysis code

working

Aug

2016

 Implement pipelines to scale

May 2017 Create final db schema

May

2017

 Integrate codes for target selection

Oct 2017 Verify and Validate codes Nov 2017 LBNL Postdoc as interface between scientists and CSE’s

2015 2016 2017 2018

Slide15

Management

15

Basically

a postdoc and a CSE over the course of the project and myself. A large effort will be in interfacing with the Collaboration

wrt input software & data-taking and output processing & db’s.

Slide16

Conclusions

16

Don’t worry that this is larger and deeper than SDSS, from multiple telescopes, and needs to get done in ~ 3 years.

We have both the manpower and the existing expertise (both with surveys like SDSS -> BOSS and data processing at NERSC with PTF) to make sure this

happens, and several fallback plans in place.