FROM PROTOTYPING TO PRODUCTION AT REWE REWE Systems GmbH March 2019 Benjamin Greve AGENDA March 2019 Scaling Feature Generation 2 1 Introduction 2 Example Project Predicting Brand M arket ID: 826365
Download Pdf The PPT/PDF document "SCALING FEATURE GENERATION" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
SCALING FEATURE GENERATIONFROM PROTOTYP
SCALING FEATURE GENERATIONFROM PROTOTYPING TO PRODUCTIONAT REWEREWE Systems GmbH | March 2019 | Benjamin GreveAGENDAMarch 2019Scaling Feature Generation21 /Introduction2 / Example Project: Predicting Brand Market Fit3 / Feature Generation in Prototyping âThe Problems4 /Moving
From Prototyping to Production âLess
From Prototyping to Production âLessons LearnedINTRODUCTION / MEMarch 2019Scaling Feature Generation3Benjamin GreveData Scientist at REWE Systemsâ¢Mathematicianâ¢Working in data science projects for 5+ years (2 years at REWE)â¢Likes hiking, climbing, dancing, cooking (especially
the eating part)INTRODUCTION/ REWEGRO
the eating part)INTRODUCTION/ REWEGROUPMarch 2019Scaling Feature Generation4Source: https://www.rewe-group-geschaeftsbericht.de, 14.03.2019INTRODUCTION/ REWEGROUPMarch 2019Scaling Feature Generation515,300MARKETS 2017 57.8Billion EuroTOTAL REVENUE 2017345,000EMPLOYEES 20
17REWE Group facts (2017)INTRODUCTION
17REWE Group facts (2017)INTRODUCTION/ REWE SYSTEMSMarch 2019Scaling Feature Generation61 Bil.data setsevery day30locations200,000users30,000cash registers1,200IT specialists7,500marketsINTRODUCTION / REWE SYSTEMSScaling Feature Generation7March 2019EXAMPLE PROJECT: PR
EDICTING BRAND MARKET FITMarch 2019Sca
EDICTING BRAND MARKET FITMarch 2019Scaling Feature Generation8We want to predict how well a brand will be received in a particular market to assist category managers in selecting the right brands for each market.BRAND / CATEGORY:Aset of products that form a logical group. Can contain betwee
n one and a few thousand products.y =
n one and a few thousand products.y = 1.13y = 0.92y = 1.27FEATURES x1, x2, â¦â¢Popularity of wider category in marketâ¢Number of competing articlesâ¢Location of market, incl. demographical informationâ¢â¦Technical definition01more popularless popularTARGET VARIABLE y:
Brand popularityin current marketcompa
Brand popularityin current marketcompared to averageacross allREWEmarketsFEATURE GENERATION IS CENTRAL PART IN THE DATA SCIENCE WORKFLOWMarch 2019Scaling Feature Generation9CRISP-DM: Cross-industry standard process for data miningDataWarehouse?FeatureGenerationPredictivemodelPre
dictionsâEverything you do with the
dictionsâEverything you do with the data before you apply a predictive modelâhttps://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining#/media/File:CRISP-DM_Process_Diagram.pngA TYPICAL RESULT OF PROTOTYPING IS A VERY LONG SQL SCRIPTMarch 2019Scaling Feature Generat
ion10OUTPUT #1: Very long SQL scriptâ
ion10OUTPUT #1: Very long SQL scriptâ¢2000+ lines of SQLâ¢40+ intermediate tablesâ¢interdependenciesâ¢inconsistent naming, chaotic code styleâ¢not optimized for performanceâ¢not robustânot production-ready or scalableOUTPUT#2: Very long R or Python scriptâ¢Different top
ic for another talk.sqlPrototyping per
ic for another talk.sqlPrototyping performed by Data ScientistsA TYPICAL RESULT OF PROTOTYPING IS A VERY LONG SQL SCRIPTMarch 2019Scaling Feature Generation11OUTPUT #1: Very long SQL scriptOUTPUT #1: Very long SQL scriptâ¢2000+ lines of SQLâ¢40+ intermediate tablesâ¢interdependencies
â¢inconsistent naming, chaotic code s
â¢inconsistent naming, chaotic code styleâ¢not optimized for performanceâ¢not robustânot production-ready or scalableAVOID LONG, MONOLITHIC SQL SCRIPTSMarch 2019Scaling Feature Generation12.sqlWeaknesses:â¢Duplicate/redundant SQL for every featureâ¢Parameters hidden within
the scriptsâ¢Adding new features is
the scriptsâ¢Adding new features is hard due to interdependenciesone very longSQL scripttraining datatable.sql.sqltwo very long, redundant SQL scriptstraining datatablescoring datatableStrengths:âHighly modular and scalableâParameters are centrally definedâFeature scr
ipts are mostly independent, making it
ipts are mostly independent, making iteasy to add new featuresUSE A MODULAR FEATURE GENERATION INSTEADMarch 2019Scaling Feature Generation13Input scriptCreates input tablei.e. table of all market-brand-combinations for which to calculate featuresFeaturescript AFeaturescript BFeatu
rescript CMarketBrandFeature A1Feat
rescript CMarketBrandFeature A1Feature A2Feature B1â¦100000012000110000002200011000000320001merge features into feature storeDerived helper tablesâ¢Distinct list of brandsâ¢Mapping brands to articlesâ¢â¦.sql.sql.sql.sql.sqlFeature StoreDECOMPOSING COMPLEX SQL WITHIN
KNIME LEADS TO COMPLEX WORKFLOWSMarch
KNIME LEADS TO COMPLEX WORKFLOWSMarch 2019Scaling Feature Generation14âVisual, KNIME nativeâ¢Gets too complex as SQL logic growsâ¢Hard to translate into standard data warehouse ETLsâCodeâPowerful version control(releases)âAnalysts fluent in SQLâ¢Harder to explain/show
Example workflow from KNIME Examples se
Example workflow from KNIME Examples serverA DEPLOYMENT WORKFLOW PULLS SQL SCRIPTS FROM VERSION CONTROL TO KNIME SERVERMarch 2019Scaling Feature Generation15A fully automated deployment workflow copies snapshot or release versions from the version control system to the KNIME ServerDownload
SQL files from version control system
SQL files from version control systemâ¢To performFeature Generation, execute a set of SQL scripts in a given orderâ¢Just write down the name of the files to be executed in the correct order and run the workflowITâS VERY SIMPLE TO EXECUTE THE DEPLOYED SQL FILESMarch 2019ScalingFeatu
re Generation16Useful metanodes help t
re Generation16Useful metanodes help to implement a staging concept, e.g.:Select staging environment âobtain connection to the corresponding databaseHELPFUL METANODES EXAMPLE: GET DATABASE CONNECTIONMarch 2019Scaling Feature Generation17NEXT STEPSMarch 2019Scaling Feature Generation18
Currently:Next level:Feature dependen
Currently:Next level:Feature dependency graphâ¢Automatic dependencyresolutionâ¢Already built into KNIME!â¢specify features in arbitrary orderâ¢specify scripts manuallyâ¢order manually to take dependencies into accountTHANKSMarch 2019Scaling Feature Generation19Thanks for your