/
Data collection methods to produce new enterprise variables using new data sources Data collection methods to produce new enterprise variables using new data sources

Data collection methods to produce new enterprise variables using new data sources - PowerPoint Presentation

williams
williams . @williams
Follow
71 views
Uploaded On 2023-10-04

Data collection methods to produce new enterprise variables using new data sources - PPT Presentation

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS UNECE Expert meeting on Statistical Data Collection 12 14 June 2023 Scalfati F Bianchi G Salamone ID: 1022081

collection data variables produce data collection produce variables june methods enterprise sources12 enterprises patent statistical information epo 2023data innovative

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data collection methods to produce new e..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Data collection methods to produce new enterprise variables using new data sourcesUNITED NATIONS ECONOMIC COMMISSION FOR EUROPECONFERENCE OF EUROPEAN STATISTICIANSUNECE Expert meeting on Statistical Data Collection (12 -14 June 2023)Scalfati F., Bianchi G., Salamone S. ISTAT (Italy)https://statswiki.unece.org/x/MADUE

2. ObjectiveData Collection strategyData Collection ProcessExperimental resultConclusionData collection methods to produce new enterprise variables using new data sources12 -14 June 2023Outline

3. Data collection methods to produce new enterprise variables using new data sources12 -14 June 2023ObjectiveThe aim of this work is to produce a statistical framework able to extract detailed information on the innovative capacity of enterprises and produce new statistical variables, by means a data analytics approach.

4. Data collection methods to produce new enterprise variables using new data sources12 -14 June 2023Data Collection strategy (1/2)This work combines multiple sources (big data, survey data and registers) in order to produce indicators that provide the profile of the enterprises. In particular, the identification of the patenting enterprises allows linking them to the structural characteristics and provides additional dimensions available for this goal.The data source used, is the most complete and updated database on patents published by the European Patent Office (EPO) which it acquires data from the EPO's master bibliographic database. The target data in EPO are the applicants based in Italy published patent/s.

5. Data collection methods to produce new enterprise variables using new data sources12 -14 June 2023Data Collection strategy (2/2)The planned statistical output has as reference population the active enterprises available from the Italian National Business Register (ASIA). The proposed approach for collecting statistical information on the innovative capacity of enterprises acquires European patent publications in text format using APIs and web scraping techniques. It integrates the extracted information with statistical registers and surveys and produces new statistical output by using text mining and machine-learning techniques.

6. Data collection methods to produce new enterprise variables using new data sources12 -14 June 2023Data Collection Process

7. Data collection methods to produce new enterprise variables using new data sources12 -14 June 2023Data characteristicsName of the applicant, owner and inventorLocalization information on the residence of the three subjectsType of patentDate of publication of the patentsPatent filing dateIPC code (International Patent Classification)The procedure collects the following macro variables:All data collected refers to the geographic origin of the applicant/owner (country of residence).

8. Data collection methods to produce new enterprise variables using new data sources12 -14 June 2023Data integration Integration step is based on record linkage procedure to match micro-data on patent application from the EPO server with the data available from the Italian Official Business Register (ASIA).Availability of data on an annual basis is preliminary to allow the subsequent integration phase.For the match between the two sources it is necessary to know the year of publication of the patent to identify whether the company was active in the reference year.Data collection procedure must to extract complete information, without duplicates in order to allow unambiguously identification.

9. Data collection methods to produce new enterprise variables using new data sources12 -14 June 2023Experimental resultsIn the case study 8000 URLs have been extracted from EPO DB. Each record is composed of about 40 variables: proponent (applicant, owner, inventor), personal data, type of patent, patent features, references, claimsData refers to Italian patentsThe procedure acquired the related patents from the European Publication server. Some output indicators: rate of proponent, rate of patents, territorial distribution, thematic distribution

10. Data collection methods to produce new enterprise variables using new data sources12 -14 June 2023ConclusionsThe innovative capacity of enterprises and institutions, can be filled with indicators that provide the profile of the enterprises. The patenting enterprises allow to produce new information by linking with structural and economic characteristics.This automatic approach reduce the burden on enterprises.Patent statistics are effective proxies for measuring and monitoring innovative activities spread across a territory.It’s a difficult task because extracts text from website and uses text mining and machine learning techniques to produce new statistical variables in reasonable time.

11. Data collection methods to produce new enterprise variables using new data sources12 -14 June 2023 Contacts:Scalfati Francesco (scalfati@istat.it) Bianchi Gianpiero (gianbia@istat.it) Sergio Salamone (sesalamo@istat.it)