TurboLift: fast accuracy lifting for historical data recovery
Abstract Historical data are frequently involved in situations where the available reports on time series are temporally aggregated at different levels, e.g., the monthly counts of people infected with measles. In real databases, the time periods covered by different reports can have overlaps (i.e., time-ticks covered by more than one reports) or gaps (i.e., time-ticks not covered by any report). However, data analysis and machine learning models require reconstructing the historical events in a finer granularity, e.g., the weekly patient counts, for elaborate analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. Time series disaggregation methods commonly utilize domain knowledge about the data, e.g., smoothness, periodicity, or sparsity, to improve the reconstruction accuracy. In this paper, we propose a novel approach, called TurboLift, which aims to improve the quality of the solutions provided by existing disaggregation methods. Starting from a solution produced by a specific method, TurboLift finds a new solution that reduces the disaggregation error and is close to the initial one. We derive a closed-form solution to the proposed formulation of TurboLift that enables us to obtain an accurate reconstruction analytically, without performing resource and time-consuming iterations. Experiments on real data from different domains showcase the effectiveness of TurboLift in terms of disaggregation error, and outlier and anomaly detection..
Medienart: |
Artikel |
---|
Erscheinungsjahr: |
2020 |
---|---|
Erschienen: |
2020 |
Enthalten in: |
Zur Gesamtaufnahme - volume:29 |
---|---|
Enthalten in: |
The VLDB journal - 29(2020), 5 vom: 09. März, Seite 1129-1148 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Yang, Fan [VerfasserIn] |
---|
Links: |
Volltext [lizenzpflichtig] |
---|
Themen: |
---|
Anmerkungen: |
© Springer-Verlag GmbH Germany, part of Springer Nature 2020. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
---|
doi: |
10.1007/s00778-020-00609-6 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
OLC2118993528 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2118993528 | ||
003 | DE-627 | ||
005 | 20240118094548.0 | ||
007 | tu | ||
008 | 230504s2020 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s00778-020-00609-6 |2 doi | |
035 | |a (DE-627)OLC2118993528 | ||
035 | |a (DE-He213)s00778-020-00609-6-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |q VZ |
100 | 1 | |a Yang, Fan |e verfasserin |4 aut | |
245 | 1 | 0 | |a TurboLift: fast accuracy lifting for historical data recovery |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © Springer-Verlag GmbH Germany, part of Springer Nature 2020. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. | ||
520 | |a Abstract Historical data are frequently involved in situations where the available reports on time series are temporally aggregated at different levels, e.g., the monthly counts of people infected with measles. In real databases, the time periods covered by different reports can have overlaps (i.e., time-ticks covered by more than one reports) or gaps (i.e., time-ticks not covered by any report). However, data analysis and machine learning models require reconstructing the historical events in a finer granularity, e.g., the weekly patient counts, for elaborate analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. Time series disaggregation methods commonly utilize domain knowledge about the data, e.g., smoothness, periodicity, or sparsity, to improve the reconstruction accuracy. In this paper, we propose a novel approach, called TurboLift, which aims to improve the quality of the solutions provided by existing disaggregation methods. Starting from a solution produced by a specific method, TurboLift finds a new solution that reduces the disaggregation error and is close to the initial one. We derive a closed-form solution to the proposed formulation of TurboLift that enables us to obtain an accurate reconstruction analytically, without performing resource and time-consuming iterations. Experiments on real data from different domains showcase the effectiveness of TurboLift in terms of disaggregation error, and outlier and anomaly detection. | ||
650 | 4 | |a Historical data | |
650 | 4 | |a Information fusion | |
650 | 4 | |a Information disaggregation | |
700 | 1 | |a Almutairi, Faisal M. |4 aut | |
700 | 1 | |a Song, Hyun Ah |4 aut | |
700 | 1 | |a Faloutsos, Christos |4 aut | |
700 | 1 | |a Sidiropoulos, Nicholas D. |4 aut | |
700 | 1 | |a Zadorozhny, Vladimir |4 aut | |
773 | 0 | 8 | |i Enthalten in |t The VLDB journal |d Springer Berlin Heidelberg, 1992 |g 29(2020), 5 vom: 09. März, Seite 1129-1148 |w (DE-627)170933059 |w (DE-600)1129061-4 |w (DE-576)032856466 |x 1066-8888 |7 nnns |
773 | 1 | 8 | |g volume:29 |g year:2020 |g number:5 |g day:09 |g month:03 |g pages:1129-1148 |
856 | 4 | 1 | |u https://doi.org/10.1007/s00778-020-00609-6 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a GBV_ILN_30 | ||
912 | |a GBV_ILN_2018 | ||
912 | |a GBV_ILN_4277 | ||
951 | |a AR | ||
952 | |d 29 |j 2020 |e 5 |b 09 |c 03 |h 1129-1148 |