Details der Publikation - A unified view of density-based methods for semi-supervised clustering and classification

A unified view of density-based methods for semi-supervised clustering and classification

© The Author(s) 2019, corrected publication 2020..

Semi-supervised learning is drawing increasing attention in the era of big data, as the gap between the abundance of cheap, automatically collected unlabeled data and the scarcity of labeled data that are laborious and expensive to obtain is dramatically increasing. In this paper, we first introduce a unified view of density-based clustering algorithms. We then build upon this view and bridge the areas of semi-supervised clustering and classification under a common umbrella of density-based techniques. We show that there are close relations between density-based clustering algorithms and the graph-based approach for transductive classification. These relations are then used as a basis for a new framework for semi-supervised classification based on building-blocks from density-based clustering. This framework is not only efficient and effective, but it is also statistically sound. In addition, we generalize the core algorithm in our framework, HDBSCAN*, so that it can also perform semi-supervised clustering by directly taking advantage of any fraction of labeled data that may be available. Experimental results on a large collection of datasets show the advantages of the proposed approach both for semi-supervised classification as well as for semi-supervised clustering.

Medienart:	E-Artikel

Erscheinungsjahr:	2019
Erschienen:	2019

Enthalten in:	Zur Gesamtaufnahme - volume:33
Enthalten in:	Data mining and knowledge discovery - 33(2019), 6 vom: 15., Seite 1894-1952

Sprache:	Englisch

Beteiligte Personen:	Castro Gertrudes, Jadson [VerfasserIn] Zimek, Arthur [VerfasserIn] Sander, Jörg [VerfasserIn] Campello, Ricardo J G B [VerfasserIn]

Links:	Volltext

Themen:	Density-based clustering Journal Article Semi-supervised classification Semi-supervised clustering

Anmerkungen:	Date Revised 28.09.2020 published: Print-Electronic Citation Status PubMed-not-MEDLINE

doi:	10.1007/s10618-019-00651-1

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM314052313

Internformat


LEADER	01000naa a22002652 4500
001	NLM314052313
003	DE-627
005	20231225151911.0
007	cr uuu---uuuuu
008	231225s2019 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1007/s10618-019-00651-1 \|2 doi
028	5	2	\|a pubmed24n1046.xml
035			\|a (DE-627)NLM314052313
035			\|a (NLM)32831623
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Castro Gertrudes, Jadson \|e verfasserin \|4 aut
245	1	2	\|a A unified view of density-based methods for semi-supervised clustering and classification
264		1	\|c 2019
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 28.09.2020
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a © The Author(s) 2019, corrected publication 2020.
520			\|a Semi-supervised learning is drawing increasing attention in the era of big data, as the gap between the abundance of cheap, automatically collected unlabeled data and the scarcity of labeled data that are laborious and expensive to obtain is dramatically increasing. In this paper, we first introduce a unified view of density-based clustering algorithms. We then build upon this view and bridge the areas of semi-supervised clustering and classification under a common umbrella of density-based techniques. We show that there are close relations between density-based clustering algorithms and the graph-based approach for transductive classification. These relations are then used as a basis for a new framework for semi-supervised classification based on building-blocks from density-based clustering. This framework is not only efficient and effective, but it is also statistically sound. In addition, we generalize the core algorithm in our framework, HDBSCAN*, so that it can also perform semi-supervised clustering by directly taking advantage of any fraction of labeled data that may be available. Experimental results on a large collection of datasets show the advantages of the proposed approach both for semi-supervised classification as well as for semi-supervised clustering
650		4	\|a Journal Article
650		4	\|a Density-based clustering
650		4	\|a Semi-supervised classification
650		4	\|a Semi-supervised clustering
700	1		\|a Zimek, Arthur \|e verfasserin \|4 aut
700	1		\|a Sander, Jörg \|e verfasserin \|4 aut
700	1		\|a Campello, Ricardo J G B \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Data mining and knowledge discovery \|d 2003 \|g 33(2019), 6 vom: 15., Seite 1894-1952 \|w (DE-627)NLM191691062 \|x 1384-5810 \|7 nnns
773	1	8	\|g volume:33 \|g year:2019 \|g number:6 \|g day:15 \|g pages:1894-1952
856	4	0	\|u http://dx.doi.org/10.1007/s10618-019-00651-1 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 33 \|j 2019 \|e 6 \|b 15 \|h 1894-1952

A unified view of density-based methods for semi-supervised clustering and classification

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände