Unsupervised vs. Supervised Classification Methods
Unsupervised vs.
Supervised learning methodS
Mwalimu Phiri
Unsupervised and Supervised classification
are both methods of machine learning but the major differences include how they’re
applied, which type of data could be used, and the parameters or conditions
involved. Unsupervised learning can use unstructured data without any training
samples and create groupings or clusters. Supervised learning requires training
samples with a defined classification. Though supervised learning allows us to
define the classification, it offers room for human error. On the other hand,
unsupervised learning requires domain knowledge to ensure valuable information
is obtained from the groupings.
Different Challenges:
The challenge with unsupervised classification
could be allowing machines to automatically choose categories of the data. This
could be similar to a “black-box” defining model clusters, in which the categories
may not associate to the objective as expected. This is where domain and industry
knowledge is critical in validating the unsupervised classifications. Cluster
numbers could be set in some models such as the popular k mean algorithm.
The challenge with supervised learning
is the dependence on sample data inputs to train the models for an acceptable
level of classification. If our sample data is not aligned on proper definitions
and parameters with the data we plan on analyzing, then the model could
classify targets erroneously. Sample data has to include or not include the target
variable in order to classify depending on the objective; classifying outliers
or group associations.
Examples:
An example of supervised and
unsupervised classification is displayed by Chris regarding Image processing (Banman, 2002) . For the supervised
learning approach, Banman uses a defined raster image with training sites (water,
grassland, e.t.c…) to outline spectral signatures. The model is trained on the
image variables by the user. The model is tested to then uses statistical
measures to support its classifications based on user parameters.
Another example is shown with
unsupervised classification on the same image but this time there are no
training involved (no predefined variables or groupings). Through ISOCLUST
classification method, the user is able to obtain similar but even better
results of digital mappings. The unsupervised classification was able to group
the little nuggets of different land areas to which a human could not
differentiate.
Thoughts:
Unsupervised learning came out on
top with this example but that’s not always the case. Validating the
classification models seems to be easier part of the whole process meanwhile
pre-processing is critical for both classifications. If the data is not defined
or scaled properly during pre-processing, it could negatively affect training
the model. Validating could be askew if the models are trained on skewed input data.
Validating the unsupervised classification model could be to the extent of developer
expertise in the fields being classified. Though we consider it unsupervised,
there’s always some sort iterative supervision being implemented during the
grouping or validation process. Though these forms of classifications are different,
they can be used in parallel or subsequently with each other. Unsupervised learning
can be used to group large data sets into clusters and supervised learning
could be implemented afterwards with a defined response variable and parameters.
The response variable could be based on the different cluster relationships, where
we could calculate the likelihood of obtaining a target land class from the
examples.
Banman, C. (2002). Supervised and Unsupervised
Land Use Classification. Retrieved from Emporia State University:
http://academic.emporia.edu/aberjame/student/banman5/perry3.html#intro
Comments
Post a Comment