Clustering is a method used to analyse the textual attributes of a set of texts and to create groups based on their similarities. It helps researchers understand which documents are more alike, or different, based on their features or attributes, rather than their content.
Clustering can be used to determine whether there are subsets or groups within a Content Set that aren’t easily uncovered by metadata or other filtering, or perhaps by other formal textual characteristics. For instance, a Content Set composed of articles might have different types or lengths of articles which could appear as distinct clusters despite their shared metadata values.
To learn more about Document Clustering and how to use it to analyse texts in Gale Digital Scholar Lab, click here.