Autonomous Clustering
Clarifyre's Autonomous Clustering Algorithms identify and group related data on a web page based on structure, proximity, and content.
Once such data has been identified and classified, they may be captured autonomously and used in other applications.
The following are some examples of this technology. There are no site-specific scripts. Please click on the links below to see input course catalogs from three different website (html pages) and their corresponding outputs (xml).
How does the technology work? The following figure shows an example of clusters within a page. For the sake of clarity, only two clusters are marked. The system (a) identifies the (hierarchical) clusters, (b) attempts to determine the meaning of each cluster in isolation, and (c) meaning of groups of clusters. Accurately implementing (c) requires domain-specific models, and we only have partial solutions for it.