Adsurgo presented Mind the Gap: JMP on the Text Explorer Express and co-presented a tutorial with JMP’s Chris Gotwalt titled The U to the V: A Hitchhiker’s Guide to JMP 13 Text Explorer
Mind the Gap: There is an enormous gap between the massive resources that companies devote to collecting, storing and organizing unstructured data and their ability to actually discover meaningful information that affects business decisions. Text Explorer in JMP 13 will fundamentally change how users view the power of analytics. This new platform uses familiar multivariate methods to close the gap in unstructured data exploration and discovery. This session will use multiple case studies to demonstrate not only the remarkable capabilities of Text Explorer, but also its extraordinary ease of use. First, presenters will show the simplicity of string processing in Text Explorer, specifically focusing on common obstacles: stopwords, synonyms and parsing terms, along with multi-word phrases. Next, the presenters will illustrate the analytical and graphical capabilities of Text Explorer to quickly uncover previously unknown information in unstructured data: term frequencies, word clouds and topic extraction. Lastly, they will show how Text Explorer can capitalize on the powerful predictive analytics capabilities already in JMP to explore relationships and build better models by using both your unstructured and structured data.
Hitchhiker’s Guide: JMP data explorers now have a new and powerful tool for their backpack: the JMP 13 Text Explorer! It has been suggested the process of transforming text into interpretable and actionable structured data is simple to explain; just tell them “a miracle occurs.” This presentation will start with an end-to-end JMP 13 Text Explorer demonstration of actual consumer goods survey data followed by a review of the technical material, unlocking the mystery of the “miracle.” We will show the construction and applications of the sparse document term matrix (DTM). The singular value decomposition (SVD) of the DTM forms two important reduced rank matrices: the V matrix associated with words (terms) and the U matrix describing the document space. Topics and themes are found by evaluating the factor loadings of the V matrix along with cluster analysis. Because the V matrix is linked to the U matrix, documents containing specific themes are easily found by sorting the corresponding column of the U matrix. We will show how the SVD method allows the columns of the U matrix to be used as structured data, just like any other variable in predictive analytics methods. We will also demonstrate Latent Class Analysis (LCA), a clustering technique that has been customized for applications within Text Explorer that is useful for identifying groups of documents that are similar to one another.