It is estimated that approximately 80% of data in organizations is unstructured, such as text. We will provide an overview of some methods that are easily implemented to find previously unknown relationships from a collection of unstructured data. Text from e-mail, survey comments, incident reports, free form data fields, websites, research reports, blogs, and social media can quickly be analyzed to discover themes, sentiments, word relationships, and natural groups of documents. We will show how to combine unstructured text data with the usual row by column structure to boost predictive modeling capabilities. Demonstrations will use various software package to include an open source solution in R.
- Understand where text mining can be applied.
- Know what text mining and natural language processing are and how they differ from data mining and predictive analytics.
- Appreciate the application of data mining techniques such as decision trees, cluster analysis, and logistic regression to translate intermediate text mining data to decision quality results.
- Understand the role of enabling technologies in the evolution of text mining methodologies.
- Understand how to extract topics and themes using Latent Dirichlet Allocation
- Appreciate the role of Latent Semantic Analysis using Singular Value Decomposition.
- Understand how to use a standard text mining software product.
Introduction to Text Mining
- What is it?
- How does it compare to Data Mining and Text Analytics?
- Earliest forms of Text Mining
- Enabling Technologies
- Areas of Text Analytics
- Start-to-finish example for service quality (demonstration only)
Review of Data Mining Techniques
- Data Exploration
- Logistic Regression
- Neural networks
- Cluster Analysis
An Application of Text Mining (start-to-finish, with specifics and student participation)
- String Process
- Natural Language Processing
- Statistical Approaches
- Unsupervised learning methods
Applications of Text Mining
- SAS Text Miner
- R with JMP