Adsurgo

Business Analytics, Strategic Planning

Menu
  • Home
  • Who We Are
    • About Us
    • What We Do
      • Enterprise Consulting
      • Training, Project Consulting, and Products
      • Professional Staffing Solutions
      • Close
    • Leadership Team
    • News & Media
      • Webinars
      • Video Library
      • Close
    • Close
  • Expertise & Impact
    • Commercial
      • Computer/IT
      • Energy
      • Financial Services and Insurance
    • Biopharma and Healthcare
      • Biotechnology and Biopharmaceuticals
      • Hospitals
      • Medical Devices
    • Government
      • Department of Defense
      • Department of Homeland Security
      • Foreign
      • Other Federal Government
    • Close
  • Capabilities
    • Applied Statistics
    • Data Visualization
    • Design of Experiments
    • Operations Research
    • Measurement Systems Analysis
    • Quality by Design
    • Reliability and Survivability
    • Strategic Planning
    • Text Mining
    • Time Series and Forecasting
    • Process Validation
    • Close
  • Products
  • Clients
  • Careers
  • Contact

Text Mining Training

Text Mining Course Outline

Text mining introduction and historical perspectivesadsurgo word cloud

Data Capture – JMP Data Table
– Text files
– Excel files
– Folder of text files
– Folder of pdf and ppt
– Emails
– Web crawling
– Twitter NTSB (JMP data table)

Basic text mining – Use of word frequencies

String Processing – Bag of words
– Isolate individual words
– Remove punctuation
– Normalize case
– Remove numbers Cars slid into curb (simple)

Natural Language Processing – Zipf’s Law
– Stopwords
– Custom stopwords
– Collocation, synonyms (find/replace)
– Stem text
– Filter by character length
– Filter by number of words that do not appear in more than X documents

Document Term Matrix (DTM)
– Representing text with numbers, DTM
– Properties of the DTM
– Transformations of the DTM
— Binary
— Ternary
— Term Frequency
— Log
— tf-idf

Statistical Approaches – Latent Semantic Analysis (LSA)
– Singular Value Decomposition (SVD)
– Bivariate plot of SVDs
– Synonyms

Topic Analysis/Concept Extraction
– Varimax rotation of document space
– Varimax rotation of the term space NSF

Unsupervised Learning – Clustering Methods
– Ward’s method
– k-means
– Cluster words (V matrix)
– Concept linkage with cluster distances
– Cluster documents (U matrix)

Sentiment analysis
– Positive/Negative Words
– Custom sentiment analysis

Supervised Learning – classification trees with structured data
– Logistic Regression on structured data
– Classification tree on words
– Graphical methods and cross-tabulation

More Advanced
– Probabilistic Topic Modeling
— Latent Dirichlet Allocation (Variational Expectation Maximization and Gibbs)
— Conditional Topic Modeling
– Custom startlist NSF

(720) 536-0851

News

Adsurgo Presentations Recognized as the Top 2 at JMP Discovery!

Data Visualization by Adsurgo at American Society for Quality/ASA

Adsurgo develops tool for FMC Technologies to analyze text data

Read More News

Video Library

Watch Statistical Methods for Establishing Equivalence Webinar!

Watch Intro to JMP Scripting Webinar!

Tips and Tricks in JMP Webinar

Adsurgo presentation on Acceptance Sampling and Quality Control

Statistical Analysis of Biosimilars

Sign Up for Free Webinars

Introduction to Text Mining Webinar

Date: August 26, 2020 Time: 10:00-11:00 PM

Data Storytelling and Visualization Best Practices

    Date: August 24, 2020 Time: 1:00-2:00 PM

Adsurgo-Missing-Puzzle-Piece-sq
Adsurgo…your missing piece
  • Home
  • Capabilities
  • Contact

Adsurgo LLC
3700 Quebec St. Unit 100, Suite 258 Denver, CO 80207-1639
Ph: (720) 536-0851 Fax: (720) 536-0852
· Copyright © 2010–2021 ·