“Big Data” is a hot topic in the business world these days. But there’s a subset of this broad field that has yet to take a turn in the spotlight. It’s called “text mining,” and you’re probably going to be hearing a lot more about it over the coming months and years. Basically, text mining is the process of combing through countless pages of plain-language digitized text to find useful information that’s been hiding in plain sight. First developed—as a labor-intensive manual discipline—in the 1980s, text mining has become ever more efficient as computing power has increased. Relevant today to any number of different businesses, the practice nonetheless brings with it as much potential for conflict as opportunity. Which is why we’re going to be hearing more about it.
(MORE: How Many iPads Can Apple Sell?)
Different than key word searches and other algorithmic forms of web data analysis—like that study, a few months back, of happiness levels based on word usage in tweets—text mining is more about finding unseen connections and patterns in plain-language narratives. The texts that are mined could be newspaper or website articles, research papers, blog entries, patent applications; all is fair game. A recent story in Nature, for example, details efforts to mine scientific research papers in the hopes of making undiscovered but useful connections—in trying to understand, say, the relationship between a particular drug compound and a specific enzyme.
Academic journals, not surprisingly, are a robust laboratory for text miners, but the Nature piece highlights some of the problems with the field, in particular the reluctance of publishers to let researchers run wild through volumes of journals and books, most especially those behind pay walls. This is true even when researchers have paid for access to journals, since those subscription fees and site licenses were priced to account for regular humans downloading and reading articles—not sophisticated “crawler” programs plowing through thousands or millions of sentences per minute, looking for clues to … well, just about anything.
Academic types are at the forefront of this effort, and at least one country is already trying to help its eggheads with their text mining needs. England’s recently established National Centre for Text Mining is the first publicly funded text mining clearinghouse in the world, with the stated aim of furthering academic research. In the private sector, meanwhile, drug companies—always on the prowl for ways to cut their astronomical R&D costs—are early players in this game.
But it’s not hard to see how almost any business could eventually reap rewards from the ability to comb through the writings of millions of people to identify coming desires and/or needs in entertainment, food, travel, retail—pretty much anything, in both the consumer and B2B space. Trend-spotting, after all, is big business. Such opportunities always come at a cost, even if no one is quite sure yet what that cost should be. But it’s hard to imagine any situation in which the Facebooks or Tumblrs of the world aren’t going to figure out ways to share in the profits of any data miners who want to go digging in their texts.