tag:blogger.com,1999:blog-5652924.post3917567966283746326..comments2024-03-02T01:02:21.655-08:00Comments on Applied Data Science and <br>Machine Learning: Text Mining and Regular ExpressionsDean Abbotthttp://www.blogger.com/profile/16818000233889520746noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-5652924.post-1082404628108846752011-06-07T07:25:25.205-07:002011-06-07T07:25:25.205-07:00This comment has been removed by a blog administrator.criticpapahttp://criticpapa.blogspot.com/2011/05/what-mens-suits-online-shop-is-best.htmlnoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-25982627878747237802009-08-23T03:48:28.472-07:002009-08-23T03:48:28.472-07:00Another good book is Tapping Into Unstructured Dat...Another good book is Tapping Into Unstructured Data. It's by Inmon and Anthony Nesavich.<br />http://www.schoolstore.com.au/pr-Tapping-Into-Unstructured-Data-Integrating-Unstructured-Data-and-Textual-Analytics-Into-Business-Intelligence-9780132360296.seoAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-72352286467567847362009-02-12T18:53:00.000-08:002009-02-12T18:53:00.000-08:00I saw on Keith McCormick's site a Text processing...I saw on Keith McCormick's <A HREF="http://www.keithmccormick.com" REL="nofollow">site </A> a <A HREF="http://gnosis.cx/TPiP/" REL="nofollow">Text processing in Python reference</A>. Though I can't vouch for it yet, it is free, but only for personal use. You can also buy the book on amazon, which gives the author, David Mertz, so royalties. If you like what you read, please to support him.Dean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-49782212769451112422009-02-12T18:39:00.000-08:002009-02-12T18:39:00.000-08:00And the problem of variable human-generated conten...And the problem of variable human-generated content doesn't end with HTML. Part of the problem with my current text mining project is parsing out a keyword from a list of nouns. Sometimes a comma means "new idea" (it is a stop character) and sometimes a comma is just a delimiter of nouns (and not a stop character). Determining what role it plays requires some careful thinking to switch on and offDean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-14450835485415848942009-02-12T13:03:00.000-08:002009-02-12T13:03:00.000-08:00Actually the tools that incorporate tag removal te...Actually the tools that incorporate tag removal techniques to extract textual content from static web pages are so available. <A HREF="http://www.datamystic.com/textpipe.html" REL="nofollow">TextPipe Pro</A> is one of the best commercially available tool that includes web text mining. This is a <A HREF="http://www.datamystic.com/textpipe/viewlets/web_extraction.htm" REL="nofollow"> demo </A> to Angelo_Artshttps://www.blogger.com/profile/11213071159445227872noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-16652755502064038482009-02-12T12:58:00.000-08:002009-02-12T12:58:00.000-08:00This comment has been removed by the author.Angelo_Artshttps://www.blogger.com/profile/11213071159445227872noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-18508024682460296812009-02-10T12:20:00.000-08:002009-02-10T12:20:00.000-08:00By the way, thanks for the recommendation on the P...By the way, thanks for the recommendation on the Perl book.Dean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-91295450840882006192009-02-10T12:19:00.000-08:002009-02-10T12:19:00.000-08:00I have used the text mining tool in Clementine and...I have used the text mining tool in Clementine and Polyanalyst. I've had very positive experiences with both tools.<BR/><BR/>In Clementine, you can read in data from a web site and it will automatically grab all the pages from the web site and strip out the html tags--very convenient for what you are doing. An example is shown <A HREF="http://www.abbottanalytics.com/data/clem_webfeed.png" REL="Dean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-38379397928829933842009-02-03T15:48:00.000-08:002009-02-03T15:48:00.000-08:00There is a great book that deals with text mining ...There is a great book that deals with text mining from a regular expression approach. It is <A HREF="http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470176431.html" REL="nofollow">Practical Text Mining with Perl</A>.<BR/>I am also trying to improve the quality text content extraction from web pages in particular to be used for clustering later. The thing is easier when dealing with dynamic webAngelo_Artshttps://www.blogger.com/profile/11213071159445227872noreply@blogger.com