as far as I can judge from the description without actually seeing it at work, I think there are two problems here:
- you are only turning keywords into tags
- the kw extraction class this is based on has no concept of context and language, it only focuses on a single text snippet.
IMHO, tagging becomes useful especially because it adds a different
perspective to the classification process. So while the author of a
page talks about information retrieval, someone else adds the tag
"web search" and the page will be found by both experts and laymen.
Now if you extract your tags from the page itself, this will not add
much value, especially if you already have a search engine in place which will do the weight calculation for you. Even the simple Mysql fulltext search can calculate weight in relation to frequency. Filtering tags out of the page itself by frequency seems to be not very useful, if compared to indexing the text with a search engine and letting this specialized tool do the rank calculation. Also, as information scientists taught us, the most relevant keywords in a text are not those that are used most frequently.
Nevertheless, it is interesting to try finding a way to add tags automagically. So to add an "external" perspective, it would be better to look at the referring pages or - in the specific case of phpclasses.org - the linked classes and try to extract tag information from there. But again, if you have a decent search engine, this is probably also implemented there. I have configured my local Mnogosearch to add link text inside HREFs on referring pages to the page's text which works fine.
The only difference to the tagging solution is that by tagging the
keywords become visible, whereas by using the internal magic of a search engine, the process is not transparent to the user. Still, search engline software can do a much better job in calculating weight, ranking and "related pages", because it can take into account the entire text collection, whereas the keyword extraction class can base its calculation only on a single text.
So the bottom line is - I would try to get tags from somewhere else than the text itself ...
|2007-03-28 17:56:09 - In reply to message 1 from Ulrich Babiak|
|Thank you for your comments.|
First let me clarify that the "searchonomy" implemented in the site did not use the Automatic Keyword Generator class of Ver Pangolio.
Maybe it is not clear enough in the text, but the keywords used to tag class packages are taken from the top searched keywords entered by the users in the site search pages. Thus the name searchonomy = search + folksonomy .
I also would like to clarify that this feature was not meant to be an alternative to page tagging based on keywords assigned explicitly by the users. As I mentioned, that feature may be implemented later.
The idea is to suggest other class packages that maybe related with the current package that the users sees but is not exactly what it is looking for.
So, the site may list several tags in each package page. The tags are linked to pages that list the top user rated packages with the same keywords. This way the site provides suggestions about related packages that happen to be the most appreciated by the site users that voted on them.
As you may understand by now, the searchonomy feature is actually retrieving the suggested keywords from the search engine results as you propose.
|2007-03-28 23:03:34 - In reply to message 2 from Manuel Lemos|
|Thanks for the clarification, I probably got distracted by the keyword extraction class. Having suffered from bad user input, I tend to put more trust into methods of automatic classification, so I will certainly keep an eye on your solution to see how it works out on a larger scale and over a longer period of time ...|