Hi,
as far as I can judge from the description without actually seeing it at work, I think there are two problems here:
- you are only turning keywords into tags
- the kw extraction class this is based on has no concept of context and language, it only focuses on a single text snippet.
IMHO, tagging becomes useful especially because it adds a different
perspective to the classification process. So while the author of a
page talks about information retrieval, someone else adds the tag
"web search" and the page will be found by both experts and laymen.
Now if you extract your tags from the page itself, this will not add
much value, especially if you already have a search engine in place which will do the weight calculation for you. Even the simple Mysql fulltext search can calculate weight in relation to frequency. Filtering tags out of the page itself by frequency seems to be not very useful, if compared to indexing the text with a search engine and letting this specialized tool do the rank calculation. Also, as information scientists taught us, the most relevant keywords in a text are not those that are used most frequently.
Nevertheless, it is interesting to try finding a way to add tags automagically. So to add an "external" perspective, it would be better to look at the referring pages or - in the specific case of phpclasses.org - the linked classes and try to extract tag information from there. But again, if you have a decent search engine, this is probably also implemented there. I have configured my local Mnogosearch to add link text inside HREFs on referring pages to the page's text which works fine.
The only difference to the tagging solution is that by tagging the
keywords become visible, whereas by using the internal magic of a search engine, the process is not transparent to the user. Still, search engline software can do a much better job in calculating weight, ranking and "related pages", because it can take into account the entire text collection, whereas the keyword extraction class can base its calculation only on a single text.
So the bottom line is - I would try to get tags from somewhere else than the text itself ...
Best regards,
Ulrich |