Discussion Topic |
|
This thread has been locked |
lazide
Big Wall climber
Bay Area, CA
|
|
Topic Author's Reply - Jul 31, 2006 - 10:08pm PT
|
Melissa: kind of like a 'if its been dead for a week already, beating it again doesn't count?'... hmmmm... That might be more complicated, but i'll see what I can do.
Here is my current query to get 'top thread killers'
SELECT
author.name,
count(*) topics
FROM
(SELECT
topic_id,
MAX(posted) latestpost,
count(*) posts
FROM
post
GROUP BY topic_id ) AS most_recent_topics,
post,
author
WHERE
post.topic_id = most_recent_topics.topic_id
AND post.posted = most_recent_topics.latestpost
AND most_recent_topics.posts >= 5
AND post.author_id = author.id
GROUP BY author_id
ORDER BY topics DESC LIMIT 50
The top 10....
+------------------+--------+
| name | topics |
+------------------+--------+
| Ouch! | 147 |
| locker | 146 |
| Karl Baba | 138 |
| Jaybro | 117 |
| dirtineye | 108 |
| Jody | 105 |
| WBraun | 104 |
| Lambone | 91 |
| toulomne*rainbow | 87 |
| radical | 86 |
+------------------+--------+
Whoa!
Once I do some tuning, I'll generate a graph for it :)
|
|
Tahoe climber
climber
Texas to Tahoe
|
|
Jul 31, 2006 - 10:14pm PT
|
That's a tough statistic.
Sometimes, it's not a person that is a thread killer - it just dies of natural causes, i.e., all that's interesting has been covered, and not the last person to post that's the reason.
For instance, it stands to reason that Werner, as the most prolific poster, would be the most common thread killer, simply because of the volume of his posts.
Still fascinating data, though!
|
|
lazide
Big Wall climber
Bay Area, CA
|
|
Topic Author's Reply - Jul 31, 2006 - 10:24pm PT
|
Jody: when a thread is deleted (unless it is deleted AFTER the crawler sees it, which is rare since the crawler is so new), all posts that were on that thread are 'invisible' to the crawler.
This means that users who have posted on deleted threads will be undercounted by however many posts they had on the deleted thread.
There is a way I can dredge them out, but it involves looking at every possible topic_id, which is overly resource intensive for supertopo (it would mean looking at over 230k possible threads to see if they have been deleted, and salvage their posts)
It is also possible that my script is not properly reading all threads (in some situations the HTML is different, and with over 230k posts, some might of fallen through the cracks), though my error checking should of turned up most of those cases.
Edit to clarify:
To save load on the supertopo servers, I look at the list of topics (that you see when you first click on 'forum'), and go through all of them. When you delete a topic, it is removed from that list, but all the replies are left intact (albeit invisible unless you know exactly where to go)
When the supertopo system looks at the number of posts for a user, it just looks at all posts for all time - since it has the raw data (which I don't), it can find them easily.
|
|
TradIsGood
Trad climber
Gunks end of country
|
|
Jul 31, 2006 - 10:28pm PT
|
Can you hook it up to a natural language tool to find the most words per thought? <-- just kidding.
Also how about posting the top ten misspellers by ratio of words spelled wrong to total words.
|
|
k-man
Gym climber
SCruz
|
|
Jul 31, 2006 - 10:34pm PT
|
Finally, a good use for computers and graphs (Exclamation point!)
My favorite stats:
Total posts: 193,066
Total exclamation points in all posts: 99,220
and of course one of the all-time most popular subjects:
AKclimber claims Bachar is a fake! 207
* We live in extroadanary times. * In fact, the most popular
post is happening right now. Git over there and add to it (EP)
|
|
lazide
Big Wall climber
Bay Area, CA
|
|
Topic Author's Reply - Jul 31, 2006 - 10:34pm PT
|
Tahoe climber, that is how it is looking. When I ran the query to exclude 'dead horse beating', i.e. requires that the previous to last post happened within the last 2 days, the 'offenders' don't seem all that offensive ;)
Here is the top 10 with the most recent query. I did some spot checking, and it SEEMS to be doing the right checks (but the query is very complex)
+-------------+--------+
| name | topics |
+-------------+--------+
| Karl Baba | 68 |
| Jody | 63 |
| Jaybro | 52 |
| clustiere | 50 |
| WBraun | 50 |
| locker | 47 |
| dirtineye | 46 |
| Ed Hartouni | 44 |
| Lambone | 40 |
| nature | 39 |
+-------------+--------+
10 rows in set (27.49 sec)
The query (for those that are interested...)
SELECT
author.name,
count(*) topics
FROM
(SELECT
topic_id,
MAX(posted) latestpost,
count(*) posts
FROM
post
GROUP BY topic_id
) AS most_recent_topics,
post,
author
WHERE
post.topic_id = most_recent_topics.topic_id
AND post.posted = most_recent_topics.latestpost
AND DATE_SUB(post.posted,INTERVAL 1 WEEK) >= (
SELECT
posted
FROM post
WHERE
topic_id = most_recent_topics.topic_id
AND posted < most_recent_topics.latestpost
LIMIT 1
)
AND most_recent_topics.posts >= 5
AND post.author_id = author.id
GROUP BY author_id
ORDER BY topics DESC LIMIT 50
|
|
happiegrrrl
Trad climber
New York, NY
|
|
Jul 31, 2006 - 10:36pm PT
|
I was worried I'd be a top exclaimation pointer....
But I can tell you this - There's no need to look for the top elipse user. I MUST I have that contest sewn up!
|
|
lazide
Big Wall climber
Bay Area, CA
|
|
Topic Author's Reply - Jul 31, 2006 - 10:46pm PT
|
t*r: if you think this is for anything but laughs, you are mistaken! No offense intended for anyone. :)
Disclaimer: all numbers are prone to analysis bias, sampling bias, and plain ole I SCREWED UP bias! :P Any stats that don't atleast include a standard deviation don't mean much. :)
P.S. I was thinking about some more graphs!
- who uses the most 'long words' (average word length)
- word frequency counts - what is the most favorite word on supertopo?
The spelling thing is interesting, but probably prone to error (lots of legit words wouldn't be in it)
ponder....
|
|
lazide
Big Wall climber
Bay Area, CA
|
|
Topic Author's Reply - Jul 31, 2006 - 10:49pm PT
|
Jody: do you have an example? I'll take a look (but shouldn't happen).
Only thing I can think of is that maybe supertopo 'loses' what people posted things sometimes? (disappearing 'edit' button)
I go off the actual 'name' (i.e. Jody), and there are a LOT of posts with no link to the authors profile (maybe that means supertopo doesn't know who they are?)
|
|
TradIsGood
Trad climber
Gunks end of country
|
|
Jul 31, 2006 - 10:51pm PT
|
When you put your dictionary in for the spell checking (or maybe you could just use the Google spell checker) put in the bad spelling of Tuolumne just so T*R doesn't get an error for every single post just because she could not spell her own name!
T*R - byte mi.
|
|
TradIsGood
Trad climber
Gunks end of country
|
|
Jul 31, 2006 - 11:05pm PT
|
It has been a while since I looked at the HTML directly. But I believe the user id (a number) is contained in each post. Perhaps using that will help the counting of posts by poster.
There is also the issue of posts that are "orphans". Still reachable by searching, but the OP was deleted.
|
|
lazide
Big Wall climber
Bay Area, CA
|
|
Topic Author's Reply - Jul 31, 2006 - 11:05pm PT
|
Jody:
pre-registration is definitely true. If you go back to the really old topics none of the names have links to their profile (presumably because the system doesn't know they have one). Hardman Knott was one of the first posters and if you were too that would definitely explain it.
I'll look into it a bit more later.
TiG: user id is not included in the normal forum posts unfortunately, rather it points to a lookup that (apparently?) uses the message id to look up the user.
THAT page then includes a user id, but you need a login to get that far (and I haven't written the spider to try)
|
|
Slakkey
Trad climber
From a Quiet Place by the Lake
|
|
Jul 31, 2006 - 11:11pm PT
|
Lazide,
You have to much time on your hands. Interesting stuff Though.
T*R dont worry, I know that there are those that think this forum should be nothing other than about climbing. I must admit I do get a little tired of the political rants but, look at the stats that show the number of true climbers who participate in OT topics.
I say post away people.
|
|
TradIsGood
Trad climber
Gunks end of country
|
|
Jul 31, 2006 - 11:23pm PT
|
lazide, perhaps you should put your login into the spider... Those id's are sequential. You could compare the post count using your existing method, to the post count using the id (logged in, you can look at every post by an individual user).
That may help you to find the discrepancies that Jody mentions.
|
|
Crimpergirl
Sport climber
St. Louis
|
|
Jul 31, 2006 - 11:35pm PT
|
!!!!!!!!
|
|
Mighty Hiker
Social climber
Vancouver, B.C.
|
|
Jul 31, 2006 - 11:42pm PT
|
Fascinating trivia! Thank you!!!
I suspect Werner would come out on top in the top 15 for unwordiness. ? He topped the charts for number of posts, but I wonder how that compares with the number of threads he starts? Of the top 15 (or top 50) posters, I suspect he starts the fewest.
Just like hockey trivia - except the subject is interesting.
Anders
|
|
Ouch!
climber
|
|
Jul 31, 2006 - 11:45pm PT
|
"!!!!!!!!"
Cool! You scored 8 more.
|
|
bonafide
climber
|
|
For thread killer, should probably express the result as a percentage of total posts. That would normalize the effect that frequent posters might have on that parameter, and possible uncover the true problem.
I can't accept the explanation that the thread dies simply becuase the topic has been exhausted. That has been disproven too many times in this forum. Its the lack of good material (post quality) that does it, or (my theory) the work of nefarious thread killers.
|
|
dirtineye
Trad climber
the south
|
|
THis is pretty funny.
I would like to see a plot of posts vs time for each user you have decided to track.
I know for instance my posting went WAY up when I got sick, and I suspect right after I got home from surgery and then during the chemo weeks it is extreme, but while I was in the hospital it woudl have been zero for those days.
Now that I FINALLY have the damned recording software working correctly, I don't think I'll be posting all that much, but I still think ST is the most hilarious and fun site for climbers (or non-climbers of a certain ilk), and I have really had a lot of fun here, and I'll try to say something funny every now and then, and keep in touch with the friends I've made.
OF course, if my thoery of internet addiction is correct, I'll still be here every day. We'll see what comes off my keys when the next dose of chemo, steroids, and marinol hits, hahaha!
I never would have figured that I was one of the exclam kings.
Thread killer, or last worder, or Johnny come lately? You need to figure out how to distinguish all three.
|
|
|
SuperTopo on the Web
|