Supertopo stats! (includes pretty graphs!)

Search
Go

Discussion Topic

Return to Forum List
This thread has been locked
Messages 21 - 40 of total 65 in this topic << First  |  < Previous  |  Show All  |  Next >  |  Last >>
lazide

Big Wall climber
Bay Area, CA
Topic Author's Reply - Jul 31, 2006 - 10:08pm PT
Melissa: kind of like a 'if its been dead for a week already, beating it again doesn't count?'... hmmmm... That might be more complicated, but i'll see what I can do.

Here is my current query to get 'top thread killers'
SELECT
author.name,
count(*) topics
FROM
(SELECT
topic_id,
MAX(posted) latestpost,
count(*) posts
FROM
post
GROUP BY topic_id ) AS most_recent_topics,
post,
author
WHERE
post.topic_id = most_recent_topics.topic_id
AND post.posted = most_recent_topics.latestpost
AND most_recent_topics.posts >= 5
AND post.author_id = author.id
GROUP BY author_id
ORDER BY topics DESC LIMIT 50


The top 10....

+------------------+--------+
| name | topics |
+------------------+--------+
| Ouch! | 147 |
| locker | 146 |
| Karl Baba | 138 |
| Jaybro | 117 |
| dirtineye | 108 |
| Jody | 105 |
| WBraun | 104 |
| Lambone | 91 |
| toulomne*rainbow | 87 |
| radical | 86 |
+------------------+--------+


Whoa!

Once I do some tuning, I'll generate a graph for it :)
Tahoe climber

climber
Texas to Tahoe
Jul 31, 2006 - 10:14pm PT
That's a tough statistic.
Sometimes, it's not a person that is a thread killer - it just dies of natural causes, i.e., all that's interesting has been covered, and not the last person to post that's the reason.

For instance, it stands to reason that Werner, as the most prolific poster, would be the most common thread killer, simply because of the volume of his posts.

Still fascinating data, though!
lazide

Big Wall climber
Bay Area, CA
Topic Author's Reply - Jul 31, 2006 - 10:24pm PT
Jody: when a thread is deleted (unless it is deleted AFTER the crawler sees it, which is rare since the crawler is so new), all posts that were on that thread are 'invisible' to the crawler.

This means that users who have posted on deleted threads will be undercounted by however many posts they had on the deleted thread.

There is a way I can dredge them out, but it involves looking at every possible topic_id, which is overly resource intensive for supertopo (it would mean looking at over 230k possible threads to see if they have been deleted, and salvage their posts)

It is also possible that my script is not properly reading all threads (in some situations the HTML is different, and with over 230k posts, some might of fallen through the cracks), though my error checking should of turned up most of those cases.

Edit to clarify:
To save load on the supertopo servers, I look at the list of topics (that you see when you first click on 'forum'), and go through all of them. When you delete a topic, it is removed from that list, but all the replies are left intact (albeit invisible unless you know exactly where to go)

When the supertopo system looks at the number of posts for a user, it just looks at all posts for all time - since it has the raw data (which I don't), it can find them easily.


TradIsGood

Trad climber
Gunks end of country
Jul 31, 2006 - 10:28pm PT
Can you hook it up to a natural language tool to find the most words per thought? <-- just kidding.

Also how about posting the top ten misspellers by ratio of words spelled wrong to total words.
k-man

Gym climber
SCruz
Jul 31, 2006 - 10:34pm PT
Finally, a good use for computers and graphs (Exclamation point!)


My favorite stats:

Total posts: 193,066
Total exclamation points in all posts: 99,220

    and of course one of the all-time most popular subjects:

AKclimber claims Bachar is a fake! 207




* We live in extroadanary times. * In fact, the most popular
post is happening right now. Git over there and add to it (EP)

lazide

Big Wall climber
Bay Area, CA
Topic Author's Reply - Jul 31, 2006 - 10:34pm PT
Tahoe climber, that is how it is looking. When I ran the query to exclude 'dead horse beating', i.e. requires that the previous to last post happened within the last 2 days, the 'offenders' don't seem all that offensive ;)

Here is the top 10 with the most recent query. I did some spot checking, and it SEEMS to be doing the right checks (but the query is very complex)

+-------------+--------+
| name | topics |
+-------------+--------+
| Karl Baba | 68 |
| Jody | 63 |
| Jaybro | 52 |
| clustiere | 50 |
| WBraun | 50 |
| locker | 47 |
| dirtineye | 46 |
| Ed Hartouni | 44 |
| Lambone | 40 |
| nature | 39 |
+-------------+--------+
10 rows in set (27.49 sec)


The query (for those that are interested...)

SELECT
author.name,
count(*) topics
FROM
(SELECT
topic_id,
MAX(posted) latestpost,
count(*) posts
FROM
post
GROUP BY topic_id
) AS most_recent_topics,
post,
author
WHERE
post.topic_id = most_recent_topics.topic_id
AND post.posted = most_recent_topics.latestpost
AND DATE_SUB(post.posted,INTERVAL 1 WEEK) >= (
SELECT
posted
FROM post
WHERE
topic_id = most_recent_topics.topic_id
AND posted < most_recent_topics.latestpost
LIMIT 1
)
AND most_recent_topics.posts >= 5
AND post.author_id = author.id
GROUP BY author_id
ORDER BY topics DESC LIMIT 50


happiegrrrl

Trad climber
New York, NY
Jul 31, 2006 - 10:36pm PT
I was worried I'd be a top exclaimation pointer....

But I can tell you this - There's no need to look for the top elipse user. I MUST I have that contest sewn up!
lazide

Big Wall climber
Bay Area, CA
Topic Author's Reply - Jul 31, 2006 - 10:46pm PT
t*r: if you think this is for anything but laughs, you are mistaken! No offense intended for anyone. :)

Disclaimer: all numbers are prone to analysis bias, sampling bias, and plain ole I SCREWED UP bias! :P Any stats that don't atleast include a standard deviation don't mean much. :)

P.S. I was thinking about some more graphs!

- who uses the most 'long words' (average word length)
- word frequency counts - what is the most favorite word on supertopo?

The spelling thing is interesting, but probably prone to error (lots of legit words wouldn't be in it)

ponder....
lazide

Big Wall climber
Bay Area, CA
Topic Author's Reply - Jul 31, 2006 - 10:49pm PT
Jody: do you have an example? I'll take a look (but shouldn't happen).

Only thing I can think of is that maybe supertopo 'loses' what people posted things sometimes? (disappearing 'edit' button)

I go off the actual 'name' (i.e. Jody), and there are a LOT of posts with no link to the authors profile (maybe that means supertopo doesn't know who they are?)
TradIsGood

Trad climber
Gunks end of country
Jul 31, 2006 - 10:51pm PT
When you put your dictionary in for the spell checking (or maybe you could just use the Google spell checker) put in the bad spelling of Tuolumne just so T*R doesn't get an error for every single post just because she could not spell her own name!

T*R - byte mi.
TradIsGood

Trad climber
Gunks end of country
Jul 31, 2006 - 11:05pm PT
It has been a while since I looked at the HTML directly. But I believe the user id (a number) is contained in each post. Perhaps using that will help the counting of posts by poster.

There is also the issue of posts that are "orphans". Still reachable by searching, but the OP was deleted.
lazide

Big Wall climber
Bay Area, CA
Topic Author's Reply - Jul 31, 2006 - 11:05pm PT
Jody:
pre-registration is definitely true. If you go back to the really old topics none of the names have links to their profile (presumably because the system doesn't know they have one). Hardman Knott was one of the first posters and if you were too that would definitely explain it.

I'll look into it a bit more later.

TiG: user id is not included in the normal forum posts unfortunately, rather it points to a lookup that (apparently?) uses the message id to look up the user.

THAT page then includes a user id, but you need a login to get that far (and I haven't written the spider to try)
Slakkey

Trad climber
From a Quiet Place by the Lake
Jul 31, 2006 - 11:11pm PT
Lazide,

You have to much time on your hands. Interesting stuff Though.

T*R dont worry, I know that there are those that think this forum should be nothing other than about climbing. I must admit I do get a little tired of the political rants but, look at the stats that show the number of true climbers who participate in OT topics.

I say post away people.
TradIsGood

Trad climber
Gunks end of country
Jul 31, 2006 - 11:23pm PT
lazide, perhaps you should put your login into the spider... Those id's are sequential. You could compare the post count using your existing method, to the post count using the id (logged in, you can look at every post by an individual user).

That may help you to find the discrepancies that Jody mentions.
Crimpergirl

Sport climber
St. Louis
Jul 31, 2006 - 11:35pm PT
!!!!!!!!
Mighty Hiker

Social climber
Vancouver, B.C.
Jul 31, 2006 - 11:42pm PT
Fascinating trivia! Thank you!!!

I suspect Werner would come out on top in the top 15 for unwordiness. ? He topped the charts for number of posts, but I wonder how that compares with the number of threads he starts? Of the top 15 (or top 50) posters, I suspect he starts the fewest.

Just like hockey trivia - except the subject is interesting.

Anders
Ouch!

climber
Jul 31, 2006 - 11:45pm PT
"!!!!!!!!"

Cool! You scored 8 more.
Ouch!

climber
Aug 1, 2006 - 12:42am PT
bonafide

climber
Aug 1, 2006 - 08:47am PT
For thread killer, should probably express the result as a percentage of total posts. That would normalize the effect that frequent posters might have on that parameter, and possible uncover the true problem.

I can't accept the explanation that the thread dies simply becuase the topic has been exhausted. That has been disproven too many times in this forum. Its the lack of good material (post quality) that does it, or (my theory) the work of nefarious thread killers.
dirtineye

Trad climber
the south
Aug 1, 2006 - 09:21am PT
THis is pretty funny.

I would like to see a plot of posts vs time for each user you have decided to track.

I know for instance my posting went WAY up when I got sick, and I suspect right after I got home from surgery and then during the chemo weeks it is extreme, but while I was in the hospital it woudl have been zero for those days.


Now that I FINALLY have the damned recording software working correctly, I don't think I'll be posting all that much, but I still think ST is the most hilarious and fun site for climbers (or non-climbers of a certain ilk), and I have really had a lot of fun here, and I'll try to say something funny every now and then, and keep in touch with the friends I've made.

OF course, if my thoery of internet addiction is correct, I'll still be here every day. We'll see what comes off my keys when the next dose of chemo, steroids, and marinol hits, hahaha!

I never would have figured that I was one of the exclam kings.

Thread killer, or last worder, or Johnny come lately? You need to figure out how to distinguish all three.
Messages 21 - 40 of total 65 in this topic << First  |  < Previous  |  Show All  |  Next >  |  Last >>
Return to Forum List
 
Our Guidebooks
spacerCheck 'em out!
SuperTopo Guidebooks

guidebook icon
Try a free sample topo!

 
SuperTopo on the Web

Recent Route Beta