Discussion:
computer bootlaces
(too old to reply)
Jonathan de Boyne Pollard
2011-09-16 15:37:29 UTC
Permalink
The side effects of publishing in this manner cuase serious problems.
Actually, the side effects come from M. Speed putting that information
into the From: headers of xyr posts, not from other people. The UBM
senders who scrape NOV information get to obtain xyr mailbox name.
Those same people do not get to find mailbox names embedded in the
bodies of messages, as was the case here.
It was, and still is, very, very bad manners to post access
information in an ASCII format in this medium.
That's just nonsense. Think about headers.
Jonathan de Boyne Pollard
2011-09-20 16:19:29 UTC
Permalink
I suspect, in any case, that the e-mail address harvesters no longer
bother with newsgroups. For a while they were doing better grabbing
addresses from web sites. These days, I imagine, they rely mostly on
trojans that dig into people's address books.
I disagree. I suspect that they use option D: all of the above. Why
would they throw any method away?
Jonathan de Boyne Pollard
2011-09-20 16:33:10 UTC
Permalink
I still have spam knocking on my MTA, addressed to article-ids. (Yep,
the harvesters are *that* stupid.)
Whilst I agree that thinking of UBM senders as total thickheads does
give one a warm sense of superiority, I suspect that you'll find that
it's not the harvesters doing this, but the people who republish Usenet
as if they were hosting their own WWW discussion forum, and turn
anything that even vaguely matches a <*@*> pattern into a mailto:
hyperlink. (It's not as widespread a practice as it once was, but it
still happens. Ironically, there are now also the fake discussion forum
sites that err in the other direction, and replace all message IDs in
messages with "(mailbox removed)" or some such.) All that harvesters do
is scrape those WWW pages.
Bernd Felsche
2011-09-21 03:53:00 UTC
Permalink
Post by Jonathan de Boyne Pollard
I still have spam knocking on my MTA, addressed to article-ids. (Yep,
the harvesters are *that* stupid.)
Whilst I agree that thinking of UBM senders as total thickheads does
give one a warm sense of superiority, I suspect that you'll find that
it's not the harvesters doing this, but the people who republish Usenet
as if they were hosting their own WWW discussion forum, and turn
hyperlink. (It's not as widespread a practice as it once was, but it
Not verifiable. I've checked many of the spammed addresses and
Google gets no hits at all on them.
Post by Jonathan de Boyne Pollard
still happens. Ironically, there are now also the fake discussion forum
sites that err in the other direction, and replace all message IDs in
messages with "(mailbox removed)" or some such.) All that harvesters do
is scrape those WWW pages.
--
/"\ Bernd Felsche - Innovative Reckoning, Perth, Western Australia
\ / ASCII ribbon campaign | For every complex problem there is an
X against HTML mail | answer that is clear, simple, and wrong.
/ \ and postings | --HL Mencken
Jonathan de Boyne Pollard
2011-09-22 18:48:30 UTC
Permalink
I speculate that searching it indirectly via what appears on the
WWW is either (a) a means of re-using Google Web mechanisms for
Google Groups, or (b) an attempt to apply the page ranking idea to
Usenet. The former seems the more likely.
Since the archive belongs to Google and is maintained in any manner
Google sees fit, I'm not clear on what you're trying to say here.
That's because you're not thinking about the database. Think about the
database that Google has. It comprises a whole lot of stuff from
DejaNews, more old articles taken from various sources, all of the
Usenet traffic that Google has encountered directly, all of Google's
*own* non-Usenet discussion fora, and various third party WWW discussion
fora. But it's almost certainly *not* in the form of a news spool. As
I said, it's probably in a form that allows Google to re-use its WWW
searching and indexing mechanisms (spider+index servers+doc servers).
As I also said, it's also possible that Google wanted to employ some
sort of equivalent to the page ranking mechanism that it has for the
WWW. Given that WWW discussion fora are involved, I suspect that Google
tries to treat all of these things as if they were WWW discussion fora,
and then employs its WWW mechanisms upon them.
Most Usenet servers purge the articles after they become a predefined
age; [...]
This particular piece of folk wisdom hasn't been true for some years,
now, note. Several of the major Usenet nodes simply don't expire
non-binaries postings at all nowadays. Their abilities to store posts
have far outstripped the size of the text portion of a full Usenet feed,
which is only a tiny proportion of the full 10TiB/day feed. I remember
that when I last looked at Highwinds Media it had articles in some
newsgroups going back to 2006 or so. Power Usenet is currently
advertising "3013+ days text retention". In other words, it hasn't
expired a non-binaries posting for *eight years*. I haven't expired any
non-binary posts from my node in that time, either. We've all
effectively just turned non-binaries expiry off, half a decade or more ago.
Rod Speed
2011-09-22 21:35:53 UTC
Permalink
Post by Jonathan de Boyne Pollard
I speculate that searching it indirectly via what appears on the
WWW is either (a) a means of re-using Google Web mechanisms for
Google Groups, or (b) an attempt to apply the page ranking idea to
Usenet. The former seems the more likely.
Since the archive belongs to Google and is maintained in any manner
Google sees fit, I'm not clear on what you're trying to say here.
That's because you're not thinking about the database. Think about
the database that Google has. It comprises a whole lot of stuff from
DejaNews, more old articles taken from various sources, all of the
Usenet traffic that Google has encountered directly, all of Google's
*own* non-Usenet discussion fora, and various third party WWW
discussion fora. But it's almost certainly *not* in the form of a
news spool. As I said, it's probably in a form that allows Google to re-use its WWW
searching and indexing mechanisms (spider+index servers+doc servers).
As I also said, it's also possible that Google wanted to employ some
sort of equivalent to the page ranking mechanism that it has for the
WWW. Given that WWW discussion fora are involved, I suspect that
Google tries to treat all of these things as if they were WWW
discussion fora, and then employs its WWW mechanisms upon them.
Most Usenet servers purge the articles after they become a predefined age; [...]
This particular piece of folk wisdom hasn't been true for some years, now, note.
Its still true of plenty of them.
Post by Jonathan de Boyne Pollard
Several of the major Usenet nodes simply don't expire non-binaries postings at all nowadays.
And hordes of the non major ones still purge. Plenty of the major ones do too.
Post by Jonathan de Boyne Pollard
Their abilities to store posts have far outstripped the size of the text portion of a full Usenet feed, which is only
a tiny proportion of the full 10TiB/day feed. I remember that when I last looked at Highwinds Media it had articles
in some newsgroups going back to 2006 or so. Power Usenet is currently advertising "3013+ days text retention". In
other words, it hasn't expired a non-binaries posting for *eight years*. I haven't expired any non-binary posts from
my node in that time,
either. We've all effectively just turned non-binaries expiry off, half a decade or more ago.
Doesnt mean that everyone has done that, hordes havent.
MotoFox
2011-09-26 23:27:00 UTC
Permalink
"And hordes of the non major ones still purge. Plenty of the major ones
do too."

AIOE and Eternal-September both purge. AIOE purges after a month and
E-S does it after a few months (generally, although there are currently
some boards on there with posts going back to 2010!)
--
MotoFox
Originator of the word "enubulous"

I just tell everybody to run Linux, myself.

The "users are idiots and are confused by functionality" approach of
Apple is a disease. If you design your OS for idiots, only idiots will
use it. I don't use a Macintosh, because in striving to be so simple,
they simply can't do what I need them to do.

Please, just tell everybody to go to Linux.
Jonathan de Boyne Pollard
2011-09-23 13:28:19 UTC
Permalink
Their role is not to archive, their role is the interface, maybe even so
they can offer "forums" without actually having their own forums.
The role of those sites is indeed to archive. They archive newsgroups
that they are interested in.
No, it isn't. Their role is to provide discussion forums, and their use
of Usenet postings is to pretend that they already have an active
membership. I just did a search for my name on Google Web, and picked
one of these sites at random. Here it is:

http://www.rhinocerus.net/forum/lang-asm-x86/684275-re-paper-tape-bootstraps.html

As you can see, I am, apparently, a "Guest" at a place that I'd never
even heard of until today. And directly beneath the copy of my message
is an invitation to other people to register.
Jonathan de Boyne Pollard
2011-09-23 13:42:52 UTC
Permalink
I've just done some more tests. I can successfully search on the e-mail
address I'm now using, and Google even displays it unmunged (perhaps
because of the ".invalid"). Likewise, I can search on the address of
others here, and it works.
What doesn't work is a search on any of the e-mail addresses I was using
in 1991. That fails, even if I first find an article and then search on
an e-mail address contained within that article.
It's possible that the searches are failing because the new Google
Groups "Advanced Search" doesn't offer an option to search Usenet
groups. The choice is between web pages in google groups, or web pages
in another domain. The phrase "web pages" occurs literally. There's no
option for searching for something that's not on the web.
Actually, the difference is more likely down to the fact that the two
sets of postings were imported into the archive in two different ways.
One, no doubt, came from one of the various collections of pre-DejaNews
postings that Google imported into its database. The other probably
came from Google importing directly from a Usenet feed. It's very
plausible that the former import mechanism didn't index mailbox names,
whereas the latter does.
Jonathan de Boyne Pollard
2011-09-23 13:59:26 UTC
Permalink
It seems to have been a stable cutoff of 1/1/2000 for some time.
That's likely something different. Bear in mind when Google took over
DejaNews. And consider the operational and mechanical differences
between importing another company's half-decade database of Usenet
postings, from whatever format that database is structured, and
importing directly from a Usenet feed.
Jonathan de Boyne Pollard
2011-09-23 14:03:35 UTC
Permalink
With
ucbvax!shasta!amadeus!evan
I get three articles (from 1985) when sorting by relevance, but none
when sorting by date.
Articles from the 1980s came into the archive through yet another route,
most famously from archives on magnetic tapes owned by Henry Spencer and
others. The processing of whatever article posting/arrival dates that
accompanied those data, if any in fact accompanied them at all, may well
have been different yet again.
Loading...