Discussion:
vague plans for one more "news server" software
(too old to reply)
Ivan Shmakov
2012-01-17 08:12:13 UTC
Permalink
[Cross-posting to news:news.misc, for it's no longer just about
NNTP and IMAP.]
With netnews and NNTP being more or less moribund, this is
primarily of historical interest these days.
There seems to be a few still active newsgroups, actually.
Consider, e. g., news:sci.electronics.design, news:comp.os.vms,
news:comp.arch.embedded, news:alt.russian.z1, etc.
I'm currently working on a new NNTP server -- so it's not entirely
moribund ;-).
I'm contemplating developing a kind of a “news caching proxy”,
actually. However, I consider making somewhat radical changes
to the usual news processing procedures.

In particular, I aim for a better support of presentation of
netnews as (apart from having them accessible via IMAP and NNTP)
both Atom feeds (including Atom Posting Protocol support) and
Web pages (XHTML.) To this end, I believe it's essential to
outrightly discard non-ASCII article headers, /and/ non-ASCII
article bodies of the articles lacking proper MIME headers, as
both are ambiguous as to what character encoding is used.

Also, I consider using an RDBMS to store particular
(Message-ID:, References:), if not all, of the message headers,
so to allow for instant threading. Perhaps this storage could
be used to implement some “extended search” facilities within
either or both of the NNTP and IMAP interfaces as well, but I
have no specific plans on that at this moment.

Perhaps, RDBMS could also be used for MIME parts below certain
threshold of size (in octets; or perhaps characters, for text/*
MIME parts, and for strictly ASCII non-MIME article bodies.)

The MIME parts are to be stored separately (from each other and
from the header), in 8-bit, even if originally coming as, e. g.,
Base64 or quoted-printable. The parts above the size threshold
would be stored separately on the filesystem. If possible, a
few more transformations would also be implemented. In
particular, it may be possible to transform any armored OpenPGP
signatures into their MIME-based counterparts.

I also intend to compute message digests (SHA 256, SHA-1) for
the MIME parts of over some trivial (1024 octets or so) size,
and aggressively replace duplicates with links. If the message
(or a part) is digitally signed with a known key, the digests
computed would be verified against the signature, and the
message discarded on failure.

As for the “caching” part, this agent would both allow
conventional feeds (for input), as well as periodical and
on-demand fetching of articles (like, e. g., suck(1), but also
allowing for partial retrieval of the newsgroups' articles,
based on criteria specified over the XOVER data.) It will also
maintain the “last access time” for the articles, to be taken
into account in the “expiration” process.

The “proxy” part would mean that the server could be instructed
to preserve the Xref: header of its “primary” (“backing”) source
(though only for the specific newsgroups or hierarchies, and
still allowing for different sources to be used to actually
fetch the articles.) The posting in proxy mode will be
performed synchronously, with no reply being sent to the user
agent before the “backing” source itself replies, or a timeout.

For the implementation of the prototype, I'm now considering the
Perl language, as there's a sheer amount of extensions
available, providing support for a variety of protocols and data
presentations.
For now it is transit only (no readers), but I would like to
eventually add reader support. I'd already intended to support both
NNTP and IMAP access once that happens.
However, given the lack of any way to post new messages via IMAP, I'm
not sure it would be very useful for Usenet.
The IMAP protocol allows for both retrieval /and/ storage of
messages. In particular, certain MUA's allow for the copies of
the messages sent via e-mail to be saved into a dedicated IMAP
mailbox. Although unconventional, I guess that the MUA's could
be changed to use such a functionality to post new messages.
--
FSF associate member #7257
River Tarnell
2012-01-17 09:05:27 UTC
Permalink
[-comp.mail.imap]
Post by Ivan Shmakov
To this end, I believe it's essential to
outrightly discard non-ASCII article headers, /and/ non-ASCII
article bodies of the articles lacking proper MIME headers, as
both are ambiguous as to what character encoding is used.
This will discard all yEnc-encoded binaries, although you may not care
about that.
Post by Ivan Shmakov
Also, I consider using an RDBMS to store particular
(Message-ID:, References:), if not all, of the message headers,
so to allow for instant threading.
This is called the overview database, and it's already implemented for
the OVER (or XOVER) command, which most clients use for threading
without having to download the entire message. See RFC3977 section 8.3.
Unfortunately there's no way to handle non-ASCII text in the overview.
Post by Ivan Shmakov
If possible, a
few more transformations would also be implemented. In
particular, it may be possible to transform any armored OpenPGP
signatures into their MIME-based counterparts.
I think you need to be careful modifying message bodies. For example,
when I send PGP-signed messages on Usenet, I always use the legacy PGP
encoding[0]. If you converted this into MIME, and it offended someone who
dislikes MIME, then they would blame me for it.

Also note that when you relay messages to another server, you are not
allowed to make *any* changes to the message except for Path and Xref
(RFC 5537, section 3.6).

[0] Firstly because my client doesn't support OpenPGP, but also because
MIME messages are fairly ugly for people using non-MIME-capable reader
software.
Post by Ivan Shmakov
I also intend to compute message digests (SHA 256, SHA-1) for
the MIME parts of over some trivial (1024 octets or so) size,
and aggressively replace duplicates with links.
Most people already do this for the entire message, but duplicates are
discarded than than de-duped. (Used to filter EMP.)
Post by Ivan Shmakov
However, given the lack of any way to post new messages via IMAP, I'm
not sure it would be very useful for Usenet.
The IMAP protocol allows for both retrieval /and/ storage of
messages. In particular, certain MUA's allow for the copies of
the messages sent via e-mail to be saved into a dedicated IMAP
mailbox. Although unconventional, I guess that the MUA's could
be changed to use such a functionality to post new messages.
It might be possible to implement, but that requires the project be
active and have developers willing to spend time on Usenet improvements,
which is not always the case. AIUI, even Mozilla's NNTP support is
fairly decrepit, even though that's a very active project.

I think the version of trn I'm using was released around 2001.

Regards,
--
-- river. | Free Usenet: http://news.rt.uk.eu.org/
Non-Reciprocal Laws of Expectations: | PGP: 2B9CE6F2
Negative expectations yield negative results.
Positive expectations yield negative results.
Ivan Shmakov
2012-01-17 09:58:07 UTC
Permalink
Post by River Tarnell
To this end, I believe it's essential to outrightly discard
non-ASCII article headers, /and/ non-ASCII article bodies of the
articles lacking proper MIME headers, as both are ambiguous as to
what character encoding is used.
This will discard all yEnc-encoded binaries, although you may not
care about that.
It's hardly of any value to me, but the primary question is
whether I'd be able to unambiguously detect the use of a
particular encoding? For sure, I'd prefer using MIME
exclusively for this end.
Post by River Tarnell
Also, I consider using an RDBMS to store particular (Message-ID:,
References:), if not all, of the message headers, so to allow for
instant threading.
This is called the overview database, and it's already implemented
for the OVER (or XOVER) command, which most clients use for threading
without having to download the entire message.
Indeed, and this database will be used to back the XOVER command
in particular. (Note that I've explicitly mentioned XOVER in
the original article.)
Post by River Tarnell
See RFC3977 section 8.3. Unfortunately there's no way to handle
non-ASCII text in the overview.
Yes there is, as long as RFC 2047 is used.

While RFC 3977 says that the header contents "SHOULD be in
UTF-8", it doesn't seem to suggest any way to discern proper
UTF-8-encoded headers from similarly-looking octet sequences.

OTOH, RFC 5322 (which, as long as this issue is concerned, could
be considered a sibling of a kind to both RFC 1036 and RFC 3977)
explicitly defines the ("unstructured") header fields' bodies as
"any printable US-ASCII characters plus white space characters".

As RFC 5322 definition of the message header allows for much
less ambiguity in this respect, I'm inclined to disallow
non-7-bit message headers completely.

(Internally, the components of the system I propose would use a
protocol different to NNTP for communication, so non-ASCII text
anywhere, and even raw binary data, won't be a problem.)
Post by River Tarnell
If possible, a few more transformations would also be implemented.
In particular, it may be possible to transform any armored OpenPGP
signatures into their MIME-based counterparts.
I think you need to be careful modifying message bodies. For
example, when I send PGP-signed messages on Usenet, I always use the
legacy PGP encoding[0]. If you converted this into MIME, and it
offended someone who dislikes MIME, then they would blame me for it.
Actually, that's precisely what OpenPGP signatures are for: to
be able to detect if the message was tampered. (Which makes me
doubt whether such a transformation is possible at all.)
Post by River Tarnell
Also note that when you relay messages to another server, you are not
allowed to make *any* changes to the message except for Path and Xref
(RFC 5537, section 3.6).
For this reason, I don't consider relaying at this moment at
all. (Not that I've stated that the software in question is
aimed at strict RFC 5537 compliance.)
Post by River Tarnell
[0] Firstly because my client doesn't support OpenPGP, but also
because MIME messages are fairly ugly for people using
non-MIME-capable reader software.
Somehow, I'm inclined to think that it's the support for legacy
software that draws Usenet towards its demise. (Though I may
easily be wrong.)
Post by River Tarnell
I also intend to compute message digests (SHA 256, SHA-1) for the
MIME parts of over some trivial (1024 octets or so) size, and
aggressively replace duplicates with links.
Most people already do this for the entire message, but duplicates
are discarded than than de-duped. (Used to filter EMP.)
ACK, thanks.

For individual MIME parts, however, I believe that de-duping may
be more appropriate.
Post by River Tarnell
However, given the lack of any way to post new messages via IMAP,
I'm not sure it would be very useful for Usenet.
The IMAP protocol allows for both retrieval /and/ storage of
messages. In particular, certain MUA's allow for the copies of the
messages sent via e-mail to be saved into a dedicated IMAP mailbox.
Although unconventional, I guess that the MUA's could be changed to
use such a functionality to post new messages.
It might be possible to implement, but that requires the project be
active and have developers willing to spend time on Usenet
improvements, which is not always the case. AIUI, even Mozilla's
NNTP support is fairly decrepit, even though that's a very active
project.
Agreed.
Post by River Tarnell
I think the version of trn I'm using was released around 2001.
--
FSF associate member #7257
River Tarnell
2012-01-17 14:16:25 UTC
Permalink
Post by Ivan Shmakov
Post by River Tarnell
This will discard all yEnc-encoded binaries, although you may not
care about that.
It's hardly of any value to me, but the primary question is
whether I'd be able to unambiguously detect the use of a
particular encoding? For sure, I'd prefer using MIME
exclusively for this end.
You can detect yEnc by looking for a line that begins with "=ybegin" and
contains the strings "line=", "size=" and "name=", and is followed some
time later by a line that begins with "=yend" and contains the string
"size=". (Yes, this is a terrible format.) There's also some sort of
"standard" for what the subject line should be, at least for multi-part
binaries.

yEnc has no "encoding" as such apart from that of the encoded file,
which is usually binary data.

If you feel clever, you could extract the filename from the =ybegin line
and convert the data into a MIME attachment ;-) (of type
application/octet-stream, unless you want to guess from the filename --
yEnc, of course, doesn't say what the file type is).
Post by Ivan Shmakov
Post by River Tarnell
See RFC3977 section 8.3. Unfortunately there's no way to handle
non-ASCII text in the overview.
Yes there is, as long as RFC 2047 is used.
Yes, I forgot about that. But I think its use is less widespread in
Usenet than email (since most mail clients support MIME nowadays, while
many newsreaders do not).
Post by Ivan Shmakov
Post by River Tarnell
If possible, a few more transformations would also be implemented.
In particular, it may be possible to transform any armored OpenPGP
signatures into their MIME-based counterparts.
I think you need to be careful modifying message bodies. For
example, when I send PGP-signed messages on Usenet, I always use the
legacy PGP encoding[0]. If you converted this into MIME, and it
offended someone who dislikes MIME, then they would blame me for it.
Actually, that's precisely what OpenPGP signatures are for: to
be able to detect if the message was tampered. (Which makes me
doubt whether such a transformation is possible at all.)
But people aren't going to verify the signature; most people who cared
about that would use a UA that supported OpenPGP to start with. They'll
just look at the message and say "oh, an annoying MIME message".

Of course anyone could post an annoying message claiming to be from me,
but that's a bit different to software generating such a message on
purpose.

PS: I would like to see Usenet keep going as well. I have no idea how
to achieve that, though. I think most people just expect forums to be
on the Web nowadays; for many topics there are many more Web forums than
mailing lists.

Regards,
--
-- river. | Free Usenet: http://news.rt.uk.eu.org/
Non-Reciprocal Laws of Expectations: | PGP: 2B9CE6F2
Negative expectations yield negative results.
Positive expectations yield negative results.
Ivan Shmakov
2012-02-03 05:43:32 UTC
Permalink
Post by River Tarnell
Post by River Tarnell
This will discard all yEnc-encoded binaries, although you may not
care about that.
It's hardly of any value to me, but the primary question is whether
I'd be able to unambiguously detect the use of a particular
encoding? For sure, I'd prefer using MIME exclusively for this end.
[...]
Post by River Tarnell
yEnc has no "encoding" as such apart from that of the encoded file,
which is usually binary data.
In MIME parlance, yEnc /is/ an encoding. The mapping of octets
to characters is called charset in MIME specifications.
Post by River Tarnell
If you feel clever, you could extract the filename from the =ybegin
line and convert the data into a MIME attachment ;-) (of type
application/octet-stream, unless you want to guess from the filename
-- yEnc, of course, doesn't say what the file type is).
In the absence of an OpenPGP signature, it may be sensible to do
just that. And also for non-MIME Base64- and UUE-encoded data.
Post by River Tarnell
Post by River Tarnell
See RFC3977 section 8.3. Unfortunately there's no way to handle
non-ASCII text in the overview.
Yes there is, as long as RFC 2047 is used.
Yes, I forgot about that. But I think its use is less widespread in
Usenet than email (since most mail clients support MIME nowadays,
while many newsreaders do not).
If there's a list, I could check if these user agents could be
easily fixed.

[...]
Post by River Tarnell
PS: I would like to see Usenet keep going as well. I have no idea
how to achieve that, though. I think most people just expect forums
to be on the Web nowadays; for many topics there are many more Web
forums than mailing lists.
That's my intent: to allow for better interoperability between
Web, netnews, and perhaps even certain social networking sites.
--
FSF associate member #7257
Ralf Döblitz
2012-01-17 17:11:30 UTC
Permalink
Post by River Tarnell
[-comp.mail.imap]
Post by Ivan Shmakov
To this end, I believe it's essential to
outrightly discard non-ASCII article headers, /and/ non-ASCII
article bodies of the articles lacking proper MIME headers, as
both are ambiguous as to what character encoding is used.
This will discard all yEnc-encoded binaries, although you may not care
about that.
Discarding that junk could also be considered as a nice feature.

Ralf
--
[Abhängen von Kruzifixen in Rechnerräumen]
Nicht opportun. Normalerweise haengt an diesen Kruzifixen ja schon jemand.
Wurde Jesus gelartet?
  – Jens Chr. Bachem in de.alt.sysadmin.recovery
Mark Crispin
2012-01-17 16:48:01 UTC
Permalink
Post by Ivan Shmakov
To this end, I believe it's essential to
outrightly discard non-ASCII article headers, /and/ non-ASCII
article bodies of the articles lacking proper MIME headers, as
both are ambiguous as to what character encoding is used.
Good luck on this. This was a battle that was fought, and lost, about 20
years ago. Search for "just send 8-bits" in old old flamewars. Prepare to
be nauseated.
Post by Ivan Shmakov
Also, I consider using an RDBMS
You're a glutton for punishment, I see. Lots of people have had this idea
in the past...

I'll just say "good luck".

However, I will give you an important hint. To do IMAP in any sort of
reasonable way, you MUST have access to the message as a single RFC *822
format char* string.

Don't assume that you can assemble on the fly from a database. EVERYBODY
who has ever tried has failed, miserably. Whatever you do to store the
message in separate parts, you MUST have the message as the big char*.
You'll regret doing otherwise.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Russ Allbery
2012-01-17 17:09:48 UTC
Permalink
Post by Mark Crispin
To this end, I believe it's essential to outrightly discard
non-ASCII article headers, /and/ non-ASCII article bodies of the
articles lacking proper MIME headers, as both are ambiguous as to
what character encoding is used.
Good luck on this. This was a battle that was fought, and lost, about 20
years ago. Search for "just send 8-bits" in old old flamewars. Prepare
to be nauseated.
Believe it or not, it's actually gotten better since the old flamewars.
Not to the point where you can just assume this, of course, but more and
more clients just quietly do the right thing, and people seem to cope.
--
Russ Allbery (***@stanford.edu) <http://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<http://www.eyrie.org/~eagle/faqs/questions.html> explains why.
Xavier Roche
2012-01-17 18:37:04 UTC
Permalink
Post by Russ Allbery
Believe it or not, it's actually gotten better since the old flamewars.
Not to the point where you can just assume this, of course, but more and
more clients just quietly do the right thing, and people seem to cope.
One of the greatest battle won on fr.* recently was to *accept* UTF-8 as
charset (rather than the venerable ISO-8859 family members). I'd say
that things are getting better, but a bit slowly :)
Julien ÉLIE
2012-01-17 19:48:16 UTC
Permalink
Hi Xavier,
Post by Xavier Roche
One of the greatest battle won on fr.* recently was to *accept* UTF-8 as
charset (rather than the venerable ISO-8859 family members). I'd say
that things are getting better, but a bit slowly :)
:-)

The next great step will be to send monthly charters or recommendations
with a Subject: header field body in MIME! (And, in a few years, in UTF-8.)

I am still amazed to read non-MIME 7-bit subjects like:

Subject: [DOC] Tables de caracteres utilisees dans la hierarchie fr.*


(Note for non-French speakers: the above subject should be written in
MIME or UTF-8 so as to properly appear in a news client as
Subject: [DOC] Tables de caractères utilisées dans la hiérarchie fr.*
)
--
Julien ÉLIE

« C'est une forêt vierge où la main de l'homme n'a jamais mis le
pied. »
Xavier Roche
2012-01-17 20:12:30 UTC
Permalink
Post by Julien ÉLIE
The next great step will be to send monthly charters or recommendations
with a Subject: header field body in MIME! (And, in a few years, in UTF-8.)
To be honest, I have always considered this RFC 2047 "hack" as clumsy.
And especially painful to handle (especially in scripts ; darn - why do
I have to play with substrings, decode base64 and big5 to extract
relevant information ?)

(Yes, I know that a "Header-Charset" mechanism would have been difficult
for many reasons, including adding headers to existing headers ; but a
ASCII => UTF-8 transition for headers would probably have been better)
Ivan Shmakov
2012-01-18 04:22:55 UTC
Permalink
[Setting Followup-To: to drop news.software.nntp.]
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
BTW, is this "format=flowed" bit intentional? Somehow, I
thought that it's "incompatible" with >-quoting.

[...]
Post by Ivan Shmakov
Also, I consider using an RDBMS
You're a glutton for punishment, I see. Lots of people have had this
idea in the past...
I'll just say "good luck".
However, I will give you an important hint. To do IMAP in any sort
of reasonable way, you MUST have access to the message as a single
RFC *822 format char* string.
Why is it so?
Don't assume that you can assemble on the fly from a database.
EVERYBODY who has ever tried has failed, miserably.
To my mind, the biggest problem here is to preserve "message
boundary" sequences, which may be essential for digital
signatures to work.
Whatever you do to store the message in separate parts, you MUST have
the message as the big char*. You'll regret doing otherwise.
Assuming that "char *" means an "octet sequence" here (and not a
"sequence of characters", as it doesn't make sense for, say,
application/octet-stream parts), I guess that I may do it the
other way around, by storing the offsets of the individual MIME
parts within the octet sequence.
--
FSF associate member #7257
Mark Crispin
2012-01-18 05:55:18 UTC
Permalink
Post by Ivan Shmakov
However, I will give you an important hint. To do IMAP in any sort
of reasonable way, you MUST have access to the message as a single
RFC *822 format char* string.
Why is it so?
IMAP allows you to fetch a part of a message in many ways, including "<n>
characters starting at position <m>" for arbitrary body parts and/or the
entire message.

If you don't have an exact representation of the message, or at least have
it is easily (and precisely) calculable, you are in for a world of hurt.
Post by Ivan Shmakov
Assuming that "char *" means an "octet sequence" here (and not a
"sequence of characters", as it doesn't make sense for, say,
application/octet-stream parts), I guess that I may do it the
other way around, by storing the offsets of the individual MIME
parts within the octet sequence.
If, by this, you mean "have an octet sequence of the entire message, with
the individual MIME parts being recorded as offset/length within that
octet sequence", then you are on the right track.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Ivan Shmakov
2012-01-18 06:18:02 UTC
Permalink
Post by Mark Crispin
Post by Ivan Shmakov
However, I will give you an important hint. To do IMAP in any sort
of reasonable way, you MUST have access to the message as a single
RFC *822 format char* string.
Why is it so?
IMAP allows you to fetch a part of a message in many ways, including
"<n> characters starting at position <m>" for arbitrary body parts
and/or the entire message.
ACK, thanks!

However, I presume it's "octets", and not "characters"?
Post by Mark Crispin
If you don't have an exact representation of the message, or at least
have it is easily (and precisely) calculable, you are in for a world
of hurt.
The other problem is that the MIME parts may be nested rather
arbitrarily, which isn't that easy to tackle when using the
relational database model (and in particular, SQL.)
Post by Mark Crispin
Post by Ivan Shmakov
Assuming that "char *" means an "octet sequence" here (and not a
"sequence of characters", as it doesn't make sense for, say,
application/octet-stream parts), I guess that I may do it the other
way around, by storing the offsets of the individual MIME parts
within the octet sequence.
If, by this, you mean "have an octet sequence of the entire message,
with the individual MIME parts being recorded as offset/length within
that octet sequence", then you are on the right track.
ACK, thanks.
--
FSF associate member #7257
Mark Crispin
2012-01-18 06:27:53 UTC
Permalink
Post by Ivan Shmakov
However, I presume it's "octets", and not "characters"?
Yes, you are correct. I am from an older generation that sloppily uses the
two terms as synonyms.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Russ Allbery
2012-01-17 17:08:46 UTC
Permalink
Post by Ivan Shmakov
In particular, I aim for a better support of presentation of
netnews as (apart from having them accessible via IMAP and NNTP)
both Atom feeds (including Atom Posting Protocol support) and Web
pages (XHTML.) To this end, I believe it's essential to
outrightly discard non-ASCII article headers, /and/ non-ASCII
article bodies of the articles lacking proper MIME headers, as
both are ambiguous as to what character encoding is used.
I highly recommend talking to larsi about what he did with GMANE, since
that's a large part of what you're trying to do here and it seems to work
quite well (and is *very* widely used in some parts of the free software
community).

I think a lot of the backend code is written in Lisp, since, well, it's
larsi, which may make it challenging to reuse depending on your preferred
languages. But the ideas at least are there.
Post by Ivan Shmakov
For the implementation of the prototype, I'm now considering the
Perl language, as there's a sheer amount of extensions available,
providing support for a variety of protocols and data
presentations.
Speaking IMAP properly is going to be the hardest part of this problem, I
think.
--
Russ Allbery (***@stanford.edu) <http://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<http://www.eyrie.org/~eagle/faqs/questions.html> explains why.
Julien ÉLIE
2012-01-17 20:27:40 UTC
Permalink
Hi Russ and Ivan,
Post by Russ Allbery
Post by Ivan Shmakov
In particular, I aim for a better support of presentation of
netnews as (apart from having them accessible via IMAP and NNTP)
both Atom feeds (including Atom Posting Protocol support) and Web
pages (XHTML.) To this end, I believe it's essential to
outrightly discard non-ASCII article headers, /and/ non-ASCII
article bodies of the articles lacking proper MIME headers, as
both are ambiguous as to what character encoding is used.
I highly recommend talking to larsi about what he did with GMANE, since
that's a large part of what you're trying to do here and it seems to work
quite well (and is *very* widely used in some parts of the free software
community).
I think a lot of the backend code is written in Lisp, since, well, it's
larsi, which may make it challenging to reuse depending on your preferred
languages. But the ideas at least are there.
In case it could be of help, an improved version of Newsportal
http://amrhein.eu/newsportal/
with UTF-8 support, and even MIME attachments (but not yEnc support) can
be seen here:
http://iulius.dinauz.org/usenet/webnews-test/thread.php?group=news.software.nntp
http://iulius.dinauz.org/usenet/webnews-test/

Written in PHP.
--
Julien ÉLIE

« C'est une forêt vierge où la main de l'homme n'a jamais mis le
pied. »
Ivan Shmakov
2012-01-18 04:00:33 UTC
Permalink
Post by Russ Allbery
In particular, I aim for a better support of presentation of netnews
as (apart from having them accessible via IMAP and NNTP) both Atom
feeds (including Atom Posting Protocol support) and Web pages
(XHTML.) To this end, I believe it's essential to outrightly
discard non-ASCII article headers, /and/ non-ASCII article bodies of
the articles lacking proper MIME headers, as both are ambiguous as
to what character encoding is used.
I highly recommend talking to larsi about what he did with GMANE,
since that's a large part of what you're trying to do here and it
seems to work quite well (and is *very* widely used in some parts of
the free software community).
Well, I'd rather check the published sources behind Gmane first.

However, it was my understanding that they've chosen to use an
NNTP "server" (namely, INN) as the backend. An approach which
seems to have certain drawbacks.
Post by Russ Allbery
I think a lot of the backend code is written in Lisp, since, well,
it's larsi, which may make it challenging to reuse depending on your
preferred languages. But the ideas at least are there.
ACK, thanks.
Post by Russ Allbery
For the implementation of the prototype, I'm now considering the
Perl language, as there's a sheer amount of extensions available,
providing support for a variety of protocols and data presentations.
Speaking IMAP properly is going to be the hardest part of this
problem, I think.
Perhaps even further complicated by the fact that I have little
experience with IMAP whatsoever. (Apart from keeping Dovecot
running here and there.)
--
FSF associate member #7257
Russ Allbery
2012-01-18 04:04:24 UTC
Permalink
Post by Ivan Shmakov
Well, I'd rather check the published sources behind Gmane first.
However, it was my understanding that they've chosen to use an
NNTP "server" (namely, INN) as the backend. An approach which
seems to have certain drawbacks.
Only because he never got around writing his own in Lisp, but he had some
of the pieces and a bunch of ideas. That's part of why it would be good
to talk to him and not only look at the source. But starting with the
source is good, of course.
Post by Ivan Shmakov
Perhaps even further complicated by the fact that I have little
experience with IMAP whatsoever. (Apart from keeping Dovecot
running here and there.)
IMAP is easily many times harder to implement properly than NNTP. All
that extra useful functionality comes at a cost. :)
--
Russ Allbery (***@stanford.edu) <http://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<http://www.eyrie.org/~eagle/faqs/questions.html> explains why.
Ivan Shmakov
2012-01-18 04:30:20 UTC
Permalink
Post by Russ Allbery
Post by Ivan Shmakov
Well, I'd rather check the published sources behind Gmane first.
However, it was my understanding that they've chosen to use an NNTP
"server" (namely, INN) as the backend. An approach which seems to
have certain drawbacks.
Only because he never got around writing his own in Lisp, but he had
some of the pieces and a bunch of ideas. That's part of why it would
be good to talk to him and not only look at the source. But starting
with the source is good, of course.
ACK, thanks. Though I'd note that NNTP /server/ functionality
wouldn't probably be a priority for me. Perhaps I'd focus on
IMAP and Atom (with Atom Posting Protocol) instead.
Post by Russ Allbery
Post by Ivan Shmakov
Perhaps even further complicated by the fact that I have little
experience with IMAP whatsoever. (Apart from keeping Dovecot
running here and there.)
IMAP is easily many times harder to implement properly than NNTP.
All that extra useful functionality comes at a cost. :)
The question is, could the Net::IMAP::Server Perl module [1] (as
available from CPAN) relief most of the implementor's pain?

[1] http://search.cpan.org/~alexmv/Net-IMAP-Server-1.30/
--
FSF associate member #7257
Loading...