Discussion:
HTML (XHTML) in netnews
(too old to reply)
Ivan Shmakov
2012-04-30 05:49:11 UTC
Permalink
[Cross-posting to news:news.misc, for the discussion is not
quite about the newsreading software.]

[…]
Actually, I tend advocate /for/ the use of HTML (or, rather, XHTML)
in e-mail in news, despite that my user agent of choice has poor
support for it. (And I think of it as an opportunity to improve the
latter.)
At the very least, it allows one to avoid such an ad-hockery as
/slashes/ /for/ /slanted/, and *stars* *for* *bold*. Not to mention
"proper" tables, formatted vs. "preformatted" text distinction, and
inline MathML formulae and SVG graphics.
I have never seen an HTML e-mail that wouldn't have become better by
rewriting it in plain text.
People that use HTML in e-mail tend to be unaware of the underlying
medium of all "them fancy formatting features" and abuse them badly
and without taste. They paste lists and tables into Outlook directly
from MS Word and are happy about it, while the result it outright
horrible. Preformatted text viewed in a monospace font is always
better than that.
As I've said before, there're many different ways to author
“poor” HTML. It doesn't mean that any HTML is /necessarily/
bad.
As you have shown, it is possible to highlight words in plain text.
Yes, but then my Gnus shows /foo/ as highlighted foo, because it
thinks that slashes mean slanted text here, not the Unix
directory separators they in fact are.
Formulas can expressed like they are in programming languages or
LaTeX source,
I disagree.

Would LaTeX code be a preferable form of presentation, I guess
we'd see a lot of scientific journals switching to it from the
now-ubiquitous mathematic notation.

Using * to mean ⋅ or ×, or introducing “new” functions, such as
abs, expt, pow, sqrt, etc., doesn't seem sane, either.

That being said, Unicode-based plain-text allows ❘x❘, x¹²³, and
even √x̅. However, I don't know of any easy way to author such
formulae, while there're a few translators from a subset of
LaTeX to MathML.
and graphics should be converted to ASCII-art,
What for?
linked, or attached, depending on the circumstances.
The graphics, both linked and provided as separate MIME parts,
could be presented in a specific place of an HTML-based message
referencing it. Not so for a message based on plain-text.
Apart from the aesthetic reasons, I like plain text for it's utter
simplicity, transparecy, and suitabilty for automated searching and
processing.
I disagree. None of the automated formatters I know can handle
hyphenated text, and none of them could discern “verbatim” text
(such as a source code fragment, or ASCII art) from the
“normal”, formatted text.

Or think of the slanted (or bold) face examples above. What if
I wish to drop all the slanted face markup, while leaving Unix
absolute filenames alone?
One doesn't have to use a parser to comfortably read and operate
plain-text files.
I disagree. Think of parsing a formatted table, like:

Column 1 Columns 2, 3
Column 2 Column 3

Whatever there is 1 Foo bar
And here's the other 2 Baz

(If that's not hard enough, consider adding various box drawing
characters to the formatting, like it was common for the DOS-era
plain text.)

While it's easy to convert (or format) an HTML table to a
plain-text presentation like the above (think of Lynx or
html2text), it's much harder the other way around.

XML is much more suitable for automated processing. Consider,
e. g., XPath, XQuery, XSLT, xmlstarlet(1), etc.

[…]
--
FSF associate member #7257
Anton Shepelev
2012-04-30 22:12:02 UTC
Permalink
I have never seen an HTML e‐mail that wouldn’t
have become better by rewriting it in plain
text.
Sorry, I counln’t find them. Those are MessageIDs,
right?
People that use HTML in e‐mail tend to be un‐
aware of the underlying medium of all "them fan‐
cy formatting features" and abuse them badly and
without taste. They paste lists and tables into
Outlook directly from MS Word and are happy
about it, while the result it outright horrible.
Preformatted text viewed in a monospace font is
always better than that.
As I’ve said before, there’re many different ways
to author “poor” HTML. It doesn’t mean that any
HTML is /necessarily/ bad.
Agree, but I wanted to show you the corelation and
that plain‐text stops people from abusing HTML.
As you have shown, it is possible to highlight
words in plain text.
Yes, but then my Gnus shows /foo/ as highlighted
foo, because it thinks that slashes mean slanted
text here, not the Unix directory separators they
in fact are.
I didn’t mean that the those symbols should be in‐
terpreted on the reader’s side. They are OK as they
are.
Formulas can expressed like they are in program‐
ming languages or LaTeX source,
I disagree.
Would LaTeX code be a preferable form of presenta‐
tion, I guess we’d see a lot of scientific jour‐
nals switching to it from the now‐ubiquitous math‐
ematic notation.
But the mediums of e‐mail and Usenet are quite dif‐
ferent from that of printed publications. The lat‐
ter has the following great drawbacks:

a. The need to use heavy and non‐editable for‐
mats, like PDF and PostScript.

b. Great effort and special software required to
produce a well‐typeset document, which on the
reader side necessiates heavy and more compli‐
cated software and results in higher latency
and lesser efficiency of communication.

HTML, as a compromise solution, has the drawbacks of
both typographical and plain‐text mediums. For ex‐
ample, it is looses the simplicity and transparency,
yet its presentation is platoform‐dependent, meaning
that that the author does not have control over how
his work will look in different programs, while
plain text viewed in a monospace font _always_ looks
the same.
Using * to mean ⋅ or ×, or introducing “new” func‐
tions, such as abs, expt, pow, sqrt, etc., doesn’t
seem sane, either.
That being said, Unicode‐based plain‐text allows
❘x❘, x¹²³, and even √x̅. However, I don’t know of
any easy way to author such formulae, while
there’re a few translators from a subset of LaTeX
to MathML.
Such functions and conventions were introduced a
long time ago, and everybody with basic math and
computer knowledge will understand them. They are
few and easy to learn and remember. Unicode math
symbols allow to write only very, very sort formu‐
las. Even e^(sin x) is impossible.
and graphics should be converted to ASCII‐art,
What for?
In order to make it transmittable in plain‐text, of
course. Bear in mind that with simple diagrams and
schematics this conversion is very easy and the re‐
sult is quite pleasant.
linked, or attached, depending on the circum‐
stances.
The graphics, both linked and provided as separate
MIME parts, could be presented in a specific place
of an HTML‐based message referencing it. Not so
for a message based on plain‐text.
I was pointing out ways to cope with images in
plain‐text. Of course, in HTML it is more straight‐
forward. One could also create a page on the web
and link it, or typeset a nice PDF and attach it,
but 99% percent of Usenet or e‐mail messages are so
far from being book‐long that the inclusion of a
couple of links or binary attachments is not a prob‐
lem, and the advantages offered by HTML are not
worth its overhead.

Taking into account that about 1‐5% of messages ref‐
erence a file or a picture, the number of plain‐text
messages that would benefit from multipart/HTML han‐
dling of attachments is second order (o(x^2)) infin‐
itesimal.
Apart from the aesthetic reasons, I like plain
text for it’s utter simplicity, transparecy, and
suitabilty for automated searching and process‐
ing.
I disagree. None of the automated formatters I
know can handle hyphenated text, and none of them
could discern “verbatim” text (such as a source
code fragment, or ASCII art) from the “normal”,
formatted text.
Or think of the slanted (or bold) face examples
above. What if I wish to drop all the slanted
face markup, while leaving Unix absolute filenames
alone?
Yes, I was talking about the simplest case of
ragged‐right and unhyphenated. Several programs go
a step futher and treat everything that has an in‐
dent as verbatim. And with reading/searching my
statement holds even for preformatted text.

Besides, I think I am the only one who uses hyphen‐
ation in e‐mail and Usenet, and I keep groff source
of important articles. Being a typeseting system,
it has a language much lighter and more comfortable
than HTML, so I can easily strip the bare text from
it or use groff to typeset a message as PDF or HTML,
if need be. groff’s code is also much easier to
read than HTML.
One doesn’t have to use a parser to comfortably
read and operate plain‐text files.
I disagree. Think of parsing a formatted table,
Column 1 Columns 2, 3
Column 2 Column 3
Whatever there is 1 Foo bar
And here’s the other 2 Baz
You are right again, but it is only 1‐2% of messages
that must have tables, and even more rare are the
cases when one needs to parse them. If a table is
intended for parsing, it is meet to post it as tab‐
or comma‐separated, either in the body of the mes‐
sage or as an attachement.

Such complicated tables are rare, even rarer does
one need to automatically parse them. Most of the
time it is a set of key‐value paris or n‐tuples ‐‐ a
regular linear structure, which is easier parsed
from plain‐text (CSV) than from HTML.
(If that’s not hard enough, consider adding vari‐
ous box drawing characters to the formatting, like
it was common for the DOS‐era plain text.)
While it’s easy to convert (or format) an HTML ta‐
ble to a plain‐text presentation like the above
(think of Lynx or html2text), it’s much harder the
other way around.
XML is much more suitable for automated process‐
ing. Consider, e. g., XPath, XQuery, XSLT, xml‐
starlet(1), etc.
Human‐oriented and machine‐oriented content should
be formatted differently. In my own expedience,
most of the time I posted tables for viewing, so I
used groff’s tbl to typeset them.

As for the XML tools you mentioned, I am afraid ev‐
ery HTML message will require a special script to
process it, becuase HTML is not strucured in the
sense that instead of defining the funciton of an
element (like, it is a list, it is a header, and so
on), it defines its appearance, and the former is
not easily inferred from the latter! On the other
hand, *roff and LaTeX use structured approach.

Anton

P.S.: Several times I have appealed to the infre‐
quency of various situations you have men‐
tioned. I find it a valid argument because
changing the whole medium in order to satisfy
several percent of posters at the expense of
imparting overload unto the heads of the re‐
maining 99 perceint would be wrong.

‐‐
() ascii ribbon campaign ‐ against html e‐mail
/\ www
tlvp
2012-04-30 23:41:46 UTC
Permalink
Post by Anton Shepelev
Even e^(sin x) is impossible.
Really? Well, congratulations, then: you just did the impossible :-) !

Cheers, -- tlvp
--
Avant de repondre, jeter la poubelle, SVP.
Anton Shepelev
2012-05-01 19:53:06 UTC
Permalink
Post by Anton Shepelev
Even e^(sin x) is impossible.
Really? Well, congratulations, then: you just did
the impossible :‐) !
Heh, that’s the plain‐text ASCII notation that I
think is the best for Usenet and e‐mail and that
Ivan Smakov doesn’t like, preferring either Unicode
symbols or a full‐fledged equation description lan‐
guage interpreted by newsreaders...

‐‐
() ascii ribbon campaign ‐ against html e‐mail
/\ www.asciiribbon.org ‐ against proprietary attachm
tlvp
2012-05-01 23:39:56 UTC
Permalink
Post by Anton Shepelev
Post by Anton Shepelev
Even e^(sin x) is impossible.
Really? Well, congratulations, then: you just did
the impossible :‐) !
Heh, that’s the plain‐text ASCII notation that I
think is the best for Usenet and e‐mail and that
Ivan Smakov doesn’t like, preferring either Unicode
symbols or a full‐fledged equation description lan‐
guage interpreted by newsreaders...
I guess if you want NG participants to *see* it the way it might look in a
book, you just have to make a .png or an .svg or a .gif or a .pdf out of it
and *attach* it to your post (and hope that the NG accepts attachments).

But, like you, I'd rather do only what *can* easily be done :-) .

Cheers, -- tlvp
--
Avant de repondre, jeter la poubelle, SVP.
Gene E. Bloch
2012-05-02 00:07:13 UTC
Permalink
Post by tlvp
Post by Anton Shepelev
Post by Anton Shepelev
Even e^(sin x) is impossible.
Really? Well, congratulations, then: you just did
the impossible :‐) !
Heh, that’s the plain‐text ASCII notation that I
think is the best for Usenet and e‐mail and that
Ivan Smakov doesn’t like, preferring either Unicode
symbols or a full‐fledged equation description lan‐
guage interpreted by newsreaders...
I guess if you want NG participants to *see* it the way it might look in a
book, you just have to make a .png or an .svg or a .gif or a .pdf out of it
and *attach* it to your post (and hope that the NG accepts attachments).
But, like you, I'd rather do only what *can* easily be done :-) .
Cheers, -- tlvp
Another use for YouSendIt and similar sites that allow a user to upload
a file and give other people access to the file.

This is not needed if you have a site of your own you can use for the
purpose.

You can upload the item and put a URL in your post.
--
Gene E. Bloch (Stumbling Bloch)
Ivan Shmakov
2012-05-02 05:20:00 UTC
Permalink
[Cross-posting to news:comp.mail.misc, for the question is about
MIME, and dropping news:news.misc from Followup-To:.]

[…]
Post by Gene E. Bloch
Post by tlvp
I guess if you want NG participants to *see* it the way it might
look in a book, you just have to make a .png or an .svg or a .gif or
a .pdf out of it and *attach* it to your post (and hope that the NG
accepts attachments).
But, like you, I'd rather do only what *can* easily be done :-) .
Another use for YouSendIt and similar sites that allow a user to
upload a file and give other people access to the file.
This is not needed if you have a site of your own you can use for the
purpose.
You can upload the item and put a URL in your post.
Somehow, I've had an impression that MIME allows for “external”
parts. I wonder if I could wrap an URI into such a part, so
that the user agent on the recipient's end would download such a
part as soon the user chooses to operate on it?
--
FSF associate member #7257
Ivan Shmakov
2012-05-02 05:21:17 UTC
Permalink
[…]
Post by tlvp
I guess if you want NG participants to *see* it the way it might
look in a book, you just have to make a .png or an .svg or a .gif or
a .pdf out of it and *attach* it to your post (and hope that the NG
accepts attachments).
That'd be NTA, not the group, BTW.
Post by tlvp
But, like you, I'd rather do only what *can* easily be done :-) .
Given the right tool, why won't it be easy, anyway?

I guess one may think of Google Drive and Google+. (Though I've
never used either of them myself.)
--
FSF associate member #7257
tlvp
2012-05-03 03:23:46 UTC
Permalink
Post by Ivan Shmakov
I guess one may think of Google Drive and Google+. (Though I've
never used either of them myself.)
Don't ask me why, but I'd prefer not to :-) . Cheers, -- tlvp
--
Avant de repondre, jeter la poubelle, SVP.
Ivan Shmakov
2012-05-03 08:00:59 UTC
Permalink
Post by tlvp
Post by Ivan Shmakov
Post by Ivan Shmakov
Post by tlvp
I guess if you want NG participants to *see* it the way it might
look in a book, you just have to make a .png or an .svg or a .gif
or a .pdf out of it and *attach* it to your post (and hope that
the NG accepts attachments).
That'd be NTA, not the group, BTW.
I guess one may think of Google Drive and Google+. (Though I've
never used either of them myself.)
Don't ask me why, but I'd prefer not to :-)
Neither would I.

However, that wasn't quite the point. The current state of
affairs is that I prefer the text formatted like:

--cut--
In computing, plain text is the contents of an ordinary
sequential file readable as textual material without much
processing, usually opposed to formatted text and to "binary
files" in which some portions must be interpreted as binary
objects (encoded integers, real numbers, images, etc.).
--cut--

Naturally, I format my messages just like that.

Then, he prefers the text formatted like:

--cut--
The encoding has traditionally been either ASCII,
one of its many derivatives such as ISO/IEC 646
etc., or sometimes EBCDIC. Unicode-based encodings
such as UTF-8 and UTF-16 are gradually replacing the
older ASCII derivatives limited to 7 or 8 bit codes.
--cut--

Naturally, he formats his messages just like that.

Each of us, acting in good faith, produce a result that the
other one finds mildly unpleasant to read.

Certain newsreaders allow one to reformat the article being
displayed (Gnus has M-x gnus-article-fill-cited-article, AKA
W w, for instance.) However, doing that would likely to destroy
any "pre-formatted" parts of the message (such as: code
fragments, tables, ASCII art, etc.) How nice would it be if it
was possible for a newsreader to somehow discern formatted text
from pre-formatted one!

And HTML allows just that.

With HTML (which, I may note, require more changes to /my/
newsreader setup than to his), we'd be able to see the messages
formatted just as we like: he'd see a message of mine formatted
as he prefers, while I'd see his one formatted as do I.

Without HTML, we're forced to use the preferences of one
another.
--
FSF associate member #7257
Ted S.
2012-05-02 01:33:08 UTC
Permalink
Post by Anton Shepelev
Heh, that’s the plain‐text ASCII notation that I
think is the best for Usenet
And yet you're needlessly using a Unicode hyphen in "plain-text".
--
Ted S.
fedya at hughes dot net
Now blogging at http://justacineast.blogspot.com
tlvp
2012-05-02 02:32:30 UTC
Permalink
Post by Ted S.
Post by Anton Shepelev
Heh, that’s the plain‐text ASCII notation that I
think is the best for Usenet
And yet you're needlessly using a Unicode hyphen in "plain-text".
Good grief, Ted, you're right: " ... that’s the plain‐text ... " -- it
even has a fancy Unicode apostrophe before that (!). Thank goodness for
"Raw message" view in Dialog, I'd never have noticed otherwise :-) .

Cheers, -- tlvp
--
Avant de repondre, jeter la poubelle, SVP.
Ted S.
2012-05-02 11:28:33 UTC
Permalink
Post by tlvp
Post by Ted S.
And yet you're needlessly using a Unicode hyphen in "plain-text".
Good grief, Ted, you're right: " ... that’s the plain‐text ... "
-- it even has a fancy Unicode apostrophe before that (!). Thank
goodness for "Raw message" view in Dialog, I'd never have noticed
otherwise :-) .
I notice it because that part of Unicode triggers a different font from
Dialog on my system. It's also the reason I hate hate hate Relf's posts
and the responses: those trigger the Unicode font *in the headerlist
pane*, and that really slows down Dialog.
--
Ted S.
fedya at hughes dot net
Now blogging at http://justacineast.blogspot.com
tlvp
2012-05-03 03:27:52 UTC
Permalink
Post by Ted S.
Post by tlvp
Post by Ted S.
And yet you're needlessly using a Unicode hyphen in "plain-text".
Good grief, Ted, you're right: " ... that’s the plain‐text ... "
-- it even has a fancy Unicode apostrophe before that (!). Thank
goodness for "Raw message" view in Dialog, I'd never have noticed
otherwise :-) .
I notice it because that part of Unicode triggers a different font from
Dialog on my system. It's also the reason I hate hate hate Relf's posts
and the responses: those trigger the Unicode font *in the headerlist
pane*, and that really slows down Dialog.
Is it *that* that makes Dialog act at times as if it's dragging a sled
through molasses? I'll have to ... yup, seems to be! Good grief!
Nice thing then that Dialog lets one just scratch unwanted posts out :-) .

Cheers, -- tlvp
--
Avant de repondre, jeter la poubelle, SVP.
Ted S.
2012-05-03 11:59:25 UTC
Permalink
Post by tlvp
Is it *that* that makes Dialog act at times as if it's dragging a sled
through molasses? I'll have to ... yup, seems to be! Good grief! Nice
thing then that Dialog lets one just scratch unwanted posts out :-) .
Well, as you can tell by my headers, I'm using Hamster to get rid of
that stuff.

On the other hand, it's always nice to know it's not just my out-of-date
computer with not enough memory that has the problem.
--
Ted S.
fedya at hughes dot net
Now blogging at http://justacineast.blogspot.com
Anton Shepelev
2012-05-02 08:06:31 UTC
Permalink
And yet you're needlessly using a Unicode hyphen
in "plain-text".
Yes, indeed. I switched my groff and tin configura-
tion to Unicode to test tin's handling of RFC 2047.
By default, groff's utf8 device maps ASCII hyphen
and apostrophe to their specific Unicode representa-
tions, and I forgot to turn it off. It only re-
quires to comment two lines in

.../share/groff/<version>/tmac/unicode.tmac

Anton
Shmuel (Seymour J.) Metz
2012-05-02 12:41:00 UTC
Permalink
In <***@g{oogle}mail.com>, on
05/02/2012
Post by Anton Shepelev
By default, groff's utf8 device maps ASCII hyphen
and apostrophe to their specific Unicode representations,
No; the specific Unicode representation of ASCII '2D'x is '002D'x and
of '27'x is '0027'x. Translating them to. e.g., Left Single Quote,
Right Single Quote, Em Dash, En Dash, Soft Hyphen involves guesses
that will often be wrong.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to ***@library.lspace.org
Anton Shepelev
2012-05-03 15:04:32 UTC
Permalink
Post by Shmuel (Seymour J.) Metz
By default, groff's utf8 device maps ASCII hyphen and
apostrophe to their specific Unicode representations,
No; the specific Unicode representation of ASCII '2D'x is
'002D'x and of '27'x is '0027'x. Translating them to.
e.g., Left Single Quote, Right Single Quote, Em Dash, En
Dash, Soft Hyphen involves guesses that will often be
wrong.
It is true in the sense that Unicode is backwards compatible
with ASCII, but in typography a hyphen and a minus are
different symbols, just as ASCII 27 (hex) is not a real
apostrophe. In order to make the usage of ASCII source
files more convenient, groff defaults to mapping ASCII 27 to
the apostrophe and ASCII 2D to hyphen, because the latter is
generally more frequently accessed than the minus. The
minus is accessed as \(mi. It is just a matter of
optimization: make frequently used elements shorter in order
to decrease the average word length.
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Shmuel (Seymour J.) Metz
2012-05-03 17:03:19 UTC
Permalink
In <***@g{oogle}mail.com>, on
05/03/2012
It is true in the sense that Unicode is backwards compatible with
ASCII, but in typography a hyphen and a minus are different
symbols,
That's understood, but the issue is correct rendering of ASCII text.
In order to make the usage of ASCII source files more convenient,
It stops being convenient the first time it guesses wrong. It may be
convenient for text written to be processed using that convention, but
it's definitely not convenient for text that uses the ASCII - as minus
signs or that uses the ASCII ' as both left and right apostrophes.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to ***@library.lspace.org
Anton Shepelev
2012-05-04 09:10:53 UTC
Permalink
Post by Shmuel (Seymour J.) Metz
Post by Anton Shepelev
It is true in the sense that Unicode is backwards
compatible with ASCII, but in typography a hyphen and a
minus are different symbols,
That's understood, but the issue is correct rendering of
ASCII text.
That has never been a problem, but conveniently accessing
more than 127 symbols from an ASCII source file has...

If you want correct ASCII, either use the ascii output
device, as opposed to utf8, or don't map ['] to singe right
quote and [-] to hyphen. Unicode source is another option.
Post by Shmuel (Seymour J.) Metz
Post by Anton Shepelev
In order to make the usage of ASCII source files more
convenient,
It stops being convenient the first time it guesses wrong.
It may be convenient for text written to be processed
using that convention, but it's definitely not convenient
for text that uses the ASCII - as minus signs or that uses
the ASCII ' as both left and right apostrophes.
There's no guessing -- it's up to the author to choose how
he will access non-ASCII symbols. When quoting code in
paper-oriented documents, I use an environment without these
mappings, like:

Normal text with correct apostrophes, hyphends and minuses
.(VERBATIM
My code...
.)VERBATIM
Back to normal text.

You can map ASCII [-] to minus if you want, and even make it
for only specific environments.
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Ivan Shmakov
2012-05-04 09:31:45 UTC
Permalink
[...]
In order to make the usage of ASCII source files more convenient,
It stops being convenient the first time it guesses wrong. It may
be convenient for text written to be processed using that
convention, but it's definitely not convenient for text that uses
the ASCII - as minus signs or that uses the ASCII ' as both left and
right apostrophes.
There's no guessing -- it's up to the author to choose how he will
access non-ASCII symbols. When quoting code in paper-oriented
Normal text with correct apostrophes, hyphends and minuses
.(VERBATIM
My code...
.)VERBATIM
Back to normal text.
Then, a similar environment should be used for quoting non-code
just as well, as it's unlikely that the author of the quote was
expecting such a substitution performed on his or her text.
You can map ASCII [-] to minus if you want, and even make it for only
specific environments.
On a second though, there're reasons /not/ to use this feature.
--
FSF associate member #7257
Anton Shepelev
2012-05-04 11:23:22 UTC
Permalink
Post by Ivan Shmakov
Post by Anton Shepelev
Normal text with correct apostrophes, hyphends and minuses
.(VERBATIM
My code...
.)VERBATIM
Back to normal text.
Then, a similar environment should be used for quoting
non-code just as well, as it's unlikely that the author of
the quote was expecting such a substitution performed on
his or her text.
In e-mail and Usenet -- yes. In typography -- no.
Post by Ivan Shmakov
Post by Anton Shepelev
You can map ASCII [-] to minus if you want, and even
make it for only specific environments.
On a second though, there're reasons /not/ to use this
feature.
OK, so in groff you'll say "5\(mi10-year-old".
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Ivan Shmakov
2012-05-04 11:59:16 UTC
Permalink
[Cross-posting to news:comp.text, for there's too much *roff.]
Post by Anton Shepelev
Post by Anton Shepelev
Normal text with correct apostrophes, hyphends and minuses
.(VERBATIM
My code...
.)VERBATIM
Back to normal text.
Then, a similar environment should be used for quoting non-code just
as well, as it's unlikely that the author of the quote was expecting
such a substitution performed on his or her text.
In e-mail and Usenet -- yes. In typography -- no.
The distinction should be /not/ typography vs. e-mail, but “I do
care to proofread all the quoted text for possible wrong
substitutions” vs. “I don't.”

(Occasionally, I'd replace typewriter-like "double quotes" with
“proper ones” even in e-mail, for instance.)
Post by Anton Shepelev
Post by Anton Shepelev
You can map ASCII [-] to minus if you want, and even make it for
only specific environments.
On a second though, there're reasons /not/ to use this feature.
OK, so in groff you'll say "5\(mi10-year-old".
I'd rather say 5\(en10-year-old for a proper EN DASH (U+2013),
which is /not/ the same as the MINUS SIGN (U+2212.)
--
FSF associate member #7257
Anton Shepelev
2012-05-04 13:26:54 UTC
Permalink
Post by Ivan Shmakov
Post by Anton Shepelev
In e-mail and Usenet -- yes. In typography -- no.
The distinction should be /not/ typography vs. e-mail, but
"I do care to proofread all the quoted text for possible
wrong substitutions" vs. "I don't."
(Occasionally, I'd replace typewriter-like "double quotes"
with "proper ones" even in e-mail, for instance.)
Not so with me, because in electronic communication I _do_
prefer to use ASCII for everything except the characters of
a foreign language.
Post by Ivan Shmakov
Post by Anton Shepelev
OK, so in groff you'll say "5\(mi10-year-old".
I'd rather say 5\(en10-year-old for a proper EN DASH
(U+2013), which is /not/ the same as the MINUS SIGN
(U+2212.)
Yes!
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Ivan Shmakov
2012-05-04 13:59:36 UTC
Permalink
Post by Anton Shepelev
Post by Anton Shepelev
In e-mail and Usenet -- yes. In typography -- no.
The distinction should be /not/ typography vs. e-mail, but "I do
care to proofread all the quoted text for possible wrong
substitutions" vs. "I don't."
(Occasionally, I'd replace typewriter-like "double quotes" with
"proper ones" even in e-mail, for instance.)
Not so with me,
I see. You do it the other way around.
Post by Anton Shepelev
because in electronic communication I _do_ prefer to use ASCII for
everything except the characters of a foreign language.
The only problem I see with using full Unicode these days is
that it has a plenty of characters which aren't that easy to
discern from one another, /especially/ when using a fixed-width
font. That's why I'd prefer to spell EN DASH as &ndash; (or
\[en], etc.), and it's also why I think that "ASCII to Unicode"
substitutions are flawed (be it *roff's - to HYPHEN, LaTeX's ---
to EM DASH, or the "smart quotes" feature, as implemented in the
contemporary office productivity suites.)

I make an exception for national language characters that look
similar to the ASCII ones, mainly because they tend to occur in
sequences, but also because one of my preferred fixed-width
fonts, koi8b-8x16, actually has /distinct/ shapes for the
similar looking Latin and Cyrillic glyphs.

However, this whole issue of Unicode characters'
indistinguishability would be solved with the adoption of HTML
for e-mail and netnews. And for me, it's one more reason to
support it.

[...]
--
FSF associate member #7257
Shmuel (Seymour J.) Metz
2012-05-04 10:27:49 UTC
Permalink
In <***@g{oogle}mail.com>, on
05/04/2012
Post by Anton Shepelev
There's no guessing
What happens when you quote ASCII text? Remember that the context is
using groff to compose news articles/
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to ***@library.lspace.org
Anton Shepelev
2012-05-04 13:29:24 UTC
Permalink
Post by Shmuel (Seymour J.) Metz
Post by Anton Shepelev
There's no guessing
What happens when you quote ASCII text? Remember that the
context is using groff to compose news articles/
It depends on one's groff setup. With my current one, no
character translations occur, but the text is reflowed.
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Ivan Shmakov
2012-05-05 15:18:54 UTC
Permalink
[Cross-posting to news:comp.text and dropping news:news.misc and
news:news.software.readers from Followup-To:, for the question
has little or no relation to the netnews technology whatsover.]

[...]
When quoting code in paper-oriented documents, I use an environment
Normal text with correct apostrophes, hyphends and minuses
.(VERBATIM
My code...
.)VERBATIM
Back to normal text.
What macro package defines this one, BTW? All that I've found
so far are the -mm's (as bundled with GNU Troff 1.20.1) .VERBON
and .VERBOFF macros.

PS. Is there a macro similar to LaTeX's \today for GNU Troff? Or,
actually, I'd be more interested in an ISO 8601-based version,
akin to the following \todayiso one for LaTeX.

\newcommand \twodigits [1] {%
\ifnum #1<10 0\fi
\number #1}

\newcommand \todayiso {%
\number \year-\twodigits \month-\twodigits \day}

[...]
--
FSF associate member #7257
Adam H. Kerman
2012-05-05 16:09:59 UTC
Permalink
Post by Ivan Shmakov
[Cross-posting to news:comp.text and dropping news:news.misc and
news:news.software.readers from Followup-To:, for the question
has little or no relation to the netnews technology whatsover.]
No, I'm not crossposting to comp.text.

Allow me to spell out the criticism of this little stunt. Followup-To
is an instruction to a different author posting a followup, not you.
Adding this header doesn't make your article on topic in every newsgroup
you crossposed it to. If you actually believe that your article is
off topic in some of the groups you're crossposting it in, then cut
those groups from the crosspost.

It's actually kind of shitty to drop all of the original groups from
the crosspost to direct a conversation into a group that hadn't
participated in it at all. I'd even call this trollish behavior.
Ivan Shmakov
2012-05-05 16:24:25 UTC
Permalink
Post by Adam H. Kerman
Post by Ivan Shmakov
[Cross-posting to news:comp.text and dropping news:news.misc and
news:news.software.readers from Followup-To:, for the question has
little or no relation to the netnews technology whatsover.]
No, I'm not crossposting to comp.text.
Allow me to spell out the criticism of this little stunt.
Followup-To is an instruction to a different author posting a
followup, not you.
May I note that it's a suggestion, and not an order?
Post by Adam H. Kerman
Adding this header doesn't make your article on topic in every
newsgroup you crossposed it to. If you actually believe that your
article is off topic in some of the groups you're crossposting it in,
then cut those groups from the crosspost.
It's actually kind of shitty to drop all of the original groups from
the crosspost to direct a conversation into a group that hadn't
participated in it at all. I'd even call this trollish behavior.
I'm pretty sure that such a vile behavior should've been
mentioned in a netiquette guide of some sort.

Care to share a link?
--
FSF associate member #7257
Adam H. Kerman
2012-05-05 19:33:30 UTC
Permalink
Post by Ivan Shmakov
Post by Adam H. Kerman
Post by Ivan Shmakov
[Cross-posting to news:comp.text and dropping news:news.misc and
news:news.software.readers from Followup-To:, for the question has
little or no relation to the netnews technology whatsover.]
No, I'm not crossposting to comp.text.
Allow me to spell out the criticism of this little stunt.
Followup-To is an instruction to a different author posting a
followup, not you.
May I note that it's a suggestion, and not an order?
Post by Adam H. Kerman
Adding this header doesn't make your article on topic in every
newsgroup you crossposed it to. If you actually believe that your
article is off topic in some of the groups you're crossposting it in,
then cut those groups from the crosspost.
It's actually kind of shitty to drop all of the original groups from
the crosspost to direct a conversation into a group that hadn't
participated in it at all. I'd even call this trollish behavior.
I'm pretty sure that such a vile behavior should've been
mentioned in a netiquette guide of some sort.
Care to share a link?
Sure, just as soon as I figure out who said what in the quoted text since
someone is posting with proprietary left-hand margins and attribution
lines that don't match quoting levels.

I'm sure I read "Don't make it annoying to post followups to your own
articles" somewhere, too. While I'm looking, perhaps you could post the
netiquette guide that recommends your posting style.

Ivan Shmakov
2012-05-02 05:14:03 UTC
Permalink
Post by Anton Shepelev
Even e^(sin x) is impossible.
Really? Well, congratulations, then: you just did the impossible
:‐) !
Heh, that’s the plain‐text ASCII notation that I think is the best
for Usenet and e‐mail and that Ivan Smakov doesn’t like, preferring
either Unicode symbols or a full‐fledged equation description lan‐
guage interpreted by newsreaders...
Newsreaders need to understand a mathematical notation language
as much as they need to understand Ext2.

(IOW, they can delegate this task to the underlying OS.)
--
FSF associate member #7257
Ivan Shmakov
2012-05-02 06:32:22 UTC
Permalink
[Cross-posting to news:comp.text, for there isn't a separate
newsgroup for *roff.]
Post by Anton Shepelev
I have never seen an HTML e‐mail that wouldn’t have become better
by rewriting it in plain text.
What’s
BTW, there I had the proper APOSTROPHE (U+0027) character, but
it was changed somehow to RIGHT SINGLE QUOTATION MARK (U+2019),
which is a change in the meaning.

Do I understand it correctly that the formatter's configuration
you use doesn't distinguish between these two?
[…]
Post by Anton Shepelev
Would LaTeX code be a preferable form of presentation, I guess we’d
see a lot of scientific journals switching to it from the
now‐ubiquitous
Also, there HYPHEN-MINUS (U+002D) was replaced by
HYPHEN (U+2019), so I guess that the formatter doesn't
distinguish these two, either. (Note that while in this
particular case this change doesn't affect the meaning, such a
change done for, e. g., a code fragment, would be destructive.)
Post by Anton Shepelev
mathematic notation.
[…]
Post by Anton Shepelev
‐‐
There, the signature delimiter was changed from the customary
“-- ”, or (U+002D, U+002D, U+0020), to “‐‐” (or U+2010, U+2010),
which, I believe, most of the newsreaders currently in use won't
recognize as such a delimiter.

“-- ” is an element of “news markup language”, after all.
Changing it “‐‐” has roughly the same effect as changing <pre />
to 〈pre /〉 in HTML.
Post by Anton Shepelev
() ascii ribbon campaign ‐ against html e‐mail
/\ www.asciiribbon.org ‐ against proprietary attachments
(Ironically, neither U+2010 nor U+2019 is ASCII.)
--
FSF associate member #7257
Anton Shepelev
2012-05-02 08:32:19 UTC
Permalink
BTW, there I had the proper APOSTROPHE (U+0027)
character, but it was changed somehow to RIGHT
SINGLE QUOTATION MARK (U+2019), which is a change
in the meaning.
Do I understand it correctly that the formatter's
configuration you use doesn't distinguish between
these two?
It distinguishes the two symbols, but behaves ac-
cording to groff_char(7), mapping the ASCII apostro-
phe to the right single quote symbol. This can be
changed by commenting the relevant line in
unicode.tmac file in groff's tmac directory:

change: .char ' \[cq]
to: .\".char ' \[cq]

--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Anton Shepelev
2012-05-02 14:51:26 UTC
Permalink
BTW, there I had the proper APOSTROPHE (U+0027)
character, but it was changed somehow to RIGHT
SINGLE QUOTATION MARK (U+2019), which is a change
in the meaning.
Do I understand it correctly that the formatter's
configuration you use doesn't distinguish between
these two?
Looks like the real true apostrophe is the same
glyph as the right single quotation mark:

http://en.wikipedia.org/wiki/Apostrophe
Also, there HYPHEN-MINUS (U+002D) was replaced by
HYPHEN (U+2019), so I guess that the formatter
doesn't distinguish these two, either. (Note that
while in this particular case this change doesn't
affect the meaning, such a change done for, e. g.,
a code fragment, would be destructive.)
The HYPHEN-MINUS character is used to denote, well,
the minus and number intervals, because its width is
equal to that of a digit glyph, while the shorter
HYPHEN is used in running text, as in "self-evi-
dent", but I am writing in ASCII right now, so
you'll see no difference in my post.

In groff, when preparing output for the utf8 device,
the minus is accessed as either \(mi in ASCII
sources or by using this symbol direcly in Unicode
sources; and I didn't take it into account when
switching from ASCII to Unicode.

Anton

--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Ivan Shmakov
2012-05-02 16:47:01 UTC
Permalink
BTW, there I had the proper APOSTROPHE (U+0027) character, but it
was changed somehow to RIGHT SINGLE QUOTATION MARK (U+2019), which
is a change in the meaning.
Do I understand it correctly that the formatter's configuration you
use doesn't distinguish between these two?
Looks like the real true apostrophe is the same glyph as the right
http://en.wikipedia.org/wiki/Apostrophe
Indeed, thanks.

Still, such a change is /not/ safe when done automatically,
especially when quoting code fragments, etc.

--cut: http://unicode.org/Public/UNIDATA/NamesList.txt --
0027 APOSTROPHE
= apostrophe-quote (1.0)
= APL quote
* neutral (vertical) glyph with mixed usage
* 2019 is preferred for apostrophe
* preferred characters in English for paired quotation marks are 2018 & 2019
--cut: http://unicode.org/Public/UNIDATA/NamesList.txt --
Also, there HYPHEN-MINUS (U+002D) was replaced by HYPHEN (U+2019),
so I guess that the formatter doesn't distinguish these two, either.
(Note that while in this particular case this change doesn't affect
the meaning, such a change done for, e. g., a code fragment, would
be destructive.)
The HYPHEN-MINUS character is used to denote, well, the minus and
number intervals, because its width is equal to that of a digit
glyph, while the shorter HYPHEN
I wonder, how these consideration may apply to the text intended
to be rendered in a monospace font?
is used in running text, as in "self-evident", but I am writing in
ASCII right now, so you'll see no difference in my post.
I understand it differently. Namely, HYPHEN-MINUS is used as a
substitute for either HYPHEN or a MINUS SIGN (in math), but
/not/ for an EN DASH, which is typically used for numeric
intervals.

Also, HYPHEN-MINUS is commonly used in programming languages to
mean subtraction, negation, comment (SQL), or otherwise, so it
is /not/ safe to automatically replace it with a HYPHEN (or
MINUS SIGN), either.

--cut: http://unicode.org/Public/UNIDATA/NamesList.txt --
002D HYPHEN-MINUS
= hyphen or minus sign
* used for either hyphen or minus sign
x (hyphen - 2010)
x (non-breaking hyphen - 2011)
x (figure dash - 2012)
x (en dash - 2013)
x (minus sign - 2212)
x (roman uncia sign - 10191)
--cut: http://unicode.org/Public/UNIDATA/NamesList.txt --
In groff, when preparing output for the utf8 device, the minus is
accessed as either \(mi in ASCII sources or by using this symbol
direcly in Unicode sources; and I didn't take it into account when
switching from ASCII to Unicode.
--
AIUI, the convention is to use “-- ” (i. e., with a trailing
blank), /not/ “--” (though Gnus seem to accept it either way.)
--
FSF associate member #7257
Anton Shepelev
2012-05-02 21:08:18 UTC
Permalink
Post by Ivan Shmakov
Looks like the real true apostrophe is the same
http://en.wikipedia.org/wiki/Apostrophe
Indeed, thanks.
Still, such a change is /not/ safe when done auto-
matically, especially when quoting code fragments,
etc.
--cut: http://unicode.org/Public/UNIDATA/NamesList.txt --
0027 APOSTROPHE
= apostrophe-quote (1.0)
= APL quote
* neutral (vertical) glyph with mixed usage
* 2019 is preferred for apostrophe
* preferred characters in English for paired quotation marks are 2018 & 2019
--cut: http://unicode.org/Public/UNIDATA/NamesList.txt --
Yes, that's why you can have a verbatim ASCII-based
environment for code and a normal one for text.
Post by Ivan Shmakov
Post by Ivan Shmakov
Also, there HYPHEN-MINUS (U+002D) was replaced
by HYPHEN (U+2019), so I guess that the for-
matter doesn't distinguish these two, either.
(Note that while in this particular case this
change doesn't affect the meaning, such a
change done for, e.g., a code fragment, would
be destructive.)
The HYPHEN-MINUS character is used to denote,
well, the minus and number intervals, because
its width is equal to that of a digit glyph,
while the shorter HYPHEN
I wonder, how these consideration may apply to the
text intended to be rendered in a monospace font?
No matter what the font is, the glyphs can be dif-
ferent. Furhtermore, groff is a universal typeset-
ting engine and it has a general mechanism to manage
characters and their representations, which is inde-
pendent of the output device.
Post by Ivan Shmakov
is used in running text, as in "self-evident",
but I am writing in ASCII right now, so you'll
see no difference in my post.
I understand it differently. Namely, HYPHEN-MINUS
is used as a substitute for either HYPHEN or a
MINUS SIGN (in math), but /not/ for an EN DASH,
which is typically used for numeric intervals.
Also, HYPHEN-MINUS is commonly used in programming
languages to mean subtraction, negation, comment
(SQL), or otherwise, so it is /not/ safe to auto-
matically replace it with a HYPHEN (or
MINUS SIGN), either.
--cut: http://unicode.org/Public/UNIDATA/NamesList.txt --
002D HYPHEN-MINUS
= hyphen or minus sign
* used for either hyphen or minus sign
x (hyphen - 2010)
x (non-breaking hyphen - 2011)
x (figure dash - 2012)
x (en dash - 2013)
x (minus sign - 2212)
x (roman uncia sign - 10191)
--cut: http://unicode.org/Public/UNIDATA/NamesList.txt --
In typography, real symbols are preferred to substi-
tutes. When it comes to programming languages, you
are correct. For the sake of convenience they tend
to use as much ASCII and as little non-ASCII as pos-
sible. Same applies to good e-mail and Usenet mes-
sages.
Post by Ivan Shmakov
--
AIUI, the convention is to use “-- ” (i. e., with
a trailing blank), /not/ “--” (though Gnus seem to
accept it either way.)
Thanks for the correction. Applied.

--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Ivan Shmakov
2012-05-02 08:29:55 UTC
Permalink
[This one got too long, so I'm splitting it into two parts. In
this part, I'm focusing on why one'd use XHTML to format
“simple” Usenet and e-mail messages; the images, formulae, and
the like are not considered.]
I have never seen an HTML e‐mail that wouldn’t have become better
by rewriting it in plain text.
Sorry, I counln’t find them. Those are MessageIDs, right?
The part next to news: is indeed a Message-ID:. As a whole,
these are news: schema URI's, as per RFC 5538.

[…]
As I’ve said before, there’re many different ways to author “poor”
HTML. It doesn’t mean that any HTML is /necessarily/ bad.
Agree, but I wanted to show you the corelation and that plain‐text
stops people from abusing HTML.
I'd only accept this argument if plain text would also prevent
them from overquoting and top-posting!

(My ongoing recommendation is to “do it right or not at all.”)

[…]
Apart from the aesthetic reasons, I like plain text for it’s utter
simplicity, transparecy, and suitabilty for automated searching and
processing.
I disagree. None of the automated formatters I know can handle
hyphenated text, and none of them could discern “verbatim” text
(such as a source code fragment, or ASCII art) from the “normal”,
formatted text.
[…]
Yes, I was talking about the simplest case of ragged‐right and
unhyphenated. Several programs go a step futher and treat everything
that has an indent as verbatim.
Unfortunately, my personal preference is to keep the /formatted/
text indented, while the “verbatim” text is put “as is”, in part
to make it easy to copy & paste it. (Think of code fragments,
for instance, where indent /may/ be an element of syntax, as in
the case of Python or Fortran 77.)
And with reading/searching my statement holds even for preformatted
text.
While I definitely can read plain text, I'm certainly able to
imagine a better “reading experience” for me.

For instance, I can imagine a newsreader reformatting the
articles to match /my/ preferences, like, for formatted text:
left margin at column 8, right margin at column 72, “reflown”
and “ragged right” (while for “verbatim” text these'd be: no
left margin, preserve line breaks and whitespace.)

While Gnus allows for “reflowning” (and thus setting of the
right margin) of the text of the article being displayed, it
doesn't discern (and, frankly, how could it?) between
“formatted” and “verbatim” text, and thus, while reading a
poorly (or simply not to my preferences) formatted message, I
have to choose between suffering an extra eye strain and ruining
the code examples. (And Gnus isn't smart enough to drop the
hyphens, BTW.)

Obviously, the use of just /two/ of the XHTML elements, namely
<p /> and <pre />, would resolve this whole issue!

To summarize, the very benefit of XHTML is that it lets the
/reader/ decide upon the final appearance. (Or at least it
gives the reader much more freedom in this respect than both the
plain text and PDF.)
Besides, I think I am the only one who uses hyphenation in e‐mail and
Usenet, and I keep groff source of important articles.
There comes a small tradeoff. When using XHTML, the version of
the article read at the other end and the version archived are
one and the same. There's no “unpublished source” to be lost.
(Think of FAQ's, or the like, where the document being posted
needs to be maintained just as well.)
Being a typeseting system, it has a language much lighter and more
comfortable than HTML, so I can easily strip the bare text from it
Note that there's software (html2text(1), Lynx, etc.) that
allows one to do the very same with HTML. Indeed, I'm a heavy
user of Lynx, and for the most of the time, Web is “plain-texty”
enough for me.
or use groff to typeset a message as PDF or HTML, if need be.
groff’s code is also much easier to read than HTML.
Indeed, I agree on that!

Note, however, that *roff, as well as LaTeX and PostScript, and
/unlike/ XHTML and PDF, is /not/ a markup language or data
format, but rather a programming language, much like ECMAScript
or Java. Thus, the use of the former on the Web and within
messages (be it e-mail, netnews, XMPP, or whatever else) has at
least the following drawbacks:

• obvious security implications; (think of infinite loops, for
instance);

• its easy to “convert” the former into the latter; not the
other way around.

Fortunately, there're a plenty of “simplified” markup languages,
such as Markdown, reStructuredText, Textile, Creole, etc.,
which, although not nearly as powerful, seem like a good fit for
e-mail and netnews (and are already widely used in various Web
forums, which, arguably, constitute a communication medium close
enough to Usenet.)

And note that posting HTML generated from either Markdown or
*roff has the very benefit of having the <p /> vs. <pre />
distinction /still present/. Which makes my dreams of easy to
re-flow mail & news messages come true! (Contrary to the
posting of plain text produced from the very same *roff source.)

The only part which is (AFAIK) currently missing is the ability
to convert HTML (or, rather, a subset of it) back into an
“editable” format, such as Markdown.

[…]
As for the XML tools you mentioned, I am afraid every HTML message
will require a special script to process it, becuase HTML is not
strucured in the sense that instead of defining the funciton of an
element (like, it is a list, it is a header, and so on), it defines
its appearance, and the former is not easily inferred from the
latter! On the other hand, *roff and LaTeX use structured approach.
I disagree with both of these statements. On the one hand,
contemporary HTML (HTML5, XHTML, “strict” HTML 4.01) delegates
the formatting almost exclusively to CSS (and isn't <section />
structured enough, anyway?) On the other, LaTeX has a plenty of
easy to abuse commands, such as, e. g., \bf, \sf, etc. And
indeed, there're LaTeX users (though most probably a minority)
who'd happily use, say, \noindent \large \bf Foo for the section
heading.

One more benefit of XHTML-based netnews is the ability to
specify the /language/ for the parts of the text, which improves
accessibility. (Think of speech synthesis software.)
P. S.: Several times I have appealed to the infrequency of various
situations you have mentioned. I find it a valid argument because
changing the whole medium in order to satisfy several percent of
posters at the expense of imparting overload unto the heads of the
remaining 99 perceint would be wrong.
The inherent flaw in this argument is that by making a feature
difficult in some technology we effectively draw those who may
benefit from such a feature /away/ from this technology.
Therefore, in the long term, the only users of the latter would
be those who don't need that feature.

In the end, it's not the community that choose the technology,
but the technology that selected its users.

And then, without enough newcomers, any technology falls into
the oblivion.
--
FSF associate member #7257
Ivan Shmakov
2012-05-03 07:19:03 UTC
Permalink
[This one got too long, so I'm splitting it into two parts. In
this, second, part, I'm focusing on why what benefits XHTML has
when formatting more complex Usenet and e-mail messages,
including those with images, tables, formulae, and the like.]

[…]
Post by Ivan Shmakov
As you have shown, it is possible to highlight words in plain text.
Yes, but then my Gnus shows /foo/ as highlighted foo, because it
thinks that slashes mean slanted text here, not the Unix directory
separators they in fact are.
I didn’t mean that the those symbols should be interpreted on the
reader’s side. They are OK as they are.
Why, would the people ask, they have to learn that slashes mean
italics? Why can't the computer just display the italics?

Being a long time fan of VGA text modes (indeed, I'm using an
80×30 one right now), I'd accept the argument of “complexity” of
displaying the italics. But I wonder if the “common people”
will find such an argument plausible.
Post by Ivan Shmakov
Formulas can expressed like they are in programming languages or
LaTeX source,
I disagree.
Would LaTeX code be a preferable form of presentation, I guess we’d
see a lot of scientific journals switching to it from the
now‐ubiquitous mathematic notation.
But the mediums of e‐mail and Usenet are quite different from that of
printed publications.
Indeed, I agree with that, but not with the specific points
presented below, which I deem rather superficial. My comments
on them are therefore /not/ to be taken too seriosly.
a. The need to use heavy and non‐editable formats, like PDF and
It was my understanding that the /printed/ publications use
/paper/ as the media and the format. Its heaviness depends
primarily on the density of the paper, the number of pages, and
the particular format (i. e., paper size) chosen.

Indeed, these formats are mostly non-editable.
PostScript.
Unlike PDF, PostScript is not quite a file format. Rather, it's
a programming language, much like ECMAScript and Java, which is
the reason to recommend /against/ any of these three on the Web.

One of my favorite hacks in PostScript is the one drawing the
Mandelbrot set, BTW.

Unfortunately, a forgot the URI, but then I've found this one:

http://www.physics.uq.edu.au/people/foster/postscript.html
b. Great effort and special software required to produce a
well‐typeset document, which on the reader side necessiates heavy and
more complicated software and results in higher latency and lesser
efficiency of communication.
Agreed.
HTML, as a compromise solution, has the drawbacks of both
typographical and plain‐text mediums. For example, it is looses the
simplicity and transparency,
… for certain definitions thereof.
yet its presentation is platoform‐dependent, meaning that that the
author does not have control over how his work will look in different
programs, while plain text viewed in a monospace font _always_ looks
the same.
No, it doesn't. In particular, there're different monospace
fonts, and different meanings associated with the ASCII HT (AKA
“TAB”, \t, \x9) code. (I'd mention also the different
conventions to represent line breaks, or boldface and underline
conventions, but by this time these are mostly irrelevant.)

Anyway, being such a “compromise”, HTML gives the reader (and
not the platform, BTW) /more/ freedom in choosing the
presentation than PDF and, arguably, plain text.

For instance, it isn't that trivial to change the base font size
(so to solve accessibility issues, for instance) or paper size
(say, A4 to A5 or vice versa) of a PDF document, irrespective of
how “well” it's typeset. This way, PDF gives the author way too
much control over the presentation, while HTML provides a useful
“compromise.”
Post by Ivan Shmakov
Using * to mean ⋅ or ×, or introducing “new” functions, such as abs,
expt, pow, sqrt, etc., doesn’t seem sane, either.
That being said, Unicode‐based plain‐text allows ❘x❘, x¹²³, and even
√x̅. However, I don’t know of any easy way to author such formulae,
while there’re a few translators from a subset of LaTeX to MathML.
Such functions and conventions were introduced a long time ago,
It's not an argument. The question is: how widely they're
known, especially among the audience without strong “computing
background”?
and everybody with basic math and computer knowledge will understand
them.
So, the basic computer knowledge includes the Fortran notation
these days? (Or is it basic math?)
They are few and easy to learn and remember. Unicode math symbols
allow to write only very, very sort formulas.
s/short/simple/, perhaps?
Even e^(sin x) is impossible.
And that'd be one more reason to use MathML. (Note, however,
that since “exp” is widely used in mathematical texts, this
specific example could be written as exp (sin x) just as well.)
Post by Ivan Shmakov
and graphics should be converted to ASCII‐art,
What for?
In order to make it transmittable in plain‐text, of course.
SVG is plain text, as well as are certain varieties of the PNM
raster graphics format.

The question is, are we solving a technical issue here, or?

Representing graphics with glyphs of a monospace font was
justified when this was pretty much the least common denominator
of the computing systems in use. (And it may still be justified
for those preferring VGA text modes.) Nowadays, it doesn't
worth the effort.
Bear in mind that with simple diagrams and schematics this conversion
is very easy and the result is quite pleasant.
So simple that it could be performed automatically with
software? Which one you'd recommend, then? (I'd use it to
complement Lynx.)
Post by Ivan Shmakov
linked, or attached, depending on the circumstances.
The graphics, both linked and provided as separate MIME parts, could
be presented in a specific place of an HTML‐based message
referencing it. Not so for a message based on plain‐text.
I was pointing out ways to cope with images in plain‐text. Of
course, in HTML it is more straightforward.
That's the point.
One could also create a page on the web and link it, or typeset a
nice PDF and attach it,
PDF is not an alternative to HTML precisely because it puts way
too much control over the final presentation in the hands of the
author.
but 99% percent of Usenet or e‐mail messages are so far from being
book‐long that the inclusion of a couple of links or binary
attachments is not a problem, and the advantages offered by HTML are
not worth its overhead.
What overhead is in question here?

Well-authored HTML gives little overhead when it comes to the
bandwidth, and neither does it require any “additional”
software, beyond those already available on most of the
“desktop” (“laptop”, “palmtop”, etc.) systems nowadays.

And, authoring HTML isn't much overhead, either, at least not a
tiny bit more than *roff, precisely because the latter could be
automatically converted into the former. (Although I'd prefer
Markdown here, with <http://example.org/> for the links.)

For precisely the same reason, MathML implies no more overhead
than the LaTeX notation, as there're existing tools to convert
(a subset of) the LaTeX math language into MathML.

[…]
Post by Ivan Shmakov
One doesn’t have to use a parser to comfortably read and operate
plain‐text files.
Column 1 Columns 2, 3
Column 2 Column 3
Did you note that the table was quoted the wrong way, BTW?
That's one of the issues that HTML would've helped to avoid.
Post by Ivan Shmakov
Whatever there is 1 Foo bar
And here’s the other 2 Baz
You are right again, but it is only 1‐2% of messages that must have
tables, and even more rare are the cases when one needs to parse
them.
Copying a plain-text table from a message to a non-plain text
document would imply some non-trivial parsing; HTML makes this
task easier.
If a table is intended for parsing, it is meet to post it as tab‐ or
comma‐separated, either in the body of the message or as an
attachement.
This would put extra burden on the reader (copying CSV into a
file to open with a spreadsheet software, scrolling or switching
between the windows showing the text and the table.)
Such complicated tables are rare, even rarer does one need to
automatically parse them. Most of the time it is a set of key‐value
paris or n‐tuples ‐‐
Two HYPHENs (U+2010) don't add up to a proper EM DASH (U+2014),
BTW.
a regular linear structure, which is easier parsed from plain‐text
(CSV) than from HTML.
Actually, CSV isn't much easier to parse than XHTML, so it
warrants using a library for that. Then, there isn't much
difference in using an XML library vs. a CSV one.

Consider, e. g., the following single-row, three-column CSV
dataset:

1,"a","well, ""hello, world"", as an example"

This issue was recently discussed in news:fido7.ru.unix.bsd [1].

[1] news:***@ddt.demos.su
http://groups.google.com/group/fido7.ru.unix.bsd/browse_thread/thread/ee7bdde4f5241891/0378d3455ffa4046

[…]
As for the XML tools you mentioned, I am afraid every HTML message
will require a special script to process it, becuase HTML is not
strucured in the sense that instead of defining the funciton of an
element (like, it is a list, it is a header, and so on), it defines
its appearance, and the former is not easily inferred from the
latter! On the other hand, *roff and LaTeX use structured approach.
I've provided some counterexamples in the other part, but here's
two more arguments.

First of all, the HTML "table" elements have well-defined
structured meaning, and while the location (in the DOM tree) of
a particular table may vary between documents, it'd likely be
only a matter of changing an XPath expression to “adapt” the
code from one document to the other.

For the second argument, I'd refer to Eric Raymond's “DocBook
Demystification HOWTO”:

--cut: http://tldp.org/HOWTO/DocBook-Demystification-HOWTO/intro.html --
[…] The advocates of XML-based “structural markup” (as opposed to
the older style of “presentation markup” exemplified by troff, Tex,
and Texinfo) seem to have won the theoretical battle. […]
--cut: http://tldp.org/HOWTO/DocBook-Demystification-HOWTO/intro.html --

[…]
--
FSF associate member #7257
Anton Shepelev
2012-05-03 14:28:48 UTC
Permalink
Ivan Shmakov,

I have too little time to give a fuller answer, so for now I
Post by Ivan Shmakov
For the second argument, I'd refer to Eric Raymond's
--cut: http://tldp.org/HOWTO/DocBook-Demystification-HOWTO/intro.html --
[Е] The advocates of XML-based "structural markup" (as opposed to
the older style of "presentation markup" exemplified by troff, Tex,
and Texinfo) seem to have won the theoretical battle. [Е]
--cut: http://tldp.org/HOWTO/DocBook-Demystification-HOWTO/intro.html --
I have looked into what that FAQ calls structural and
presentational markup, and I must say its author is plain
wrong about groff and (La)TeX, although he seems to be
correct about DocBook, probably because he knows it better:

As an example: In a presentation-markup language, if
you want to emphasize a word, you might instruct the
formatter to set it in boldface. In troff(1) this
would look like so:

All your base
.B are
belong to us!

In a structural-markup language, you would tell the
formatter to emphasize the word:

All your base <emphasis>are</emphasis> belong to us!

The "<emphasis>" and </emphasis>in the line above are
called markup tags, or just tags for short. They are
the instructions to your formatter.

In a structural-markup language, the physical
appearance of the final document would be controlled
by a stylesheet . It is the stylesheet that would tell
the formatter "render emphasis as a font change to
boldface". One advantage of structural-markup
languages is that by changing a stylesheet you can
globally change the presentation of the document (to
use different fonts, for example) without having to
hack all the the individual instances of (say) .B in
the document itself.

Both LaTeX, a macro package for TeX, and a majority of groff
macro packages are structured markup languages, in which a
document is defined in terms of its structure (title, TOC,
abstract, headers and subheaders, lists, references and
indexes...), while the "presentation" (i.e. the formatting)
of these structural elements depends upon the underlying
macro package, so one can reformat the same document just by
switching changing the implementation of a macro or swithing
to another macro package with a compatible interface.

Many packages for both *roff and TeX provide provide means
for deep customization of the "presentation", which can be
effected without changing the "structure" (i.e the
document's source). This is called separation of concerns.

Both TeX and roff are low-level typesetting languages that
are presentation-oriented, but they are used as basis for
high-level structured mark-up languages implemented in what
is called macro packages. Just open up any LaTeX tutorial
and you'll see structured mark-up at work:

http://tinyurl.com/7obyloe
(a PDF document)
(see section 2.1)

As for groff, see, for example, the documentation to the mom
macro package:

http://www.schaffter.ca/mom/mom-01.html

and especially this little section describing just what Eric
Raymond says groff and TeX can't do:

http://www.schaffter.ca/mom/mom-02.html#global

It is actually a feature that both LaTeX and groff are proud
of. I don't know whether it is due to ignorance or bad
intent that Eric Raynolds has published such disinformation.

The groff example I will explain in more detail.

All your base
.B are
belong to us!

The roff language has no .B request, but many of its macro
packages do have such a macro. The package implementor or
user may (re)define this macro however he wants, and even
may have several alternative definitions at the same time,
so that the active one will depend on the environment,
context, e.t.c. or explicitly specified in a macro call.
Similarly, you could define an .EMPH macro and use it just
as the author describes.

In other words, roughly:

-- roff and TeX correspond to the rendering engine and
interpreter

-- the interface of a macro package corresponds to a
structured markup language

-- the settings of a macro package correspond to a style
sheet.
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Shmuel (Seymour J.) Metz
2012-05-03 16:57:02 UTC
Permalink
In <***@g{oogle}mail.com>, on
05/03/2012
Post by Anton Shepelev
In a structural-markup language, you would tell the
All your base <emphasis>are</emphasis> belong to us!
No; that's still marking up the text for presentation. With structural
markup you tag the text to indicate its semantics, e.g.,
":title.Return of the foo". That might cause emphasis, quoting or
something else. It might cause automatic indexing. It might do
something else. And it might do different things depending on the
context.
Post by Anton Shepelev
In a structural-markup language, the physical
appearance of the final document would be controlled
by a stylesheet .
That's one posibility; there are others.
Post by Anton Shepelev
-- the interface of a macro package corresponds to a
structured markup language
Some macro packages are strictly presentational.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to ***@library.lspace.org
Anton Shepelev
2012-05-04 13:44:25 UTC
Permalink
Post by Shmuel (Seymour J.) Metz
Post by Anton Shepelev
In a structural-markup language, you would tell the
All your base <emphasis>are</emphasis> belong to us!
No; that's still marking up the text for presentation.
With structural markup you tag the text to indicate its
semantics, e.g., ":title.Return of the foo". That might
cause emphasis, quoting or something else. It might cause
automatic indexing. It might do something else. And it
might do different things depending on the context.
I think you are nitpicking here, because the tag <emphasis>
may be endowed with any meaning by means of a style sheet.
Similarly you could have a <title> tag.
Post by Shmuel (Seymour J.) Metz
Post by Anton Shepelev
In a structural-markup language, the physical appearance
of the final document would be controlled by a
stylesheet .
That's one posibility; there are others.
Agree.
Post by Shmuel (Seymour J.) Metz
Post by Anton Shepelev
the interface of a macro package corresponds to a
structured markup language
Some macro packages are strictly presentational.
Yes. Structural mark-up is an abstraction from the lower-
level presentational mark-up. For example, TeX is
presentational and LaTeX is strutural.
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Ivan Shmakov
2012-05-04 14:21:36 UTC
Permalink
[Cross-posting to news:comp.text, and dropping news:news.misc
from Followup-To:.]
In a structural-markup language, you would tell the formatter to
All your base <emphasis>are</emphasis> belong to us!
No; that's still marking up the text for presentation. With
structural markup you tag the text to indicate its semantics, e. g.,
":title.Return of the foo". That might cause emphasis, quoting or
something else. It might cause automatic indexing. It might do
something else. And it might do different things depending on the
context.
I think you are nitpicking here, because the tag <emphasis> may be
endowed with any meaning by means of a style sheet. Similarly you
could have a <title> tag.
There's one more issue with the TeX approach, which is not as
much of structural vs. presentational kind, as it's of code
vs. data one.

Namely, while it's possible to parse DocBook, documents in
TeX-based markup are essentially /unparsable/. For instance,
while it's possible to extract all the section headings from a
DocBook document, it's impossible to do so, in general, for a
LaTeX one, as the LaTeX document in question can introduce its
own commands all along the way. Consider, e. g.:

\let \sec=\section

The same applies to *roff, and it's precisely the reason that
various *roff "viewers" have to either rely on an implementation
of the language (such as GNU Troff), or support only a
particular macro package (as in the case of Emacs' M-x woman.)

On the contrary, the software working with DocBook documents
doesn't have to rely upon, say, the DocBook XSL stylesheets.

[...]
the interface of a macro package corresponds to a structured markup
language
Some macro packages are strictly presentational.
Yes. Structural mark-up is an abstraction from the lower-level
presentational mark-up. For example, TeX is presentational and LaTeX
is strutural.
Actually, LaTeX is structural, plain TeX is presentational, and
TeX is the macro processing language in which both of them are
implemented (as are, e. g., ConTeXt and certain GNU Texinfo
"conversions.")
--
FSF associate member #7257
Anton Shepelev
2012-05-05 08:41:13 UTC
Permalink
[cross-posting to news.misc]
What's the reason to have different Newsgroups and
Followup-To headers? Isn't it logical to reply to the
same groups to which the article was posted?
When the discussion has drift away sufficiently from the
original topic, it may be necessary to direct it into a
different newsgroup, or a set of them. Then, it's
customary to post the followup preserving Newsgroups:,
while also setting Followup-To: (and announcing so), so
that the next followup will be directed to the now-
appropriate newsgroups.
[...]
The reason for preserving Newsgroups: is that NNTP doesn't
allow one to follow a /thread/ (or a /person/, BTW), so
it's necessary to warn the readers of the thread of the
pending Newsgroups: change, to give them chance to adjust
their subscriptions.
Thank you for the explanation, Ivan.

You mentioned the lack of support for following
threads/authors, and I think it can be done on the client
side, and Pegasus and Dialog seem to have a "thread watch"
function working at least within a newsgroup. Coupled with
the lack of HTML, doesn't it make Usenet a medium obsolette
beyond amendment in your opinion? And will the apporpriate
updates not turn it into yet another heavy and bloated web
forum, sans the distributed architecture? That's my another
point against such innovations.
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
Ivan Shmakov
2012-05-05 15:02:40 UTC
Permalink
Post by Anton Shepelev
What's the reason to have different Newsgroups and Followup-To
headers? Isn't it logical to reply to the same groups to which the
article was posted?
When the discussion has drift away sufficiently from the original
topic, it may be necessary to direct it into a different newsgroup,
or a set of them. Then, it's customary to post the followup
preserving Newsgroups:, while also setting Followup-To: (and
announcing so), so that the next followup will be directed to the
now-appropriate newsgroups.
The reason for preserving Newsgroups: is that NNTP doesn't allow one
to follow a /thread/ (or a /person/, BTW), so it's necessary to warn
the readers of the thread of the pending Newsgroups: change, to give
them chance to adjust their subscriptions.
Thank you for the explanation, Ivan.
You mentioned the lack of support for following threads/authors, and
I think it can be done on the client side, and Pegasus and Dialog
seem to have a "thread watch" function working at least within a
newsgroup.
And that's precisely why the aforementioned "Followup-To: is a
subset of Newsgroups:" hack is necessary.

Trying to apply such a feature across all the newsgroups within
a news server that has a decent feed is likely to induce a
considerable load on the latter, which, in turn, may result on a
permanent ban for an unlucky offender, at least should he or she
try it on a free-of-charge news service.
Post by Anton Shepelev
Coupled with the lack of HTML, doesn't it make Usenet a medium
obsolette beyond amendment in your opinion?
Who am I to talk about obsolescence? (May I remind you that I
still use VGA text modes for the most of my time at a terminal?)

But, yes, it makes Usenet less convenient that it might be.

That being said, it's not the technology that's of the utmost to
me. It's the community that is.

(And, BTW, I'm not concerned with the choice of the markup
language, either, as long as it allows me to reflow the
formatted text while leaving the "pre-formatted" one alone, has
a registered MIME type, and a sheer variety of processing tools.
But I seem to contradict myself.)
Post by Anton Shepelev
And will the apporpriate updates not turn it into yet another heavy
and bloated web forum, sans the distributed architecture?
I'm sure they won't. Why should them?
Post by Anton Shepelev
That's my another point against such innovations.
Care to share on which innovations you'd likely welcome?
--
FSF associate member #7257
Ivan Shmakov
2012-05-05 16:18:39 UTC
Permalink
[Cross-posting to news:comp.text and dropping news:news.misc
from Followup-To:.]
Post by Shmuel (Seymour J.) Metz
In a structural-markup language, you would tell the formatter to
All your base <emphasis>are</emphasis> belong to us!
No; that's still marking up the text for presentation.
Yes, somewhat.

--cut: http://docbook.org/tdg5/en/html/emphasis.html --
An emphasis is often used wherever its typographic presentation is
desired, even when other markup might theoretically be more
appropriate.
--cut: http://docbook.org/tdg5/en/html/emphasis.html --

I tend to think of <emphasis /> also as of a "I know that this
fragment of text stands out semantically, but I can't find a
suitable element for it right now" element.

[...]
Post by Shmuel (Seymour J.) Metz
In a structural-markup language, the physical appearance of the
final document would be controlled by a stylesheet.
That's one posibility; there are others.
For instance?
Post by Shmuel (Seymour J.) Metz
-- the interface of a macro package corresponds to a structured
markup language
Some macro packages are strictly presentational.
E. g., most of the LaTeX packages that provide additional
glyphs.
--
FSF associate member #7257
Loading...