Language-tagged strings

They currently have a special status in RDF.  "RDF 1.1 Concepts and Abstract Syntax currently contains many caveats to accommodate the idiosyncratic nature of language-tagged strings"
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0090.html

"It is a real pain to create these 3 component literals and to query for different languages and datatypes in SPARQL.
And worse still, if you want to query for strings that may or may not have language tags on, you need to do some real messing about."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0098.html

"Using a general way to make statements about literals sounds good to me.
For geographical data I also see too many statements being squashed into a
single literal.  It is difficult to process and to store. . . . Why have a standard provision for
indicating the language of a text string and not its pronunciation for
example?"
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0102.html

"language codes do matter, but are pretty inconvenient for multiple reasons:
- comparability with untyped/plain strings (of course, and most obviously
and counter-intuitive to RDF novices),
- complexity (BCP47 defines (a) complex selection rules among ISO 639
language tags, and (b) complex rules for composition, e.g., with script and
region codes), and
- confusability (having 2-letter codes aside with 3-letter codes for the
same language can let people used to work with 3-letter codes chose
2-letter codes, which is an easy error to make, but can result in failure
to compare, e.g., "cat"@eng and "cat"@en. Not sure what should happen when
you compare "рука"@sr-Cyrl with "рука"@sr. Both are identical, the first is
just more explicit in stating that this is Cyrillic.)
- coverage (for many applications, ISO639 simply isn't fine-grained or
well-defined enough, and its extension is slow, bureaucratic and doubtful)."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0116.html

"RDF seems to violate its own doctrine by having separate
systems for data types and languages of literals."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0143.html 

## IDEA: Eliminate the special status of language-tagged strings
"would it be possible to do away with the special status of language-tagged strings? . . . Would it be possible to define a regular lexical space, e.g., containing "hello@en"^^rdf:langString, together with a value-2-lexical and a lexical-2-value mapping?  The N3 and SPARQL notation "hello"@en will of course still be available, and will be syntactic sugar for "hello@en"^^rdf:langString."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0090.html

"Surely languages and datatypes should simply be RDF properties of Literals, which are 1 component things?
Much easier to explain to developers, and for them to use."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0098.html

"That also fits in nicely with making it easier to represent property graphs."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0101.html

"it would be much more efficient to declare the language used only once, at the class and/or metadata level. Using plain properties to indicate language enables doing that."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0145.html 

**CONCERN:** "The RDF 1.1 WG did spend some time [on language tags] - both on putting the langtag 
into the lexical space and putting the lang tag into the datatype.  Both 
are not so easy; in the end the rdf@langString at least meant all 
literals had a datatype."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0097.html

**CONCERN:** "chat"@en and "chat"@fr are different.
  "chat" rdf:lang "en" .
  "chat" rdf:lang "fr" .
makes every use of "chat" both @en and @fr.
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0148.html 

"I think the only way to avoid this would be if subject literals are be
taken as a notational short-hand for a blank node that carries the literal
as an rdf:value. (And, in a separate step, a problem-specific bnode
skolemization routine could be provided to give it a proper URI.)"
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0156.html

"I really don't have a problem with every instance of "chat"^^xsd:string being both en and fr if someone has asserted that using rdf:lang. . . . Basically I think language tags are trying to avoid having to say in RDF what should be in the RDF."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0164.html

## IDEA: Use W3C OntoLex / Lemony as a basis for language tagging
"[It] is possible already [to declare language only once, at the class and/or metadata level] (using the pointers to ISO639 URIs in my earlier mail), and it is recommended practice to do so in OntoLex/lemon . . . . OntoLex is . . . a W3C community group report, but it would
be the most suitable basis for future standardization efforts in this direction."
https://www.w3.org/2016/05/ontolex/#lexicon-and-lexicon-metadata
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0145.html 

## IDEA: Use URIs to identify language
"A much more convenient solution would be to identify the language by means
of a URI. This can be an ISO 639 category (see under
http://id.loc.gov/vocabulary/iso639-2.html and
http://id.loc.gov/vocabulary/iso639-1.html; for ISO 639, cf.
http://www.lexvo.org/), or provided by another authority (e.g.,
https://glottolog.org/). Other properties (e.g., xsd datatypes) could also
be stated about a literal. Two strings could be considered identical if the
values are the same and the properties of one are a proper subset of the
properties of the other."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0116.html

"a downward-compatible notation is possible:
- take @ as a short-hand for ^^xsd:string, with language identifiers
following
- if the language identifier is not a URI, it must be BCP47
- BCP47 codes can be decomposed in the background into their sub-properties
- permit multiple language URIs/BCP47 codes (if you want to provide both a
BCP47 code [indicating region and script] and a URI [unambiguously
identifying the language])
- let plain literals be untyped"
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0119.html


**CONCERN:** "No. All literals MUST have a type, so that queries can have a 
unique response when they ask for the type or specify the type. 
The RDF 1.1 WG spent a lot of time and effort on this. Allowing 
untyped plain literals in RDF 2004 was a bug. Please do not screw 
this up again. Plain literals are syntactically legal (to 
preserve backward compatibility) but they now have type xsd:string."
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0149.html

"But this only means that "рука" entails [a xsd:string] . . . .
As far as comparisons between strings are concerned, this makes no
difference to the example, as the subset relation between the (implicit)
properties of "рука"@sr and "рука" still holds"
https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0152.html 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Language-tagged strings #22

IDEA: Eliminate the special status of language-tagged strings

IDEA: Use W3C OntoLex / Lemony as a basis for language tagging

IDEA: Use URIs to identify language

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Language-tagged strings #22

Description

IDEA: Eliminate the special status of language-tagged strings

IDEA: Use W3C OntoLex / Lemony as a basis for language tagging

IDEA: Use URIs to identify language

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions