Internet-Draft trust.txt August 2024
Brown Expires 7 February 2025 [Page]
Workgroup:
Internet Engineering Task Force
Internet-Draft:
draft-org-trust-relationship-protocol-00
Published:
Intended Status:
Experimental
Expires:
Author:
RW. Brown, Ed.
Brown Wolf Consulting

Organization Trust Relationship Protocol

Abstract

This document specifies the "Organization Trust Relationship Protocol" method for service owners (organizations) to express in a standard format their trusted relationships with other organizations, as well as identify the social networks they control. This method was originally defined by Scott Yates in 2020 for this purpose.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 7 February 2025.

Table of Contents

1. Introduction

This document applies to services that provide resources that clients can access through URIs as defined in [RFC3986]. For example, in the context of HTTP, a browser is a client that displays the content of a web page.

Crawlers are automated clients. Search engines, for instance, have crawlers to recursively traverse links for indexing as defined in [RFC8288].

This document specifies the "trust.txt" method for service owners to declare their trusted organizational relationships. This protocol can be used by any organization to declare, in a standard format, their trusted relationships with other organizations, e.g., industry association memberships, organizational ownership or control, customer/vendor relationships, and social media accounts.

The concept of trust for many industries with a presence on the web (e.g., healthcare, journalism, etc.) has been under assault through the spead disinformation by anyone ranging from profiteers seeking financial gain to nation state actors exploiting the reputation legitimate organizations have invested in building over decades.

While there are many worthwhile efforts to build up networks of trust, those efforts are largely invisible to search engines, platforms, advertisers, researchers and others. They exist in the offline world as associations, cooperatives, and other affiliations based on commonalities, but modern consumers of current websites do not get all of the benefit of the work that is already being done.

This protocol seeks to make those offline networks of trust visible online.

The concept of a trust.txt file borrows heavily from two previous very successful efforts improving the overall experience of the internet: robots.txt [RFC9309] and ads.txt [IAB_Tech_Lab]. With both, website publishers are able to create a small and very manageable file that they have full control over that helps platforms and advertisers improve the overall ecosystem, and thereby the experience for users. So it will be with trust.txt.

2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. Scope

It is not in the scope of this document to define, for example, what is "truth" or to ascribe any level of credibility to an organization and its website.

This is a technical document providing a method for any web publisher or group of publishers to produce and post a small text file on their own website indicating how they chose to associate and detailing what related URLs they publish on, what they choose to disclose about policies regarding things such as generative Artificial Intelligence, and doing so in a way that can be uniformly visible to platforms, advertisers, and other interested parties.

A key attribute is that the file is posted to the web serving system of the Publisher, thus proving that the controllers of that website created the file.

4. Description of Roles

For this document, the following words describe specific roles. While these roles are broadly described, there is an implied trust relationship between organizations that fall into these respective roles. This trust relationship MAY be based on a legal agreement executed by the respective parties (e.g., a membership agreement or a purchase agreement).

Table 1
Publisher Any organization or individual who has control of a website that is generally available to the public on the internet. If part of that site is behind a paywall, that is allowed as long as some information is available at the root domain.
Association Any group of Publishers, as defined by that group. This may include traditional state-level associations, buying collectives, titles held by one owner, news-sharing organizations, etc. An Association will typically have a membership agreement that a Publisher will need to execute to be considered a member of the Association.
Control Allowable attributes in the file include control and controlledby. This allows, for example, an ownership group to express in the trust.txt file control over online publications, and vice versa. So, The New York Times owns and has control over thewirecutter.com. The Times is a member of the Associated Press, but is not controlled by the AP.
Vendor Any organization that sells products or services to a Customer.
Customer Any Publisher or Association that purchases products or services from a Vendor. A Customer will typically execute a purchase agreement with the Vendor for products or services provided.
Data Consumer Any organization or individual who ingests data from any trust.txt file. This may take the form of a crawler, a robot, an agent, or any automated script. It may also include human users who want to look at the file using a browser.

This document applies to services that provide resources that clients can access through URIs as defined in [RFC3986]. For example, in the context of HTTP, a browser is a client that displays the content of a web page.

Crawlers are automated clients. Search engines for instance have crawlers to recursively traverse links for indexing. This specification is not a form of access authorization.

This RFC specifies a format for encoding instructions in a plain-text file available to Data Consumers. Robots may retrieve these instructions before visiting other URLs on the site. Owners of those robots may use the data to learn about Associations and other affiliations of a web Publisher. The use of that data is completely up to the consumer of that data.

5. Access Method

The use of the "Well Known Uniform Resource Identifiers" according to [RFC8615] is required.

All of the instructions in that standard MUST be followed. For example, for the domain "example.com" the corresponding "trust.txt" well-known URI on "http://www.example.com/" would be "http://www.example.com/.well-known/trust.txt". Optionlly, a redirect from "http://www.example.com/trust.txt" to "http://www.example.com/.well-known/trust.txt" MAY be used for backward compatibility with older crawlers or robots.

The declarations MUST be accessible via HTTP and/or HTTPS from the website that the instructions are to be applied to under a standard relative path on the server host: "/.well-known/trust.txt" and HTTP request header containing "Content-Type: text/plain". It may be advisable to additionally use "Content-Type: text/plain; charset=utf-8" to signal UTF8 support.

Crawlers or robots SHOULD prefer HTTPS connections over HTTP when crawling trust.txt files. In any case where data is available at an HTTPS and an HTTP connection for the same URL, the data from HTTPS should be preferred.

If the server response indicates Success (HTTP 2xx Status Code) the Data Consumer SHOULD read the content, parse it, and use the declarations.

If the server response indicates an HTTP/HTTPS redirect (301, 302, 307 status codes), the Data Consumer SHOULD follow the redirect and consume the data as authoritative for the source of the redirect, if and only if the redirect is within scope of the original root domain as defined above. Up to three redirects are valid as long as each redirect location remains within the original root domain. For example an HTTP to HTTPS redirect within the same root domain is valid.

Any other redirect SHOULD be interpreted as an error and ignored.

If the server response indicates the resource is restricted (HTTP 401) the Data Consumer SHOULD seek direct contact with the site for authorization keys or clarification. Lacking direct contact, the Data Consumer should assume no declarations are being made under this system.

If the server response indicates the resource does not exist (HTTP Status Code 404), the Data Consumer MAY assume no declarations exist. For any other HTTP error encountered for a URL which the crawler or robot previously found data, the Data Consumer should assume that previous declarations by the Publisher are no longer valid.

If the trust.txt file is unreachable due to server or network errors, this means the file is undefined and the Data Consumer MAY assume no declarations exist. For example, in the context of HTTP, an unreachable trust.txt has a response code in the 500-599 range. For other undefined status codes, the Data Consumer SHOULD assume the file does not exist.

6. Trust URI

A Publisher MAY place a trust URI of the form "trust://<domain>!" in the HTML of the social network pages they control (identified by the "social=" entries in their trust.txt file). This will advertise their trust.txt file in their social network pages. The corresponding trust.txt file can be retrieved by replacing the "trust://" scheme with "https://", by removing the trailing character "!", and by appending "/.well-known/trust.txt" to the path. For example, if the Trust URI is "trust://example.com!" the resulting trust.txt file URL is "https://example.com/.well-known/trust.txt".

When visiting a page with a trust URI, a Data Consumer MAY fetch the corresponding trust.txt and verify that this page is listed in the retrieved trust file, therefore confirming that the Publisher controlling the origin domain also controls the referenced "social" URI.

See Appendix B for information on where to place the Trust URI on various social media platforms.

It should be noted that client-side parsers might have issues in attempting to retrieve the trust.txt files referenced by the Trust URI depending on the server's Cross Origin Resource Sharing (CORS) settings and the client browser security settings. Server-side crawlers on the other hand will not have these CORS issues.

7. File Format

The instructions are encoded as a formatted plain text object, described here.

The format logically consists of a non-empty set of records, separated by blank lines, returns, line-feeds or end-of-line command. The records consist of a set of lines ofthe form:

<attribute> "=" <value>

Comments are allowed anywhere in the file, and consist of optional whitespace, followed by a comment character '#' followed by the comment, terminated by the end-of-line.

8. File Content

Not all of the attributes here need to be used, but all of them are available. All Fields MAY be used more than once with the exception of controlledby and datatrainingallowed, which SHALL each be used only once. For instance, an Association will in nearly all cases have many members. Each one will get its own line in the file. All Data Consumers SHOULD store all valid data with each URI.

Note that there is no distinction in the file between Publishers and Associations. This is by design. An organization may both have members, and be a member of other Associations.

Table 2
Field Valid Data Notes
member URL Included here will be the URL for a member of any Association. One line for each member.
belongto URL This is the place to list an Association or other organization that a Publisher may belong to. One line for each organization.
control URL A domain directly controlled by one entity. For use by ownership groups or other similar organizational units.
controlledby URL Domain of owner or other controlling entity. There should be only one listing for the controlling organization.
social URI Any social media account directly controlled by the Publisher, see Appendix B for examples.
vendor URL Included here will be the URL for a Vendor to any Association or Publisher. One line for each Vendor.
customer URL Included here will be the URL for a Customer to any Vendor. One line for each Customer.
disclosure Directory on base URL If a Publisher has, for example, an ethics policy, it can publish the URI for that.
contact Contact information that can be in any form, including physical or email addresses, a URI, etc. As part of full transparency, Publishers or Associations may want to associate contact data so that people who are part of Data Consumer organizations can make contact with questions.
datatrainingallowed "yes" or "no" This is a directive to any scraper from an AI, a large language model, or any other tool designed to collect data from the site of the publisher to be used in forms other than referring users to the site of origin. A "yes" reply means that tools can scrape the data without restriction. A "no" means that a tool can not do that without a legally binding contract in place before collecting any data.

9. File Methods

9.1. Attribute Declaration Records

Any line containing a pattern of <ATTRIBUTE>=<VALUE> SHOULD be interpreted as a attribute declaration and the crawler or robot SHOULD store the data associated with the root domain.

The <ATTRIBUTE> is a string identifier without internal whitespace. The only supported separator is the equals sign '='. The <VALUE> is an open string that may contain arbitrary data.

The declaration line is terminated by the end-of-line marker. The Data Consumer should liberally interpret CR, CRLF, etc., as a line separator. For human readability it is recommended that attributes be declared at the end of the file, but this is not a strict requirement and SHOULD NOT be assumed.

9.2. Expiration

Data Consumers MAY independently store files, but if they do it is recommended that they regularly verify their own cache. Standard HTTP cache-control mechanisms can be used by both origin server and robots to influence the caching of the trust.txt file. Specifically consumers and replicators should take note of HTTP Expires header set by the origin server.

9.3. Limits

Crawlers or robots MAY impose a parsing limit, but it is recommended that the limit be at least 500 kilobytes (KB).

9.4. Where to Place the File

As detailed above, in the well-known directory, and in the top-level directory of your web server for backward compatibility.

When a robot looks for the "/trust.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/trust.txt" in its place.

For example, for "http://www.example.com/news/index.html" it will remove the "/news/index.html" and replace it with "/.well-known/trust.txt", and will end up with "http://www.example.com/.well-known/trust.txt".

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: "trust.txt", not "Trust.TXT."

10. Note About Base URI for Publishers on Other Platforms

There are some Publishers who work on a platform for which they do not control the base URI, for example a video Publisher working exclusively on YouTube. In that case, the Publisher would be advised to create a single-page website. The traditional public-facing homepage of that site can be just a link to the YouTube page. Then a trust.txt page can be placed on that public-facing homepage and place a Trust URI as described above in Section 6 on the YouTube page.

Another example is a local, independent newsroom that is part of a chain that uses one URL, for example www.bizjournals.com. In that case it will be up to the Publisher to have all trust.txt information in one file, or to set up individual URIs for each publication. Either one is acceptable.

11. Note on Reliability of Signals

Search engines and social networks never reveal exactly what goes into their algorithms for determining search engine results or placement in a social feed. To reveal that would be an invitation to fraudulent manipulation. That said, they clearly rely on "signals" from pages, and from the way those pages are referenced and used. The existence of this RFC is designed in large part to create a new and useful signal.

That said, while the intention of the trust.txt system is to increase the trust of legitimate Publishers, the existence of a file on a site should not be regarded a priori as a signal of trust on its own. Also, the lack of a trust.txt file should not on its own be regarded as a negative indicator.

The platforms and social networks must weigh for themselves the trustworthiness of any individual Publisher or Association.

The goal of this RFC and the underlying framework is to make the inference of trust by affiliation much more accessible and scalable.

12. Note on Importance of Network Effects

While every organization publishes a trust.txt file completely at its own discretion, the importance of networked connections is vital to make the signal valuable to algorithms assessing the value of the information in the file.

In other words, if a Publisher says that it belongs to an Association, but that Association does not publish a trust.txt file confirming that the Publisher is indeed a member, the strength of that signal may be lost, or even become negative. If a Publisher feels participation in an Association is a positive signal, that organization should strongly encourage the Association to publish its own trust.txt file.

13. IANA Considerations

The well known resource "trust.txt" has been registered with IANA in accordance with [RFC8615].

14. Security Considerations

This document should not affect the security of the Internet. Section Section 6, specifying the Trust URI, describes potential Cross Origin Resource Sharing issues that might arise in client side parsers.

15. References

15.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC3986]
Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, , <https://www.rfc-editor.org/info/rfc3986>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8288]
Nottingham, M., "Web Linking", RFC 8288, DOI 10.17487/RFC8288, , <https://www.rfc-editor.org/info/rfc8288>.
[RFC8615]
Nottingham, M., "Well-Known Uniform Resource Identifiers (URIs)", RFC 8615, DOI 10.17487/RFC8615, , <https://www.rfc-editor.org/info/rfc8615>.

15.2. Informative References

[RFC9309]
Koster, M., Illyes, G., Zeller, H., and L. Sassman, "Robots Exclusion Protocol", RFC 9309, DOI 10.17487/RFC9309, , <https://www.rfc-editor.org/info/rfc9309>.
[IAB_Tech_Lab]
"IAB Tech Lab, "ads.txt Version 1.1", , <https://iabtechlab.com/ads.txt/>.
[JournalList.net]
JournalList.net, "Specification for trust.txt file and underlying system Version 1.5", , <https://journallist.net/reference-document-for-trust-txt-specifications/>.

Appendix A. Examples of Files

These examples (created by JournalList, so data is not accurate) are examples of files that would be generated by individual organizations that would be placed on their own URL and controlled by them.

This the file that might be created by a Publication, The Durango Herald, of Colorado:

# Durango Herald trust.txt file from Ballantine Communications Inc.
#
# For more information on trust.txt see:
# 1. Home of the trust.txt specification - https://journallist.net
# 2. IETF RFC 8615 - Well-Known Uniform Resource Identifiers (URIs) -
# https://datatracker.ietf.org/doc/html/rfc8615
# 3. IANA's list of registered Well-Known URIs -
# https://iana.org/assignments/well-known-uris/well-known-uris.xhtml
#
belongto=https://coloradopressassociation.com
belongto=https://www.ap.org/
belongto=https://www.journallist.net/
control=http://www.adventurepro.us/
control=http://www.directoryplus.com/
control=http://www.doradomagazine.com/
control=http://www.dgomag.com/
control=http://the-journal.com/
control=http://pinerivertimes.com/
datatrainingallowed=no
social=https://facebook.com/TheDurangoHerald
social=https://twitter.com/durangoherald
social=https://instagram.com/durango_herald
social=https://www.youtube.com/channel/UCSfC3ozxDs8aOVDaMnaUAQA
contact=https://durangoherald.com/contact_us/staff

This is the file that might be created by a Publication that is owned by the owner of The Durango Herald, but does not have any other association memberships. This example, taken from the Herald's file, is called Adventure Pro Magazine:

# Adventure Pro Magazine trust.txt file from Ballantine
# Communications Inc.
#
# For more information on trust.txt see:
# 1. Home of the trust.txt specification - https://journallist.net
# 2. IETF RFC 8615 - Well-Known Uniform Resource Identifiers (URIs) -
# https://datatracker.ietf.org/doc/html/rfc8615
# 3. IANA's list of registered Well-Known URIs -
# https://iana.org/assignments/well-known-uris/well-known-uris.xhtml
#
controlledby=http://www.durangoherald.com/
datatrainingallowed=no
social=https://www.facebook.com/AdventureProMag
social=https://twitter.com/AdventureProMag
social=https://www.instagram.com/adventurepromagazine/
social=https://www.youtube.com/channel/UCm0EL3_uRC6BFBtCud8mw7Q
social=https://flipboard.com/@AdventurePro
contact=https://adventurepro.us/about/

The Durango Herald could place the following trust URI on its X (Twitter) page (e.g., in the bio of its "durangoherald" account): "trust://durangoherald.com!". A Data Consumer visiting the durangoherald X/Twitter account would then be able to verify that this X/Twitter account is listed in the Durango Herald's trust.txt file.

This is the (shortened) file that might be created by an Association, The Colorado Press Association:

# CPA trust.txt file
#
# For more information on trust.txt see:
# 1. Home of the trust.txt specification - https://journallist.net
# 2. IETF RFC 8615 - Well-Known Uniform Resource Identifiers (URIs) -
# https://datatracker.ietf.org/doc/html/rfc8615
# 3. IANA's list of registered Well-Known URIs -
# https://iana.org/assignments/well-known-uris/well-known-uris.xhtml
#
belongto=http://newspapers.org/
belongto=https://www.nammembers.com/
belongto=https://coloradofoic.org/
belongto=https://www.journallist.net/
member=http://www.akronnewsreporter.com/
member=http://www.alamosanews.com/
member=http://www.theflume.com/
member=http://arvadapress.com/
member=http://www.aspendailynews.com/
member=http://www.aspentimes.com/
member=http://www.hightimbertimes.com/
member=http://www.aurorasentinel.com/
datatrainingallowed=yes
social=https://www.facebook.com/coloradopressassociation/
social=https://twitter.com/ColoradoPress
social=https://www.linkedin.com/company/colorado-press-association/
social=https://www.youtube.com/channel/UCDHXPIQtH1ze7UM3aT8ivKA/

This is the (shortened) file that might be created by the Associated Press:

# Associated Press trust.txt file
#
# For more information on trust.txt see:
# 1. Home of the trust.txt specification - https://journallist.net
# 2. IETF RFC 8615 - Well-Known Uniform Resource Identifiers (URIs) -
# https://datatracker.ietf.org/doc/html/rfc8615
# 3. IANA's list of registered Well-Known URIs -
# https://iana.org/assignments/well-known-uris/well-known-uris.xhtml
#
belongto=https://iptc.org/
belongto=https://journallist.net/
#
member=https://www.hearst.com/
member=https://scripps.com/
member=https://www.jsonline.com/
member=https://www.swiftcom.com/
member=https://www.spokesman.com/
member=https://www.nytimes.com/
member=https://www.ogdennews.com/
#
social=https://www.facebook.com/APNews
social=https://www.instagram.com/apnews/
social=https://twitter.com/ap
social=https://www.linkedin.com/company/associated-press
social=https://www.youtube.com/ap
#
contact=https://www.ap.org/contact-us/

Appendix B. Trust URI Placement Guidelines

The trust.txt specification is platform agnostic. In order to improve interoperability one SHOULD follow these guidelines when creating Trust URI entries on the following platforms.

Acknowledgements

Many thanks to these people who reviewed the original trust.txt specification. (Reviewing does not imply endorsement):

Claire Wardle, First Draft News; Ralph Brown; John Daniszewski, Heather Edwards, Associated Press; Tom Brand, NAFB; Justin Sasso, Colo. Association of Broadcasters, Mickey Osterreicher, NPPA; Bill Skeet and Cedar Milazzo, Trustie; Sandro Hawke, W3C fellow; Jill Fraschman, Colo. Press Association; Connie Moon Sehat, NewsQ/Credibility Coalition; Gabriel Altay, Kensho; Sean La Roque-Doherty lawyer, writer and IEEE P.7011 participant; Andres Rodriguez, Instituto Tecnológico de Buenos Aires (ITBA) and IEEE P.7011 participant; Brendan Quinn, IPTC; Scott Cunningham original ads.txt advisor; Ed Bice, Scott Hale and Megan Marrelli, Meedan; Jason Kint, Chris Pedigo, Digital Content Next; Kati London and Christian Paquin, Microsoft; Brendan Quinn, IPTC; Thad Swiderski, eTypeServices; Michael W. Kearney, Ph.D., AI & Digital Media Expert; Kenny Katzgrau, CEO, BroadStreet Ads; Laura Ellis and Chris Needham, BBC.

Contributors

Thanks to Scott Yates for the contribution of the original trust.txt specification [JournalList.net] and Christian Paquin for the contribution of the Trust URI specification.

Author's Address

Ralph W Brown (editor)
Brown Wolf Consulting
1355 S Foothills Hwy
Boulder, CO 80305-7319
United States of America