Internet-Draft | trust.txt | August 2024 |
Brown | Expires 7 February 2025 | [Page] |
This document specifies the "Organization Trust Relationship Protocol" method for service owners (organizations) to express in a standard format their trusted relationships with other organizations, as well as identify the social networks they control. This method was originally defined by Scott Yates in 2020 for this purpose.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 7 February 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This document applies to services that provide resources that clients can access through URIs as defined in [RFC3986]. For example, in the context of HTTP, a browser is a client that displays the content of a web page.¶
Crawlers are automated clients. Search engines, for instance, have crawlers to recursively traverse links for indexing as defined in [RFC8288].¶
This document specifies the "trust.txt" method for service owners to declare their trusted organizational relationships. This protocol can be used by any organization to declare, in a standard format, their trusted relationships with other organizations, e.g., industry association memberships, organizational ownership or control, customer/vendor relationships, and social media accounts.¶
The concept of trust for many industries with a presence on the web (e.g., healthcare, journalism, etc.) has been under assault through the spead disinformation by anyone ranging from profiteers seeking financial gain to nation state actors exploiting the reputation legitimate organizations have invested in building over decades.¶
While there are many worthwhile efforts to build up networks of trust, those efforts are largely invisible to search engines, platforms, advertisers, researchers and others. They exist in the offline world as associations, cooperatives, and other affiliations based on commonalities, but modern consumers of current websites do not get all of the benefit of the work that is already being done.¶
This protocol seeks to make those offline networks of trust visible online.¶
The concept of a trust.txt
file borrows heavily from two previous very successful efforts improving the overall experience of the
internet: robots.txt [RFC9309] and ads.txt [IAB_Tech_Lab]. With both, website publishers are able to create a small
and very manageable file that they have full control over that helps platforms and advertisers improve the overall ecosystem, and thereby
the experience for users. So it will be with trust.txt
.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
It is not in the scope of this document to define, for example, what is "truth" or to ascribe any level of credibility to an organization and its website.¶
This is a technical document providing a method for any web publisher or group of publishers to produce and post a small text file on their own website indicating how they chose to associate and detailing what related URLs they publish on, what they choose to disclose about policies regarding things such as generative Artificial Intelligence, and doing so in a way that can be uniformly visible to platforms, advertisers, and other interested parties.¶
A key attribute is that the file is posted to the web serving system of the Publisher, thus proving that the controllers of that website created the file.¶
For this document, the following words describe specific roles. While these roles are broadly described, there is an implied trust relationship between organizations that fall into these respective roles. This trust relationship MAY be based on a legal agreement executed by the respective parties (e.g., a membership agreement or a purchase agreement).¶
Publisher | Any organization or individual who has control of a website that is generally available to the public on the internet. If part of that site is behind a paywall, that is allowed as long as some information is available at the root domain. |
Association | Any group of Publishers, as defined by that group. This may include traditional state-level associations, buying collectives, titles held by one owner, news-sharing organizations, etc. An Association will typically have a membership agreement that a Publisher will need to execute to be considered a member of the Association. |
Control | Allowable attributes in the file include control and controlledby . This allows, for example, an
ownership group to express in the trust.txt file control over online publications, and vice versa. So, The New York Times owns and
has control over thewirecutter.com. The Times is a member of the Associated Press, but is not controlled by the AP. |
Vendor | Any organization that sells products or services to a Customer. |
Customer | Any Publisher or Association that purchases products or services from a Vendor. A Customer will typically execute a purchase agreement with the Vendor for products or services provided. |
Data Consumer | Any organization or individual who ingests data from any trust.txt file. This may take the form of a
crawler, a robot, an agent, or any automated script. It may also include human users who want to look at the file using a browser. |
This document applies to services that provide resources that clients can access through URIs as defined in [RFC3986]. For example, in the context of HTTP, a browser is a client that displays the content of a web page.¶
Crawlers are automated clients. Search engines for instance have crawlers to recursively traverse links for indexing. This specification is not a form of access authorization.¶
This RFC specifies a format for encoding instructions in a plain-text file available to Data Consumers. Robots may retrieve these instructions before visiting other URLs on the site. Owners of those robots may use the data to learn about Associations and other affiliations of a web Publisher. The use of that data is completely up to the consumer of that data.¶
The use of the "Well Known Uniform Resource Identifiers" according to [RFC8615] is required.¶
All of the instructions in that standard MUST be followed. For example, for the domain "example.com" the corresponding
"trust.txt
" well-known URI on "http://www.example.com/
" would be "http://www.example.com/.well-known/trust.txt
".
Optionlly, a redirect from "http://www.example.com/trust.txt
" to "http://www.example.com/.well-known/trust.txt
" MAY be used
for backward compatibility with older crawlers or robots.¶
The declarations MUST be accessible via HTTP and/or HTTPS from the website that the instructions are to be applied to under a standard
relative path on the server host: "/.well-known/trust.txt
" and HTTP request header containing "Content-Type: text/plain
".
It may be advisable to additionally use "Content-Type: text/plain; charset=utf-8
" to signal UTF8 support.¶
Crawlers or robots SHOULD prefer HTTPS connections over HTTP when crawling trust.txt
files. In any case where data is available at
an HTTPS and an HTTP connection for the same URL, the data from HTTPS should be preferred.¶
If the server response indicates Success (HTTP 2xx Status Code) the Data Consumer SHOULD read the content, parse it, and use the declarations.¶
If the server response indicates an HTTP/HTTPS redirect (301, 302, 307 status codes), the Data Consumer SHOULD follow the redirect and consume the data as authoritative for the source of the redirect, if and only if the redirect is within scope of the original root domain as defined above. Up to three redirects are valid as long as each redirect location remains within the original root domain. For example an HTTP to HTTPS redirect within the same root domain is valid.¶
Any other redirect SHOULD be interpreted as an error and ignored.¶
If the server response indicates the resource is restricted (HTTP 401) the Data Consumer SHOULD seek direct contact with the site for authorization keys or clarification. Lacking direct contact, the Data Consumer should assume no declarations are being made under this system.¶
If the server response indicates the resource does not exist (HTTP Status Code 404), the Data Consumer MAY assume no declarations exist. For any other HTTP error encountered for a URL which the crawler or robot previously found data, the Data Consumer should assume that previous declarations by the Publisher are no longer valid.¶
If the trust.txt
file is unreachable due to server or network errors, this means the file is undefined and the Data Consumer MAY
assume no declarations exist. For example, in the context of HTTP, an unreachable trust.txt
has a response code in the 500-599 range.
For other undefined status codes, the Data Consumer SHOULD assume the file does not exist.¶
A Publisher MAY place a trust URI of the form "trust://<domain>!
" in the HTML of the social network pages they control
(identified by the "social=
" entries in their trust.txt
file). This will advertise their trust.txt
file in their
social network pages. The corresponding trust.txt
file can be retrieved by replacing the "trust://
" scheme with
"https://
", by removing the trailing character "!
", and by appending "/.well-known/trust.txt
" to the path.
For example, if the Trust URI is "trust://example.com!
" the resulting trust.txt file URL is
"https://example.com/.well-known/trust.txt
".¶
When visiting a page with a trust URI, a Data Consumer MAY fetch the corresponding trust.txt and verify that this page is listed in the retrieved trust file, therefore confirming that the Publisher controlling the origin domain also controls the referenced "social" URI.¶
See Appendix B for information on where to place the Trust URI on various social media platforms.¶
It should be noted that client-side parsers might have issues in attempting to retrieve the trust.txt
files referenced by the
Trust URI depending on the server's Cross Origin Resource Sharing (CORS) settings and the client browser security settings. Server-side
crawlers on the other hand will not have these CORS issues.¶
The instructions are encoded as a formatted plain text object, described here.¶
The format logically consists of a non-empty set of records, separated by blank lines, returns, line-feeds or end-of-line command. The records consist of a set of lines ofthe form:¶
<attribute> "=" <value>
¶
Comments are allowed anywhere in the file, and consist of optional whitespace, followed by a comment character '#' followed by the comment, terminated by the end-of-line.¶
Not all of the attributes here need to be used, but all of them are available. All Fields MAY be used more than once with the exception of controlledby and datatrainingallowed, which SHALL each be used only once. For instance, an Association will in nearly all cases have many members. Each one will get its own line in the file. All Data Consumers SHOULD store all valid data with each URI.¶
Note that there is no distinction in the file between Publishers and Associations. This is by design. An organization may both have members, and be a member of other Associations.¶
Field | Valid Data | Notes |
---|---|---|
member | URL | Included here will be the URL for a member of any Association. One line for each member. |
belongto | URL | This is the place to list an Association or other organization that a Publisher may belong to. One line for each organization. |
control | URL | A domain directly controlled by one entity. For use by ownership groups or other similar organizational units. |
controlledby | URL | Domain of owner or other controlling entity. There should be only one listing for the controlling organization. |
social | URI | Any social media account directly controlled by the Publisher, see Appendix B for examples. |
vendor | URL | Included here will be the URL for a Vendor to any Association or Publisher. One line for each Vendor. |
customer | URL | Included here will be the URL for a Customer to any Vendor. One line for each Customer. |
disclosure | Directory on base URL | If a Publisher has, for example, an ethics policy, it can publish the URI for that. |
contact | Contact information that can be in any form, including physical or email addresses, a URI, etc. | As part of full transparency, Publishers or Associations may want to associate contact data so that people who are part of Data Consumer organizations can make contact with questions. |
datatrainingallowed | "yes" or "no" | This is a directive to any scraper from an AI, a large language model, or any other
tool designed to collect data from the site of the publisher to be used in forms other than referring users to the site of origin. A
"yes " reply means that tools can scrape the data without restriction. A "no " means that a tool can not do that without a
legally binding contract in place before collecting any data. |
Any line containing a pattern of <ATTRIBUTE>=<VALUE>
SHOULD be interpreted as a attribute declaration and the crawler
or robot SHOULD store the data associated with the root domain.¶
The <ATTRIBUTE>
is a string identifier without internal whitespace. The only supported separator is the equals sign '='.
The <VALUE>
is an open string that may contain arbitrary data.¶
The declaration line is terminated by the end-of-line marker. The Data Consumer should liberally interpret CR, CRLF, etc., as a line separator. For human readability it is recommended that attributes be declared at the end of the file, but this is not a strict requirement and SHOULD NOT be assumed.¶
Data Consumers MAY independently store files, but if they do it is recommended that they regularly verify their own cache. Standard HTTP cache-control mechanisms can be used by both origin server and robots to influence the caching of the trust.txt file. Specifically consumers and replicators should take note of HTTP Expires header set by the origin server.¶
Crawlers or robots MAY impose a parsing limit, but it is recommended that the limit be at least 500 kilobytes (KB).¶
As detailed above, in the well-known directory, and in the top-level directory of your web server for backward compatibility.¶
When a robot looks for the "/trust.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and
puts "/trust.txt
" in its place.¶
For example, for "http://www.example.com/news/index.html
" it will remove the "/news/index.html
" and replace it with
"/.well-known/trust.txt
", and will end up with "http://www.example.com/.well-known/trust.txt
".¶
So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same
place where you put your web site's main "index.html
" welcome page. Where exactly that is, and how to put the file there, depends
on your web server software.¶
Remember to use all lower case for the filename: "trust.txt
", not "Trust.TXT
."¶
There are some Publishers who work on a platform for which they do not control the base URI, for example a video Publisher working exclusively on YouTube. In that case, the Publisher would be advised to create a single-page website. The traditional public-facing homepage of that site can be just a link to the YouTube page. Then a trust.txt page can be placed on that public-facing homepage and place a Trust URI as described above in Section 6 on the YouTube page.¶
Another example is a local, independent newsroom that is part of a chain that uses one URL, for example www.bizjournals.com. In that case it will be up to the Publisher to have all trust.txt information in one file, or to set up individual URIs for each publication. Either one is acceptable.¶
Search engines and social networks never reveal exactly what goes into their algorithms for determining search engine results or placement in a social feed. To reveal that would be an invitation to fraudulent manipulation. That said, they clearly rely on "signals" from pages, and from the way those pages are referenced and used. The existence of this RFC is designed in large part to create a new and useful signal.¶
That said, while the intention of the trust.txt system is to increase the trust of legitimate Publishers, the existence of a file on a site should not be regarded a priori as a signal of trust on its own. Also, the lack of a trust.txt file should not on its own be regarded as a negative indicator.¶
The platforms and social networks must weigh for themselves the trustworthiness of any individual Publisher or Association.¶
The goal of this RFC and the underlying framework is to make the inference of trust by affiliation much more accessible and scalable.¶
While every organization publishes a trust.txt file completely at its own discretion, the importance of networked connections is vital to make the signal valuable to algorithms assessing the value of the information in the file.¶
In other words, if a Publisher says that it belongs to an Association, but that Association does not publish a trust.txt file confirming that the Publisher is indeed a member, the strength of that signal may be lost, or even become negative. If a Publisher feels participation in an Association is a positive signal, that organization should strongly encourage the Association to publish its own trust.txt file.¶
The well known resource "trust.txt" has been registered with IANA in accordance with [RFC8615].¶
This document should not affect the security of the Internet. Section Section 6, specifying the Trust URI, describes potential Cross Origin Resource Sharing issues that might arise in client side parsers.¶
These examples (created by JournalList, so data is not accurate) are examples of files that would be generated by individual organizations that would be placed on their own URL and controlled by them.¶
This the file that might be created by a Publication, The Durango Herald, of Colorado:¶
# Durango Herald trust.txt file from Ballantine Communications Inc. # # For more information on trust.txt see: # 1. Home of the trust.txt specification - https://journallist.net # 2. IETF RFC 8615 - Well-Known Uniform Resource Identifiers (URIs) - # https://datatracker.ietf.org/doc/html/rfc8615 # 3. IANA's list of registered Well-Known URIs - # https://iana.org/assignments/well-known-uris/well-known-uris.xhtml # belongto=https://coloradopressassociation.com belongto=https://www.ap.org/ belongto=https://www.journallist.net/ control=http://www.adventurepro.us/ control=http://www.directoryplus.com/ control=http://www.doradomagazine.com/ control=http://www.dgomag.com/ control=http://the-journal.com/ control=http://pinerivertimes.com/ datatrainingallowed=no social=https://facebook.com/TheDurangoHerald social=https://twitter.com/durangoherald social=https://instagram.com/durango_herald social=https://www.youtube.com/channel/UCSfC3ozxDs8aOVDaMnaUAQA contact=https://durangoherald.com/contact_us/staff¶
This is the file that might be created by a Publication that is owned by the owner of The Durango Herald, but does not have any other association memberships. This example, taken from the Herald's file, is called Adventure Pro Magazine:¶
# Adventure Pro Magazine trust.txt file from Ballantine # Communications Inc. # # For more information on trust.txt see: # 1. Home of the trust.txt specification - https://journallist.net # 2. IETF RFC 8615 - Well-Known Uniform Resource Identifiers (URIs) - # https://datatracker.ietf.org/doc/html/rfc8615 # 3. IANA's list of registered Well-Known URIs - # https://iana.org/assignments/well-known-uris/well-known-uris.xhtml # controlledby=http://www.durangoherald.com/ datatrainingallowed=no social=https://www.facebook.com/AdventureProMag social=https://twitter.com/AdventureProMag social=https://www.instagram.com/adventurepromagazine/ social=https://www.youtube.com/channel/UCm0EL3_uRC6BFBtCud8mw7Q social=https://flipboard.com/@AdventurePro contact=https://adventurepro.us/about/¶
The Durango Herald could place the following trust URI on its X (Twitter) page (e.g., in the bio of its "durangoherald" account):
"trust://durangoherald.com!
". A Data Consumer visiting the durangoherald X/Twitter account would then be able to verify that this
X/Twitter account is listed in the Durango Herald's trust.txt file
.¶
This is the (shortened) file that might be created by an Association, The Colorado Press Association:¶
# CPA trust.txt file # # For more information on trust.txt see: # 1. Home of the trust.txt specification - https://journallist.net # 2. IETF RFC 8615 - Well-Known Uniform Resource Identifiers (URIs) - # https://datatracker.ietf.org/doc/html/rfc8615 # 3. IANA's list of registered Well-Known URIs - # https://iana.org/assignments/well-known-uris/well-known-uris.xhtml # belongto=http://newspapers.org/ belongto=https://www.nammembers.com/ belongto=https://coloradofoic.org/ belongto=https://www.journallist.net/ member=http://www.akronnewsreporter.com/ member=http://www.alamosanews.com/ member=http://www.theflume.com/ member=http://arvadapress.com/ member=http://www.aspendailynews.com/ member=http://www.aspentimes.com/ member=http://www.hightimbertimes.com/ member=http://www.aurorasentinel.com/ datatrainingallowed=yes social=https://www.facebook.com/coloradopressassociation/ social=https://twitter.com/ColoradoPress social=https://www.linkedin.com/company/colorado-press-association/ social=https://www.youtube.com/channel/UCDHXPIQtH1ze7UM3aT8ivKA/¶
This is the (shortened) file that might be created by the Associated Press:¶
# Associated Press trust.txt file # # For more information on trust.txt see: # 1. Home of the trust.txt specification - https://journallist.net # 2. IETF RFC 8615 - Well-Known Uniform Resource Identifiers (URIs) - # https://datatracker.ietf.org/doc/html/rfc8615 # 3. IANA's list of registered Well-Known URIs - # https://iana.org/assignments/well-known-uris/well-known-uris.xhtml # belongto=https://iptc.org/ belongto=https://journallist.net/ # member=https://www.hearst.com/ member=https://scripps.com/ member=https://www.jsonline.com/ member=https://www.swiftcom.com/ member=https://www.spokesman.com/ member=https://www.nytimes.com/ member=https://www.ogdennews.com/ # social=https://www.facebook.com/APNews social=https://www.instagram.com/apnews/ social=https://twitter.com/ap social=https://www.linkedin.com/company/associated-press social=https://www.youtube.com/ap # contact=https://www.ap.org/contact-us/¶
The trust.txt
specification is platform agnostic. In order to improve interoperability one SHOULD follow these guidelines when
creating Trust URI entries on the following platforms.¶
https://www.facebook.com/<accountname>
on the Facebook account page: in the account
page's Intro field.¶
https://github.com/<accountname>
on the account page: in the GitHub account page's
Bio field.¶
https://www.instagram.com/<accountname>
on the account page: in the Instagram account
page's Bio field.¶
https://www.linkedin.com/<type>/@<accountname>/
, where <type> is in
(for individuals), school
or company
, on the LinkedIn account page: for individuals - in the LinkedIn account page's
About field; for schools and companies - in the account page's Overview section. This will appear on the About page
(<url>/about/
) of the account.¶
https://<accountname>.medium.com
form (depending on the owner's Medium account settings) on the account page: preferably
in the Medium account page's Bio field, optionally in the Medium About page's description.¶
https://rumble.com/c/<accountname>
for individuals, in the user's Rumble channel
Description field (per channel). This will appear on the About page (<url>/about/) of the account.¶
https://t.me/<accountname>
for accounts - in the Telegram account bio field; for
channels - in the Telegram channel description field.¶
https:///www.threads.net/@<accountname>
in the Threads account page's Bio field.¶
https://www.tiktok.com/@<accountname>
on the TikTok account page, in the account
page's Bio field.¶
https://vimeo.com/c/<accountname>
on the Vimeo account page, in the account page's
Bio field.¶
https://x.com/c/<accountname>
on the X account page, in the account
page's Bio field.¶
https://youtube.com/@<accountname>
on the YouTube account page, in the About page's
description. This will appear on the About page (<url>/about/
) of the account.¶
Many thanks to these people who reviewed the original trust.txt specification. (Reviewing does not imply endorsement):¶
Claire Wardle, First Draft News; Ralph Brown; John Daniszewski, Heather Edwards, Associated Press; Tom Brand, NAFB; Justin Sasso, Colo. Association of Broadcasters, Mickey Osterreicher, NPPA; Bill Skeet and Cedar Milazzo, Trustie; Sandro Hawke, W3C fellow; Jill Fraschman, Colo. Press Association; Connie Moon Sehat, NewsQ/Credibility Coalition; Gabriel Altay, Kensho; Sean La Roque-Doherty lawyer, writer and IEEE P.7011 participant; Andres Rodriguez, Instituto Tecnológico de Buenos Aires (ITBA) and IEEE P.7011 participant; Brendan Quinn, IPTC; Scott Cunningham original ads.txt advisor; Ed Bice, Scott Hale and Megan Marrelli, Meedan; Jason Kint, Chris Pedigo, Digital Content Next; Kati London and Christian Paquin, Microsoft; Brendan Quinn, IPTC; Thad Swiderski, eTypeServices; Michael W. Kearney, Ph.D., AI & Digital Media Expert; Kenny Katzgrau, CEO, BroadStreet Ads; Laura Ellis and Chris Needham, BBC.¶
Thanks to Scott Yates for the contribution of the original trust.txt specification [JournalList.net] and Christian Paquin for the contribution of the Trust URI specification.¶