1. Introduction

In this tutorial, we’ll introduce the concept of a URI (Uniform Resource Identifier). We’ll analyze and explain their components and talk about their purpose. Additionally, we’ll describe how URIs relate to Uniform Resource Locators (URLs) and Uniform Resource Names (URNs).

2. Uniform Resource Identifier (URI)

A Uniform Resource Identifier (URI) is a sequence of characters identifying a hypertext resource. A resource can be abstract or physical, existing, or yet to be created in the future. The URI syntax is flexible enough to cover all those cases.

2.1. URI General Syntax

The syntax of a generic URI defines a URI as a sequence of components we refer to as the scheme, authority, path, query, and fragment:

URI syntax diagram

2.2. URI Components

Now, let’s those components in more detail.

The scheme is the first component. It’s a sequence of characters beginning with a letter, followed by any combination of letters, digits, plus (+), period (.), or hyphen (-) signs. In the canonical form, the letters are lowercase, although the syntax is case-insensitive. The scheme is mandatory, so we can’t omit it.

The second component, authority, consists of three parts: the user authentication information, a host, and a port with the following syntax:

[username"@"]host[":"port]

The third component is the path. It’s a sequence of path segments separated by slashes (/).

Then follows the query, which is an optional component containing a query string preceded by a question mark. Often, it consists of a sequence of attribute-value pairs separated by delimiters as ampersands (&) or semicolons (;).

Finally, the fragment is an optional component that contains an identifier of a secondary resource preceded by a hash sign. For instance, we can use fragments to refer to a section heading on a web page.

3. URI Examples

Let’s now check some examples of how to use a URI as a locator, a name, or both.

3.1. URL Examples

Let’s take a look at the following URI:

URI example with its components

It’s an example of a Uniform Resource Locator (URL). The URLs constitute a subset of URIs which identify resources by their network location. They also specify a mechanism for retrieving the resource. For example, the URL:

http://example.org/wiki/Main\_Page

refers to the resource /wiki/Main_Page. The resource is in the HTML format and is obtainable via the Hypertext Transfer Protocol (http:) from a network host named example.org.

Some languages like Java require a special process to encode/decode URLs.

3.2. URN Examples

On the other hand, the Uniform Resource Names (URNs) are a subset of URI that remain globally unique. They’re intended to persist even if the resource becomes unavailable or ceases to exist.

The general format of a URN is:

urn::

For example, the URN:

urn:isbn:0451450523

identifies the book “The Last Unicorn” by its book number (isbn).

In the same way, the URN

urn:isan:0000-0000-2CEA-0000-1-0000-0000-Y

identifies the 2001 movie “Spider-man” by its audiovisual number (isan).

3.3. Other URI Schemes

The two most well-known schemes are probably http and https.

The Hypertext Transfer Protocol (HTTP) scheme http operates at the application layer using port number 80 for communication. It doesn’t use encryption, nor require certificates to verify the identity. Hence, it has security issues and is prone to cyberattacks.

In contrast, the Hypertext Transfer Protocol Secure (HTTPS) scheme https uses encryption-decryption and requires certificates to verify the identity of the website. Therefore, it is secure and designed to resist cyberattacks.

There are many other schemes: tel for phone numbers, mailto for e-mail addresses, skype for Skype calls, and so on.

4. A Bit More on the URI Schemes

As we said, the URI scheme is the first component in a URI. It allows parsers to identify the type of resource the URI represents. Furthermore, the scheme indicates how to proceed with the syntax analysis of the URI and which semantics apply to it.

In many cases, a naming authority defines the URI scheme and the rules to describe and interpret its URI type. The same authority also defines the semantics associated with the URI scheme and how to interpret it.
Many URI schemes are registered with the Internet Assigned Numbers Authority (IANA), the body that coordinates the elements of various Internet standards. However, not all schemes currently in use are registered there.

5. Conclusion

In this article, we explained the URIs, URLs, and URNs, as well as their components.

The most important of these is the URI scheme, the first component in the URI definition. It allows us to determine the application and rules for processing the URI and accessing the resource associated with it.


« 上一篇: 高效排序链表