URLs are one of the most used web standard. Everytime you enter a domain in the browser or click a link in a webpage you are accessing a URL. Often we want to compare URLs. When we do, we need to remember that URLs are more than a mere strings. A URL has a structure and contains data. When comparing URLs all parts of their structures and data must be compared.
Looking at structure when comparing URLs
URL = [protocol]://[host]/[path]?[query]#[fragment]
In the example above we have the following structure
Protocol: The protocol mostly used on the web are
https. Other protocols include
Most website support both https and http. So two similar URL with different protocol might still contain the same data.
Port: The default port is
80. The majority of websites are not accessed using port. Since 80 is the default port in can be omited. This implies that two URLs where the first has the port 80 while the second does not, still have the same data.
Domain: It is common for website today to have multiple domains. It is a common to have a site accessible the root domain and also with the
www subdomain. In this case these two URLs are still referencing the same resource.
Path: Two paths can return the same page. For example it is common to have a resource with a path with and ending slash (
/) and one without. When looking at URL as string, The extra the one with the extra
/ is different than the other. However, they are still accessing the same resource.
Query: Queries contain keys and values. Two URLs can have the same keys and values but have them in different order. For this reason it is also important to also consider the order of keys.
URLs can be encoded. The same URL when encoded would have different characters.
When comparing URLs, the comparison process should look at structure and encoding. You can use Our URL Comparison tool for comparing URLs.