There’s a Lot You Can Learn From 404s

The Value of 404 Errors: What Can Be Learned

Changes to URLs on a WordPress site, especially for older posts, may lead to broken links. If someone had previously linked to these posts, clicking on those links would now result in a 404 error. This is common on older sites, where things evolve over time, causing links to become outdated.

A simple script was added to the site to track these 404 errors. While server logs could have been used, it was easier to implement code directly into the 404.php page. This created a basic log of just the necessary 404 error information.

Tracking 404 Errors Using PHP Code

Here’s the PHP code used to log 404 errors:

<?php
$url = $_SERVER["REQUEST_URI"];

$agent = "  ";
$ua = $_SERVER['HTTP_USER_AGENT'];
if (strpos($ua,"Googlebot") !== FALSE) $agent = "go";
if (strpos($ua,"bingbot") !== FALSE) $agent = "bi";
if (strpos($ua,"DuckDuckBot") !== FALSE) $agent = "du";
if (strpos($ua,"YandexBot") !== FALSE) $agent = "yb";
if (strpos($ua,"Yahoo! Slurp") !== FALSE) $agent = "ya";
if (strpos($ua,"Baiduspider") !== FALSE) $agent = "ba";
if (strpos($ua,"Sogou") !== FALSE) $agent = "so";

if ($f = fopen(ABSPATH."404log.txt","a+")) {
	fwrite($f,date("ymd H:i:s")."\t". $agent."\t". $url."\n");
	fclose($f);
}
?>

The code primarily checks the user agent string and tags the major web crawlers (like Googlebot, Bingbot, etc.). This allows tracking of whether these bots are still attempting to access old URLs. Simplifying the user agent to just labels like “go” for Google and “ya” for Yahoo made the logs easier to read and interpret.

Discoveries from the 404 Log

After reviewing the log, it became clear that some renamed URLs could pose ongoing issues. To fix these broken links, URL rewrites were added to the .htaccess file. However, the log revealed several other important findings.

Missing Files and Unknown Requests

The log showed attempts to access an “apple-app-site-association” file that was missing. This file is essential for apps accessing URLs on a server. After researching, it became clear that this file improves the functionality of Universal Links for apps, something overlooked in previous app development efforts. Additional information about this file can be found in the Universal Links support documentation.

Apple Touch Icons and Bookmarking

The log also revealed requests for the “apple-touch-icon,” a file used when web pages are saved as home screen bookmarks on iPhones or iPads. This file allows users to bookmark web pages and gives them an icon on their home screen. The process is detailed in the “Look ma, no HTML!” section. This discovery emphasized that any webpage can become a bookmark without needing an app or a developer relationship with Apple.

Bots and Security Threats

An alarming pattern in the log was the large number of bots and hackers attempting to access non-existent plugins. These attempts suggested that the bots were searching for security vulnerabilities, potentially trying to exploit the site. Monitoring these activities revealed valuable insights into what malicious actors were trying to access.

New Discoveries in the “.well-known” Directory

Another interesting finding in the 404 errors was related to the “.well-known” directory. This led to the discovery of Well-Known Uniform Resource Identifiers (URIs), a standardized way to manage URLs. Learning about this helped identify how certain resources should be configured and how the site might be accessed in new ways.

Conclusion: The Importance of Monitoring 404 Errors

Tracking 404 errors is more than just fixing broken links. It provides insights into potential security issues, user behavior, and the site’s technical needs. The information gathered from these errors helped improve the site’s performance and security. The ongoing logging of 404 errors provides valuable information, helps optimize the user experience, and addresses emerging issues.