A slug is a SEO-friendly, human-readable version of a URL. Generally used on most blog software for permalinks via a blog’s title (exactly like my blog here), or basically any string you want to turn into a friendly URL. Sure you could just use PHP’s urlencode (or other language equivalent) but then you’re stuck with unfriendly characters translated into hex codes: %2F%20
The problem is greater when the content you want to Slug is UTF-8 encoded and contains non-ASCII characters. How do you slug a word like: Iñtërnâtiônàlizætiøn?
My ongoing redo of Footstops, which now creates slug’d URLs from UTF-8 user generated content, has ventured me into such territory and I’ll share my slug method with you. The one caveat is that its power relies on the awesome iconv library, which has come enabled by default since PHP 5.0.0, and easily installable in PHP 4.2+, so make sure you have that, if not, remove the line – it still works, just not nearly as well. I also make the assumption that your data is encoded in UTF-8, which is fairly safe because it is pretty backward compatible, but if you are working in a different charset, please adjust as necessary.
The method is short and sweet.
1. First we use iconv to translit the UTF-8 string into ASCII. This converts the UTF-8 string into an ASCII equivalent, but also translate non-ASCII characters into their ASCII appearing equivalents: ë becomes e.
iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
2. We then remove all the unwanted characters from the URL.
preg_replace('/[^a-zA-Z0-9 -]/', '', $url);
3. We convert it to lowercase (which is just a preference for consistency), make sure its between our max string length (we don’t want a 64 character slug, 40-50 characters is probably lots), and remove any surrounding whitespace.
trim(substr(strtolower($url), 0, $maxLength));
4. Finally we replace any whitespace or our separator character with a single instance of the separator character, to remove multiples. I prefer an underscore as a word separator rather than a dash (traditional slug separator) as it may conflict with an actual hyphen in the string but in the final version you’ll see it’s easy to default to your own preference.
preg_replace('/[s' . $separator . ']+/', $separator, $url);
So we put it all together with a couple of options:
public static function ToSlug($string, $maxLength=40, $separator='_') { $url = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string); $url = preg_replace('/[^a-zA-Z0-9 -]/', '', $url); $url = trim(substr(strtolower($url), 0, $maxLength)); $url = preg_replace('/[s' . $separator . ']+/', $separator, $url); return $url; }
Calling ToSlug('Iñtërnâtiônàlizætiøn is the greatest! ') produces the slug:
internationalizaetion_is_the_greatest
perfect for a friendly URL!


