Member Avatar for diafol

Hi all. AM having a little issue with regex. Not one of my strongest skills!

I'm trying to produce a simple translation system that doesn't require arrays, gettext etc. It basically has this structure:

$content=<<<CONTENT
<p>{{Dyma destun||Here's some text}}</p>
...
CONTENT;

So the page content is held in a var via heredoc syntax. That's all well and good.

My index.php is this:

include('common/config.php');
include('common/functions.php');
include($page_server);
include('templates/common.php');

echo translate($header);
echo translate($content);
echo $footer;

The translate function in the functions.php file is this:

function translate($content){
    $l = $_SESSION['lang'];
    $pattern = array('en'=>'/\{\{.[^(\}\})]*\|\|/','cy'=>'/\|\|.[^(\{\{)]*\}\}/');
    $brackets = array('en'=>'}}','cy'=>'{{');
    $content = preg_replace($pattern[$l],'',$content);
    $content = str_replace($brackets[$l],'',$content);
    return $content;
}

So the function searches for '{{...||' if English is the chosen language. It then deletes this and all '}}'.
Likewise, if Welsh is the chosen language if searches for '||...}}' - deletes them and all '{{'.

Now I know this isn't the most efficient way of doing a bilingual site, but it serves my purposes perfectly.
Ok, now the background's out the way, here's the problem:

There's something wrong with the regex since it messes up when it encounters a '()' or '}' or '{' - I get the

Any comments/insight/suggestions would be greatly appreciated.

that was my 1st task when i was hired, put the site for i18n standards.
but, i was a noob at the time, so maybe my approach isnt the best

i did that differently thu.

1) check cookie if cookie == null, detect language and set cookie

2) drop down list to set cookie language to different language

3) had a en.php, fr.php and spa.php files which contained each the required text for the pages, content, nav bar, promos and so on. Basicly i have put the same variable names in all the language php files. So when you ask for $greeting on your page depending on the require_once file they would output the proper language. Exemple: $greeting from en.php is Hi, fr.php is Salut and spa.php was Ola.

but when i was writing the spa.php they had me take it off and restore the old site, because they lost SEO performance.

sorry if im not much help

How about something like:

<?php

session_start();
$_SESSION['lang'] = 'en';

$content=<<<CONTENT
<p>{{Dyma destun||Here's some (more) text}}</p>
<p>{{Cymru||Wales}}</p>
...
CONTENT;

function translate($content)
{
    $lang = isset($_SESSION['lang']) ? $_SESSION['lang'] : 'en';
    $pattern = '/\{\{([^\|\|]*)\|\|(.*)\}\}/';
    $replacement = ($lang == 'cy') ? '$1' : '$2';
    return preg_replace($pattern, $replacement, $content);
}

echo translate($content);
commented: Will optimise along these lines - thanks +14

Remove the () from the regex, seems to work for me:

<?php

function translate($content,$lang){
    $l = $lang;
    $pattern = array('en'=>'/\{\{.[^\}\}]*\|\|/','cy'=>'/\|\|.[^\{\{]*\}\}/'); # change this
    $brackets = array('en'=>'}}','cy'=>'{{');
    $content = preg_replace($pattern[$l],'',$content);
    $content = str_replace($brackets[$l],'',$content);
    return $content;
}

$content=<<<CONTENT
<p>{{Dyma destun (cy example) rand ||Here's some text en (example) rand}}</p>
CONTENT;

echo translate($content,'en');

echo "\n";

?>

or I'm missing the problem? o_o'

commented: Spot on! Thanks again :) +14
Member Avatar for diafol

@blocblue - almost perfect except for the fact that multiple replacements on the same line don't work, for example:

I get this:

<li><a href="/cy/tgau</a></li>

From:

<li><a href="{{/cy/tgau||/en/gcse}}>{{TGAU||GCSE}}</a></li>

My fault, I failed to mention that I had this scenario. Thanks very much - optimised my code along your suggestion - your code is far more eloquent than mine.

@cereal - I don't believe it! I thought I'd tried the bracketless example (been racking my tiny brain for ages!!) - obviously not - as your regex seems to work perfectly in all scenarios thus far. Thanks very much.

@DarkMonarch - thanks for the info. I've been developing bi/multi-lingual sites for some time, via CMS, Frameworks, lang array files, gettext + PO/POT files, separate language files for each page (as you mentioned) - you name it, I've probably tried and trashed it somewhere along the line :)
BUT all have one drawback - updating.
Although the approach I'm pursuing at the moment doubles the amount of data being delivered before being snipped, I find that having 'side-by-side' text is far more prefereable to array-based lang files - which often lose meaning and become extremely unwieldy once they start growing. The separate file scenario is OK, but lacks the immediacy of editing just the one file. This is especially true of the highly static site that I'm building at the moment. It's a high school chemistry revision site, which strangely for me, doesn't depend on a DB (1995 revisited!).

With regard to structure, I'm using a DIY 'templater'. Just trying to keep it lightweight and simple wrt to engines like Twig, Smarty etc. So out of interest...

Here's my stuff (still rough and w/out error trapping at the mo):

index.php (all calls routed through this - .htaccess file rewrites 'lang' and 'page' querystring parameters - e.g. example.com/en/gcse/ = example.com/index.php?lang=en&page=gcse)

<?php
include('common/config.php');
include('common/functions.php');

include($page_server);
include('templates/common.php');

echo translate($header);
echo translate($content);
echo $footer;
?>

config.php

session_start();
$lang = (isset($_GET['lang']) && in_array($_GET['lang'],array('en','cy'))) ? $_GET['lang'] : 'en';
$_SESSION['lang'] = $lang;

//Top Navigation
$navArray = array(
    'index' => array('pagelabel'=>'{{Hafan||Home}}','pagerewrite'=>'{{hafan||home}}'),
    'gcse' => array('pagelabel'=>'{{TGAU||GCSE}}','pagerewrite'=>'{{tgau||gcse}}'),
    'aslevel' => array('pagelabel'=>'{{Safon UG||AS Level}}','pagerewrite'=>'{{safon-ug||as-level}}'),
    'alevel' => array('pagelabel'=>'{{Safon U2||A2 Level}}','pagerewrite'=>'{{safon-u2||a2-level}}'),
    'coursework' => array('pagelabel'=>'{{Gwaith Cwrs||Coursework}}','pagerewrite'=>'{{gwaithcwrs||coursework}}'),
    'tests' => array('pagelabel'=>'{{Profion||Tests}}','pagerewrite'=>'{{profion||tests}}')
);

//hardcoded for now - function will provide the include file
$page_server = "pages/index.php";

functions.php

//will optimise this along blocblue's code
function translate($content){
    $l = $_SESSION['lang'];
    $pattern = array('en'=>'/\{\{.[^\}\}]*\|\|/','cy'=>'/\|\|.[^\{\{]*\}\}/');
    $brackets = array('en'=>'}}','cy'=>'{{');
    $content = preg_replace($pattern[$l],'',$content);
    $content = str_replace($brackets[$l],'',$content);
    return $content;
}

function createNavBar($navArray,$parent){
    $nav = "\n<ul class=\"nav\">";
    $l= $_SESSION['lang'];
    foreach($navArray as $page => $navitem){
        $active = ($page == $parent) ? ' class="active"' : '';
        $nav .= "\n\t<li$active><a href=\"/$l/{$navitem['pagerewrite']}\">{$navitem['pagelabel']}</a></li>";
    }
    $nav .= "\n\t<li><a href=\"/$l/{{ynghylch||about}}\">&copy; My Name</a></li>\n</ul>";
    return $nav;
}

common.php (based on the Twitter bootstrap - http://twitter.github.com/bootstrap/index.html and blueimp's styling: http://blueimp.github.com/jQuery-File-Upload/)

$nav = createNavBar($navArray,$parent);
$headscripts = createHeadScripts(); //in functions.php (not shown)
$footscripts = createFootScripts(); //in functions.php (not shown)

$header = <<<HEADER
<!DOCTYPE HTML>
<html lang="en">
<head>
<!-- Force latest IE rendering engine or ChromeFrame if installed -->
<!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><![endif]-->
<meta charset="utf-8">
<title>$page_title</title>
<meta name="description" content="$page_description">
<meta name="viewport" content="width=device-width">
$headscripts
</head>
<body>

<div class="navbar navbar-fixed-top">
    <div class="navbar-inner">
        <div class="container">
            <a class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse">
                <span class="icon-bar"></span>
                <span class="icon-bar"></span>
                <span class="icon-bar"></span>
            </a>
            <a class="brand" href="index.php">AlCemegol</a>
            <div class="nav-collapse">
                $nav
            </div>
        </div>
    </div>
</div>
<div class="container">
HEADER;

$footer = <<<FOOTER
</div>
$footscripts
</body>
</html>
FOOTER;
?>

Example include file: pages/index.php

$parent = 'index';
$page_title = "{{Hafan||Home}}";
$page_description = "{{Dyma disgrifiad||This is a description}}";

$content=<<<CONTENT
    <div class="page-header">
        <h1>{{Croeso i AlCemegol||Welcome to Alcemegol, (pr. <em>al-kɛm-e:´gɔl</em>)}}</h1>
    </div>
    <!-- plenty more - but snipped here - you get the idea -->
CONTENT;

If anybody has any comments on the above, I'd be very interested to learn more. It seems to work surprisingly well at the mo, but I'm always looking to improve.

Oops, I overlooked multiple translations on a single line. This should do it:

function translate($content)
{
    $lang = isset($_SESSION['lang']) ? $_SESSION['lang'] : 'cy';
    $pattern = '/\{\{([^\|\|]*)\|\|([^\}\}]*)\}\}/';
    $replacement = ($lang == 'cy') ? '$1' : '$2';
    return preg_replace($pattern, $replacement, $content);
}

echo translate($content);

My only comment as to your approach is that it can be resource intensive to perform regular expression matches over large swathes of content.

Another option would be to have an index_en.php and index_cy.php file and to load the appropriate file based on the current language selection.

Either way, kudos for trying your own method. It works well, as you say.

Member Avatar for diafol

Thanks for the update.
Wrt the intensive replacements - yes that's something to bear in mind. I'll keep an eye on the load times as the site progresses, and I'll try to compare it with traditional methods. It definitely won't be as quick as the 'two pages method' as you suggest, but having used this method in the past, I found updating info in them was a nightmare, especially if pages were really long and complicated. Again, thanks for the feedback.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.