Hi all,

I am looking to extract address line of text into arrays in a word document. I am only really looking for the postcode to extract.

Any help would be appreciated

Thanks

David

Velta - The Underfloor Heating Company, Wombwell School, S73 8AX
Speedy Asset Services Limited, C/O Arnold Clark, CW9 5GG
Owen Brown Limited, Metal & Wood Shop, DE74 2NL
Getrag Ford Transmission, Britain, Off Speke Boulevard, L24 9LE
PDM Ltd, Stoke Lane, NG14 5HJ
Robert Kirkland (Blyth) Ltd, Belsay Close, NE61 6XG
MCS Ltd, C/O Carillion Advanced Works Project, BS10 5NB
MCS Ltd, C/O Carillion Advanced Works Project, BS10 5NB
Gasrec Biomethane Plant, Albury Landfill Site, GU5 9BW
Tioxide Europe Ltd, Greatham Works, TS25 2DD
Tioxide Europe Ltd, Greatham Works, TS25 2DD
BFF NONWOVENS LTD, Bath Road, TA6 4NZ

Hi, by "word document" are you referring to Microsoft Word file? doc or docx? Or it is just a plain-text file?

Hi Cereal, yes a word.doc. I can copy it into a Notepad file

Thanks

David

If you can use a plain-text it will be easier, because you can use the file() function to read into an array each line of the file, then just loop it through preg_match() to extract the postal codes. Now, if these are the UK postal codes there should be a rule to match only the valid combinations. Here's an example:

<?php

$source = file('source_uk.txt');

$result = array();
$i = 0;
$pattern = "/\b([A-PR-UWYZa-pr-uwyz]([0-9]{1,2}|([A-HK-Ya-hk-y][0-9]|[A-HK-Ya-hk-y][0-9]([0-9]|[ABEHMNPRV-Yabehmnprv-y]))|[0-9][A-HJKS-UWa-hjks-uw])\ {0,1}[0-9][ABD-HJLNP-UW-Zabd-hjlnp-uw-z]{2}|([Gg][Ii][Rr]\ 0[Aa][Aa])|([Ss][Aa][Nn]\ {0,1}[Tt][Aa]1)|([Bb][Ff][Pp][Oo]\ {0,1}([Cc]\/[Oo]\ )?[0-9]{1,4})|(([Aa][Ss][Cc][Nn]|[Bb][Bb][Nn][Dd]|[BFSbfs][Ii][Qq][Qq]|[Pp][Cc][Rr][Nn]|[Ss][Tt][Hh][Ll]|[Tt][Dd][Cc][Uu]|[Tt][Kk][Cc][Aa])\ {0,1}1[Zz][Zz]))\b/";

foreach($source as $line)
{
    $result[$i]['address'] = trim($line);
    if(preg_match($pattern, $line, $match) == 1)
    {
        $result[$i]['postcode'] = trim($match[0]);
    }

    $i++;
}

print_r($result);

It will output:

Array
(
    [0] => Array
        (
            [address] => Velta - The Underfloor Heating Company, Wombwell School, S73 8AX
            [postcode] => S73 8AX
        )

    [1] => Array
        (
            [address] => Speedy Asset Services Limited, C/O Arnold Clark, CW9 5GG
            [postcode] => CW9 5GG
        )

    [2] => Array
        (
            [address] => Owen Brown Limited, Metal & Wood Shop, DE74 2NL
            [postcode] => DE74 2NL
        )

    [3] => Array
        (
            [address] => Getrag Ford Transmission, Britain, Off Speke Boulevard, L24 9LE
            [postcode] => L24 9LE
        )

    [4] => Array
        (
            [address] => PDM Ltd, Stoke Lane, NG14 5HJ
            [postcode] => NG14 5HJ
        )

    [5] => Array
        (
            [address] => Robert Kirkland (Blyth) Ltd, Belsay Close, NE61 6XG
            [postcode] => NE61 6XG
        )

    [6] => Array
        (
            [address] => MCS Ltd, C/O Carillion Advanced Works Project, BS10 5NB
            [postcode] => BS10 5NB
        )

    [7] => Array
        (
            [address] => MCS Ltd, C/O Carillion Advanced Works Project, BS10 5NB
            [postcode] => BS10 5NB
        )

    [8] => Array
        (
            [address] => Gasrec Biomethane Plant, Albury Landfill Site, GU5 9BW
            [postcode] => GU5 9BW
        )

    [9] => Array
        (
            [address] => Tioxide Europe Ltd, Greatham Works, TS25 2DD
            [postcode] => TS25 2DD
        )

    [10] => Array
        (
            [address] => Tioxide Europe Ltd, Greatham Works, TS25 2DD
            [postcode] => TS25 2DD
        )

    [11] => Array
        (
            [address] => BFF NONWOVENS LTD, Bath Road, TA6 4NZ
            [postcode] => TA6 4NZ
        )

)

For each line you get the full address and the separated postcode.

Reference - the pattern of the regular expression is written by Faisal Khan in the comments of this page:

If instead you want to use a .doc file, you will probably need the COM object library, which seems to be available only on Windows environments:

Hi Cereal, Thanks for the very comprehensive reply

I will keep the post open just incase I may have another question

Cheers

David

Thanks you both for your reply Cereal woked a treat as you demonstrated, Just to help understand where is the Pattern match for the Postcode.

iamthwee, thanks for the useful link

David

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.