Hi,

I have a text file that contains bunch of phone numbers. Some contain invalid characters and some are too long. How do i read this file into a list using regex. I only want valid phone numbers to populate the list, and skip the invalid ones.

this is what i have so far:

phoneNumbers = []
file2read = open('PhoneNumbers.txt', 'r')
for currentline in file2read:

    phoneNumbers.append(currentline.rstrip())
    
file2read.close()

What contry do you what valid phone nummers for?

Or give an example of match and not match.
match-
(+44)(0)20-12341234
02012341234

Not match.
(44+)020-12341234
1-555-5555

sorry i shouldve been more specific. im validating US phone numbers.

heres what the txt file looks like:


4616186224
3501292628
2698109000
4398248508
8462632398
5414846117
9167449701
5097458418
4F882945714
456729994744
4563249

I need to filter out phone numbers that have invalid char and that are too many or too few chars. Im guessing that regex for this is not needed. This can be done with list comprehension, but how?

I can write a solution tomorrow(away now),with regex.
Not so difficult to filter out numbers you need.

#!/usr/bin/env python
import re
DataDir = '/home/david/Programming/Python'
phonefile = DataDir + '/' + 'PhoneNumbers.txt'

"""Read each line from file. If it starts with
a string of exactly 10 consecutive digits, assume
this is a phone number and print it."""

ph_nbr_pattern =  r'^(\d{10})(?:\s|$)'
compile_obj = re.compile(ph_nbr_pattern)

file2read = open(phonefile, 'r')
for currentline in file2read:
    match_obj = compile_obj.search(currentline)
    if match_obj:
        print currentline.rstrip()
    
file2read.close()
"""Output is:
4616186224
3501292628
2698109000
4398248508
8462632398
5414846117
9167449701
5097458418
"""

Awesome! Works like a charm, Thanks!

AMERICAN PHONE NUMBER!

import re
phonepattern = re.compile(r"(\d{3})\D*(\d{3})\D*(\d{4})")

Remember reading about it in the dive into python book. Too lazy to check the actual regex pattern, but this is what i cam up with off the top of my head.

@ultimatebuster
That regex vil match "456729994744"

import re

test_input = '456729994744'

if re.match(r'(\d{3})\D*(\d{3})\D*(\d{4})', test_input):
    print ('That is a valid input')  #That is a valid input  
else:
    print ('This is not a valid input')

A fix vil be this.
^(\d{3})\D*(\d{3})\D*(\d{4})$

Now it seems as this list only has numbers that shall match on lenght.
So no need to match against nr as (573)8841878 or 322-3223-222.
Then d5e5 solution work fine.

Or do without re:

## tennumber filter without re

test="""4616186224
3501292628
2698109000
4398248508
8462632398
5414846117
9167449701
5097458418
4F882945714
456729994744
4563249
"""

def tennumbers(a):
    sep=[x for x in a if not x.isdigit()]
    if sep<>[] : return "" ## not numbers
    elif len(a) == 10: ## ten numbers and newline
        return a+'\n'  ## mayby more usefull than True/False
    else: return ""

print filter(tennumbers,test.splitlines())
print 'Or'
for i in test.splitlines():
    print tennumbers(i), ## will be space for every discarded number though

print 'Or like this:'
for i in test.splitlines():
    try:
        if 1e9<=int(i)<1e10:
            print i
    except ValueError as e:
        #print e
        pass

what about extensions? That's why my Regex has no $

Input given to accept had only normal numbers of 10 numbers to accept, nothing else. If need to match something more I need example of the form to accept and we can change the matching function

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.