I have various list being generated by a mapper function in this format

>>> mapper("b.txt" , i["b.txt"])
[('thats', 1), ('one', 1), ('small', 1), ('step', 1), ('for', 1), ('a', 1), ('man', 1), ('one', 1), ('giant', 1), ('leap', 1), ('for', 1), ('mankind', 1)]

>>> mapper("c.txt" , i["b.txt"])
[('thats', 1), ('one', 1), ('small', 1), ('step', 1), ('for', 1), ('a', 1), ('man', 1), ('one', 1), ('giant', 1), ('leap', 1), ('for', 1), ('mankind', 1)]

i want to merge the list generated from these 2 functions in a way that if i encounter a common element , then in the augmented list the data to be stored should be
('for' , 2) ( in this case , since for is common in both the results of mapper function) and the rest unique elements to be stored in augmented list as it is ..

PS: mapper function is a self made function

Personal preference here is to use a dictionary, with the key being the word, pointing to the number. Convert the first list to a dictionary, loop through the second list and if the word is found in the dictionary, add to the number http://www.greenteapress.com/thinkpython/html/book012.html#toc120 Post back with any code you are having coding problems with for additional help.

It can be just me, but it seems something is wrong!

Can you explain a little further.

You want to know the words that exist in the two files, is that it?

Or do you want to count the ocurrences in each file?


well .. the complete picture is i am implementing MapReduce framework .. i didnt find ne ( in python ) for me so decided to code by myself , i have come as far as writting mapper function and got stuck up in this list to dictionary conversion ..

For simplicity purpose i have mentioned 2 files here .. the actual thing will have more than 100 files and size of each file to be approximately around 5mb ( pure text only) and then run the mapper and reduce function in a multi threaded environment .. thats wat is the plan as of now .

@woooee : ur link is very helpful


the 1st part i have coded ,and converted the list to dictionaries

>>> l
[('a', 0), ('c', 2), ('b', 1), ('e', 4), ('d', 3)]
>>> dd= {}
>>> i = 0
>>> while i < len(l):
...  s = l[i]
...  dd[s[0]] = s[1]
...  i = i + 1
>>> dd
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}

now , if i get different dictionaries of different lengths from different files , how do i merge them . is there any direct module of doing so ?

I think this should give some insight, if I'm understanding what you are trying to do.

def merge_dic(merged_dic, wordlist):
    for item in wordlist:
        if merged_dic.has_key(item):
            merged_dic[item] += 1
            merged_dic[item] = 1

file1 = 'this is a dummy sample file for example as sample'
file2 = 'this is another dummy sample file also created for example with \
some samples repeated for example'

list1 = file1.split(' ')
list2 = file2.split(' ')

all_count = {}
file_lists = []

merge_dic(all_count, file_lists)

print 'all_count =', all_count

file_uniques = {}
file_lists = []                 # Converting lists to sets it's the fastest and
file_lists.extend(set(list1))   # simplest way that I know of eliminating
file_lists.extend(set(list2))   # duplicates on a list, when position doesn't mather

merge_dic(file_uniques, file_lists)

print 'file_uniques =', file_uniques

Happy coding!

## Your way
l = [('a', 0), ('c', 2), ('b', 1), ('e', 4), ('d', 3)]
dd= {}
i = 0
while i < len(l):
      s = l[i]
      dd[s[0]] = s[1]
      i = i + 1
print dd
#{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}
## shorter way
print dd

{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}

Use a function and pass the file name to it. Some pseudo-code:

def mapper_dict(fname, word_dict):
    ## assumes word is first element after split()
    fp = open(fname, "r")
    for rec in fp:
        substrs = rec.split()
        word = substrs[0]
        if word not in word_dict:
            word_dict[word] = 0
        word_dict[word] += 1
    return word_dict

word_dict = {}
for fname in ["/a/b/abc", "/d/e/def", "/g/h/ghi"]:
    word_dict = mapper_dict(fname, word_dict)


Both of our codes have a bug...... unquestionably ur method is the shortest ... i felt like a moron when i saw that conversions implicitly existed , but foe example

>>> l1
[('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]
>>> d1 = dict(l1)
>>> d1
{'brown': 1, 'lazy': 1, 'jumped': 1, 'over': 1, 'fox': 1, 'grey': 1, 'quick': 1, 'the': 1, 'dogs': 1}

you see , although the pair (the , 1 ) repeats twice in the list , the dictionay accepts it only once and rather than updating the d = 2 it ignores the second occurrence. Even my initial code gives the same result :(

i tried writing this way .. but dunno why it isnt working

>>> while i < len(l1):
...  s = l1[i]
...  if s not in d1.keys():
...    d1[s[0]] = s[1]
...  else:
...    d1[s[0]] += 1
...  i = i + 1
>>> d1
{'brown': 1, 'lazy': 1, 'jumped': 1, 'over': 1, 'fox': 1, 'grey': 1, 'quick': 1, 'the': 1, 'dogs': 1}

@woooee and @beat_slayer : i am working on your code ( thanks for providing one ).

This should be:

## changed to s[0] and d1, di.keys() is not necessary
        if s[0] not in d1:
>>> while i < len(l1):
...  s = l1[i]
...  if s[0] not in d1:
...    d1[s[0]] = s[1]
...  else:
...    d1[s[0]] += 1
...  i = i + 1
>>> d1

geting empty dictionary .. :(

How about this?

class Word_Counter():

    def __init__(self):
        self.count = {}

    def add_string(self, s):
        word_list = s.split(' ')

    def add_list(self, wl):
        for item in wl:
            if self.count.has_key(item):
                self.count[item] += 1
                self.count[item] = 1

    def add_mapper(self, ml):
        for item in ml:
            if self.count.has_key(item[0]):
                self.count[item[0]] += item[1]
                self.count[item[0]] = item[1]

str1 = 'the quick brown fox jumps over the lazy dog'

d = Word_Counter()


print d.count

{'brown': 1, 'lazy': 1, 'over': 1, 'fox': 1, 'dog': 1, 'quick': 1, 'the': 2, 'jumps': 1}

list1 = ('the', 'quick', 'blue', 'cat', 'jumps', 'over', 'the', 'lazy', 'turtle') 


print d.count

{'blue': 1, 'brown': 1, 'lazy': 2, 'turtle': 1, 'over': 2, 'fox': 1, 'dog': 1, 'cat': 1, 'quick': 2, 'the': 4, 'jumps': 2}

map1 = [('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]


print d.count

{'blue': 1, 'brown': 2, 'lazy': 3, 'turtle': 1, 'grey': 1, 'jumped': 1, 'over': 3, 'fox': 2, 'dog': 1, 'cat': 1, 'dogs': 1, 'quick': 3, 'the': 6, 'jumps': 2}
>>> while i < len(l1):
...  s = l1[i]
...  if s[0] not in d1:
...    d1[s[0]] = s[1]
...  else:
...    d1[s[0]] += 1
...  i = i + 1
>>> d1

geting empty dictionary .. :(

words=[('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]
for word,count in words:
     dict_of_words[word] = dict_of_words[word]+count if word in dict_of_words else count

print dict_of_words
{'brown': 1, 'lazy': 1, 'jumped': 1, 'over': 1, 'fox': 1, 'grey': 1, 'quick': 1, 'the': 2, 'dogs': 1}

using dictionary it's very simple

For example:
I have a list like this: ["thing1", "thing2", "thing3", "thing4", "thing1"]
If i understood you correctly, you want thing1 to have a 2 associated with it.

This code would do it:

li = ["thing1", "thing2", "thing3", "thing4", "thing1"]

def maplist(li):
    d = dict()
    for item in li:
        value = d.setdefault(item, 0)
        value += 1
        d[item] = value
    return d

print maplist(li)

To fit your situation, add this to the above code:

def formatconverter(yourformatli):
    li = []
    for tu in yourformatli:

    return maplist(li)

Also i got too lazy to read the replies so i dunno if this applies.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.