Augment elements of a list

Question

rahul8590 71 Posting Whiz

14 Years Ago

I have various list being generated by a mapper function in this format

>>> mapper("b.txt" , i["b.txt"])
[('thats', 1), ('one', 1), ('small', 1), ('step', 1), ('for', 1), ('a', 1), ('man', 1), ('one', 1), ('giant', 1), ('leap', 1), ('for', 1), ('mankind', 1)]

>>> mapper("c.txt" , i["b.txt"])
[('thats', 1), ('one', 1), ('small', 1), ('step', 1), ('for', 1), ('a', 1), ('man', 1), ('one', 1), ('giant', 1), ('leap', 1), ('for', 1), ('mankind', 1)]

i want to merge the list generated from these 2 functions in a way that if i encounter a common element , then in the augmented list the data to be stored should be
('for' , 2) ( in this case , since for is common in both the results of mapper function) and the rest unique elements to be stored in augmented list as it is ..

PS: mapper function is a self made function

python

5 Contributors
13 Replies
262 Views
1 Day Discussion Span
Latest Post 14 Years Ago Latest Post by ultimatebuster

All 13 Replies

woooee 814 Nearly a Posting Maven

14 Years Ago

Personal preference here is to use a dictionary, with the key being the word, pointing to the number. Convert the first list to a dictionary, loop through the second list and if the word is found in the dictionary, add to the number http://www.greenteapress.com/thinkpython/html/book012.html#toc120 Post back with any code you are having coding problems with for additional help.

woooee 814 Nearly a Posting Maven

14 Years Ago

Use a function and pass the file name to it. Some pseudo-code:

def mapper_dict(fname, word_dict):
    ## assumes word is first element after split()
    fp = open(fname, "r")
    for rec in fp:
        substrs = rec.split()
        word = substrs[0]
        if word not in word_dict:
            word_dict[word] = 0
        word_dict[word] += 1
    return word_dict

word_dict = {}
for fname in ["/a/b/abc", "/d/e/def", "/g/h/ghi"]:
    word_dict = mapper_dict(fname, word_dict)

Edited 14 Years Ago by woooee because: n/a

woooee 814 Nearly a Posting Maven

14 Years Ago

This should be:

## changed to s[0] and d1, di.keys() is not necessary
        if s[0] not in d1:

ultimatebuster 14 Posting Whiz in Training

14 Years Ago

using dictionary it's very simple

For example:
I have a list like this: ["thing1", "thing2", "thing3", "thing4", "thing1"]
If i understood you correctly, you want thing1 to have a 2 associated with it.

This code would do it:

li = ["thing1", "thing2", "thing3", "thing4", "thing1"]

def maplist(li):
    d = dict()
    for item in li:
        value = d.setdefault(item, 0)
        value += 1
        d[item] = value
    return d

print maplist(li)

To fit your situation, add this to the above code:

def formatconverter(yourformatli):
    li = []
    for tu in yourformatli:
        li.append(tu[0])

    return maplist(li)

Also i got too lazy to read the replies so i dunno if this applies.

Edited 14 Years Ago by ultimatebuster because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Beat_Slayer 17 Posting Pro in Training · Answer 1 · 2010-08-03T22:17:47+00:00

It can be just me, but it seems something is wrong!

Can you explain a little further.

You want to know the words that exist in the two files, is that it?

Or do you want to count the ocurrences in each file?

rahul8590 71 Posting Whiz · Answer 2 · 2010-08-03T22:34:43+00:00

@Beat_slayer

well .. the complete picture is i am implementing MapReduce framework .. i didnt find ne ( in python ) for me so decided to code by myself , i have come as far as writting mapper function and got stuck up in this list to dictionary conversion ..

For simplicity purpose i have mentioned 2 files here .. the actual thing will have more than 100 files and size of each file to be approximately around 5mb ( pure text only) and then run the mapper and reduce function in a multi threaded environment .. thats wat is the plan as of now .

@woooee : ur link is very helpful

rahul8590 71 Posting Whiz · Answer 3 · 2010-08-03T23:05:47+00:00

@woooee:

the 1st part i have coded ,and converted the list to dictionaries

>>> l
[('a', 0), ('c', 2), ('b', 1), ('e', 4), ('d', 3)]
>>> dd= {}
>>> i = 0
>>> while i < len(l):
...  s = l[i]
...  dd[s[0]] = s[1]
...  i = i + 1
... 
>>> dd
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}

now , if i get different dictionaries of different lengths from different files , how do i merge them . is there any direct module of doing so ?

Beat_Slayer 17 Posting Pro in Training · Answer 4 · 2010-08-04T02:45:26+00:00

I think this should give some insight, if I'm understanding what you are trying to do.

def merge_dic(merged_dic, wordlist):
    for item in wordlist:
        if merged_dic.has_key(item):
            merged_dic[item] += 1
        else:
            merged_dic[item] = 1

file1 = 'this is a dummy sample file for example as sample'
file2 = 'this is another dummy sample file also created for example with \
some samples repeated for example'

list1 = file1.split(' ')
list2 = file2.split(' ')

all_count = {}
file_lists = []
file_lists.extend(list1)
file_lists.extend(list2)

merge_dic(all_count, file_lists)

print 'all_count =', all_count

file_uniques = {}
file_lists = []                 # Converting lists to sets it's the fastest and
file_lists.extend(set(list1))   # simplest way that I know of eliminating
file_lists.extend(set(list2))   # duplicates on a list, when position doesn't mather

merge_dic(file_uniques, file_lists)

print 'file_uniques =', file_uniques

Happy coding!

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 5 · 2010-08-04T03:29:19+00:00

## Your way
l = [('a', 0), ('c', 2), ('b', 1), ('e', 4), ('d', 3)]
dd= {}
i = 0
while i < len(l):
      s = l[i]
      dd[s[0]] = s[1]
      i = i + 1
     
print dd
#{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}
## shorter way
dd=dict(l)
print dd

"""Output:
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}
"""

rahul8590 71 Posting Whiz · Answer 6 · 2010-08-04T21:01:13+00:00

@tonyjv:

Both of our codes have a bug...... unquestionably ur method is the shortest ... i felt like a moron when i saw that conversions implicitly existed , but foe example

>>> l1
[('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]
>>> d1 = dict(l1)
>>> d1
{'brown': 1, 'lazy': 1, 'jumped': 1, 'over': 1, 'fox': 1, 'grey': 1, 'quick': 1, 'the': 1, 'dogs': 1}

you see , although the pair (the , 1 ) repeats twice in the list , the dictionay accepts it only once and rather than updating the d = 2 it ignores the second occurrence. Even my initial code gives the same result :(

i tried writing this way .. but dunno why it isnt working

>>> while i < len(l1):
...  s = l1[i]
...  if s not in d1.keys():
...    d1[s[0]] = s[1]
...  else:
...    d1[s[0]] += 1
...  i = i + 1
... 
>>> d1
{'brown': 1, 'lazy': 1, 'jumped': 1, 'over': 1, 'fox': 1, 'grey': 1, 'quick': 1, 'the': 1, 'dogs': 1}

@woooee and @beat_slayer : i am working on your code ( thanks for providing one ).

rahul8590 71 Posting Whiz · Answer 7 · 2010-08-04T22:53:25+00:00

>>> while i < len(l1):
...  s = l1[i]
...  if s[0] not in d1:
...    d1[s[0]] = s[1]
...  else:
...    d1[s[0]] += 1
...  i = i + 1
... 
>>> d1
{}

geting empty dictionary .. :(

Beat_Slayer 17 Posting Pro in Training · Answer 8 · 2010-08-04T23:18:43+00:00

How about this?

class Word_Counter():

    def __init__(self):
        self.count = {}

    def add_string(self, s):
        word_list = s.split(' ')
        self.add_list(word_list)

    def add_list(self, wl):
        for item in wl:
            if self.count.has_key(item):
                self.count[item] += 1
            else:
                self.count[item] = 1

    def add_mapper(self, ml):
        for item in ml:
            if self.count.has_key(item[0]):
                self.count[item[0]] += item[1]
            else:
                self.count[item[0]] = item[1]
    


str1 = 'the quick brown fox jumps over the lazy dog'

d = Word_Counter()

d.add_string(str1)

print d.count

"""
{'brown': 1, 'lazy': 1, 'over': 1, 'fox': 1, 'dog': 1, 'quick': 1, 'the': 2, 'jumps': 1}
"""

list1 = ('the', 'quick', 'blue', 'cat', 'jumps', 'over', 'the', 'lazy', 'turtle') 

d.add_list(list1)

print d.count

"""
{'blue': 1, 'brown': 1, 'lazy': 2, 'turtle': 1, 'over': 2, 'fox': 1, 'dog': 1, 'cat': 1, 'quick': 2, 'the': 4, 'jumps': 2}
"""

map1 = [('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]

d.add_mapper(map1)

print d.count

"""
{'blue': 1, 'brown': 2, 'lazy': 3, 'turtle': 1, 'grey': 1, 'jumped': 1, 'over': 3, 'fox': 2, 'dog': 1, 'cat': 1, 'dogs': 1, 'quick': 3, 'the': 6, 'jumps': 2}
"""

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 9 · 2010-08-04T23:38:43+00:00

>>> while i < len(l1):
...  s = l1[i]
...  if s[0] not in d1:
...    d1[s[0]] = s[1]
...  else:
...    d1[s[0]] += 1
...  i = i + 1
... 
>>> d1
{}

geting empty dictionary .. :(

words=[('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]
dict_of_words={}
for word,count in words:
     dict_of_words[word] = dict_of_words[word]+count if word in dict_of_words else count

print dict_of_words
"""Output:
{'brown': 1, 'lazy': 1, 'jumped': 1, 'over': 1, 'fox': 1, 'grey': 1, 'quick': 1, 'the': 2, 'dogs': 1}
"""

Augment elements of a list

Recommended Answers Collapse Answers

All 13 Replies

Recommended Answers