Results 1 to 2 of 2

Thread: Count, filter out duplicates for dictionaries

  1. #1

    Default Count, filter out duplicates for dictionaries



    I have an iterator in which it returns me the following - item_name, item_size, user_name

    What is the best way in which I can use if I wanted to:
    • Collate similar item_namings into a 1-liner
    • Prior to point #1, calculate the number of items
    • Prior to point #1, it will also displays the user_names affiliated with the versions and the size it used in descending order


    Currently I am using a lot of dictionaries and I am not sure what is the best way to approach this..
    Code:
    gen_dict = {}
    size_dict = {}
    # my_iterator is the one that I have mentioned as above
    for result in my_iterator:
        gen_dict[result['object_name']] = result['user']
        size_dict[result['user]] = result['dir_size']
       
       # If same key exists, append value to existing key
       if result['owner'] in size_dict:
           size_dict[result['owner']].append(result['dir_size'])
     
    # Filter out duplicates, count versions
    asset_user_dict = defaultdict(set)
    asset_count = defaultdict(int)
    user_ver_count = defaultdict(lambda: defaultdict(int))
     
    for vers_name, artist_alias in ivy_results.iteritems():
        strip_version_name = vers_name[:-3]
           asset_user_dict[strip_version_name].add(artist_alias)
           asset_count[strip_version_name] += 1
           user_ver_count[artist_alias][strip_version_name] += 1
     
    # Gather the sum of all item's size accordingly to each user
    for user_name, user_size in size_dict.iteritems():
        # This will sums up all sizes of that particular user
        size_dict[stalk_name] = sum(user_size)
     
    for version_name, version_count in sorted(asset_count.iteritems()):
        user_vers_cnt = ', '.join('{0}({1}v, {2})'.format(user, user_ver_count[user][version_name], convert_size_query(ivy_size_query[user])) for user in asset_user_dict[version_name])
        print "| {0:<100} | {1:>12} | {2:>90} |".format(version_name+"(xxx)",
                                                                          version_count,
                                                                          user_vers_cnt
                                                                         )
    I tried using dictionary but while I can do almost all the above 3 points, I am having issues with point #3 where I either can't seem to sort them in order or the size dervied for the user are of the same value as I am using multi dictionaries? Any advice is greatly appreciated!

    By the way, my output currently is:
    Code:
    Suppose if my data is something like
    (1 MiB) "item_C_v001" : "jack"
    (5 MiB) "item_C_v002" : "kris"
    (1 MiB) "item_A_v003" : "john",
    (1 MiB) "item_B_v006" : "peter",
    (2 MiB) "item_A_v005" : "john",
    (1 MiB) "item_A_v004" : "dave"
    
    
    Item Name     | No. of Vers.      | User
    item_A           | 3                     | dave(1, 1MiB), john(2, 3MiB)
    item_B           | 1                     | peter(1, 1MiB)
    item_C           | 2                     | kris(1, 5MiB), jack(1, 1MiB)
    Last edited by xenas; 02-16-2017 at 09:13 PM.

  2. #2
    float Claudio A's Avatar
    Join Date
    Feb 2012
    Location
    Montreal
    Posts
    65

    Default

    Splitting this into three dictionaries is probably overkill here (I might be wrong). You can probably get away with a single dict or OrderedDict and do your collecting in a single pass which would probably be more efficient.

    From the snippet you included I assume your iterator returns a dictionary for each 'result'...

    Code:
    from collections import OrderedDict
    
    data = OrderedDict()
    
    for result in my_iterator:
        item_name = result.get('object_name', None)
        item_user = result.get('user', None)
        item_size = result.get('dir_size', 0)
    
        if item_name not in data:
            data[item_name] = {item_user: [item_size]}
    
        else:
            if item_user not in data[item_name]:
                data[item_name][item_user] = [item_size]
            else:
                data[item_name][item_user].append(item_size)
    Your resulting data OrderedDict should be structured so that a single mapped item_name is as follows:

    Code:
    # data[item_name1] = {item_user1: [item_size1, item_size2], item_user2: [item_size1, item_size2], etc...}
    From this you can derive the information you need.

    Code:
    for item_name, item_users in data.iteritems():
        count = sum(len(item_sizes) for item_sizes in item_users.values())
        'Item Name: {} Count: {} User(s): {}'.format(item_name, count, ', '.join(item_users.keys()))

    DISCLAIMER: untested!

    Hopefully this helps you look at it under another angle.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •