Announcement

Collapse
No announcement yet.

Discussion and/or suggestions to manage and merge large file collection.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Discussion and/or suggestions to manage and merge large file collection.

    I have a fairly large music collection. It is well organized and properly named and tagged because my Plex server requires it and all in either FLAC or MP3. I recently inherited an additional music collection from a deceased friend. It is NOT well organized and includes OGG, M4A, WMA, WAV formats. There are duplicates files and albums, misfiled albums, incorrectly tags, damaged files, misspellings, bad folder names, etc. Before I can even consider adding to my collection, I need to clean out any duplicates - both within this collection and then by comparing to my own, re-catalog anything I'm interested in, check/correct the tags, and so on. This is going to require many separate and lengthy tasks. There isn't a single tool that I've found that will accomplish more than one task. I did find a replacement for the older FSLINT program that is current and a flatpack: Czkawka

    Here are/were the tasks ahead of me:

    CONVERT FILE TYPES:
    I used FIND and FFMPEG to batch convert all the files to either MP3 or FLAC and then deleted the sources. That was pretty easy except I could never figure out how to correctly get the file extensions right. I.e. converting "somefile.wma" resulted in "somefile.wma.flac". I finally gave up and left it that way. Here's the command I used:
    Code:
    find . -type f -iname '*.m4a' -exec ffmpeg -i {} {}.flac \;
    I haven't tried to fix this yet.

    To delete them after conversion, I used "find" to:
    • list all the extensions
    • count and make a note the number of each type
    • run the ffmpeg conversion
    • verify the conversion succeeded by doing the math
      • i.e. n(flac)+n(wav) before conversion should equal n(flac) after the wav conversion
    • using the "-delete' option of find to delete the converted files

    TAG FILES CORRECTLY:
    I use "MusicBrainz Picard" and it works very well until it encounters a mistagged group of files. Then I have to manually search through the MiuscBrainz database and try and find a match. This is very time consuming. Thankfully, 60-70% are close enough the Picard identifies them. Unfortunately, the search function on the MusicBrainz websight is not very helpful. For example, the smallest difference in the album name will fail the search: searching "Face Value" finds the correct album, but searching "Face Value [Deluxe Version] fails completely. One small plus here is Picard will rename the files when cataloged so the ".wav.flac" extenions are corrected during the tagging process.

    FIND DUPLICATE FILENAMES / ALBUMS:
    It helps having only two file types (FLAC and MP3), but there still are duplicates files that are of different file types. I need to figure out how to compare file names without looking at extensions, then determine which albums I should keep or delete. I'm haven't solved this one quite yet but I just discovered "Czkawka" (formally fslint) has this ability by matching on only name and no other criteria.

    The hard part is having to decide on solid enough criteria that I can automate a task. For example, I prefer flac over mp3 if I have the same song, so deleting an mp3 instead of flac would be necessary.

    Ultimately, I would prefer:
    • FLAC over MP3
    • My current collection over the new one UNLESS the new one holds FLAC and my current is MP3.
    • Pass over any files tagged so poorly they don't register on Musicbrainz for later review.
    • Naming folders as the Artist is named: Joe Brown not Brown, Joe and The Cult not Cult, The or just Cult.
    So in order, tasks ahead of me are
    1. Fixing folder names
    2. Listing album names AND file types of both music sets
    3. Comparing albums an eliminating duplicates
    4. Merging desired additions

    I have some work ahead of me!
    Last edited by oshunluvr; Jan 01, 2025, 12:32 PM.

    Please Read Me

    #2
    Working on fixing folder names, I've started with this as a test:
    Code:
    f=`Last, First` ; g=`echo ${f} |grep -aob ',' |grep  -oE '[0-9]+'` ;  echo -n ${f:(-$g-2)}' ' ; echo ${f:0:$g}
    Running this on f='Last, First' results in 'First Last' so that works, but not other lengths so I've got some mathing to do.

    Once it's working, I'm thinking I'll put it into a service menu and then use Dolphin to select the folders to modify.
    Last edited by oshunluvr; Jan 02, 2025, 09:05 AM.

    Please Read Me

    Comment


      #3
      OK, this seems to work for any length:
      Code:
      f='Brooks, Garth' ; g=`echo ${f} |grep -aob ',' |grep  -oE '[0-9]+'` ;  echo -n ${f:$g+2:99}' ' ; echo ${f:0:$g}
      I turned this into a bash script including checking for an already existing folder name and some popup messages:

      Code:
      #!/bin/bash
      
      f="$1"
      path="$( echo ${f%/*} )/"
      folder="$( echo ${f##*/} )"
      
      if [[ "$folder" == *,* ]] then
      
      g=`echo ${folder} |grep -aob ',' |grep  -oE '[0-9]+'`
      
      h=`echo -n ${folder:$g+2:999}`
      h="${h} ${folder:0:$g}"
      
      if [ -d "${path}${h}" ]; then
        kdialog --title "Flip name" --passivepopup "${path}${h} already exists."
      exit 1
      fi
      
      mv "${1}" "${path}${h}"
      
      kdialog --title "Flip name" --passivepopup "Folder renamed $path $folder"
      
      exit 0
      
      else
      
      kdialog --title "Flip name" --passivepopup "$f comma not found"
      
      fi
      exit 1​
      I launch the above from a Dolphin ServiceMenu:
      Code:
      [Desktop Entry]
      Type=Service
      Actions=flip;
      X-KDE-ServiceTypes=KonqPopupMenu/Plugin,inode/directory
      MimeType=inode/directory;
      X-KDE-StartupNotify=false
      X-KDE-Priority=TopLevel
      
      [Desktop Action flip]
      Name=Flip folder name at comma
      Icon=object-flip-horizontal.svg
      Exec=$HOME/.local/share/kio/servicemenus/flipname "%u"​
      Last edited by oshunluvr; Jan 02, 2025, 11:22 AM.

      Please Read Me

      Comment


        #4
        This new collection had many files types and duplicates and other thing to manage.

        First, I converted any "lossless" types to FLAC and "lossy" types to MP3. These are my preferred file types. The "find" command is wonderful for these types of tasks.

        For example, I ran this first to list all the file types in the collection:
        Code:
        find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
        This showed my that there were wav, ogg, wma, and m4a files along with flac and mp3.

        Then this converted wma to mp3 en mass
        Code:
        find . -type f -iname '*.wma' -exec ffmpeg -i {} {}.mp3 \;
        This left both a wma file and a matching mp3 file. So I deleted all the wma files with
        Code:
        find . -type f -name '*.wma' -delete
        I'm sure there's a way to combine the actions, but I couldn't get it to work.

        Once I had all the file types the way I wanted, I worked for a while trying to figure out how to cull as much duplication as possible from his collection that I already had (and some within his collection). This proved to be very difficult. There was just too much mislabeling and not enough correct folder names for any comparison tools to be of any value. For example, some albums were duplicated in "The Band', "Band, The" and just "Band"

        I decide I had to at least get rid of duplicates and get the file metatags right. So I plowed through the collection for hours over several days with MusicBrainz Picard and deleted as many matching albums as I could find. I also compared his collection to mine as I went down the alphabet and deleting anything I already had. I also moved anything the was not easily identifiable to a "needs work" folder and will probably be deleted as well. This resulted in a 15-20% reduction in the number of files.

        Next steps will be to try and automate album name matching to 100% insure I have no dups already in my collection. I suspect I can cull another 10-20% before I will be satisfied.


        Please Read Me

        Comment

        Working...
        X