Just started with linux mint, need help with a command

skaarl@feddit.nl · edit-2 2 days ago

Just started with linux mint, need help with a command

harsh3466@lemmy.ml · 2 days ago

You’re welcome! Happy I could help.

One other quick note, do the filenames or directories have spaces in them? If they do, that will cause a problem with the command as it is and need some additional modification. I accounted for the possible spaces in the directory names with the find command, but not with xargs. I just realized that as I was looking it over again.

skaarl@feddit.nl · 2 days ago

That was it! Thank you. I got rid of over 150 files in 127 directories with a lot less clicks than through the file explorer.

Luckily this time there were no spaces in the names. Spaces in names are a PITA at my stage of learning, and I’m never sure if I should use ’ or ".

Btw, new challenge in the edited original post, if you haven’t yet exhausted your thinking quota for the day lol.

harsh3466@lemmy.ml · 1 day ago

I’d tackle this as two different commands rather than trying to come up with a oneliner to handle moving and renaming with incrementing numbers all in one go. It could be done all in one go, but to accomplish that, I’d probably write a bash script over a one liner.

So we’ll start with moving them, which is the easy part using find:

find html-dir -name '*.html' -exec mv {} /path/to/new/dir/ \;

Let’s look at each argument:

find - calls find

html-dir - is the path of the top level directory that contains all of the html files and/or other directories that contain the html files you want to move. This command will go recursively through all directories it finds inside of this top level directory.

-name - this flag tells find to use the following expression to match filenames

'*.html' - This is the expression being used to find filenames. It’s using globbing to match any file that has the .html extension

-exec this is kind of like xargs, but for find. It allows you to take what has been found and act on it.

mv - the move command to move the files to the new directory

{} - this is the placeholder to call the find results

/path/to/new/dir/ - the directory you want to move the .html files to

\; - this terminates the exec action. The backslash escapes the termination character. You can use two different termination characters, ; and +. The semicolon tells exec to pass the results through one at a time, while the plus tells exec to pass the results through all at once.

Once all of the files are in the new folder, cd into that folder and use this to enumerate the files:

n=1; for file in *; do pad="$(printf '%02d' $n)"; mv $file $pad$file; ((n++)); done

This is a one liner similar to what you were trying to do.

n=1; - sets our first number for the enumeration with a semicolon to terminate and set up the next command

for file in *; do - Sets up our for loop. file will be the variable in the loop representing each file as it loops, and * will bring in all the .html files in the directory you are currently in, and ; do terminates and sets up the loop.

pad="$(printf '%02d"'' $n)"; - You may or may not want to use this. I like to pad my numbers so they list properly. This creates a new variable inside the loop called pad. Using command substitution, the printf command will take our $n variable and pad it. The padding is dictated by the '%02d', which will turn 1 into 01, and so on. If there will be more than 99 .html files, and you want padding to three digits, then change '%02d' to '%03d'. Again the semicolon terminates and sets up the next command

mv $file $pad$file; - this is pretty straightforward. It’s taking the file being acted on in the loop as represented by the variable $file and moving it inside the same directory with the mv command to give it a new enumerated name by concatenating the $pad and $file variables into a single filename. so a.html would become 01a.html. This command is also terminated with a semicolon

((n++)); - this is an easy way in bash to increment a numeric variable, again terminated by a semicolon,

done ends the loop, and the command.

Let me know how it works for you!

skaarl@feddit.nl · edit-2 18 hours ago

😍😍😍😍 thanks harsh!! I’ll study this and report back. I really appreciate your time and effort. There is a lot to learn here, and actually the padding is on my list of things to learn, so thank you sensei! As to your question about the integers, the files need to be in alphabetical order before getting the integer prepended to them, so like

folder1/file1
folder1/file2
folder2/file1
folder2/file2
folder2/file3

turns to

folder1/001file1
folder1/002file2
folder2/003file1
folder2/004file2
folder2/005file3

that way in the folder when it’s all said and done I’ll have

001file1
002file2
003file1
004file2
005file3

I’ll check if your method works out of the box for that or if I have to use the sort function like you showed me last time. Thanks again!

harsh3466@lemmy.ml · 17 hours ago

It won’t work the way you need it to as is, since the find command will first move all of those files together into the same folder, and if there are similar names between folder, like:

folder1/file1
folder2/file1

Then it will throw errors and/or overwrite files.

I’m at work so I can’t loom at this right now but I’ll look at it later when I get home.

skaarl@feddit.nl · 16 hours ago

Yea, I just came back to say this. Since cp overwrites by default (I tried copying first before trying moving) and each folder has files named index001 index002 etc then then folder where they all go has only ONE of index001.html, ONE of index002.html etc. So I think what I need to do is find each html file, rename it with a unique integer in front of the name, move it to the common folder.

harsh3466@lemmy.ml · 5 hours ago

Okay, took me awhile to write everything up. The script itself is pretty short, but it’s still much easier to do in a script than to try to make this a one line command.

I tested this creating a top level directory, and then creating three subdirectories inside it with a different number of html files inside those directories, and it worked perfectly. I’m going to break down exactly what’s going on in the script, but do note the two commented commands. I set this script up so you can test it before actually executing it on your files. In the breakdown of the script I’m going to ignore the testing command as if it were not in the script.

The script:

#! /bin/bash

# script to move all html files from a series of directories into a single directory, and rename them as they are moved by adding a numeric indicator
# to the beginning of the filename, while keeping files of the same folder grouped.

fileList="$(find ~/test -name '*.html' | sort)"

num=1

while IFS= read -r line; do

	pad="$(printf '%03d' $num)"

	#The below echo command will test the script to ensure it works. 
	#The output to the terminal will show the mv command for each file.
	#If the results are what you want, you can comment this line out to disable the command, or delete it entirely.

	echo "mv $line ~/done/"$pad${line##*/}""

	#This commented out mv command will actually move and rename all of the files.
	#When you are certain based on the testing that the script will work as desired
	#uncomment this line to allow the command to run and move & rename the files.

	# mv $line ~/done/"$pad${line##*/}"
	((num++))

done<<<"$fileList"

The breakdown of the script is in the reply comment. Lemmy wouldn’t let me post it as one comment.

harsh3466@lemmy.ml · 5 hours ago

The breakdown:

#! /bin/bash - This heads every bash script and is necessary to tell your shell environment what interpreter to use for the script. In this case we’re using /bin/bash to execute the script.

fileList="$(find ~/path/to/dir/with/html/files -name '*.html' | sort)" - What this command is doing is creating a variable called fileList using command substitution. Command substitution encloses a command in "$()" to tell bash to execute the command(s) contained within the substitution group and save the output of the command(s) to the variable. In this case the commands are a find command piped into a sort command.

find ~/path/to/dir/with/html/files -name '*.html' | sort - So this is the command set that will execute and the output of this command will be saved to the variable fileList.

find - invokes the find tool for finding files

~/path/to/dir/with/html/files - This tells find where to start looking for files. You’ll want to change this to the top level directory containing all the subdirectories with the html files.

-name - tells find to match files names using the expression that follows

'*.html' - This is the expression find will use to match files. In this case it’s using globbing to find all files with a file extension of .html. The * glob means to match any number of characters (including no characters at all). So when you combine the glob with the file extension for *.html, you’re telling find to find any files that have any characters at all in the filename as long as that filename ends with .html

| - The pipe redirects the output of the find command, which in this case is a list of files with the full path of those files and to sort them alphanumerically. That sorted list is then saved to the variable fileList

num=1 - Here we’re creating a variable called num with a value of 1. This is for adding sequential numbers to the files as they are moved from their source directory to the destination directory.

while IFS= read -r line; do - This script uses a while loop to process each item saved to the fileList variable. Within the while loop, the moving and renaming of the files will take place.

while - Invokes the while command. What while does is repeat all of the commands contained inside the loop as long as the given condition is true. In this case the condition is “While there are still items in the variable fileList to process, keep processing.” When there are no more items in the fileList variable to process, the condition becomes false and the loop terminates.

IFS= - This calls the Internal Field Separator. The Internal Field Separator is a set of three characters that are used by default to terminate an item. Those default characters are Tab, Space, and Newline. Because the contents of the fileList variable are separated by newline characters, the Internal Field Separator will take the items in the variable one line at a time instead of feeding the entire list into the while loop as one big chunk of text.

read -r - The read command does what it says and reads the input given. In this case our variable fileList (we’ll get to how we make the while loop read the variable below.). The -r flag tells read to ignore the backslash character (if it finds it anywhere) and just treat it as a normal part of the input.

line - this is the variable that each line will be saved in as they are worked through the loop. You can call this whatever you want. I just used line since it’s working through lines of input.

; do - The semicolon terminates the while setup, and do opens the loop for the commands we want to run using the input saved in line.

pad="$(printf '%03d' $num)" - This is another variable being created using command substitution. What the command in the substitution group does is take the num variable and pad it with zeroes to be a three digit number.

printf '%03d' $num - This is the command that runs inside the substitution set.

printf - calls the printf command, which is similar to echo in that it prints output to standard out (the terminal), but printf has more options for manipulating that output.

'%03d' $num - This is a format specifier that you use with printf. The % indicates that what follows is a format specifier. The 0 is the character that’s going to be used in the formatting. You can use any character you want in this position. The 3 indicates the amount of padding, in this case formatting the number to three digits, and the d indicates that what’s being formatted is an integer. The combined format specifier of %03d will then format the argument that follows it. In this case, the variable num.

mv $line /path/to/dest/"$pad${line##*/}" - This command actually moves and renames the file.

mv - Invokes the mv command to move some files

$line this argument is the file to be moved. This being the variable line will be expanded to the full file path of the current html file working through the while loop.

/path/to/dest/"$pad${line##.*/}" - This argument is the destination and renaming of the file being moved. The path part is pretty self explanatory. Replace this with the path to your desired destination. The filename bit needs its own explanation.

"$pad${line##.*/}" - This is the bit that renames the file. What this is doing is concatenating (joining) two different variables to create the new filename. $pad is the formatted num variable to result in a zero padded three digit number that will be added at the beginning of the new filename. ${line##.*/} is the line variable modified using parameter expansion, which is indicated by the curly braces.

Inside the curly braces you have three parts. The first is the parameter, which in this case is the line variable. This is followed by the special characters to modify the expansion, and then following the modification characters is the pattern to be matched.

Without modification, the line variable will look something like: /full/path/to/file1.html.

For the purposes of this mv command, we don’t want the full file path in the destination argument. If the destination argument was /path/to/dest/$pad$line, the expanded result would be /path/to/dest/001/full/path/to/file1.html. That’s obviously no good.

What we want here is /path/to/dest/001file1.html.

To get that, we use the ## modification characters which will take the pattern that follows it, search the parameter for that pattern, find the last occurrence of that pattern, and then delete everything up to that point.

After the ## special characters is the actual pattern to be matched which is .*/. That pattern is a regular expression made up of three characters. The . in a regular expression means to match any character at all. The * after the . dictates how many times the pattern can repeat. As mentioned above the * means to match zero or more times, so combined the .* means literally:

“Match any character at all zero or more times”.

The final character of the regular expression is the literal /. All three characters together (.*/) means:

“Match any character at all zero or more times until you get to the last forward slash character”.

When you combine the ## modification characters with the .*/ pattern like so: ##.*/ it means:

“Find any character at all zero or more times until you get to the last forward slash character, and then delete everything found including the last forward slash character”.

When you put all of that together you have ${file##.*/} which will take /full/path/to/file1.html and output file1.html.

Finally, when you combine the pad variable with it like so: "$pad${file##.*/}" (enclosed in double quotes to insure correct expansion), you get 001file1.html as the new name for the html file as it is moved to the new destination.

((num++)) - Is a nice easy way to increment a number variable in bash

done<<<"$fileList" - Is three parts. done indicates the end of the while loop. <<< is for heredoc, which is a bash utility that allows you to pass a multiline chunk of text into a command. To pass a variable into a command with heredoc you need to use three less than symbols (<<<). Finally, is the variable holding the chunk of text we want fed into the while loop, which is the fileList variable (double quoted to insure proper expansion ignoring spaces and other nonstandard characters).

And that’s the script! Let me know how it works for you.

harsh3466@lemmy.ml · 16 hours ago

Agree. I think for that a bash script will be a better approach. I’ll be off work in a few hours. I’ll take a look then.

harsh3466@lemmy.ml · 1 day ago

Okay, for the HTML files, do the integers need to be apied to the files in a certain order, like, alphanumerically from the folders they’re coming from?