Regular Expressions?

Eric_Chadwick · March 28, 2017, 9:18pm

I need to batch-replace all PHP calls in HTML files with simple HTML. Maybe regular expressions are the way to go?

Background… I’ve been tasked with dumbing down our Mambo & Gallery2 -generated website into vanilla HTML, as a temporary measure to stop the crazy CPU load being generated by one or more CGI’s on the site. Our web designer is unavailable, and frankly I don’t want the guy touching the site ever again.

Anyhow, I used a spider to convert the PHP calls into HTML pages, so the code I need to replace is something like this:

Replace this:
<param name=“filename” value=“gallery2/main****.html?g2_view=core.DownloadItem&g2_itemId=928&g2_serialNumber=3&g2_GALLERYSID=9953fe8cc7591acadb87b39d2ac13d81”>’);
With this:
<param name=“filename” value=“gallery2/main****.wmv”>’);
Where **** is a different series of alphanumeric characters per HTML file, so it needs to remain the same, but I want to change html to wmv (or other ext), and I need to strip everything else up to the end quote.

I tried using the regex code in a batch-replace script for HTML-Kit, an html editor, but I can’t figure it out, I keep getting “expected this” errors.

Eric_Chadwick · March 28, 2017, 9:27pm

Currently going through this tutorial…
http://www.regular-expressions.info/tutorial.html

Hopefully I’ll figure it out before someone chimes in here.

Adam_Pletcher · March 28, 2017, 9:27pm

Regex should work well for this. A regular expression like this should do the trick: “.html(.*)”.

[FONT=Verdana]… then do a “sub” or “substitute” on the match. I’m not sure what language you’re using, but in Python it would go like this:[/FONT]

>>> import re
>>> str = ‘’’<param name=“filename” value=“gallery2/mainWHATEVER.html?g2_view=core.DownloadItem&g2_ite mId=928&g2_serialNumber=3&g2_GALLERYSID=99 53fe8cc7591acadb87b39d2ac13d81”>’);’’’
>>> re.compile(’.html(.*)’)
>>> newStr = reObj.sub(’.wmv">’);’, str)
>>> print newStr
<param name=“filename” value=“gallery2/mainWHATEVER.wmv”>’);

[FONT=Verdana]If you never memorize any complicated regex syntax (like me), you can still get a long way with B[/B], which just matches an arbitrarily long sub-string.[/FONT]

[FONT=Verdana]Also, there’s a little program called The Regex Coach that’s amazingly useful. Lets you type/edit regex’s on the fly and interactively see how the matches work against a test string.
[/FONT]

Eric_Chadwick · March 28, 2017, 9:27pm

Thanks Adam, that tool’s great!

Seems I need to make sure that the find/replace tool “remembers” the chars between main and .html, so it can restore them during the replace. These chars are different in each of the 100 or so files I have to process, but each refers to a specific file so I need those chars intact.

Someone mentioned using the variables ($1) ($2) etc. for storing strings in memory. Is WHATEVER a variable in Python that remembers a string, so I can put it back into my string during the replace?

Still working through that tut, it’s starting to dawn on me.

Adam_Pletcher · March 28, 2017, 9:27pm

“Whatever” was just what I typed to indicate a string. Same as **** in your original post.

You shouldn’t have to store that string anywhere in the example I gave, it should remain untouched by the replace.

What that example does specifically is first looks for a “.html” inside your string, and creates a match that includes that plus everything that comes after it. That match (from “.html” onward) is then replaced with the new “.wmv…” string.

The part that comes ahead of “.html” is left alone, including the **** parts that change for each line.

Eric_Chadwick · March 28, 2017, 9:27pm

Ah OK, I think I got it now. Regex Coach is indeed very useful.

Find:
.html[^"
]*

Replace:
.wmv

Thanks Adam!

Rob_Galanakis · March 28, 2017, 9:27pm

Started a wiki page on [w]Regular expressions[/w], don’t forget to update it when you’re done here! Good thread, I’ve always known about regular expressions but never used them, Regex Coach will be really useful.