| |
About
Some days you get the data, some days the data gets you …
|
|

Photo by jeremybarwick
What is a Location Independent Freelance Lifestyle? I would say that, for some, location refers to where the money is; others will understand it to be where they hang their mouse at night.
Either way, freedom is the fuel that fires the freelancer’s soul and location independence is the tinder.
Knocking Down Doors

Photo by recursion_see_recursion
I call myself the Digital Tinker. Historically, tinkers went from town to town, fixing small items before moving on. Talk about a nomadic lifestyle!
My entrepreneurial spirit is suited to location independence. I’ve always been involved in sales, from SA8 soap, to term insurance; from legal plans to real estate; from radio advertising to website development. Each of these endeavors had one common thread: I had to go to the prospects. So, I am comfortable going “on-site”.
However, as a life-style, I’m still growing into it.
Knocking Down Windows

Photo by jglsongs
It all began the day my computer died. My son rebuilt a desktop computer from spare parts, but I didn’t like being disconnected from the software and data that define my freelance work. I began researching portable applications.
There is a wealth of open source and commercial software that perform tasks at least as well as my desktop versions. I also discovered that quite a few of the desktop applications could be used in a portable environment.
Encouraged by these two facts, I began to transfer everything onto an 8GB (gigabyte) USB key. Next, I backed up the key onto a portable USB hard drive. Finally, I installed the portable version of RoboForm onto the key, which contained all the passwords to websites I visit.
Once the crucial files were set up, I simply treated my replacement computer as a public machine. It turned out to be good practice, as I began to spend a lot of time doing client research at the library.
Knocking Down Walls

Photo by robinfensom
One casualty of the nascent mobile movement was my ODesk account. Similar to sites like Elance, Rent-A-Coder and Guru, ODesk brings buyers and sellers together for the purpose of exchanging digital services for money. Unlike my account at Guru.com, ODesk requires desktop monitoring software. I cheerfully gave that up, as I don’t like being monitored, anyway!
For too many years, I relied on creaky versions of Microsoft® Office. Prospective clients usually had newer versions. I struggled to adapt to new gadgets, such as the accursed ribbon and new file formats. Eventually, I managed to become more productive.
Email and file backup took the hardest hit. These things are like toothbrushes: I can’t use yours and new ones take some getting used to. In fact, just this past week (with Sharon’s help!), I completed the transition from desktop email to Google Apps. Gmail will require a whole new outlook on email (sorry, I couldn’t resist.)
Backup is like a religion to me. I’m still searching for enlightenment in this new nomadic age.
Opportunity Knocks

Photo by Ben Zvan
I use what I call the Three Ps to build my freelance business:
- Profile
- Portfolio
- Promote
I have profiles scattered on so many websites I need a profile of my profiles! Thankfully, Google does that.
My Google Profile
A portfolio shouldn’t be a dry list of completed projects. I try to keep readers engaged, even if they couldn’t care less about the actual project.
My Engaging Portfolio
Promotion is a daily, natural part of networking. I use FaceBook, Twitter, forums and blogs to discuss business and personal topics. I only follow one rule: no blatant advertising. However, if you ask …
By doing the little things, I’ve been able to establish credibility, remain visible and build momentum.
Don’t Knock It ‘Til You Try It

The location independent lifestyle may have recently sprung up on me, but the mindset and infrastructure have been in place for years. I’ve already talked about the mindset on my main blog. The infrastructure is simply the human and computer networks upon which I rely. At least once a month, the human network feeds me a referral. The computer networks make it possible for me to act on them.
|
There is a time to cut costs and a time to ensure quality at any price.
Right now is not the right time for either service extreme.
Like many small business owners, my company is experiencing a crunch. While looking for ways to remain relevant and competitive, I joined oDesk.com,which is a marketplace for freelance programmers and writers.
oDesk has a very structured hourly-centric system that exposes the service extreme:
Offshore freelancers are willing to work for $4.00 per hour
New freelancers and those with lower feedback scores
Freelancers with high feedback scores are charging a premium for their services.
The group of freelancers in the middle constitutes a thick cluster of fierce bidding competition.
All of this is good for the freelance marketplace. The cream rises, the crud curdles and the best of the middle joins the elite after building some feedback.
Take a look at the freelance question from the viewpoint of your business. Can you afford to cut costs, taking a chance on offshore talent? On the other hand, can you really afford to pay a premium for the elite freelancer? oDesk clearly states that you are paying for time, not a product. If you get poor results, you’ll just have to hire someone else to fix the mess. If you went for the low-end solution, chances are, you won’t want another bargain basement solution. If you were disappointed with the boutique brand, you’ll wish you had gone to the middle. Either way, the fix means yet another expense.
What if you could present your problem without risk? Instead of paying for time, why not pay a fixed price for quality?
My Digital Tinker service initial consultation will ensure that I can even do the job. The proposed solution is one of a menu of fixed cost services. You won’t pay a dime until the job is done.
How does this benefit you?
You get to pre-qualify the service.
You control the cost.
You control your cash flow.
Your business needs every dollar to be well-spent. Contact me for help with your data management challenges.
Download the Digital Tinker Data Cleansing Packet.
|
The following testimonial was solicited from Rachel Tolbert, a Fingerprint Identification Supervisor at the Philadelphia Police Department.
1. What one thing has the database(s) helped you accomplish that was more difficult before?
The databases that were created by Mitchell Allen have helped me accomplish a lot in a shorter amount of time. The one I use the most often creates an alphabetical listing of various pieces of information that are needed to fax to other departments. It creates a neat, orderly, professional account of the information that is required.
Also an added advantage of one of the databases is it does the math for you. Occasionally, the tallied numbers may be incorrectly totaled. As long as you verify the information was input correctly, the program is going to give you the necessary correct totals.
These figures are used biweekly and monthly, and decisions are made based on these accurate figures, so this database is producing very important information.
2. How has your work day benefited from using the database?
I can do a preview on it prior to printing, to determine if all my information was input correctly. This has also helped me to send the most accurate information. It has been an invaluable tool over the years. I use it every single day.
3. Would you recommend Mitchell to others?
Definitely. I hate to think about how long it would take us everyday to organize the information that has been done for us- as we input it onto the page he created just for our purposes. It really makes this one aspect of our job so much easier.
4. Why would you recommend him?
I would recommend him because he is thorough, competent and very professional.
Read more testimonials
|

Some software ideas are best left in the ad hoc bin. This is another way of saying that, just because I can create something, I don’t necessarily want to.
My friend Erin was looking for a quick way to determine which forums on her At Home Resources website were performing well and which topics should be abandoned.
I did an unsolicited analysis - just for fun.
Once I had completed the project, Erin was so happy with the reports, I started thinking about ways to make this service available to others. Just as quickly, I turned away from the idea. There were two main reasons for not forging ahead:
1. I’m quite sure that forum software already provides this information
2. I didn’t think it would be sustainable
So, the software sits in my archive folder, a dim reminder of a concept that did its job and doesn’t need to be developed further.
Four Microsoft ® Access queries, one report and no front end form. A classic example of a quick-and-dirty solution.
|
Here’s a change of pace. In this article, I’ll show you how I used Microsoft ® Word, Microsoft ® Excel and Notepad to clean up an old blog post for republishing.
Project Purpose
For my personal blog, I decided to do a links posts, which is used to highlight older posts that new visitors may enjoy and long-time readers can revisit. There are some SEO benefits, so that incentive was there as well.
On my old blog, I regularly maintained one such post. It was called the Ambient Orb Blog Index and, rather than just having a few selected posts, I put all of them in this one post! I wouldn’t recommend that today, but back then, the website had no other way of effectively categorizing and showcasing blog posts.
Anyway, to give a sense of context for my new links post, I wanted to share the original Ambient Orb Blog Index. Of course, that meant that I would have to remove all of the dead links.
Examining the Raw Data
The original post was saved in Microsoft ® Word format.
Here are a few lines from the file:
<li><font size=2 color=blue><b><a href=”http://www.writingup.com/mitchell_allen/am_i_an_artist_or_an_engineer”>OOO</a>: Am I an Artist or an Engineer?</b></font>
<li><font size=2 color=blue><b><a href=”http://www.writingup.com/mitchell_allen/beethoven_hearing_aids_and_suicide”>OOO</a>: Beethoven, Hearing Aids and Suicide</b></font>
<li><font size=2 color=blue><b><a href=”http://www.writingup.com/mitchell_allen/a_tortured_history_of_mass_storage”>OOO</a>: A Tortured History of Mass Storage</b></font>
As each link was unique, I couldn’t do a mass search and replace. My first idea was to “bracket” the data that I wanted to delete. I figured that would allow me to use Microsoft ® Word’s wildcard search. Bracketing is a term I use to describe using a weird character to enclose or delimit (separate) substrings within a longer string. Usually, bracketing involves searching for a common substring and either appending the weird character to the end of the common substring or, if the common substring is unneeded, replacing that common substring altogether.
First Cleansing
For this project, I wanted to get rid of bits of unneeded text while bracketing the links. So, the first thing I did was to change every occurrence of
<a href=”http://www.writingup.com/mitchell_allen/
to
~
To complete the bracket and get rid of even more useless text, I changed every occurrence of
“>OOO</a>:
to the same ~.
Here are the modified lines at this point:
<li><font size=2 color=blue><b>~am_i_an_artist_or_an_engineer~Am I an Artist or an Engineer?</b></font>
<li><font size=2 color=blue><b>~beethoven_hearing_aids_and_suicide~Beethoven, Hearing Aids and Suicide</b></font>
<li><font size=2 color=blue><b>~a_tortured_history_of_mass_storage~A Tortured History of Mass Storage</b></font>
Now, I had to get rid of the bracketed text.
Weak Soap
In order to understand the wildcard search and replace, let me digress a moment. Look at this nonsense text:
frogface
frogs in the field
swollen river flows peacefully
Let’s perform a wildcard search and replace. To change every occurrence of text bracketed by the letter f to the (equally nonsensical) string consisting of fishf, we do the following:
1. Click Edit on the menu and select Replace … (or press Ctrl+H)
2. In the Find what: box, type f*f
3. Press the More button
4. Check Use wildcards
5. In the Replace with: box, type fishf
6. Press the Replace All button
This is the result of the wildcard search and replace:
fishface
fishfield
swollen river fishfully
The asterisk (*) is a wildcard operator that takes the place of zero or more arbitrary chararacters. When you don’t know or care what’s being matched, asterisk is your friend.
On the first line of nonsense text, the asterisk matches rog. On the second line, it matches rogs in the . Finally, on the third line, it matches lows peace.
Keep in mind, the search and replace function does things one at a time, so we don’t wind up with our asterisk matching ace and ield swollen river.
By the way, if you’d like to see all of the wildcard search operators, check the wildcard box and click the Special button.
Now, back to the project. I had fully expected to be able to replace ~*~ with a character string of my choice (actually, just deleting it would have been best, but I didn’t want to confuse you). However, no matches were found! I never figured out what I did wrong, but I must have introduced a spurious space character to the end of the search string because, when I recreated the files for this article, the wildcard search worked perfectly. So, instead of being done with this project, I had to try another tack.
After muttering a few choice curses, I went to my industrial-strength solution.
Strong Soap
Microsoft ® Excel has powerful conversion utilities. Once I had bracketed my data, I could literally show Microsoft ® Excel how to read it and convert it into a spreadsheet.
First, I pasted the slightly scrubbed data into Notepad and saved the file. I didn’t want any formatting characters to be imported into the spreadsheet. Next, I opened Microsoft ® Excel, clicked Data and selected Get External Data.
This invoked a powerful wizard that walked me through choosing the Notepad file, selecting the ~ as a delimiter and choosing the destination on the spreadsheet.
In seconds, I had a worksheet that looked like this:

Click to view full size
Now that the dead links were separated, all I had to do was concatenate column A with column C and place the result in column D.
=A1 & C1
Here is a screen shot, with column B hidden:

Click to view full size
Squeaky Clean
The only thing I had to do after all that, was to paste column D into a new Microsoft ® Word document and fix the syntax of the font tag by adding quotes (which weren’t needed on the old blog software).
The result can be viewed at my blog: Ambient Orb - Blog Index.
After years of manually processing documents, I wrote a program that greatly simplifies - and speeds up - tedious tasks such as this one.
Visit www.parsermonster.com for more information.
|
Jumbalaya Jam
Before I learned about HTML in 2003, I learned how to get rid of the tags that define it.
One of my passions is designing games and I wanted to recreate Boggle® on the computer.
My wife loved to play an online game called Jumbalaya, which is similar to Boggle®.
I decided to borrow her word lists for my Boggle® dictionary, so that it would reject nonsense words.
Because she played so many games in one sitting, I wrote a program in Visual Basic to read through the Internet Explorer Cache, locate the pages, read each one and extract just the words.
Real Estate Ads
A more recent project involved sifting through online advertising for real estate. I wanted a neat list that I could store in my own database.
This time, I wrote a program to copy the actual web page (just like selecting a portion of text to paste into another location).
I felt guilty because I thought I was doing something called scraping a website. However, when I finally learned the true meaning of scraping, which is to republish other people’s website content as if it were your own, I realized that I wasn’t doing anything unethical.
Parsing Power
The real power behind both mini-projects was the parsing engine.
To parse means to separate a string of data into parts.
It is very useful to parse an HTML file into its tags and values. Your web browser does this in order to display the web page on your screen.
It is also useful to be able to parse a string of characters, in order to find specific patterns. Google does this when it separates your search into keywords, by finding the spaces and punctuation between the words.
You don’t always have to use a programming language: Microsoft® Access and Microsoft® Excel provide string manipulation functions that can help you parse strings.
Keep in mind, though, these two programs don’t always agree on the names or even how to use the function!
For example, suppose you want to see if a string contains a comma. If it does, you want to know where it starts (this is called the string position).
Here is the way each program expects you to use their function:
Access: InStr([start, ]string1, string2)
string1 is the string being searched
string2 is the smaller string being sought
start is an optional parameter which tells the function where in string1 to begin looking for string2
Excel: =SEARCH (find_text,within_text,start_num)
within_text is the string being searched
find_text is the smaller string being sought
start_num is an optional parameter which tells the function where in within_text to begin looking for find_text
If the search function does not find the string being sought, Access returns a 0, but Excel gives you a #VALUE error.
This means that you have to go through hoops to get Excel to parse arbitrary strings.
To understand my suggested solution below, open up Excel and type Digital,Tinker into cell A1.
In cell A2, type or paste the following formula:
=SEARCH(”,”, A1)
When you press Enter, A2 will display 8.
If you remove the comma from A1, A2 will display #VALUE.
Experts Only
I am always learning and never claim to know all the answers. I hope that, if you consider yourself an expert, you’ll do two things:
- Share your techniques with us
- Be opened-minded about unusual techniques you find by other readers and by me
Now, consider the reason the Excel function fails and think of the easiest way to avoid it.
Here’s what I think: just add the smaller string to the end of the string being searched!
Adding one string to another string is called concatenating and is symbolized by the ampersand ( & )
In cell A3, type or paste the following formula:
=SEARCH(”,”, A1 & “,”)*(SEARCH(”,”, A1 & “,”) <LEN(A1))
When you press Enter, A3 displays a 0.
Add the comma to A1, press Enter and both A2 and A3 will display the string position correctly.
This works because of a little-known feature of nearly all programming languages. Normally, when you evaluate an equation, the result is TRUE or FALSE.
However, if you use that result in another equation, Excel treats it as a 1 if TRUE or a 0 if FALSE.
To see this, type or paste just the evaluation into A4:
=(SEARCH(”,”, A1 & “,”) <LEN(A1))
A4 will display either TRUE or FALSE, depending on whether a comma is in A1.
Now, change A4 to multiply that evaluation by 1:
=(SEARCH(”,”, A1 & “,”) <LEN(A1))*1
Now you’ll see a 1 or a 0 in A4, depending on the presence of a comma in A1.
Looking again at cell A3, you can now see that by multiplying the result of the search by the result of the evaluation, we’re basically saying, “multiply the search result by either 0 or 1″.
As you may recall from elementary school mathematics, any number multiplied by 0 equals 0, while any number multiplied by 1 equals that number.
(In some programming languages, such as Visual Basic and Access, -1 is used. You could negate the final result or use the ABS( ) function to change it into a positive number.)
As for the mysterious LEN( ), Excel returns the length of the string of text in A1. So, the evaluation does the SEARCH, compares the result to the length of the original string and returns 1 (TRUE) if the string position of the comma is less than the length of the string in A1. That would mean that a comma was found before the one we concatenated to the end!
By the way, in Excel, you are always working with a copy of something in your formulas. We never really concatenate a comma to the end of A1. We are using a copy of A1 in our other cells.
If you come up with anything shorter than this suggestion, I’d love to hear it!
After years of manually processing documents, I wrote a program that greatly simplifies - and speeds up - tedious tasks such as this one.
Visit www.parsermonster.com for more information.
|
One of the best things you can do with your contact list is to create a standard for capitalization.
Whether your data is spread across multiple fields or jammed into one (shame on you!), a parsing function can make your reports look more professional.
proper(”John D. ROCKWELL, S. Kinnickinnic Av and Lake Pkwy, milwaukee, Wi”)
becomes
John D. Rockwell, S. Kinnickinnic Av & Lake Pkwy, Milwaukee, WI
Way back in 1999, I modified a function that I found in the Microsoft ® Access Solutions Database.
If you look closely, you’ll see several items relevant to Philadelphia and law enforcement.
This function is written for Access but, if you know how to add functions in Excel, it should work with spreadsheet data, too.
The basic idea is to read each character and look at what comes before and after it. Based on what it finds, the function will capitalize the character and move on to the next. The default behavior is to capitalize the first letter of each word.
In addition to the basic functionality, you can add all kinds of rules for capitalizing location indicators (Rd., Ln, Drive), states (VA instead of Va) and organizations (FBI, CIA, NBC).
For example, I added a rule to change and to & - making sure not to change Anderson to &erson!
There is one limitation in my version:
Names such as O’Brien and Wilson-Smythe are properly capitalized, but MacDonald is changed to Macdonald, and van Buren to Van Buren
If your data has many similar names that are not properly capitalized, just add a new rule to look for them!
This tip is published below for educational purposes only. If you use it, make sure you only work with sample data!
I didn’t doctor it for this blog: what you see is how I used it. It doesn’t win any clarity awards, but it gets the job done!
‘ *** ProperNouns
‘
‘ *** This Module can be exported to MODULES.MDB
‘
‘ The ProperNouns Module contains functions which clean up names and addresses.
‘
‘ ———————————————————————————–
Function Proper(x)
‘ Capitalize first letter of every word in a field.
‘ Use in an event procedure in AfterUpdate of control;
‘ for example, [Last Name] = Proper([Last Name]).
‘ Names such as O’Brien and Wilson-Smythe are properly capitalized,
‘ but MacDonald is changed to Macdonald, and van Buren to Van Buren.
‘ Note: For this function to work correctly, you must specify
‘ Option Compare Database in the Declarations section of this module.
‘ *** This routine was copied from the Solution Database and modified
‘ *** to ignore ordinal numbers (1st, 2nd, 3rd, 4th).
‘ *** Additionally, the abbreviation for states are capitalized.
‘ ***
‘ *** Revision History
‘ *** 09/09/1999 - apostrophes are checked to prevent Upper case ”s”
‘ *** 11/20/1999 - spaces are added between most symbols - fixed ordinal bug
‘ *** 11/29/1999 - ”And” converted to ”&”
‘ *** 01/10/2000 - fixed ”And” bug, which changed Anderson to &erson
‘ *** 09/12/2000 - double quotes converted to single quotes (fixes John”s)
‘ *** 09/12/2000 - Capitalized specific three-letter strings (see org)
‘ *** 02/22/2002 - Fixed bug in three-letter capitalization. Added misc. two-letter strings
‘ *** 04/08/2002 - Created a ”Scrub” function to eliminate extraneous characters
Dim raw As String, temp As String, c As String, OldC As String
Dim aa As String, ab As String
Dim i As Integer, j As Integer, k As Integer
If IsNull(x) Then
Exit Function
End If
Const symbols = ”#&/()” ’ these should have spaces around them
Const digits = ”0123456789″
Const ordinal = ”StNdRdTh”
Const state = ”AL AS AR AZ CA CO CT DC DE FL GA HI ID IL IN IA KS KT LA ME ” _
& ”MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI ” _
& ”SC SD TN TX UT VT VA WA WV WI WY ”
Const org = ”ATF CJC DAO DEA FBI IAD III IRS JFK PHA PNC PPL UPS VII ”
Const misc2 = ”II IV IX VI GP”
’ Add missing spaces (e.g.: change Phila.,PA to Phila., PA) and (P/O to P / O)
raw = Scrub(x)
For i = 1 To Len(raw)
aa = Mid$(raw, i, 1)
ab = Mid$(raw, i + 1, 1)
If InStr(symbols, aa) Then
temp = temp & aa
If ab ” ” Then
temp = temp & ” ”
End If
ElseIf InStr(symbols, ab) Then
If aa ” ” Then
temp = temp & aa & ” ”
Else
temp = temp & aa
End If
Else
temp = temp & aa
If InStr(1, ”0123456789′ -”, aa) = 0 Then
If (aa ”z”) And ab ” ” Then
If Not (aa = ”.” And ab = ”,”) Then
temp = temp & ” ”
End If
End If
End If
End If
Next i
’ Initialize OldC to a single space because first
’ letter needs to be capitalized but has no preceding letter.
OldC = ” ”
For i = 1 To Len(temp)
c = Mid(temp, i, 1)
If c >= ”a” And c <= ”z” And (OldC ”z”) _
And Not (OldC = ”‘” And c = ”s”) Then
Mid(temp, i, 1) = UCase(c)
End If
OldC = c
Next i
raw = Trim(temp) & ” ”
temp = ”"
For j = 1 To Len(raw) - 2 ’ Check ordinals
aa = Mid$(raw, j, 1)
ab = Mid$(raw, j + 1, 2)
temp = temp & aa
If InStr(1, digits, aa) > 0 Then
x = InStr(1, ordinal, ab)
If x > 0 Then
temp = temp & LCase(Mid$(ordinal, x, 2))
j = j + 2 ’ skip ordinal in Temp
End If
End If
Next j
For k = 1 To Len(state) Step 3 ’ Check States
x = InStr(1, temp, Mid(state, k, 2))
If x > 2 Then
If Mid(temp, x - 2, 2) = ”, ” Then
If x + 2 > Len(temp) Or _
(Mid(temp, x + 2, 1) = ” ”) Then
temp = Left(temp, x - 1) & UCase(Mid(temp, x, 2)) & Mid(temp, x + 2)
End If
End If
End If
Next k
For k = 1 To Len(org) Step 4 ’ Check Organizations and other 3-letter initials
x = InStr(1, temp, Mid(org, k, 3))
Select Case x
Case Is = 1 ’ check for space BEFORE 3-letter initials
If Len(temp) = 3 Then
temp = UCase(Mid(temp, x, 3)) & Mid(temp, x + 3)
ElseIf Mid(temp, x + 3, 1) = ” ” Then
temp = UCase(Mid(temp, x, 3)) & Mid(temp, x + 3)
End If
Case Is > 1 ’ check for space BEFORE and AFTER 3-letter initials
If Mid(temp, x - 1, 1) = ” ” And _
(x + 3 > Len(temp) Or _
Mid(temp, x + 3, 1) = ” ”) Then
temp = Left(temp, x - 1) & UCase(Mid(temp, x, 3)) & Mid(temp, x + 3)
End If
End Select
Next k
For k = 1 To Len(misc2) Step 3 ’ Check miscellaneous 2-letter initials
x = InStr(1, temp, Mid(misc2, k, 2))
Select Case x
Case Is = 1 ’ check for space BEFORE 2-letter initials
If Len(temp) = 2 Then
temp = UCase(Mid(temp, x, 2)) & Mid(temp, x + 2)
ElseIf Mid(temp, x + 2, 1) = ” ” Then
temp = UCase(Mid(temp, x, 2)) & Mid(temp, x + 2)
End If
Case Is > 1 ’ check for space BEFORE and AFTER 2-letter initials
If Mid(temp, x - 1, 1) = ” ” And _
(x + 2 > Len(temp) Or _
Mid(temp, x + 2, 1) = ” ”) Then
temp = Left(temp, x - 1) & UCase(Mid(temp, x, 2)) & Mid(temp, x + 2)
End If
End Select
Next k
’ Convert And to ampersand
x = InStr(1, temp, ” And ”)
If x > 0 Then
temp = Left$(temp, x) & ”& ” & Mid$(temp, x + 5)
End If
’ process names beginning with Mc or Mac
x = InStr(temp, ” mc”)
If x Then
aa = UCase(Mid$(temp, x + 3, 1))
i = 4
If aa = ” ” Then
aa = UCase(Mid$(temp, x + 4, 1))
i = 5 ’ skip the space
End If
temp = Left(temp, x) & ”Mc” & aa & Mid$(temp, x + i)
End If
Proper = temp
End Function
After years of manually processing documents, I wrote a program that greatly simplifies - and speeds up - tedious tasks such as this one.
Visit www.parsermonster.com for more information.
|
No, this isn’t a copywriting article. It’s a plea to all office managers who send insertion text to webmasters.
This is also a horror story, so if you have a weak stomach, click away now!
Julie is the owner and publisher of the Fredericksburg Guidebook and its companion website, fredericksburgguidebook.com.
It is a quarterly publication. Advertisers pay for inclusion in the guidebook and can pay for a listing online.
Each quarter, the office manager solicits new business, while updating and deleting existing accounts.
Naturally, I get the results for updating the website.
“It was a dark and stormy night. Suddenly, my cell phone rang. Julie wanted an update to the list of sponsors on her website.”
You Want it WHEN?
The quarterly list of advertiser insertions, deletions and modifications was on its way to my in-box.
As usual, I was getting it at the last minute. The pressure was on! (I had never missed a deadline and I wasn’t about to start, now.)
I had just under a week to update the website.
This is NOT What I Asked For!
The first update I ever received from the office manager was in a Microsoft Excel spreadsheet. Cruella (not her real name) explained her system to me and I told her that I could work with that. I had written a custom database program that reads text and pumps out web pages. But for this project, I modified it to import the spreadsheet data into a database table.
Unfortunately, after just two updates, Cruella changed her system and started using Microsoft Word! So I had to re-tool my factory, so to speak. I added some widgets and doo-dads and fairy dust that allowed me to feed the required text file for this project.
Now, all I had to do was convert the Word document into the proper text format.
Are You Feeding Me A Line?
Long-time Word users know that there are three ways to generate a new line of text:
- Natural break (you just keep on typing and Word “wraps” the text)
- Manual line break (hold the Shift key down and press the Enter key)
- Paragraph break (just press the Enter key)
Well, Cruella managed to get all three of them into one document! This simply means that she had no fixed format for her data. Actually, in most cases, there was a manual line break between the ad copy and the contact information. In addition, there were usually two paragraph breaks between advertisers. To make things interesting, she inserted three paragraph breaks between categories of advertisers. For example, all of the restaurants were listed together, followed by all of the beauty salons.
That helped.
I used Word’s search and replace to convert all the manual line breaks to paragraph breaks.
Next, I converted every paragraph break to this pair of characters:
~p
This removes all the formatting information, which is stored with the paragraph mark. (Believe me, you’ll go blind alternating between 14 pt headers and 10 pt body text!)
Then, I went through the document, looking for groups of these and stamping out inconsistencies.
Once I ensured that all the advertisers were separated from each other by exactly one ~p, I made sure each category was separated by two ~p ~p.
Finally, I converted each ~p back into a paragraph break.
I pasted the whole kit and caboodle into a plain text file and fed it to my Frankenstein monster.
Hyper Text
One gritty detail that I’ve hidden from you, gentle reader, is that fredericksburgguidebook.com originally had tons of duplicated ads listed in different categories.
Julie insisted that this is what her clients demanded. If you wanted to visit a restaurant for a juicy steak, you might miss Pablo’s Ice Cream Land, who serves steak, pizzas and ice cream! Poor Pablo would have to plaster his ad in three categories!
Pablo, a fictional businessman, had nothing on the real restaurant owners, some of whom had 14 copies of the same ad throughout the site.
After that update debacle, I insisted that the office manager learn to do her own updates!
WordPress to the Rescue!
My son Sheldon tweaked a WordPress theme so that the advertisers’ categories and subcategories would show up in the sidebar.
Meanwhile, I opened up the Frankenstein monster and plopped in yet another converter. Instead of pumping out web pages, I made it pump out WordPress static pages with no HTML (except for special characters).
We ended up condensing 380 insertions to 86 unique advertisers. We manually categorized each one and uploaded the lot to fredericksburgguidebook.com/guide.
Technically Speaking
My job was only half done. If Cruella couldn’t do the updates herself, my mission would have been a failure.
So, I wrote a user guide: Manage Your Online Directory.
Cruella used it.
The new office manager uses it.
I have more free time.
After years of manually processing documents, I wrote a program that greatly simplifies - and speeds up - tedious tasks such as this one.
Visit www.parsermonster.com for more information.
|
Believe it or not, I started out in the sub-cellar of my old personal computer, cleaning up MP3 files that I had either downloaded or ripped from my CD collection. In the olden days, there was a clever website called CDDB.com (now gracenote.com) that had a database of CDs based on the number of tracks and the length of time for each track. MusicMatch Jukebox, the software I was using then, would connect to CDDB and provide the information from a newly ripped CD. If CDDB found a match, it would return all of the data for the CD!
Once I learned the method that CDDB was using to store its information, I marveled at the elegance of the service.
First of all, this was very much like giving a CD its own fingerprint, since it is highly unlikely that any two CDs would have the exact same number of tracks, with the exact minutes and seconds for each track in the exact same order! Indeed, the only time CDDB returned multiple matches was when the CD was rereleased. (That was just my experience, though. I’m sure there were false positives.)
Secondly, the user community that updated the database was very altruistic. As far as I know, the only interaction was through MusicMatch Jukebox: when a CD match was not found, I would laboriously type in the song titles and submit them to the CDDB database.
read more about CDDB on wikipedia.
During all of this activity, I learned about the extra data that gets stored along with the actual music in an MP3:
- ID3v1 - the original MP3 tag stored a few items, such as title, artist, album, track number and genre
- ID3v2 - an enhanced version allows more space for the ID3v1 items and has even more items, such as lyrics, cover art
As you might expect (or know from experience), downloaded music did not always have the correct tag information.
One of the benefits of using MusicMatch Jukebox to store my music was the ability to create a library and sort my music by ID3 tags.
Naturally, selecting a genre is subjective and I spent a lot of time changing ID3 tags.
MusicMatch Jukebox had a primitive ID3 tagging function that relied on the physical file name of the MP3. This became quite a chore, as I could never decide how best to name the files.
My first data cleansing project, therefore, was to buy a program called Dr.Tag. Since it was a dedicated application, it did a better job than the MusicMatch Jukebox software. After learning the nuances of renaming physical files based on ID3 tag information versus updating ID3 tags based on the file name, I was able to clean up my music library in a few hours.
|
|
|