DEV Community

brthanmathwoag
brthanmathwoag

Posted on • Originally published at blog.tznvy.eu on

Hands-on Powershell: Pruning an exported VCF contact list

Recently, when hiking in the mountains, I got caught in a raincloud and got completely soaked. When I eventually arrived at the alpine hut, I found out the power switch in my phone no longer worked. It would power on just fine, but once the screen was turned off, it was impossible to turn on again. Luckily, this was just enough to backup all data while I was looking for a new phone.

When the new phone arrived in the mail and I was up to importing the contact list, I remembered I've always wanted to clean it up a bit. See, back when I had my previous phone set up, I logged in to my Google account, which caused the whole Gmail addressbook to download. Thanks, Google, I guess, but I don't even do email on my phone. Some emails were appended to existing contacts, but quite a few did not match, resulting in duplicated entries for some people, and junk entries for individuals and companies that I messaged only once 10 years ago or so.

So I thought, this would be a good opportunity to get rid of them.

I looked at the exported contact list and it luckily turned out to be just a flat textfile with concatenated vCards. This is what it looked like:

BEGIN:VCARD
VERSION:2.1
N:;Lastname;Firstname;;
FN:Nickname
TEL;CELL;PREF:123456789
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:Surname;Firstname;;;
FN:Firstname Surname
EMAIL;PREF:email@test.com
PHOTO;ENCODING=BASE64;JPEG:/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAUDBAoJCAsJCQk
 ...
 LS5apuBYSw02xdyPEsRPJxMByx33iEoQZdohkFlugEIHljtYiFfNHtnp/9k=

END:VCARD
BEGIN:VCARD
VERSION:2.1
N:;Nickname1;;;
FN:Nickname1
TEL;CELL;PREF:456789123
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:;Nickname2;;;
FN:Nickname2
TEL;CELL;PREF:789123456
END:VCARD

etc...
Enter fullscreen mode Exit fullscreen mode

So what I needed to do was to split the text into separate vCard records, then filter out those which don't contain a line beginning with TEL;.

In Powershell, this could be achieved with:

$in = 'unfiltered.vcf'
$out = 'filtered.vcf'

Get-Content $in `
    | Out-String `
    | Select-String -pattern '(?s)BEGIN:VCARD.*?END:VCARD' -AllMatches `
    | % { $_.Matches} `
    | % { $_.Value } `
    | ? { $_.Contains("`nTEL;") } `
    | Out-File -Encoding ascii $out
Enter fullscreen mode Exit fullscreen mode

(pardon me for the broken highlighting, there currently is no powershell highlighter in pandoc, the upcoming release will have loadable language definitions though so I will be able to use m0t0k1's kate-powershell then)

Now, there are some interesting parts here. Because Select-String cannot match over multiple elements on pipe, we have to use Out-String on Get-Content output so that the whole file is processed as one string, rather than line by line. (?s) enables multiline pattern matching just like /s modifier in traditional perl regexp. Also because we are processing one huge string, -AllMatches has to be set so that Powershell doesn't stop after finding first match - similar to /g in perl. Finally, we need to set encoding on Out-File explicitly. Otherwise, it would be saved in UTF-16LE with BOM, which, at least on my phone, caused a generic Could not import contacts error.

Now, the above code ran for a sub-second on my machine for a 200-some KB file but if your contact list contains many base64-encoded photos, slurping the whole file at once and applying regex to it might not be the best idea. Instead, we could read the file line by line and buffer them up manually:

$in = 'unfiltered.vcf'
$out = 'filtered.vcf'

$curr = @()
Get-Content $in | % {
    $curr += $_
    if ($_.StartsWith("TEL;")) {
        $has_tel = $true
    }
    if ($_ -eq 'END:VCARD') {
        if ($has_tel) {
            $curr
        }
        $curr = @()
        $has_tel = $false
    }
} | Out-File -Encoding ascii $out
Enter fullscreen mode Exit fullscreen mode

Less clever, longer, but probably scaling better.

This post was originally published on blog.tznvy.eu

Top comments (0)