Using Anaconda from PowerShell

After installing Anaconda, you can invoke conda init to initialize settings for PowerShell, including PowerShell Core. There is the problem of this approach as Anaconda is initialized every time PowerShell is launched. This is OK if you are using Python every time you launch PowerShell, however, with my case, it is often the case that I use PowerShell and no Python. This is inefficient.

I’ve decided to create a Cmdlet called Initialize-Conda. Here’s the content. You can replace C:/Path/to/Conda to your Anaconda location to adopt this to your use. (Available for Windows and Linux in Gist.)

function Initialize-Conda
{
	$CONDA_ROOT_DIR = "C:/Path/to/Conda" # Change this
        [System.Environment]::SetEnvironmentVariable("CONDA_EXE", "$CONDA_ROOT_DIR/Scripts/conda.exe", [System.EnvironmentVariableTarget]::Process)
	[System.Environment]::SetEnvironmentVariable("_CE_M", "", [System.EnvironmentVariableTarget]::Process)
	[System.Environment]::SetEnvironmentVariable("_CE_CONDA", "", [System.EnvironmentVariableTarget]::Process)
    [System.Environment]::SetEnvironmentVariable("_CONDA_ROOT", "$CONDA_ROOT_DIR", [System.EnvironmentVariableTarget]::Process)
        [System.Environment]::SetEnvironmentVariable("_CONDA_EXE", "$CONDA_ROOT_DIR/Scripts/conda.exe", [System.EnvironmentVariableTarget]::Process)

        Import-Module -Scope Global "$Env:_CONDA_ROOT/shell/condabin/Conda.psm1"
        conda activate base
}

Environment variable initialization appears bit long — this is because scope of the variable does not get passed outside of the script scope. Therefore, I had to use .NET’s SetEnvironmentVarible to initialize environmental variable for the scope of the process. (Under Linux, CONDA_EXE and _CONDA_EXE should be changed to $CONDA_ROOT_DIR/bin/conda)

It is also possible to use Add-CondaEnvironmentToPrompt to prefix the prompt with the current Conda environment. I’ve omitted this as I found this to be unreliable.

Here’s the corresponding PSD1 file. (Available from Gist.)

@{
    RootModule = 'Conda.psm1'
    ModuleVersion = '1.0.0.0'
    FunctionsToExport = @(
            'Initialize-Conda'
        )
    CmdletsToExport   = '*'
    VariablesToExport = '*'
    AliasesToExport   = '*'
    GUID = '23421dee-ca6f-4847-9c93-1268c629964a'
    Author = 'Hideki Saito'
    Description = 'Anaconda Initializer'
    PowerShellVersion = '6.0'
    CompatiblePSEditions = 'Core'
    Copyright = '(c) 2019 Hideki Saito. All rights reserved.'
    PrivateData = @{
        PSData = @{
            ProjectUri = ''
            ReleaseNotes = ''
        }
    }
}

Take these two files, Conda.psm1, and Conda.psd1, and place them under Documents\PowerShell\Modules\Conda (Under Linux, it’s ~/.local/share/powershell/Modules/Conda) and then you should be able to launch Initialize-Conda.

PowerShell Core + Event Management

I have been staffing at Sakura-Con Guest Relations since 2002, and throughout years, I have been experimenting with numerous technologies to support event planning. Previously, I have used Org-mode, some small Python scripts, and this year, I have been constructing the system using PowerShell Core. I have been using PowerShell to various applications, like tabulating Karaoke data, I wanted to expand the use cases for this area as well.

Motivations

My goal for this year to build a centralize solution to:

  • Provide the automated way of tabulating event schedule data and create a data set that I can work with.
  • Provide conversion from the event schedule data to other useful data type, such as calendar format.

The reason I have picked PowerShell Core is:

  • I wanted the solution that works across multiple platforms. I use Linux and Windows. Big plus if it runs on Android. What makes this great is that PowerShell Core can run on Android environment through app like UserLAnd. (If you are doing this on Android, the keyboard such as this one helps a lot.)
  • I wanted something that I can use offline. During the event, internet connection may be degraded to the point it’s unusable.

Components and Setup

First module I have created, is a data fetcher to retrieve schedule data from the scheduling system. The scheduling system, unfortunately lacked API that allowed me to retrieve the data from the system, and I used Selenium to retrieve the data.

Retrieving data using Selenium (Scheduling solution name is obscured due to the confidentiality reasons.)

Obtaining data through Selenium would take about 30 seconds — there are about 800 entries. (though, I regularly interact about 1/8 of it.) The Selenium module navigates to the appropriate calendar ranges through UI, and grabs the result out from the DOM. This is coded in C#. Since Selenium provides driver for both Linux and Windows, same module works works cross-platform as well. (Probably not on Android, but if there’s a driver for Android, this potentially can be done on Android as well.)

The system exposes the following data, and is exported as array:

   TypeName: SakuraCon.Relations.Data.Event
Name        MemberType     Definition
----        ----------     ----------
End Time    AliasProperty  End Time = EndTime
Event Title AliasProperty  Event Title = EventTitle
Start Time  AliasProperty  Start Time = StartTime
Equals      Method         bool Equals(System.Object obj)
GetHashCode Method         int GetHashCode()
GetType     Method         type GetType()
ToString    Method         string ToString()
EndTime     Property       datetime EndTime {get;set;}
EventId     Property       string EventId {get;set;}
EventTitle  Property       string EventTitle {get;set;}
Notes       Property       string Notes {get;set;}
Rating      Property       string Rating {get;set;}
StartTime   Property       datetime StartTime {get;set;}
Type        Property       string Type {get;set;}
Venue       Property       string Venue {get;set;}
Duration    ScriptProperty System.Object Duration {get=($this.EndTime - $this.StartTime);}

This is coupled with the simple ps1xml:

<Types>
    <Type>
        <Name>SakuraCon.Relations.Data.Event</Name>
        <Members>
            <MemberSet>
                <Name>PSStandardMembers</Name>
                <Members>
                    <PropertySet>
                        <Name>DefaultDisplayPropertySet</Name>
                        <ReferencedProperties>
                            <Name>Event Title</Name>
                            <Name>Venue</Name>
                            <Name>Type</Name>
                            <Name>Start Time</Name>
                            <Name>End Time</Name>
                            <Name>Duration</Name>
                            <Name>Notes</Name>
                        </ReferencedProperties>
                    </PropertySet>
                </Members>
            </MemberSet>
            <AliasProperty>
                <Name>Event Title</Name>
                <ReferencedMemberName>EventTitle</ReferencedMemberName>
            </AliasProperty>
            <AliasProperty>
                <Name>Start Time</Name>
                <ReferencedMemberName>StartTime</ReferencedMemberName>
            </AliasProperty>
            <AliasProperty>
                <Name>End Time</Name>
                <ReferencedMemberName>EndTime</ReferencedMemberName>
            </AliasProperty>
            <ScriptProperty>
                <Name>Duration</Name>
                <GetScriptBlock>($this.EndTime - $this.StartTime)</GetScriptBlock>
            </ScriptProperty>
        </Members>
    </Type>
</Types>

Usually, I export this data into clixml — this way, I can retrieve the contents later as needed. PowerShell provides the convenient cmdlet to do this.

$schedule | Export-Clixml schedule.xml

Importing this is easy:

$schedule = Import-Clixml schedule.xml

This would allow offline access to the data offline, as exported XML is essentially the snapshot of obtained data.

From this, I can easier export this as CSV (used for tabulating schedule information) and as a iCalendar format that can be imported into Google Calendar.

Something I like about this data structure is that I can use Where-Object to retrieve desired information. If I want all the event that happens in the room 6C, I would query:

$schedule | ?{ $_.Venue -eq "6C" }

I’ve combined this with the following script:

function Get-ScNextEvent {
    param(
        [parameter(Mandatory = $true, ValueFromPipeline = $true)]
        $Schedule,
        [parameter(Mandatory = $false)]
        [uint] $Hour = 1
    )
    $result = $Schedule | Where-Object { ($_.StartTime -ge [DateTime]::Now) -and ($_.EndTime -le [DateTime]::Now.AddHours($Hour))}

    return ($result | Sort-Object StartTime)
}

function Get-ScCurrentEvent {
    param(
        [parameter(Mandatory = $true, ValueFromPipeline = $true)]
        $Schedule
    )
    $result = $Schedule | Where-Object { ($_.StartTime -ge [DateTime]::Now) -and ($_.EndTime -le [DateTime]::Now)}

    return ($result | Sort-Object StartTime)
}

This will retrieve the data about the next and current event that taking place.

Other PowerShell Core Applications

Schedule is not only place I have utilized PowerShell Core. Another area I use PowerShell Core is in area of generating form letters. Since I have the letter source as LaTeX source, this is matter of passing the personalized arguments through parameters. Since PowerShell contains the functionality to convert a CSV into data structure, the list of the names are populated as CSV.

function process {
    param(
        [Parameter(position = 0, mandatory = $true)]
        $Name,
        [Parameter(position = 1, mandatory = $true)]
        $Identifier
        )
    $OutputEncoding = [System.Text.Encoding]::GetEncoding('shift_jis')
    $docGuid =  [guid]::NewGuid().ToString()

    uplatex -kanji=utf8 --no-guestt-input-enc -jobname="Welcome_$Identifier" "\newcommand\scguestname{$Name}\input{2019_welcome.tex}"
    dvipdfmx "Welcome_$Identifier"

    Remove-Item @("Welcome_${Identifier}.aux", "Welcome_${Identifier}.log", "Welcome_${Identifier}.out", "Welcome_${Identifier}.dvi") -ErrorAction SilentlyContinue
    $OutputEncoding = New-Object System.Text.ASCIIEncoding
}

$names = Import-Csv welcome.csv

foreach($item in $names)
{
    $currentloc = (Get-Location).Path
    $identifier = $item.Name -replace " ","_"
    Write-Host "$identifier"
    $fileExist = Test-Path (Join-Path -Path $currentloc -ChildPath "Welcome_${identifier}.pdf")
    Pop-Location
    if($fileExist -eq $False)
    {
        process -Identifier $identifier -Name $item.JapaneseName
    }
    else {
        Write-Host -BackgroundColor Red "File exists, please delete a PDF file if you really need update this file."
    }

Because of the way the command line arguments passed, this is the area where I had struggle running this under Windows (because of the way the frontend handles character encoding) and had to use Windows Subsystem for Linux (WSL) to generate Japanese letters; but since PowerShell Core is available both for Linux and Windows, same components and scripts are used unmodified.

This will take about 30 seconds to generate 20 letters.

Conclusion

PowerShell Core provided, cross-platform and consistent environment to support data wrangling within limited area of relations tasks I was handling.

I am planning to improve the system to support schedule conflict, workload verification as well as staffing.

Should we give Firefox another chance?

While the Mozilla Firefox vs. Google Chrome seems to be a topic that comes up once in a while, while the recently, it is dwarfed by close to 60% of the market share by Chrome.  Recently there was an article called It’s time to give Firefox another chance.

The article emphasizes that the next version of Firefox Quantum will be the major release for Firefox.

I have been using Chrome as my primary web browser for a long time, while before, I have been using Firefox. But recently, I am finding I am using more Firefox more often.

From few different perspective, here are my thought on this topic. With this article, I am not making assertion that the one browser is better than others, but to get the foundation out so it can be compared time to time.

Platform Support

Chrome and Firefox both supports Windows, Linux, and Mac. Thus they are on par with this perspective.

Feature Comparisons

Browser Sync

Chrome supports syncing via Google Sync, and Firefox via Firefox Account. They offer similar functionality. Both Chrome and Firefox offer encrypted sync. Perhaps Firefox’s sync feature is a little more comprehensive as it supports history syncing while Chrome seems to be limited in this regard.

Security

Both Chrome and Firefox are discovered with good number of security issues. (and it is important they are addressed in timely manners, too!) Discussing these can be article of its own, so I am focusing here regarding the security feature.

Chrome has been supporting sandboxing and most recent version of Firefox is supporting it as well. From more robust multi process design with Chrome (more of that in later) perhaps in this area, Chrome has bit of edge compared to Firefox.

One thing I like about Firefox that lacks on Chrome is the support of master password, which you can lock password store with the master password.

Also Firefox has a useful extension called Firefox Multi-Account Containers, which can isolate session cookies, which makes it easier to confine certain sites (like Facebook) to reduce tracking.

Chrome has recently moved toward hiding security information from the browser interface, perhaps to make it more user friendly. One illustration of this problem is that they are making jump through some hoops to be able to look up certificate information. (Through developer tools.) While this can be turned back on through experimental features, it is bit concerning they are hiding this information for the sake of user friendliness; if that’s the reason.

Operating System Integration

Chrome uses operating system’s own certificate management where available, which makes it easy to integrate in environment where internal certificate (while this is not a best practice, but it happens!) or internal Active Directory authentications are used.

Firefox, in contrast, uses Network Security Service, which basically is a security storage independent of the system. This makes the system less integrated with the one configured through the system. While configuring Active Directory authentication for Firefox is certainly possible, it is not a simple one-click process.

With this regard, Chrome seems to integrate a little better with the operating system.

Google Service Integration

This is somewhat silly notion when Chrome is designed by Google who owns those Google services. One of the major blocking factor is if there’s Chromecast involved, as at least from the desktop, there is no practical way to control it outside of Chrome.

Other than that, aside from some services that heavily expect Chrome to be used, most of Google Service should work on both of those browsers.

Mobile Versions

With extensions support, it appears Firefox for Android, also known as Fennec feels like a mobile port of the desktop version rather than the scaled down version, which is the way every time I use Chrome I can’t help notice.

Internal Designs

Chrome is designed from the ground up with multi-process utilization in mind, which this development was relatively recent with Firefox. While they work similar, with this regard, perhaps Chrome has a slight advantage of this in stability. Multi process support for Firefox, called Electrolysis can be disengaged for having incompatible feature sets (like accessibility) or extensions. This problem usually comes from the older type of extensions, and perhaps will soon go away when Firefox 57 only supports WebExtension.

Other Features I Love

Firefox features a reader mode. This mode is very useful when reading long article as it removes a lot of clutter off the document. (Also supported on a mobile version as well!)

 

On Characteristics of Machine Learning

My previous article, Do Neural Networks Dream of Pokémon? I have inferred it is not very effective use of the machine learning technology, and I wanted to do some follow up on that aspect.

First of all, the definition of machine learning is that technology that allows machine to determine outcome without explicitly programming covering all the cases. Generally, when the program is written, in context of normal application programs, programmers have already programmed what to do when certain things happens. For instance, if you write in URL to your web browser, it displays the page under that URL. It is because programmers who have programmed your web browser program to do that.

Therefore, with elements on Pokémon where all the combination of elements are already known, it is more reliable to program all the case for it. (And Pokémon is programmed to do so.)

Unlike the case above, machine learning is effective in the cases like below:

  • Cases where input parameters and their combinations are large, and is not practical to program for all.
  • There are unknowns and ambiguities in expected input.
  • Rate of change and characteristics in input is very subtle. (For it would require a lot more data to determine certain outcome, it is related to the characteristics on input parameters.)

For example, self-driving car is very difficult problem, because it is not possible to explicitly program virtually infinite cases that it has to deal with. (For instance, let’s say the self-driving car has to determine if someone’s standing on the road, it is not realistic to program for all the situations, for example, it can be varying weather, lighting condition, place where the person is standing, and their motion.)

However, there are way such ambiguities can be reduced, for example by the traffic infrastructure, such as traffic signals, and communications between cars, and while current self-driving car technologies focuses on co-existence between current cars and environment, but opposite approach of revising traffic infrastructure should be also taking place.

Before, Google CEO said “It’s a bug that cars were invented before computers” and if he meant that by having computer prior, traffic infrastructure would have been designed that way, I think he was right on.

Back to Pokémon, with Pokémon, it’s 18 elements that each can have up to 2 overlaps, and it is not a large information to process, and can be programmed so without too much of effort. However, if this becomes hundreds, and thousands, with varying overlaps, it becomes very difficult problem to program. (It is however, to program for their patterns, thus, it’s not something that has to be programmed one-by-one.)

Pokémon neural network I’ve experienced other day is a very simple case of neural network, but image recognition and other advanced recognitions are just mere extension of it.

Democratization of machine learning is certainly a big topic in near future, and my intent is to continue experimenting for it.

Do Neural Networks Dream of Pokémon?

One thing I wanted to experiment with TensorFlow (and neural network) is a some simple numerical processing. I have decided to put it into test.

I was looking for some subject, and then I noticed something about Pokémon Sun & Moon. There is a Festival Plaza feature in the game where players can play set of minigame. One of those minigame is where player can attempt to match “most effective” elements given a single or a set of element. For example, one of such example with be Bug and Grass — the answer is Fire gives Super Effective, which earns you point. There are four classes of effectiveness, from none to most effective are, No Effect, Not Very Effective, Normal and Super Effective.

There is a truth table on how this is layouted, but I wanted to see if some learning from mistake approach can predict outcome of this minigame.

Using a dozen of festival tickets, I’ve collected around 80 of log. Next quest is to research how these data can be normalized to test — as I am not yet very familiar with the way TensorFlow works in this regard, with a lot of focus on image recognition, it is surprisingly hard to find information about where numeral data is processed, then I stumbled upon existing effort by mtitg, using TensorFlow to see if TensorFlow can predict survivors of Titanic incident based on multiple parameters.

mtitg’s example uses many parameters, 8 to be exact, but in my case, I will be using three. mtitg’s code is very great starting point as it covers topics about how textual data can be prepared to be converted to numeral value for TensorFlow to process.

I have adopted mtitg’s code to work with my test data.

Long story short, with my limited testing on 80, it turns out to be 50% at most I can get leverage out of. I think the reason as follows:

  • Not enough variety of data; since this is obtained through manual process, there are manual process and considering fairly complexity of Pokémon’s elements, data represented is certainly not a exhaustive set of data.
  • Too many “Normal” outcome. Default for most of outcome that are not affected by certain elements, for example, fire element for bug is normal. Thus data tend to revert to normal, which provided more example where outcome is normal than other three classes.
  • Perhaps neural network is not very good approach for this problem. Perhaps simple logistic regression, and/or Bayesian algorithms would work better.

Conclusion

As I wrote earlier, with existence of actual element pairing data, there is no practical reason of this attempt; it’s really for learning and fun, after all.

With further optimization and research, TensorFlow and machine learning method have a great potential in making sense of data as well as to provide added value from dataset that may be already present. Machine visions and self-driving cars are very cool application for the machine learning technologies, but we shouldn’t forget adopting this technology on our personal computers we already have.

Jupyter the Python Based Notebook System

Jupyter the Python based notebook system

Started using fairly recently, there is a tool called Jupyter. Characteristics are:

  • It is based on Python; it was originally called IPython which provided interactive Python environment.
    • It now supports more than Python, thus is now developed under Project Jupyter. IPython, however, is still in development for providing interactive Python environment, which Jupyter uses for its Python execution.
  • For the format options, it is possible to use Markdown (and LaTeX!) for documentation.
  • Code can be executed within the notebook.

and more. (I really use this with Python only, and I cannot tell much about other part where some other languages are used.) I personally like:

  • It can use many features in Python. For instance, libraries such as Tensorflow can be used. Depending on the purpose, libraries like SciPy, NumPy to perform advanced calculation and plotting, SymPy to use it as a CAS (Computer Algebra System) to do symbolic math, and use Pandas and Python internal libraries to interact with databases.
  • Notes can be saved to share. There is nbviewer that enable people to view notebooks right on the browser. Services like GitHub uses it on Gist which can display the notebook.
  • Codes can be saved as Python, which makes it a development environment that can execute Python expressions interactively. Written notes are exported as a comment.

I have done installation on Linux and Windows, and I found Anaconda to be easy to install. (Anaconda uses it as a based on its commercial platform, and additional features are available for purchase thus making it scalable platform where these modules and supports are needed and have enough budget to do so. It is however, is bit too expensive for personal use.) For Linux, you can just use pip to install Jupyter on pre-existing installation. I will write more some details as I feel like it.

Why 28-backspace Vulnerability is not so Serious?

There have been news being circulated around regarding 28-backspace vulnerability of a Linux bootloader, Grub2.

Why this is not a serious problem? Because relying your security on bootloader is utterly stupid to begin with. If someone has a physical access to your machine, that means, they can easily bypass limitations set by a boot loader. It is a very simple feat to create a thumbdrive with a bootloader without such authentication in place, whether this vulnerability exists or not. Therefore, the existence of this vulnerability does not provide any difference from a security standpoint.

To protect a system with this type of issues, you’ll have to take steps like:

  • Limit physical access to the device.
  • Encrypt root partition

If only thing this protection on this Grub2 feature provides, that’s an illusion of security, maybe we are better off without this feature. (If someone’s so concerned about giving access to rescue shell so easily, in casual operation, then remove a rescue shell from Grub2 installation. If system administrator ever needs one, just boot from external media with rescue shell access.)

T-mobile Roaming in Japan

I was traveling Japan earlier, from March 23rd, to March 30th. This was my first time traveling with my phone set to roam. The last travel before that was precisely 7 years 1 month 24 days before this trip, and I did not roam for two reasons back then; one it was cost prohibitive, and secondly, back around that time, I did not have a phone capable of roaming; back in time, I was using T-mobile MDA, essentially a renamed version of the HTC Wizard, which was an EDGE capable, which basically meant there’s no roaming available in Japan. (Japan is one of the few countries out there never had seen GSM deployment.)

Forwarding the clock for 7 years 1 month, 24 days, now I have a Nexus 5, which supports LTE and UTMS. This meant finally I am capable of trying out roaming in Japan. Coincidentally, as a part of T-mobile’s Un-Carrier initiative, they have made available free roaming (albeit with some limitations) available to simple plan subscribers, which basically gave me a chance to try it out.
The premise is that, I get data and text roaming free of charge (data locked to 128Kbps) and calling at $0.20/minutes. Latter pricing is not actually too bad, as some of their domestic plan can come very close or even exceed that at high margins. (For example, Softbank would charge 20 yen — roughly 17 cents per 30 seconds in their White Plan, and DOCOMO would also charge up to 20 yen, but generally more like 11 to 15 yen, that’s 9 to 12 cents. Although in Japan, receiving the call is free as a caller actually picks up that bill, but in case of roaming, both making and receiving a call incurs charges, just like they would in domestic calls in the US.)

Getting off the plane, I have waited for a while, until the phone receives signal. Actually the phone force restarted, but it probably has nothing to do with the roaming; perhaps it’s something to do with the Android 5.1 issue, but once that’s over, it was a fairly smooth ride. The phone received a signal, showing JP DOCOMO as a carrier. Then I received the following two messages from 156 number:

Free T-Mobile Msg: Welcome to Japan. Unlimited text incl with your global coverage. Talk $0.20/min. More info http://t-mo.co/tc

Free T-Mobile Msg: Unlimited web included as part of your global coverage. To purchase high speed data please visit: http://t-mo.co/4G-Data

After a while, I have also received the following from 889 number:

Free T-mobile Msg: For faster web browsing, you can purchase a high speed data pass at: http://t-mo.co/4G-Data

The data pass, which I did not purchase, comes in three increments: 100MB for $15.00 (expires in a day), 200MB for $25.00 (expires in a week) and 500MB for $50.00 (expires in two weeks) — again, I did not purchase these so I can’t speak of experiences using those add-ons. These are still bargains, considering non-simple choice plan would incur $15.00 per MB. While connected to Japanese network, it was locked to UTMS. I am not sure if it’ll be the case with these optional packages or whether purchasing one of these options would unlock access to LTE network.

Pulling IP address of the device showed that it was T-mobile USA’s IP range, which was not surprising considering the packet would still go through their access point, even though it is piped through Japanese network. At 128Kbps, I was worried that my data experience suck, however, it handled fairly well, especially after I turned off most of background backups of photos. It handled things like Facebook post, and Instagram fairly well.

Messaging applications naturally did not have any issues. One thing I noticed is when I jumped from one carrier to another (the network I could roam was on DOCOMO, as well as Softbank) and occasionally, especially when I was emerging out from no-signal area to coverage area, for some reason, it took a while to phone to realize that I have a data connection. This intermittent connection issue was somewhat annoying, but not critical for most of my applications.

Google Maps operated and helped a lot navigating a massive network of Tokyo train systems. Browsing experience was not too bad either; this probably also have helped by the fact that Google Chrome compresses data for me in the background.

Softbank and DOCOMO mobile cell towers -- my device probably roamed both of those towers at some point!
Softbank and DOCOMO mobile cell towers — my device probably roamed both of those towers at some point!

In any case, having some form of connectibility was way better than having nothing, and the first try with T-mobile free roaming was a very pleasant one. Now I can’t wait to go back again! 🙂