Wednesday, November 5, 2014

Hybris Commerce workspace settings for Eclipse/STS IDE


Recently I took up the task of setting up Hybris Commerce development environment and in the process I had to setup Hybris workspace on my eclipse and STS IDE, Hybris comes with a ton of extensions and modules and I ended up having atleast 30 different projects on my workspace to resolve the build errors, with additional projects came the requirement of increased heap memory else STS and Eclipse would either freeze or crash during the ant build phase with an out of memory error.

So here are couple of steps that should be included to support a clean Hybris commerce workspace.
There is a very detailed document on the Hybris WIKI for recommended Eclipse and STS settings so make use of these steps as a addendum to Hybris official document.

64 bit JVM

No matter which IDE you make use of it is important to make sure your system is running a 64 bit version of the Oracle JVM, 32 bit JVM has a upper limit of 2GB RAM , so to break this ceiling we need to make use of 64 bit JVM. you can validate this by running the "java -version" command, the output should be as follows and this indicates that we have a 64 bit version of the JVM.

$java -version
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)

64 bit IDE and Java Heap Setting (Eclipse or STS)

Make sure you download 64 bit version of  Eclipse or STS IDE, once you have extracted and installed the respective IDE go ahead and edit Eclipse.ini or STS.ini file and make changes to increase the max heap size, refer following file as a reference from my toolkit setup.




Java Heap Setting for the ANT Task

The heap setting defined for the IDE in the previous step is only allocated to the IDE, something that most of us misunderstand here is this heap setting does not automatically apply to the process launched within within the IDE, for instance if we launch ANT build task the heap setting is still capped at 2gb.

To increase the max heap setting within the IDE you have to set the VM arguments for the Ant Build in the Run->External Tools->External Tools Configurations dialog. Click the JRE tab and set the usual -Xmx and -Xms params as needed.



Restart the IDE and perform a ANT build, monitor the heap usage from the Task manager and you should notice that the IDE is crossing the threshold of 2GB limit now.


Sunday, November 2, 2014

Head first Chef


As a beginner it is quite challenging to understand where to begin with Chef, In this blog series on Chef I attempt to share my learning path and hopefully this will help others as well, Chef has evolved considerably in the last two years and so has the the tools surrounding this technology, not until recently the folks at chef did a great job of putting together best of breed tools and technologies and have created at SDK referred to as ChefDK. In the first series we will walk through the ChefDK installation, creation of simple cookbooks and spinning up a virtual machine in under 5 minutes.. Yeah that's what it takes to get a head start with ChefDK.

Installation

Usage

Chef Client can operate in two modes.

Local Mode – it simulates full chef server instance in memory, any data that would have been saved on server is saved locally, and this mode is used for rapid chef recipe development. We will be using this mode for our chef learning.

Client Mode – In this mode it is assumed that we have a Chef server running on another system, this mode is used in production or production like environment.

Exercise 1: Automating VM Creation 

"chef" utility was released along with the ChefDK and this streamlines various workflows associated with chef development. We will be using this utility for our cookbook generation.

>> chef generate app test_vm

This command will generate the basic template for the cookbook development, it will create a folder names hello-chef with quite a few files and sub folders in it. As a beginner we can focus on couple of files.

Take a look at the generated .kitchen.yml file and edit as follows, we are going to spin up a Cent OS VM and hence the name "centos-7.0"
Refer the Chef docs at https://docs.getchef.com/config_yml_kitchen.html for more details on the various parameters used in the kitchen.yml file

---
driver:
  name: vagrant

provisioner:
  name: chef_zero

platforms:
  - name: centos-7.0

suites:
  - name: default
    run_list:
      - recipe[test_vm::default]
    attributes:

Now Type following command and it should list various virtual machines supported by this cookbook, in this case you should get centos listed in the output

>> kitchen list
Instance          Driver   Provisioner  Last Action
default-centos70  Vagrant  ChefZero     <Not Created>

Create the virtual machine
>> kitchen converge
Once the machine is created you should be able to login from the command prompt
>> kitchen login


Exercise 2: Automating VM Creation + Apache Server Setup

In the previous exercise we did not really write a cookbook, all we did was to tweak the generated kitchen.yml file to spin up a virtual machine.
We will now edit the recipes/default.rb file to write our first cookbook to install an Apache Webserver.


execute "update Packages" do
  command 'yum list updates'
end

package 'httpd' do
  action :install
end

service 'httpd' do
  action [ :enable, :start ]
end

cookbook_file "/var/www/html/index.html" do
  source 'index.erb'
  mode '0644'
end

Create "index.erb" under templates/default folder, we will use this as the welcome page for the Apache server.

Hello World!! from <%= node['hostname'] %>
Create the virtual machine

>> kitchen converge
Access the http server running inside the newly created virtual machine, since we created a port forwarder we should be able to access it from the host machine as follows. http://localhost:8080/
you should be able to see the page with the following content
"Hello World!! from default-centos-70"

In this blog we looked at basics of ChefDK setup and few basic examples of Cookbook creation, In the next set of articles we will look at more advanced cookbook authoring such as using a community cookbook and creating custom cookbooks.


Reference

ChefDK - https://docs.getchef.com/#chef-dk-title
Test Kitchen - http://kitchen.ci/


Thursday, July 3, 2014

Google blocks custom chrome extensions

Google sighted security as the main driver behind this, but on the forums you can already read tech. enthusiast ranting against google forcing everyone to host the plugin on Google store through this trick.
https://sites.google.com/a/chromium.org/dev/developers/extensions-deployment-faq

I think there is some merit in the security concern, however, Google could have included an option for the users to opt-in to install custom extensions, with the latest security update they have closed the doors for custom extensions that could have been previously distributed outside of a chrome store.

I think a lot of intranet chrome extensions fall in this category. why would a company like to host something on chrome store if the extension was meant to be used within the organization for intranet usage.

Anyways knowing what Google is we are left with limited options for custom extensions.

Option 1: Host your extension on Google chrome store :)

Option 2: Uninstall the extension and re-install through a simple drag and drop of the crx file, the extension will work until you close and reopen your browser again

Option 3: Switch to Firefox extensions :)
https://blog.mozilla.org/addons/2014/06/05/how-to-develop-firefox-extension/

Sunday, April 27, 2014

Hosting sites for free on Github

In this blog I would like to explore how we can make use of GIT Pages to host websites for free, and not just hosting, GIT also makes it easy to manage your pages using prose.io CMS Lite tool.

Our software stack will be as follows

Github Pages hosting
http://prose.io/ CMS lite tool to edit pages
Google Analytics
Disqus for blog post comments
At a minimum you need to buy a domain name if would like to avoid accessing your site with git pages url, else you can access the site using http://USERNAME.github.io/
Prerequisite (Optional)

I have decided to manage my blog using Jekyll, but this is not mandatory, you may create a basic html site and host it on GIT. Github internally uses Jekyll engine and hence you have the luxury to code the page in Jekyll as well.

Since in this example we will explore building a blog and Jekyll makes it very easy to build a blog aware site
First install ruby and jekyll

Install rubygems followed by jekyll, on ubuntu I had to run following commands

sudo yum install ruby rubygems
sudo gem install jekyll
root@search:/home/search/git# git clone https://github.com/dbtek/jekyll-bootstrap-3 hariinfo.github.io

It is a good idea to begin with a template, I decided to make use of jekyll-bootstrap-3 which is a bootstrap 3 clone of Jekyll bootstrap
I got following error while installing jekyll as sudo account, so make sure you install jekyll as root.

ERROR:  While executing gem ... (Errno::EACCES) Permission denied - /var/lib/gems

Prepare your system to host jekyll powered blogger locally

$ mkdir jekyll_workspace
$ cd jekyll_workspace/
~/jekyll_workHost your site / blog on the internet
space/ $sudo gem install jekyll jekyll-import

-- This will take a while before importing all the libraries

~/jekyll_workspace $ jekyll new gblog

Migrating the existing Blog

First we need to install the jekyll-import gem module as follows

gem install jekyll-import

You can read more about this module here https://github.com/jekyll/jekyll-import

Blogger provides an option to export entire blog as Atom feed, but unfortunately there is no
jekyll importer as of this writing to import an ATOM feed.

Export the blog as RSS feed
http://<NAME>.blogspot.com/feeds/posts/default?alt=rss&max-results=<RECORDS>
Replace name with your blog name and RECORDS with the number of posts you would like to export.
E.g. I used this to export my blog and upto 500 posts, there isn't a nice way to export all so you will
have to count all the posts before exporting all records.

curl -o gdata.xml http://techhari.blogspot.com/feeds/posts/default?alt=rss&max-results=500

ruby -rubygems -e 'require "jekyll-import";JekyllImport::Importers::RSS.run({"source" => "gdata.xml"})'
Start the server and access localhost:4000 to visit the new jekyll powered blog.


~/jekyll_workspace/gblog $ jekyll serve
Configuration file: /home/hari/jekyll_workspace/gblog/_config.yml
            Source: /home/hari/jekyll_workspace/gblog
       Destination: /home/hari/jekyll_workspace/gblog/_site
      Generating... done.
    Server address: http://0.0.0.0:4000
  Server running... press ctrl-c to stop.

Host your site / blog on the internet

Github pages support Jekyll websites, so it should be easy for us to now
publish the locally created blog site to github so the entire world can access your new site

Create a new project respository on github with following naming convention
USERNAME.github.com

where <USERNAME> is your github name.

Saturday, March 22, 2014

Testing Cassandra durability

CommitLog, MemTable, and SSTable are 3 core components of cassandra, they work in tandom to gurantee durability aspect of cassandra, this is more of less similar to RDBMS databases where in commit logs are used to replay a transaction in case of of db server crash.

Here is a very basic high level life cycle of the data from the time it is written from a cassandra client to the time it is persisted to SSTables.
The complexity is slightly higher in the cluster setup, however, for starters this is good enough to understand the internals of Casandra write and update flow.



Step 1: Request is received by a random node in the cluster
Step 2: Node Writes data into the local commit log file in a sequential manner.
Step 3: Memtable gets updated in asynchronous mode.
Step 4: Memtable flushes the data to SSTables periodically, SStables is really the final persistance store for the data.
Step 5: Once data makes it way to SSTables the corresponding reference of the record in commit log and Memtable is flushed out.

Cassandra tool "nodetool" provides an option to explicitly flush the data in commit log or Memtable into SSTables,
This tool is supposed to be used for maintenance, however it is a nice utility that can be used during maintenance to ensure all pending transactions are
flushed out to the SSTables before a node shutdown.

Let us put cassandra's durability to test with a real world ecommerce use case.
Let us assume we have a keyspace that manages user cart or orders and given a scenario of node failure let us put cassandra's durability to test.

We will do following.
Load some sample data into cassandra and shutdown the databased before cassandra performs a flush to SSTables.

Make sure cassandra server is running, start by using following command in the foreground ./bin/cassandra -f


Once the data is loaded you can exit cqlsh and check your data directory, the location of data directory is defined in ./config/cassandra.yaml files "data_file_directories" key property.

E..g if mapped to your home directory, go to ~/cassandra/data/ecommerce/orders and you should not notice any files in this directory, usually you will find a couple of files related to SSTables in this location once the flush operation is completed.

We can terminate cassandra at this point to replicate a situation where in cassandra data is not yet flused to SSTables and is only available with in the commit log and Memtables in memory data store.

Now you can bring up cassandra and you should notice few interesting log messages indicating a replay of pending records from commit log to SSTables.
Once this operation is complete check the ~/cassandra/data/ecommerce/orders folders and you should not see the data inserted before the server crash.

Completed flushing /home/search/cassandra/data/system/compaction_history/system-compaction_history-jb-1-Data.db (237 bytes) for commitlog position ReplayPosition(segmentId=1395543674692, position=271)
You should be able to query the same from cqlsh as well.


cqlsh> select * from ecommerce.orders;

 orders_id | users_id | emails                         | first_name | last_name | order_comments                                                                     | order_log                                                                              | order_status | order_total | promotions_total | shipping_total | tax_total

-----------+----------+--------------------------------+------------+-----------+------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+--------------+-------------+------------------+----------------+-----------

      4321 |     1234 | {'a@gmail.com', 'b@gmail.com'} |  hariharan |  vadivelu | {'comment_1': '2013-06-13 11:42:12-0500', 'comment_2': '2013-06-13 11:42:12-0500'} | {'created_on': '2013-06-13 11:42:12-0500', 'last_updated': '2013-06-13 11:42:12-0500'} |      Pending |        20.3 |                5 |              2 |         1

       321 |      123 | {'a@gmail.com', 'b@gmail.com'} |  hariharan |  vadivelu | {'comment_1': '2013-06-13 11:42:12-0500', 'comment_2': '2013-06-13 11:42:12-0500'} | {'created_on': '2013-06-13 11:42:12-0500', 'last_updated': '2013-06-13 11:42:12-0500'} |      Pending |        20.3 |                5 |              2 |         1

(2 rows)

or you can also check the data in sstables using the ./bin/sstables utility

./sstable2json /home/search/cassandra/data/ecommerce/orders/ecommerce-orders-jb-1-Data.db

[

{"key": "000010e1","columns": [["1234:","",1395543482883000], ["1234:emails","1234:emails:!",1395543482882999,"t",1395543482], ["1234:emails:6140676d61696c2e636f6d","",1395543482883000], ["1234:emails:6240676d61696c2e636f6d","",1395543482883000], ["1234:first_name","hariharan",1395543482883000], ["1234:last_name","vadivelu",1395543482883000], ["1234:order_comments","1234:order_comments:!",1395543482882999,"t",1395543482], ["1234:order_comments:636f6d6d656e745f31","0000013f3e6a76a0",1395543482883000], ["1234:order_comments:636f6d6d656e745f32","0000013f3e6a76a0",1395543482883000], ["1234:order_log","1234:order_log:!",1395543482882999,"t",1395543482], ["1234:order_log:637265617465645f6f6e","0000013f3e6a76a0",1395543482883000], ["1234:order_log:6c6173745f75706461746564","0000013f3e6a76a0",1395543482883000], ["1234:order_status","Pending",1395543482883000], ["1234:order_total","20.3",1395543482883000], ["1234:promotions_total","5.0",1395543482883000], ["1234:shipping_total","2.0",1395543482883000], ["1234:tax_total","1.0",1395543482883000]]},

{"key": "00000141","columns": [["123:","",1395543482804000], ["123:emails","123:emails:!",1395543482803999,"t",1395543482], ["123:emails:6140676d61696c2e636f6d","",1395543482804000], ["123:emails:6240676d61696c2e636f6d","",1395543482804000], ["123:first_name","hariharan",1395543482804000], ["123:last_name","vadivelu",1395543482804000], ["123:order_comments","123:order_comments:!",1395543482803999,"t",1395543482], ["123:order_comments:636f6d6d656e745f31","0000013f3e6a76a0",1395543482804000], ["123:order_comments:636f6d6d656e745f32","0000013f3e6a76a0",1395543482804000], ["123:order_log","123:order_log:!",1395543482803999,"t",1395543482], ["123:order_log:637265617465645f6f6e","0000013f3e6a76a0",1395543482804000], ["123:order_log:6c6173745f75706461746564","0000013f3e6a76a0",1395543482804000], ["123:order_status","Pending",1395543482804000], ["123:order_total","20.3",1395543482804000], ["123:promotions_total","5.0",1395543482804000], ["123:shipping_total","2.0",1395543482804000], ["123:tax_total","1.0",1395543482804000]]}

]




Further Reading

https://wiki.apache.org/cassandra/MemtableSSTable



Saturday, February 22, 2014

Cassandra cluster setup in 2 minutes


Cassandra is a very popular member of distributed nosql dbms and is one of the most scalable, fastest, and very robust NoSQL database. The steps documented in this post are very basic in nature and you should consider tuning this for production grade cluster setup, however, this is good enough to smackdown and explore Cassandra's capabilities.

Basic Cluster Configuration:


Step 1: Setting up on a single node.


Replace the download url with your closest mirror.
Here is a sample command for version 2.5, this command will download, extract and rename the folder

wget http://mirrors.gigenet.com/apache/cassandra/2.0.5/apache-cassandra-2.0.5-bin.tar.gz && tar xvzf apache-cassandra-2.0.5-bin.tar.gz && mv apache-cassandra-2.0.5 cassandra25_node1_dc1

Step 2 (Optional): Edit configuration to modify following as per your standards.


conf/cassandra.yaml
data_file_directories:
    - /home/cassandra/data
commitlog_directory: /home/cassandra/data/commitlog
saved_caches_directory: /home/cassandra/saved_caches

conf/log4j-server.properties
log4j.appender.R.File: /home/cassandra/system.log

Repeat Step 1 and 2 in another machine/vdi

At this point we have a basic setup configured and you should be able to launch the nodes
independently, However, the nodes are not yet clustered and can not communicate with each other.
./bin/cassandra -f

Step 3: Cluster nodes


We need to make few more changes to our configuration file to let the nodes cluster
conf/cassandra.yaml
Provide a logical name for your cluster, E.g.
cluster_name: 'hari_cassandra_ring'

Seeds - For a cassandra node to participate in a cluster it has to know about one other node in the datacenter, this is called as "seed" node
in cassandra config file, this can be a comma separated list of servers, the documentation suggests to avoid a chicken and egg reference while defining the seed node
http://wiki.apache.org/cassandra/GettingStarted

E.g.
seeds: "192.168.0.119"

listen_address - This should be a private address that nodes connect to for inter node communication
for simple configuration we can leave this as the ip address or hostname of the node.
listen_address: 192.168.0.108

This is the rpc communication interface, for basic configuration we will leave this same as listen_address
rpc_address: 192.168.0.108

initial_token - This is another important aspect of cluster configuration and governs load distribution across nodes, for the purpose of this demo I will leave it as blank, you may refer cassandra documentation on how this can be defined based on the number of nodes within the data center.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configGenTokens_c.html

Step 3: Test cluster setup


You can now fire up one node at a time as follows "cassandra25_node1_dc1/bin/cassandra -f"

As you bring up more nodes we should be able to see similar messages indicating cluster node handshake.

INFO 22:06:20,974 Handshaking version with /192.168.0.108
 INFO 22:06:23,023 Node /192.168.0.108 is now part of the cluster
 INFO 22:06:23,047 Handshaking version with /192.168.0.108
 INFO 22:06:23,061 InetAddress /192.168.0.108 is now UP
 INFO 22:06:23,207 InetAddress /192.168.0.108 is now DOWN
 INFO 22:06:23,212 Handshaking version with /192.168.0.108
 INFO 22:06:24,037 InetAddress /192.168.0.108 is now UP
 INFO 22:06:53,449 [Stream #6e422a30-99e4-11e3-858d-e535fdb952e8] Received streaming plan for Bootstrap
 INFO 22:06:53,590 [Stream #6e422a30-99e4-11e3-858d-e535fdb952e8] Session with /192.168.0.108 is complete

Another command to check cluster / node status is nodetool command

./cassandra25_node1_dc1/bin/nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID                               Rack
UN  192.168.0.108  68.61 KB   256     100.0%            005d1cea-aa68-41b0-9a75-0051dd431930  rack1
UN  192.168.0.119  73.14 KB   256     100.0%            8ca40713-2eb5-44df-8a52-6cd838a492e3  rack1

Sunday, February 2, 2014

WCS SQL Cheat Sheet


I have compiled together a list of frequently used WCS SQL's by subsystem's
Please do leave a comment if you have some useful snippet that you would like to share with the community, I will update the corresponding GIST frequently to keep the list up to date.


Sunday, January 19, 2014

Faceting using Elasticsearch Aggregations


Facets are probably one of the most compelling reasons to use search engines for an ecommerce site, They can be used to render pages such as search results, browsing of categories etc, We will take a look at the real world example of how ES Aggregations (AKA Facets in older versions) can be used to build category navigation for an ecommerce site.
Similar results can be achieved using ES Facets, but in this blog we will look at "aggregations" which seems to be the way forward and has additional flexibility as compared to ES "Facets"

ES Aggregations will eventually replace ES Facets, but the nice thing is they have a lot in similarity so to learn or migrate to this new feature from ES "facets" to "aggregations" should not be complicated.
You can read more about the new feature on this GIT issue tracker https://github.com/elasticsearch/elasticsearch/issues/3300

Mock User Interface

The mock screen in this example is a typical category or search term navigation on an ecommerce site, at a very basic level you have 4 components of search results.

Facets - Facets help users to narrow down / or filter a search result, facet is built based on the search context.

Sort Order - Sort order impacts the search results components, it defines in what order the results should be listed on the page, for instance a user may sort by lowest to highest price or by product ratings.

Pagination of Results - Pagination component allows an user to navigate back and forth through a search results, this also guides the number of records that should be returned in ES query.

Search Result - Restricted to number of records that should be displayed on the landing page, perhaps this will be configurable based on your application needs.



Sample Data


We will begin with a schema less version of our products index, schema less support happens to be a nice thing about ES to get started quickly with design and testing, you can always add a schema latter on for production quality index and to better control the behavior, for the purpose of this demo we will go with following sample for our products index.


Sort Results

User can sort the search or category navigation results by "lowest to highest" price or by popular products, we can combine "sort" element with aggregations to achieve this.
In our example we have used "sort" by lowest to highest price as follows.

"sort" : [{"offerprice" : {"order" : "asc", "mode" : "avg", "ignore_unmapped":true, "missing":"_last"}},"_score"]

Sorting within Facets

The results within in the facets can be sorted using order types within the term definition, for instance in our example we are sorting the Brand  Facet by total count of each brand in descending order.
"order": { "_count" : "desc" }
Similarly we are sorting size facet in ascending using "order": { "_count" : "asc" }

Pagination Component

Pagination of results can be achieved by using "from" and "to" fields, these can be passed either in Query body or as a URL param, in our sample we have passed this in JSON body as follows.
"from" : 0, "size" : 5 or "from" : 5, "size" : 5 for the next page

They can also be used as URL params as follows.
curl -XGET 'http://localhost:9200/products/_search?pretty=true&from=5&to=5


Facet Selection

Search results are also influenced by the facet selection, for instance a user wants to see all products in men's category that are from Brand "diesel" and are of size "small", this can be achieved by using an "and" filter as follows.

....
"and": [
                {
                    "term": {
                        "Brand":"diesel"    
                    }
                    
                },
                {
                    "term": {
                        "size":"small"
                    }
                    
                }]
            }
...
...

What is missing

I could be completely wrong, but I have not been able to achieve following with in the ES Query, of course there are alternate ways of doing this within the application code, but I would love to see these added to ES aggregations in future.

#1 As you can see in our example, we are defining the price range in ES query, but then in a typical ecommerce model the price range may vary dynamically based on the browse category, so Instead of defining price ranges like, 1 to 5, 6 to 10.. etc. there should be a way to get an even spread by defining the number of buckets.


Wednesday, January 1, 2014

Installing Spring STS or Eclipse IDE on Linux Mint 16

The steps in this section are for Spring STS, but these can be used for eclipse as well.
The instructions involves installing Spring ide or eclipse from source and then creating a startup Menu option so it can be launched from the menu.

Download the latest version of eclipse or Spring STS

curl -O http://download.springsource.com/release/STS/3.4.0/dist/e4.3/spring-tool-suite-3.4.0.RELEASE-e4.3.1-linux-gtk-x86_64.tar.gz
tar -xvzf spring-tool-suite-3.4.0.RELEASE-e4.3.1-linux-gtk-x86_64.tar.gz

sudo mv springsource /opt
sudo chown -R root:root /opt/springsource
sudo chmod -R +r /opt/springsource
sudo cp /opt/springsource/sts-3.4.0.RELEASE/icon.xpm /usr/share/pixmaps/sts.xpm

Run this command to create a file with startup options for sts or eclipse.

sudo cat >> /usr/share/applications/sts.desktop<<EOF
[Desktop Entry]
Encoding=UTF-8
Name=Spring IDE
Comment=Spring IDE
Exec=/opt/springsource/sts-3.4.0.RELEASE/STS
Icon=/usr/share/pixmaps/sts.xpm
Terminal=false
Type=Application
Categories=GNOME;Application;Development;
StartupNotify=true
EOF


Navigate to Menu -> Programming -> Spring IDE
This should now launch eclipse or Spring ide instance.