Buddha Tree

Wednesday 1 October 2014

Python – Excellent for Data analysis if know R

If we are in analytics world, then we must know about the tools called SAS, SPSS, R and many more. I have experience of working on three tools SAS, R, and Python. SAS is always first choice for data analysis because of ease of use, well acceptability and plenty of experts in market.

Since this is the era of Freeware, so I will talk about R and Python. The primary advantage of R and Python is that it’s free and company doesn’t need to invest money. Besides, the all advance analytical methodologies are available in Python or R for which you need to pay huge money in paid software.

Why to choose Python over R programming:

1. Multiprocessing support in Python. In R, its single processing so complex methodology takes longer time to execute. R is highly depended on RAM. Sometime 1 GB data handling with 2GB RAM system is quite tough. Example in Random Forest, some study has said that Python takes 1/10^th time compare to R.

2. Memory management: In Python you don’t require very high configuration computer. Whereas R is highly depended on RAM. Example if a table of 2GB or more, then you can’t easily operate in R. But that can be done easily in Python.

3. Even more number of readymade functions available in Python to do Data preparation and charting compare to R.

Python has long list of functions to perform multiple tasks at all the stages of Data mining project. I have explained below few examples. For statistical modeling you need below libraries.

1. import pandas as pd

2. import statsmodels.api as sm

3. import pylab as py

4. import numpy as np

Here are few functions available for each stage of Data Mining. I have also mentioned similar methods in SAS or R.

1. Reading data from sources (df is data frame):

a. df = pd.read_csv(“path of file”/”file_name.csv”)

b. similarly read_table, read_fwf, read_clipboard, XML, MongoDB

c. Lots of option to customize the reading of files like header, no of rows to read, delimiter etc.

2. Data Cleaning

a. Df.describe() – Like proc summary in SAS or describe() in R

b. df.get_dtype_counts() – list the number of columns on each data type.

c. pd.crosstab(df['Default'], df['Rating_score'], rownames=['Default']) – like table() in R or like Proc freq in SAS.

d. df.hist(), pl.show() – Draw the data distribution of all the columns. Like Proc Univariate in SAS.

e. dummy_rating = pd.get_dummies(df['rating_score'],prefix='R')

f. Head() – to see 1^st 5 rows like haed() in R

g. Binning of data: New_col = pd.cut(data, 4) # Cut into quartiles

3. Missing and outlier treatment

a. Number of method to identify and teat missing value with any value like by mean, mode or median or any other value. Few examples:

· df['col_name'].fillna('missing') - Fill NULL with any usre defined value

· df.fillna(method='pad') - Fill NULL with above row value.

· df.dropna(axis=0) -drop rows have any null value

· df.dropna(axis=1) -drop cols have any null value

b. Easy to find the outlier values in data by visual presentation, or by filtering of some characteristic.

· df[(np.abs(df) > 10000).any(1)] – selecting rows having values exceeding some value.

· Replacing list of values, could be missing or any outlier-df.replace([NaN, 100], [nan, 500])

4. Creating Train and Test dataset

5. Different way to randomly select data for train and test

a. numpy.random.shuffle(df) – reshuffle the dataset

b. train, test = df[:80,:], df[80:,:] – in 80/20 ration.

· Other method, Spliting in 80/20 ration:

c. df_train, df_test = train_test_split(df, test_size=0.2, random_state=0)

6. Running predictive modeling

a. logit = sm.Logit(df['default'], df[train_dataframe])

b. clf = RandomForestClassifier(n_estimators=10, max_depth=None, min_samples_split=1, random_state=0); >>> scores = cross_val_score(clf, “response variable”,”Predictor variables list”)

c. Has all the advance analytics methodology support.

Wednesday 30 July 2014

Selenium: How to handle Stale Element Exception error

Common Causes

A stale element reference exception is thrown in one of two cases:

The element has been deleted entirely or element/page has been refreshed
The element is no longer attached to the DOM.

Solution

The most simple solution is to use the Java PageFactory, this will create a proxy WebElement that will find the element every time you use it (There is a slim change that the element will be released in the couple of milliseconds between it being found and you performing an action on it, in this case it is suggested to use an explicit wait to wait for the element to become stale before finding it again).
Use try catch to handle this exception and reload the element in catch block.
Add a short sleep between FindElement & action method..

Thursday 10 July 2014

Cucumber vs TestNG : Which is better?

I have been a long time user of TestNG. Few months back, I started exploring about Cucumber. It was a wish from my client to create automation framework using Cucumber. I searched on web & found many blogs on Cucumber & its implementation. I started with assumption that Cucumber will replace TestNG. After working for few months, I came to following conclusion they are as follows -

Cucumber

TestNG

Cucumber is a collaboration tool, which lets non-technical people write executable specifications. Those executable specifications test your app from the outside - like a black box.
Cucumber is not meant to be used as a unit testing tool.
It allows to write automated acceptance tests, functional requirements and software documentation into one format that would be understandable by non-technical people as well as testing tools
You can implement your tests using the same language you use to discuss them with the business.
Cucumber adds the overhead of plain English (or other native language) to executable code conversion
Good for acceptance testing

TestNG are unit testing tools. They are great for testing individual classes, but not great for executable specifications that are readable (and writeable) by non-technical people.
It facilitates to test individual classes.
You can group tests using tags.
TestNG supports a lot of complicated practices like priorities, grouping, listener etc.
Useful when you have to automate large number of test case

Monday 10 February 2014

How to Verify Broken Links using Selenium WebDriver

This program reads all link from a web page, sends open connection request to URL, checks for response code. Based on response code, broken link is identified & get printed on console.

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;
import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.testng.annotations.Test;
public class Links_Broken {

    @Test
      public void saveAllLinks(){
          FirefoxDriver firefoxDriver = new FirefoxDriver(); //Starts Firefox browser

          firefoxDriver.navigate().to("http://google.co.in"); //opens Web Page

           List <WebElement>linksList = firefoxDriver.findElements(By.tagName("a")); // finds link elements & stores in a list
//traverse each link from collection
          for(WebElement linkElement: linksList){
              String link =linkElement.getAttribute("href");
              if(link!=null){
                verifyLinkActive(link);
              }
          }
          firefoxDriver.quit(); // close Firefox browser
      }

      /**
       * This method verifies that link is active
       * @param link - link(URL)
       * @return - true/false
       */
       public void verifyLinkActive(String linkUrl){
          try {
             URL url = new URL(linkUrl);
             HttpURLConnection httpURLConnect=(HttpURLConnection)url.openConnection();
             httpURLConnect.setConnectTimeout(3000);
             httpURLConnect.connect();
             if(httpURLConnect.getResponseCode()==200){
                 System.out.println(linkUrl+" - "+httpURLConnect.getResponseMessage());
              }
if(httpURLConnect.getResponseCode()==HttpURLConnection.HTTP_NOT_FOUND)
{
                 System.out.println(linkUrl+" - "+httpURLConnect.getResponseMessage() + " - "+ HttpURLConnection.HTTP_NOT_FOUND);
              }
          } catch (MalformedURLException e) {
              e.printStackTrace();
          } catch (IOException e) {
              e.printStackTrace();
          } catch (Exception e) {
              e.printStackTrace();
          }
      }
}

Tuesday 4 February 2014

Working with Selenium WebDriver and Cucumber without Maven

Download jar files

List of jar file needed to configure Cucumber is as follows -

1. cucumber-core-1.1.5.jar

2. cucumber-html-0.2.3.jar

3. cucumber-java-1.1.5.jar

4. cucumber-junit-1.1.5.jar

5. cucumber-jvm-deps-1.0.3.jar

6. gherkin-2.12.2.jar

7. hamcrest-core-1.3.jar

8. jchronic-0.2.6.jar

9. Junit-4.11.jar

10. Selenium-server-standalone-2.39.0.jar

Note - If you already have above listed jar file, you may skip Download jar file step

Create Cucumber Project in Eclipse

1. Go to [File -> New -> Java Project]

2. Enter project name in [Project Name] field

3. Click on [Finish] button

4. Delete [src ] folder of this project

5. [Right click on Project folder] & select [Source Folder]

6. Enter value [src/test/resources] in [Folder Name] field

7. Click on [Finish] button

8. [Right click on Project folder] & select [Source Folder]

9. Enter value [src/test/java] in [Folder Name] field

10. Click on [Finish] button

11. [Right click on Project folder] & select [Source Folder]

12. Enter value [src/main/java] in [Folder Name] field

13. Click on [Finish] button

14. Add above downloaded jar file into Build path of your Selenium Project.

Pages