Monday, September 5, 2016

Periscope data tutorial

1.     Data Base Supported by periscope:
·         MySQL: 3306
·         Postgres: 5432
·         Redshift: 5439
·         Vertica: 5433
·         SQL Server: 1433
·         Oracle: 1521
·         MemSQL: 3306
·         Sybase: 5000
·         Exasol: 8563
·         AMAZON RDS
·         Mircrosoft Azure's SQL Server

2.     Connecting to a SQL Server Database
·         Whitelist the TCP port to access the we can access this port through your firewall, that it responds to TCP
3.     Drilldowns in periscope data
Drill down is possible in periscope
4.     Dependent Filtersin periscope data
Creating dependent filters in Periscope data is very easy.
5.     Creating a Charts
Creating charts in periscope is are possible.
6.     Dashboard and Chart Refresh Mechanic
Configuring the refresh rate for dashboard and reports is possible
7.     Organizing Charts or Dashboard
Top down and theme based dashboard configurations is possible
8.     Pivot Tables creation in Periscope data
Pivot table creation from database is possible, it is an extension to all table data
9.     Auto Time Zone Conversions
 Periscope Data converts the timestamp used in the daterange clause to be in your site's display time zone, and compares it to the range selected
10.  Doubling Query Throughput with Read Replicas
Queries can be pointed at multiple replicas of the same master, and Periscope will distribute the queries over them to improve throughput
11.  Scheduled Email Reports
Emailing reports allows a user to schedule the time and day when they would like to send out a dashboard report. Dashboards that are emailed are always sent out with the default filters. 
12.  Easy Embedding With Shared Dashboards
You can quickly remove the Periscope Data header in your shared dashboard by adding ?embed=true at the end of your shared link.
13.  Embed API
Periscope's Embed API lets you programmatically specify filters and their values in the URL when embedding dashboards.
The embed API URLs take the following form:
https://www.periscopedata.com/api/embedded_dashboard?data={data}&signature={signature}
14.  Map Chart Overview
Map Charts can be plotted by selecting a column of latitude points and a column of longitude points in that order. The Map Chart type currently only supports latitude, longitude points. Options for plotting in Points, Bubbles, Heatmaps, and Clusters are available in the Settings Tab.
15.  Graph-like Chart Types and Specials Settings
In addition to generalized chart settings, individual graph-like charts have special settings based on their type.
16.  Mixing Chart Types
The "Series" tab can be used to set individual columns and segments as different chart types. The "Series Type" dropdown tab supports the following Chart Types: Area, Line, Bar, and Scatter. This lets users plot multiple segment types on the same chart.
17.  Heatmaps
18.  Cohort Grids
Cohort Grids are the Periscope Data equivalent of Excel Pivot Tables or Two-way Tables. One series is plotted horizontally, and another is plotted on the vertically. A third series of values is filled in for each cell to calculate the sum of their intersecting cells
19.  Tables
Tables are one of the most straightforward methods to display data and represent the underlying values returned by any query. 
20.  GeoJSON
GeoJSON are JSON blobs that describe arbitrary geometries in geographic space. They are great for plotting specific geographies or the routes of airplanes, ships and cars.
21.  Satellite Maps
To make a map chart, first select latitudes and longitudes and choose the Map chart type. Then change the map tiles to satellite imagery by going to the Settings tab and checking the "Satellite" checkbox in "Map Tile Settings"
22.  Clusters and Bubbles
The "Clusters" and "Bubbles" display settings for Map-based charts are useful for grouping together many data points on the map. 
23.  Cross-Database Joins
If you're on our Enterprise Plan then you have full access to perform cross-database joins. All that you need is for the tables you'd like to join to be on the Periscope Cache.
24.  Materialized SQL Views
SQL Views can be materialized on the Periscope Data Cache. This means that the query must be in Redshift-supported syntax. Once a view has been materialized for the first time, you can call it in a query and see results almost immediately. It is important to note that Periscope searches for view names prior to table names when building the query
25.  Dollars and Percentages
To format a numeric column as dollars, use ":$" in your formatter:
select [sum(price):$] from purchases;
This will give you a comma-delimited number with a dollar sign and two decimal points:
select [cast(count(distinct purchases.user_id) as numeric)/count(1):%]
from purchases left join users on user_id = users.id
26.  Composing SQL Formatters
SQL Formatters can be chained together, one after another. For example, to convert a timestamp to Pacific Time and then aggregate to date, do:
select [created_at:pst:date] from users.
27.  Querying the Periscope Cache
Queries automatically run on the Periscope Cache. If the query fails on the Cache, Periscope Data will run it directly against your database.
28.  Caching Your Data
Customers on the Enterprise Plan have the option of having their data cached on the Periscope Cache.
When Periscope Data caches a table, all of the table's rows are copied into the cache. Three different cache population strategies are used to keep the cache's version of your data in sync with your databases. Many tables use multiple caching strategies in parallel on different schedules to maximize freshness and minimize differences.
29.  Parameterized SQL Snippets
Parameterized snippets allow you to pass values directly into your snippets, so you can cut down on SQL writing. They also help to make your queries more dynamic, and give you a way to imitate passing variables into a function.
30.  SQL Views
Periscope's SQL Views let you define a table once that your whole team will use. If you're using the Periscope Data Cache on the Periscope Enterprise plan, Periscope optimizes the view by keeping it hot in the cache, and keeps it up-to-date at all times.
31.  SSO Overview
Periscope supports OktaOneLogin, and Google Apps SAML 2.0 SSO providers.
32.  Managing User Permissions
User groups let you manage you users' view and edit permissions easily. Admins can quickly set up different user types by modifying the user groups that users belong to and the settings of those groups.
33.  Dashboard Permissions
Once you’ve set up your user groups and user types, you are ready to manage your per dashboard permissions.
Dashboard permissions allow you to specify which user groups have view and edit permissions on a per dashboard basis. Admins and the dashboard owner always have both view and edit level access to dashboards.
34.  Two-Factor Authentication Overview
Periscope supports Two-Factor Authentication (2FA) via QR code. 2FA can be required for the entire organization, or turned on for a specific user. Recommended applications include Authy, Google Authenticator, or Windows Phone Authenticator.
35.  User Types
Periscope supports three different user types: admin, write-access user, and read-only user. These are set up primarily through the use of user groups.
36.  Connecting to a Private Network with an SSH Tunnel
To connect to a database in a private network, create an SSH tunnel using the following steps:
Download our public key and whitelist 107.23.195.228 and 54.236.224.46 through your firewall.

Friday, December 5, 2014

How to Configure Report Scheduler to work on your Jasper Server

Introduction
To leverage this capability to the greatest extent, it is important to get familiar with configurability of this feature. This tutorial covers some of the most commonly used configuration parameters associated with the scheduling feature.
Pre-Requisites
  • Access to the file system of and the ability to stop and start an operational installation of JasperReports Server
  • Access to a running outbound mail server.
Setting Up the Connection to your Outbound Email Server
In order for the scheduler to successfully send email notifications and distribute reports, it must be configured to connect to an outbound email server.

Step
Action
1
Open the following configuration file in your preferred text editor: /WEB-INF/js.quartz.properties
2
Set the value for the mail.sender.host property to your outbound mail server.
Examplereport.scheduler.mail.sender.host=outbound.yourdomain.com
3
Set the value for the mail.sender.protocol property to the protocol used by your outbound mail server.
Examplereport.schedulers.mail.sender.protocol=smtp
4
Set the value for the mail.sender.port property to the port that your outbound mail server listens on.
Examplereport.schedler.mail.sender.port=25
5
Set the mail.sender.username and mail.sender.password properties using valid login credentials to your outbound mail server.
Examplereport.scheduler.mail.sender.username=myusername report.scheduler.mail.sender.password=mypassword
6
Restart JasperReports Server for the configurations to take effect.
Setting the Outbound Email Address for the Scheduler
When the scheduler sends out emails, these emails need to come from a specific email address as the sender. This should be a valid email that is monitored for email bounce backs, etc. so that appropriate action can be taken.
The following steps cover how to define the email address that should be used as the sender of emails coming from the scheduler jobs.
Step
Action
1
Open the following configuration file in your preferred text editor: /WEB-INF/js.quartz.properties
2
Set the value for the mail.sender.from property to the email address you would like emails to come from.
Examplereport.scheduler.mail.sender.from=yourname@yourdomain.com
3
Restart JasperReports Server for the configurations to take effect.

Defining the URI for Emailed Report Links
When a link to a report is sent out in an email from a scheduler job, this link must refer the user back to the appropriate place on the server to access the output. In order for this to occur, the scheduler must be aware of the URI for JasperReports Server.
Example: If the default login page for JasperReports Server is accessed by going to http://www.yourdomain.com/jasperserver-pro/login.html, the URI for the scheduler should be set to http://www.yourdomain.com/jasperserver-pro.
The following steps outline how to set the URI properly.

Step
Action
1
Open the following configuration file in your preferred text editor: /WEB-INF/js.quartz.properties
2
Set the value for the web.deployment.uri property to the URI for your JasperReports Server installation.
Examplereport.scheduler.web.deployment.uri=http://www.yourdomain.com/jasperserver-pro
3
Restart JasperReports Server for the configurations to take effect.

Defining the Number of Scheduler Threads per Server
In order to keep the scheduler jobs from using too many resources on a particular server, it is important to control the number of threads used to process scheduled jobs.
  • The thread count controls the maximum number of concurrent scheduler jobs that will run on a server at once.
  • Increasing this count increases the throughput of schedules that can be processed, but may adversely impact the resources available for end-users that are interacting directly with the application.
The following steps cover how to configure the number of threads per server that will process scheduled jobs.

Step
Action
1
Open the following configuration file in your preferred text editor: /WEB-INF/js.quartz.base.properties
2
Set the value for the threadPool.threadCount property to the number of threads you want to process schedule jobs per server.
Exampleorg.quartz.threadPool.threadCount=3
3
Restart JasperReports Server for the configurations to take effect.

Defining the Job Misfire Threshold
The misfire threshold sets the amount of time that can pass before a missed or delayed scheduler job is skipped.
  • Increasing the misfire threshold will allow reports to run even if schedules are delayed due to server downtime or a backlog of jobs scheduled at the same time.
  • Decreasing this threshold is useful if you have jobs that are scheduled frequently.
Example: If a report is scheduled every hour, you may not want the 8am report to run if it is already 9am and the 9am report would give you the same information.
The following steps show how to set the misfire threshold.

Step
Action
1
Open the following configuration file in your preferred text editor: /WEB-INF/js.quartz.base.properties
2
Set the value for the jobStore.misfireThreshold property to the number of milliseconds beyond the scheduled time that a delayed schedule job should be skipped.
Example: For 30 minutes, org.quartz.jobStore.misfireThreshold=1800000
3
Restart JasperReports Server for the configurations to take effect.

Disabling the Report Scheduler on a Server
Under certain circumstances, it may be necessary to disable the report schedules completely on a server.
Example: It may make sense to have dedicated servers for end users separate from those that execute scheduler jobs.
The following steps cover how to disable the report scheduler on a particular server.

Step
Action
1
Open the following configuration file in your preferred text editor: /WEB-INF/applicationContext-report-scheduling.xml
2
Find the following lines:
<bean class="com.jaspersoft.jasperserver.api.engine.scheduling.quartz.QuartzSchedulerControl"
      depends-on="reportSchedulingService"
      id="quartzSchedulerControl"
      init-method="start">
    <property name="scheduler" ref="quartzScheduler" />
</bean>
3
Remove the init-method definition – init-method=”start”
4
Restart JasperReports Server for the configuration to take effect.

How to Configure Report Scheduler to work on your Jasper Server


Another BI technology I'm exploring right now is Jaspersoft. It's a powerful Open Source BI tool which is a bit similar to Oracle Hyperion. And since it uses Open Source technology, this means it's way cheaper and easier to sell. lol

Going back to the topic, I'd like to share how easy it is to configure your Jasper Server to use it's report scheduler facility.

Step 1 : 
Look for your Web-Inf folder from your Jaspersoft folder. I'm using Jaspersoft's later version so mine is located at ..\jasperserver-pro\apache-tomcat\webapps\jasperserver-pro\WEB-INF.


Step 2:
At the js.quartz.properties file, add your smtp and email credentials. If you are using gmail, be sure to use the 587 port as 465 port does not work well with gmail.

For example :
report.scheduler.mail.sender.host= your mail server(smtp.gmail.com for gmail)
report.scheduler.mail.sender.username= your email's username
report.scheduler.mail.sender.password= your email's password
report.scheduler.mail.sender.from= your email address
report.scheduler.mail.sender.protocol=smtp
report.scheduler.mail.sender.port=587

report.scheduler.web.deployment.uri=http://localhost:8080/jasperserver-pro (This url should be the same url you see when you open your Jasperserver login page. You only need to remove login.php)


Step 3 (Optional) : 

If you'de like to increase your thread pool, you may do so by modifying the file js.quartz.base.properties .

Locate the line org.quartz.threadPool.threadCount and increase the thread count.

For example:
org.quartz.threadPool.threadCount = 3


Step 4 :
At the applicationContext-report-scheduling.xml file, locate the bean id reportSchedulerMailSender . Modify and add the following prop key under the property javaMailProperties.

For Example:

prop key="mail.smtp.auth">true
prop key="mail.smtp.starttls.enable">true


Step 5:
Restart your Jasperserver and schedule your report



"repo:ILN_REPORT_dp_summary.jrxml"


$P{SUBREPORT_DIR} + "ILN_REPORT_dp_summary.jasper"

Sunday, November 2, 2014

Data Warehousing - Testing

Introduction

Testing is very important for data warehouse systems to make them work correctly and efficiently. There are three basic level of testing that are listed below:

  •  Unit Testing
  •  Integration Testing
  • System testing

UNIT TESTING


  • In the Unit Testing each component is separately tested.
  • In this kind of testing each module i.e. procedure, program, SQL Script, Unix shell is tested.
  • This tested is performed by the developer.

INTEGRATION TESTING


  •  In this kind of testing the various modules of the application are brought together and then tested against number of inputs.
  • It is performed to test whether the various components do well after integration.

SYSTEM TESTING


  • In this kind of testing the whole data warehouse application is tested together.
  • The purpose of this testing is to check whether the entire system work correctly together or not.
  • This testing is performed by the testing team.
  • Since the size of the whole data warehouse is very large so it is usually possible to perform minimal system testing before the test plan proper can be enacted.

Test Schedule


  • First of all the Test Schedule is created in process of development of Test Plan.
  • In this we predict the estimated time required for the testing of entire data warehouse system.

DIFFICULTIES IN SCHEDULING THE TESTING


  • There are different methodologies available but none of them is perfect because the data warehouse is very complex and large. Also the data warehouse system is evolving in nature.
  • A simple problem may have large size of query which can take a day or more to complete i.e. the query does not complete in desired time scale.
  • There may be the hardware failure such as losing a disk, or the human error such as accidentally deleting the table or overwriting a large table.
Note: Due to the above mentioned difficulties it is recommended that always double the amount of time you would normally allow for testing.

Testing the backup recovery

This is very important testing that need to be performed. Here is the list of scenarios for which this testing is needed.

  • Media failure.
  • Loss or damage of table space or data file
  • Loss or damage of redo log file.
  • Loss or damage of control file
  • Instance failure.
  • Loss or damage of archive file.
  • Loss or damage of table.
  • Failure during data failure.

Testing Operational Environment

There are number of aspects that need to be tested. These aspects are listed below.

  • Security - A separate security document is required for security testing. This document contain the list of disallowed operations and devising test for each.
  • Scheduler - Scheduling software is required to control the daily operations of data warehouse. This need to be tested during the system testing. The scheduling software require interface with the data warehouse, which will need the scheduler to control the overnight processing and the management of aggregations.
  • Disk Configuration. - The Disk configuration also need to be tested to identify the I/O bottlenecks. The test should be performed with multiple times with different settings.
  • Management Tools. - It is needed to test all the management tools during system testing. Here is the list of tools that need to be tested.
o    Event manager
o    system Manager.
o    Database Manager.
o    Configuration Manager
o    Backup recovery manager.

Testing the Database

There are three set of tests that are listed below:
  • Testing the database manager and monitoring tools. - To test the database manager and the monitoring tools they should be used in the creation, running and management of test database.
  • Testing database features. - Here is the list of features that we have to test:
o    Querying in parallel
o    Create index in parallel
o    Data load in parallel
  • Testing database performance. - Query execution plays a very important role in data warehouse performance measures. There are set of fixed queries that need to be run regularly and they should be tested. To test ad hoc queries one should go through the user requirement document and understand the business completely. Take the time to test the most awkward queries that the business is likely to ask against different index and aggregation strategies.

Testing The Application


  •  All the managers should be integrated correctly and work in order to ensure that the end-to-end load, index, aggregate and queries work as per the expectations.
  • Each function of each manager should work in correct manner.
  • It is also necessary to test the application over a period of time.
  • The week-end and month-end task should also be tested.

Logistic of the Test

There is a question that What you are really testing? The answer to this question is that you are testing a suite of data warehouse application code.
The aim of system test is to test all of the following areas.

  • Scheduling Software
  • Day-to Day operational procedures.
  • Backup recovery strategy.
  • Management and scheduling tools.
  • Overnight processing
  • Query Performance
Note: The most important point is to test the scalability. Failure to do so will leave us a system design that does not work when the system grow.

Data Warehousing - Future Aspects

Following are the future aspects of Data Warehousing.

  • As we have seen that the size of the open database has grown approximately double the magnitude in last few years. This change in magnitude is of greater significance.
  • As the size of the databases grow , the estimates of what constitutes a very large database continues to grow.
  • The Hardware and software that are available today do not allow to keep a large amount of data online. For example a Telco call record require 10TB of data to be kept online which is just a size of one month record. If It require to keep record of sales, marketing customer, employee etc. then the size will be more than 100 TB.
  • The record not only contain the textual information but also contain some multimedia data. Multimedia data cannot be easily manipulated as text data. Searching the multimedia data is not an easy task whereas the textual information can be retrieved by the relational software available today.
  •  Apart from size planning, building and running ever-larger data warehouse systems are very complex. As the number of users increases the size of the data warehouse also increases. These users will also require to access to the system.

  • · With growth of internet there is requirement of users to access data online.