MySQL to AWS RedShift Replication using Talend for PowerBI.

Talend is powerful and so many ways. It’s open source and most of the functionality is free to use. There is a powerful community which provides excellent support. We used AWS Redshift as a backend for the data warehouse. Client had a requirement to test the PowerBI for their IOT data from the sensor.

We initially set the maximum capacity to just one but later on increased it to 12 due to poor performance with PowerBI. We also ended up tuning several queries coming from PowerBI which helped tremendously.

There are other considerations such as how much load you anticipate? What is the size of your warehouse going to be in next year or two? How many users will be using PowerBI? Are the users going to be in the system all day or just periodic?

Based on all these questions, you might want to adjust the size to begin with. Otherwise, you can test with few users and create a benchmark. Most of the customers we have worked with wants to start with minimal and increase capacity as they go. If you start with big, AWS cost might surprise you for such a large EC2 instance.

So in our opinion, you should start with smaller cluster and increase the capacity based on the performance.

This is the Redshift cluster

Below Windows Machine holds all the client softwares. Such as Talend, PowerBI and TeamSQL.

This Machine was used mainly for Talend to schedule a windows job to pull the data from MySQL.

We created below Jobs in Talend to pull data from MySQL which was on premise.

Under Metadata below connections were created.

Task Scheduler – There is a batch job program named redshift-sync which runs every hour.

TeamSQL was configured to query both MySQL and RedShift.

Finally PowerBI was configured to access AWS RedShift.

Contact us for more