breaking into the unknown…

Leave a comment

raw sql for rails joins

There are time when ActiveRecord query will not server our purpose and you needs to write raw sql query.

Lets understands with examples when such case arise and how to write raw sql. We will also see how irresponsibly written code crash your system.

One of our application facing a lots of out of memory exception in Heroku and also some of payments not at all getting render in UI as Heroku kills any process which take more than 30 seconds.

Now question is why any request will need more than 30 seconds to load ?.

I was given the task to find the issue and fix it.

The model association in the Application look something as below:

class Payment < ActiveRecord::Base
  has_many :ledgers, dependent: :destroy, autosave: true
  has_many :invoices, through: :ledgers, autosave: true
  has_many :remittances, dependent: :destroy, autosave: true

class Ledger < ActiveRecord::Base
  belongs_to :invoice
  belongs_to :payment

class Invoice < ActiveRecord::Base belongs_to :customer_information has_many :ledgers has_many :payments, :through => :ledgers

class Remittance < ActiveRecord::Base
  belongs_to :payment

The business requirement look straight forward, say you have raised invoice for any service and than client will send you Payment for a bunch of invoices. Ledger is the join table between invoice and payment.

Remittance is something like you received extra payment which you return back to Client.
Here the Remittance table do not have direct link with Invoice table. The money returned may be part on a invoice or without it. The Remittance table do have a name column which keep Invoice number and possibly match to a invoice in Invoice table.

Now let see, how current code is written and Problem with it.

Case 1 : Display all the invoices with a payment on UI

Existing Code look as below:

payment_ledgers = payment.ledgers.includes(:invoice).where(matched: true)
payment_ledgers.each do |ledger|
  payment_invoice = ledger.invoice
  if payment_invoice && payment_invoice.remaining_amount > 0        
    customer = payment_invoice.customer_information.customer
    customer_name = customer.legal_name.present? ?  customer.legal_name :         
    date ="%m/%d/%Y") if payment_invoice.try(:date)
    invoice_hash = {
      number: payment_invoice.number, date: date,
      customer_name: customer_name, remaining_amount: payment_invoice.remaining_amount.to_f.round(2),
      total_amount: payment_invoice.original_amount,applied_amount: payment_invoice.applied_amount,
    invoice_hash.merge!({ledger_id:, ledger_applied_amount: ledger.applied_amount.to_f.round(2)})
    invoices << invoice_hash

So what developer did here is that. He fetched all the ledgers with a payment and eager loaded invoices so he do not needs to hit a query again on DB.He looping over them as he needs data from 2 other tables also – invoices and customer_informations.

But, this approach have below issue:
It will return say 500 ledgers and than 500 invoices also. So there will be 1000 object occupying your memory. Worst part is you are looping over the ledgers, so the memory will be not released till process complete.

The optimized code look as below:

# Find Payment's Ledger Invoices whose remaining amount greater than zero
# Query will pluck required columns from all the three tables - Ledger, Invoice, CustomerInformation
invoices_data = Ledger.includes(invoice: :customer_information)
                .where("ledgers.payment_id = ?  and matched = ? and invoices.remaining_amount > ?",, true, 0)
                .pluck('', 'ledgers.applied_amount', 'customer_informations.legal_name', '',
                 '', 'invoices.number', '', 'invoices.remaining_amount', 'invoices.original_amount', 'invoices.applied_amount'
#Convert invoices_data to array of hash
invoices_data.each do |inv_data|
  invoices << {
    ledger_id: inv_data[0], 
    ledger_applied_amount: inv_data[1].to_f.round(2),
    customer_name: inv_data[2].present? ? inv_data[2] : inv_data[3],
    id: inv_data[4],
    number: inv_data[5], 
    date: inv_data[6],
    remaining_amount: inv_data[7].to_f.round(2),
    total_amount: inv_data[8].to_f.round(2),
    applied_amount: inv_data[9].to_f.round(2)

The above Code with includes generate below sql query:
#The Sql Query above will generate is:
SELECT ledgers.applied_amount, invoices.number, customer_informations.legal_name FROM “ledgers” LEFT OUTER JOIN “invoices” ON “invoices”.”id” = “ledgers”.”invoice_id” LEFT OUTER JOIN “customer_informations” ON “customer_informations”.”id” = “invoices”.”customer_information_id” WHERE “ledgers”.”payment_id” = $1 [[“payment_id”, 4]]

So what we do is:

  1. we removed any looping in ruby code.
  2. We fetched the needed data from 3 tables from DB query itself.
  3. We are not fetching ActiveRecord object but set of info as strings.

You can see that we manged to write DB query with ActiveRecord itself. We do not need any raw sql here…. Rails is great !

Case 2 : Show all Remittances with a payment on UI

Existing Code look as below:

payment_rmh = []
payment.remittances.order('remittance_sequence_number').each do |rmh|
  invoice_id = Invoice.find_by('lower(number) = ? ', if
  payment_rmh << {name:, amount: rmh.amount,invoice_id: invoice_id}

So what developer is doing here is first finding all remittances with Payment, looping over each remittance. Finding Invoice whose number match with remittance name.

Problem here is that Your DB query will become proportional to number of remittances. If 1000 remittances you will make 1000 call to DB. Your system start chocking with memory and execution time will increase.

NOTE: Never ever make DB query within a loop

Now, lets try to optimized the code by eager loading approach in last case:

Remittance.includes(invoice: :customer_information)
Remittance Load (31.0ms) SELECT “remittances”.* FROM “remittances”
ActiveRecord::AssociationNotFoundError: Association named ‘invoice’ was not found on Remittance; perhaps you misspelled it?

So we can see that, includes will not work here unlike last case, as there is no direct association between remittances and invoices. So ActiveRecord failed to generated the Query for us.

Here we need to move out of ActiveRecord queries and build our own raw sql query.

The optimized code look as below:

# Find Payment RMH
rmh_raw_query = "SELECT, ri.amount, FROM remittances ri LEFT JOIN invoices inv ON LOWER( = LOWER(inv.number) WHERE ri.payment_id = #{} ORDER BY remittance_sequence_number ASC"
rmh_records = ActiveRecord::Base.connection.exec_query(rmh_raw_query).rows
payment_rmh ={|rme| {name: rme[0], amount: rme[1].to_f.round(2),invoice_id: rme[2]}}

Here the way to execute raw sql query with rails to call the exec_query method of ActiveRecord::Base.connection.

We have called row method on result returned by exec_query method which basically will return all column you need as an array.

With the above optimization :

  1. 1000 remittances took less than half seconds as compared to 20 seconds earlier.
  2. It will be independent of number of children.Unlike earlier approach, it will always take less than 1 second even if no of remittances reach 1 lakhs or more.
  3. There will be no load on memory.

1 Comment

escaping special characters & < etc in xml

some charecter like & , > or < etc have special meaning in xml structure. So If you are generating xml payload from user inputs for some API call, you must take care of these characters, or the generated xml payload will become corrupt and break your application.

This is sample code from one of my library which generate xml payload from user input to submit the content to an other API

require 'rest_client'
require 'base64'

module LoanPath
  class Application
    def apply_credit(detail)
      uri = ""
      payload =<<-eos
      <soap11:Envelope xmlns:soap11="">
     rest_resource =, {:user => "arun", :password => "secret", 
                                                   :timeout => 90 s, :open_timeout => 90s}) payload, :content_type => "application/xml"

I trigger, apply credit method in my controller as below # params contain value filled by user for name city state etc.

Now, say if a user fill the street address as “23 & main street”, the application will crash due to internal server error, which is basically due to generation of wrong XML.

The solution is to, replace all the special character with there correct XML notation

"&" with "&amp";
"<"; with "&lt";  
">" with "&gt"; 
and so on.....

Well, you do not need to do much hard work. You just need to encode all the users value By using REXML Text module.

NOTE : ruby have its own library called REXML to do all the stuff with XML. just look up this document whenever, you need some solution for your XML need.

So the correct code will look like this

require 'rest_client'
require 'base64'

module LoanPath
  class Application
    def apply_credit(detail)
      uri = ""
      payload =<<-eos
      <soap11:Envelope xmlns:soap11="">
                  <firstName>#{[:first_name], false, nil, false)}</firstName>
                  <lastName>#{[:last_name], false, nil, false)}</lastName>
                  <street>#{[:street], false, nil, false)}</street>
                  <city>#{[:city], false, nil, false)}</city>
                  <state>#{[:state], false, nil, false)}</state>
     rest_resource =, {:user => "arun", :password => "secret", 
                                                   :timeout => 90 s, :open_timeout => 90s}) payload, :content_type => "application/xml"

So basically, you need to replace the value of the node with below method of RXML. see the detail documentation here“your string”, false, nil, false)

Below, are the description of the parameter above.

First Argument

“your string”  is the first argument and will hold your string you want to escape. we careful that your string can be empty but not nil

1.9.3p194 :022 > nil, false, nil, false) # will throw below error  for nil
NoMethodError: undefined method `type’ for nil:NilClass

Also, the first argument should be a string. Not apply it to integer 1234, false, nil, false)
NoMethodError: undefined method `type’ for 1234:Fixnum

Second Argument

It have boolean value for white space, if set true, white space will be retained, for false it will be removed.

1.9.3p194 :018 > “&     arun”, true, nil, false) # see that white space is retained in the string
=> “&     arun”

Third Argument:

Keep it nil or pass the parent object. I always keep it nil, have never tried with parent object

Fourth Argument:

raw (nil) This argument can be given three values.

If true, then the value of used to construct this object is expected to contain no unescaped XML markup, and REXML will not change the text.

1.9.3p194 :011 > “Iam <&”, false, nil, true ) # setting last argument true will not escape < & , so throw error
RuntimeError: Illegal character ‘<‘ in raw string “Iam <&”

If this value is false, the string may contain any characters, and REXML will escape any and all defined entities whose values are contained in the text.

1.9.3p194 :011 > “Iam <&”, false, nil, true )  # setting last argument false will escape < &  etc
“Iam <&”

If this value is nil (the default), then the raw value of the parent will be used as the raw value for this node. If there is no raw value for the parent, and no value is supplied, the default is false.

NOTE : always set the fourth argument to false if want to escape the special characters

Leave a comment

json parsing in ruby

Sometime we want to convert json response to  Hash or array , so that we can easily use it. Ruby provide, json parsing as its core feature. The detail documentation is available here. Let us assume that our json response is stored in sample_json file at  /db/data/sample_json of your project root. Some sample data is available here.  Add one of the sample data to this file, I used the first one. Now, we will parse it on the console

$rails c #move to the console

1.9.3p194 :005 >json_file =“db”, “data”, “sample_json”)) # It will open the json file

1.9.3p194 :006 > json_data = # file value is read and stored in json data, this is basically our response, which you will get from some API cals, some methods etc.

1.9.3p194 :007 > a=JSON.parse(json_data) # your response wil be as below

=> {“glossary”=>{“title”=>”example glossary”, “GlossDiv”=>{“title”=>”S”, “GlossList”=>{“GlossEntry”=>{“ID”=>”SGML”, “SortAs”=>”SGML”, “GlossTerm”=>”Standard Generalized Markup Language”, “Acronym”=>”SGML”, “Abbrev”=>”ISO 8879:1986”, “GlossDef”=>{“para”=>”A meta-markup language”, “GlossSeeAlso”=>[“GML”, “XML”]}, “GlossSee”=>”markup”}}}}}

You can also pass, different option to the parser as below

1.9.3p194 :023 > b=JSON.parse(json_d, :max_nesting => false,:allow_nan => true, :symbolize_names => true)
=> {:glossary=>{:title=>”example glossary”, :GlossDiv=>{:title=>”S”, :GlossList=>{:GlossEntry=>{:ID=>”SGML”, :SortAs=>”SGML”, :GlossTerm=>”Standard Generalized Markup Language”, :Acronym=>”SGML”, :Abbrev=>”ISO 8879:1986″, :GlossDef=>{:para=>”A meta-markup language”, :GlossSeeAlso=>[“GML”, “XML”]}, :GlossSee=>”markup”}}}}}

The complete list of option for parse method is as below

parse(source, opts = {}) #Parse the JSON document source into a Ruby data structure 
and return it.

opts can have the following keys:

max_nesting: #The maximum depth of nesting allowed in the parsed data structures. 
Disable depth checking with :max_nesting => false. It defaults to 19.

allow_nan: #If set to true, allow NaN, Infinity and -Infinity in defiance of RFC 4627
to be parsed by the Parser. This option defaults to false.

symbolize_names: #If set to true, returns symbols for the names (keys) in a JSON object.
Otherwise strings are returned.Strings are the default.

create_additions: #If set to false, the Parser doesn't create additions even if a matching class 
and ::create_id was found.This option defaults to true.

object_class: #Defaults to Hash

array_class: #Defaults to Array

Leave a comment

Sample json data

Sometime, I want to experiment with json data and feel the need of some rough data to work on. I used to generate the data myself while creating array or hash or array of hash and then converting that to json data by using ruby to_json method which basically convert array or hash object to a json object. However, I come across some sample data here and copied here for my future use.


    "glossary": {
        "title": "example glossary",
		"GlossDiv": {
            "title": "S",
			"GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
					"SortAs": "SGML",
					"GlossTerm": "Standard Generalized Markup Language",
					"Acronym": "SGML",
					"Abbrev": "ISO 8879:1986",
					"GlossDef": {
                        "para": "A meta-markup language",
						"GlossSeeAlso": ["GML", "XML"]
					"GlossSee": "markup"


{"web-app": {
  "servlet": [   
      "servlet-name": "cofaxCDS",
      "servlet-class": "org.cofax.cds.CDSServlet",
      "init-param": {
        "configGlossary:installationAt": "Philadelphia, PA",
        "configGlossary:adminEmail": "",
        "configGlossary:poweredBy": "Cofax",
        "configGlossary:poweredByIcon": "/images/cofax.gif",
        "configGlossary:staticPath": "/content/static",
        "templateProcessorClass": "org.cofax.WysiwygTemplate",
        "templateLoaderClass": "org.cofax.FilesTemplateLoader",
        "templatePath": "templates",
        "templateOverridePath": "",
        "defaultListTemplate": "listTemplate.htm",
        "defaultFileTemplate": "articleTemplate.htm",
        "useJSP": false,
        "jspListTemplate": "listTemplate.jsp",
        "jspFileTemplate": "articleTemplate.jsp",
        "cachePackageTagsTrack": 200,
        "cachePackageTagsStore": 200,
        "cachePackageTagsRefresh": 60,
        "cacheTemplatesTrack": 100,
        "cacheTemplatesStore": 50,
        "cacheTemplatesRefresh": 15,
        "cachePagesTrack": 200,
        "cachePagesStore": 100,
        "cachePagesRefresh": 10,
        "cachePagesDirtyRead": 10,
        "searchEngineListTemplate": "forSearchEnginesList.htm",
        "searchEngineFileTemplate": "forSearchEngines.htm",
        "searchEngineRobotsDb": "WEB-INF/robots.db",
        "useDataStore": true,
        "dataStoreClass": "org.cofax.SqlDataStore",
        "redirectionClass": "org.cofax.SqlRedirection",
        "dataStoreName": "cofax",
        "dataStoreDriver": "",
        "dataStoreUrl": "jdbc:microsoft:sqlserver://LOCALHOST:1433;DatabaseName=goon",
        "dataStoreUser": "sa",
        "dataStorePassword": "dataStoreTestQuery",
        "dataStoreTestQuery": "SET NOCOUNT ON;select test='test';",
        "dataStoreLogFile": "/usr/local/tomcat/logs/datastore.log",
        "dataStoreInitConns": 10,
        "dataStoreMaxConns": 100,
        "dataStoreConnUsageLimit": 100,
        "dataStoreLogLevel": "debug",
        "maxUrlLength": 500}},
      "servlet-name": "cofaxEmail",
      "servlet-class": "org.cofax.cds.EmailServlet",
      "init-param": {
      "mailHost": "mail1",
      "mailHostOverride": "mail2"}},
      "servlet-name": "cofaxAdmin",
      "servlet-class": "org.cofax.cds.AdminServlet"},

      "servlet-name": "fileServlet",
      "servlet-class": "org.cofax.cds.FileServlet"},
      "servlet-name": "cofaxTools",
      "servlet-class": "org.cofax.cms.CofaxToolsServlet",
      "init-param": {
        "templatePath": "toolstemplates/",
        "log": 1,
        "logLocation": "/usr/local/tomcat/logs/CofaxTools.log",
        "logMaxSize": "",
        "dataLog": 1,
        "dataLogLocation": "/usr/local/tomcat/logs/dataLog.log",
        "dataLogMaxSize": "",
        "removePageCache": "/content/admin/remove?cache=pages&id=",
        "removeTemplateCache": "/content/admin/remove?cache=templates&id=",
        "fileTransferFolder": "/usr/local/tomcat/webapps/content/fileTransferFolder",
        "lookInContext": 1,
        "adminGroupID": 4,
        "betaServer": true}}],
  "servlet-mapping": {
    "cofaxCDS": "/",
    "cofaxEmail": "/cofaxutil/aemail/*",
    "cofaxAdmin": "/admin/*",
    "fileServlet": "/static/*",
    "cofaxTools": "/tools/*"},

  "taglib": {
    "taglib-uri": "cofax.tld",
    "taglib-location": "/WEB-INF/tlds/cofax.tld"}}}

If some value is not there, you can put them as NULL, they are equivalent to <separator/> in XML
{"menu": {
    "header": "SVG Viewer",
    "items": [
        {"id": "Open"},
        {"id": "OpenNew", "label": "Open New"},
        {"id": "ZoomIn", "label": "Zoom In"},
        {"id": "ZoomOut", "label": "Zoom Out"},
        {"id": "OriginalView", "label": "Original View"},
        {"id": "Quality"},
        {"id": "Pause"},
        {"id": "Mute"},
        {"id": "Find", "label": "Find..."},
        {"id": "FindAgain", "label": "Find Again"},
        {"id": "Copy"},
        {"id": "CopyAgain", "label": "Copy Again"},
        {"id": "CopySVG", "label": "Copy SVG"},
        {"id": "ViewSVG", "label": "View SVG"},
        {"id": "ViewSource", "label": "View Source"},
        {"id": "SaveAs", "label": "Save As"},
        {"id": "Help"},
        {"id": "About", "label": "About Adobe CVG Viewer..."}

Leave a comment

undefined method `content=’ for nokogiri

while, traversing the xml file and retrieving value of one of the node , I was getting this error :  undefined method `content=’ for #<Nokogiri::XML::NodeSet:0xc0755b0>. Actually, I want to modify the node value on run time. The idea is to retrieve the node value, manipulate it and reinsert it.

My xml structure is as below

I want to modify the loanId value to append current_user id to it. I   used the below code
                      doc.xpath(“//list/request”).each do |doc|
                                 loan_number = doc.xpath(“loanId”).text
                                doc.xpath(“loanId”).content = loan_number +
                                puts doc.xpath(“id”).text

But, the code is throwing below error

undefined method `content=’ for #<Nokogiri::XML::NodeSet:0xc0755b0>

actually , xpath return an array of element, and so the content method ,  is called for array rather then the node. So, we should use at_xpath
and change the culprit line as below

doc.at_xpath(“loanId”).content = loan_number +
This will work now