I posted this question on IT Toolbox, but thought I'd post it here as well.
I'm working on a project where the company wants to score
model(s) in real time (transactional type data). They also would
like to remain vendor-independent. With these in mind, they have
considered using PMML. However, they are having a hard time
finding vendors that have a Scoring Engine that runs PMML (many
software products have this, if you want to use those products).
We want a standalone option so no matter what tool is used to be
the models, we can just drop in the PMML code and run it.
I've discussed the options of running source code (C or Java),
but they also want to be able to update models on the fly
without a recompile.
Anyone have experiences with PMML in production out there?
Friday, May 04, 2007
Subscribe to:
Post Comments (Atom)
22 comments:
well, you can always use an additional layer for orchestration and do some nice automatic recompilings, instead of manual ones.
Recently I have been deploying models as SQL but I guess you are then limited in the types of models you can use.
I have not used PMML but I know R has a PMML package which is used by Rattle.
Interesting. I haven't come accross too many products that offer a SQL export option. KXEN is the only one I can think of right now.
Are you hand coding the models into SQL or are you using a product?
Is anyone else using methods or scoring the model closer to a database (database vendors with data mining functions aside)?
Yeah I was talking about manually hand coding it. Another which has exporting functionality is Tiberius (code and SQL export also I believe). I don't know if it does PMML.
MicroStrategy has a scoring engine integrated into its BI platform. Users can import PMML models and generate scores on the fly. It supports models from SPSS, SAS, KXEN and any other product supporting the PMML standard. Check it out at http://www.microstrategy.com/Software/Products/Service_Modules/DataMining_Services/. Good luck!
I realize that this is only indirectly related to the original question, but this thread of responses reminds one of the value of "rolling" one's own code.
All of my data mining work is now done in my programming language of choice, and I regularly generate code for deployment platforms, not only for the actual model, but for variable transformations, missing value handling, etc.
It is a simple matter to include dynamically generated comments, with things like sample statistics, original data source names, model creation dates, etc.
And, adding to Will's comments, developing a generator of source code for models is not rocket science. Most data mining software allows for export of C code for example. I have done this on many occasions as well.
In this case, there is a requirement to update models without recompiling code. So, I recommended a "control file" of sorts that is just a file of neural network coefficients or regression coefficients, etc. But in the end, this is no more than an engine like the PMML engine I am looking to implement here.
It is always nicer if someone else has done the work of developing that driver program. But it isn't too difficult to do yourself, given some time and expertise.
This could also be of use:
http://sourceforge.net/projects/augustus
Thanks to all for the comments.
In addition to my other comment, a few more here...
Anonymous: SQL is becoming a much more common export option in my experience, which is a very good thing. In this case, we would want to build a few (perhaps expanding to dozens of) models, post them to a scoring server, and away it goes, with no manual intervention at all. So no hand-coding if we can at all help it. That's the nice thing about PMML (or source code for that matter).
I'm also becoming more aware of how many products have PMML real-time scoring engines inside the product. We just don't want to be tied to a product (yet), so scoring with a stand-alone engine is what we want for the this phase.
Shane: Thanks for the tip on augustus--this looks promising and is the kind of solution we are looking for right now. I have to download it a look at it to see more about what it does.
I'll post once we find a solution... by the way, there has also be some investigation of JDM options, and looking at Java tools like Weka to see what they can do.
PMML is easy to generate, but not as easy to apply. You need to implement all of it to be able to say that your software reads it, and that's a lot of code. I believe this is why so many modeling solutions can save PMML, but relatively few can apply it. IBM has something that does both (they keep renaming their products, so I don't know what it's called today).
I agree that SQL seems to be easier to integrate. That's why ArrowModel generates SQL, but not PMML.
Well Fair Isaac's business rules management system, Blaze Advisor imports PMML for scorecards, regression models and neural nets and then let's you manage and deploy them as business rules. Deployment includes as .NET, Java or COBOL code. I blogged about this here.
In the interest of fair disclosure I work at Fair Isaac but the product really does support PMML and we know 'cos we have a modeling product that outputs it as well as a PMML-based integration with Teradata etc.
JT
www.edmblog.com
James: the customer I'm working with needs to run PMML in conjunction with an Oracle database (the data comes from Oracle and has to be accessed transactionally in real time). So two questions:
1) is this possible with Blaze Advisor?
2) what is the cost of Blaze Advisor?
I used to work at SPSS (left a year ago).
SPSS do have a PMML scoring engine they have created (it was named 'SmartScore' or simply 'scoring engine'). It is integrated within many of the SPSS products that score predictive models (SPSS, Clementine etc).
I believe it is available as a toolkit or SDK, although probably not openly advertised by the sales channels.
We (i now work for a telco in Australia) use SPSS Clementine as a front-end to a Teradata warehouse. Most of our analysis is performed as SQL, and our models are scored as SQL automatically by Clementine (it converts many models such as CART and C5 decision trees and neural nets into SQL). This offers us very good performance and the ability to score our entire customer base quickly (Teradata is fast).
Cheers
Tim Manns
oh btw - i also had a bit of fun with a simple visualisation of neural network pmml. I'm not much of a programmer, but set myself the task of learning a bit of VB.net and tried to create a way to view neural net PMML.
I've supplied the vb.net executable and all the source code.
see;
http://www.kdkeys.net/forums/thread/6495.aspx
of course, also the PMML homepages;
http://www.dmg.org
http://sourceforge.net/forum/forum.php?forum_id=187860
cheers
Tim Manns
I own Tiberius Data Mining, and we develop output 'scoring' code for out models in any language that we have requests for. This currently stands at about 13 different formats from SQL, SAS, SPSS script, VB.net etc.
We've never had any request for PMML or ever come across anyone who uses it.
I'd say SQL is a more useable generic code format, so long as all the different flavours are catered for. As Shane says though, it can get hard to put some models in SQL. We've done neural nets in SQL but are finding Support Vector Machines a challenge.
Phil
Will,
I am curious, have your customer considered Oracle Data Mining (ODM). It is in the Oracle RDBMS, so your customer does not need to move the data outside or hardcode SQL versions of the models. It is easy to update models as well and the necessary transformations are generated for you.
Phil, ODM has Support Vector Machine as one of its algorihtms.
Full disclosure, I work for Oracle. Hope this does not come across as a pitch. I am truly curious about your views on deploying code to an Oracle DB versus using the techniques available in the RDBMS.
-Marcos
Marcos: I assume that your comment was directed at me (Dean) and the original question. (By the way--hi to you. I don't think I've seen you since we cross paths in Oakland several years ago)
The company I was consulting with was looking at Oracle (at my recommendation, I might add) as they were an Oracle shop. And they were getting support from Oracle (the folks up in Nashua).
I cycled off the project, so don't know how they eventually deployed the solution. If I find out how they deployed, I'll post it.
Please check out ADAPA, our lightweight J2EE deployment engine which executes a variety of predictive models in real time. It supports PMML and you can simply upload one or more models into the engine and then execute them, e.g., via web services.
For more details, please see
http://www.zementis.com/adapa.htm
I am in the process of developing a proposal and prototype for a market data and accounting data data quality system. This will be using WebSphere and potentially Web Services. I am a big fan of Python but my employer (a large financial institution of 40,000) only endorses Pearl.
I will check out the various posts here starting with Augustus. I will post to this site with updates on my progress. I hope this will be valuable to future adapters of PMML.
As far as I know in the past, I can export Clementine models into PMML. However, I'm not sure which version of Clementine can convert those models into SQL automatically without any programming needed. Can you tell how to score the model as SQL automatically in Clementine and which version of Clementine is it?
I just discovered Augustus and it seems to provide the exact functionality we want. Is there anyone following this discussion who'd like to share their experience with Augustus?
Sorry, I am here for technical purposes.
Post a Comment